We (Quickwit) would love to parse those JSON logs, but we need to consider how to handle that correctly.
Taking inspiration from the OTEL log data model, here is what we propose to do:
- we try to parse the log line.
- If it’s successful, we put the JSON in the field
body. All subfields ofbodywill be tokenized so users can run full-text search queries on them. We also propose to extractattributes,resources,severity_textfields present in the JSON and populate the log accordingly. The values of those fields won’t be tokenized, and users will be able to run analytics queries on them. For example, this opens the possibility to do aggregations onstatus,methodif those fields are underattributesorresourcesfields. - If the parsing fails, we fall back to the current behavior with a slight change, we put the log line in the field
body.message.
Let’s take a concrete example with this JSON log:
{
"pid": "#PID<0.624304.0>",
"severity_text": "info",
"attributes": {
"request_id": "F8GVvVNs9-rHr5oAbNeB",
"method": "GET",
"duration": 0.319,
"status": 304
}
}
This will be transformed into the following log:
{
"fly": {
"app": {
"id": 1,
"instance": "instance-id",
"name": "my-app"
},
"org": {
"id": 1
},
"region": "fra"
},
"log": {
"level": "info"
},
"body": {
"pid": "#PID<0.624304.0>",
}
"attributes": {
"request_id": "F8GVvVNs9-rHr5oAbNeB",
"method": "GET",
"duration": 0.319,
"status": 304
}
}
This way you will be able to execute those kind of queries:
attributes.method:GET attributes.status:304 body.pid:624304- do a date histogram + term aggregation on
attributes.statusso you can follow the evolution of log count per status
WDYT? (ping @Cade @tj1 @BrickInTheWall )