We (Quickwit) would love to parse those JSON logs, but we need to consider how to handle that correctly.
Taking inspiration from the OTEL log data model, here is what we propose to do:
- we try to parse the log line.
- If it’s successful, we put the JSON in the field
body
. All subfields ofbody
will be tokenized so users can run full-text search queries on them. We also propose to extractattributes
,resources
,severity_text
fields present in the JSON and populate the log accordingly. The values of those fields won’t be tokenized, and users will be able to run analytics queries on them. For example, this opens the possibility to do aggregations onstatus
,method
if those fields are underattributes
orresources
fields. - If the parsing fails, we fall back to the current behavior with a slight change, we put the log line in the field
body.message
.
Let’s take a concrete example with this JSON log:
{
"pid": "#PID<0.624304.0>",
"severity_text": "info",
"attributes": {
"request_id": "F8GVvVNs9-rHr5oAbNeB",
"method": "GET",
"duration": 0.319,
"status": 304
}
}
This will be transformed into the following log:
{
"fly": {
"app": {
"id": 1,
"instance": "instance-id",
"name": "my-app"
},
"org": {
"id": 1
},
"region": "fra"
},
"log": {
"level": "info"
},
"body": {
"pid": "#PID<0.624304.0>",
}
"attributes": {
"request_id": "F8GVvVNs9-rHr5oAbNeB",
"method": "GET",
"duration": 0.319,
"status": 304
}
}
This way you will be able to execute those kind of queries:
attributes.method:GET attributes.status:304 body.pid:624304
- do a date histogram + term aggregation on
attributes.status
so you can follow the evolution of log count per status
WDYT? (ping @Cade @tj1 @BrickInTheWall )