Searchable application logs in Grafana

From my understanding, attributes is a map from the application.

  1. If params goes into the body in the following query, would that be fully queryable?
  2. What if attributes/params has a deeper map? I don’t use that myself, but I could imagine others would. As an example, I added last_login as an array. Maybe someone wants to search for all users on desktop who last logged in the past 1 week.
{
  "pid": "#PID<0.52431.0>",
  "time": "2024-04-15T15:52:07.533725Z",
  "level": "info",
  "ip": "66.42.26.1",
  "region": "dfw",
  "user_id": 123,
  "guid": "121268d4-953c-4015-8540-ac0e560077a5",
  "event": "device",
  "params": {
    "height": 820,
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3.1 Safari/605.1.15",
    "width": 1440,
    "last_login": [1234, 5678]
  }
}

If params goes into the body in the following query, would that be fully queryable?

First, we need to define “fully queryable.” If you mean the ability to run analytics/aggregation queries, it won’t work. You will need to put the params in the attributes field to run term aggregations on the field attributes.params.last_login. This will allow you to retrieve all IDs. You will also be able to filter logs for a specific user ID with attributes.params.last_login:1234.

In your case, you probably want to put almost all your fields in the attributes fields like this:

{
  "time": "2024-04-15T15:52:07.533725Z",
  "severity_text": "info",
  "attributes": {
    "pid": "#PID<0.52431.0>",
      "ip": "66.42.26.1",  
      "region": "dfw",
      "user_id": 123,
      "guid": "121268d4-953c-4015-8540-ac0e560077a5",
      "event": "device",
      "params": {
        "height": 820,
        "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.3.1 Safari/605.1.15",
        "width": 1440,
        "last_login": [1234, 5678]
      }
   }
}
1 Like

One of my pet peeves with all log searches is that it usually tries to throw you in the deep end of the the query language from the the get go… (This goes for most[if not all] log solutions, at least the few I had experience with. BetterStack probably being the least worse). If Quickwit can also solve for that, it would definitely put them many steps ahead IMO.

Any junior teammate, or just any person that is not their day-to-day work to search stuff on this very specific log query language, is usually just left very confused that they are searching for veeery basic things, and it doesn’t return what they want.

The peak bad experience is when everything usually “just works”, no one is very used to the specialized query langague, and once a quarter when you DO have an incident and NEED to jump on the logs to search something, it feels like they are working against you, while you are trying to search for the simplest thing to frantically extinguish a fire.

IMO, plaintext search is usually more than enough in most debug scenarios, the average user is completely oblivious to the specialized query language. Or something akin to splitting the plaintext on the space characters and search of all messages that have all the keywords… But I have no idea if that plays nicely with whatever index you have powering this.

Could also be 2 separate querying fields, one specifically saying “basic search”, and one “advanced search”… It could be just a fallback, to not try to do force the search as the query language, if it is simply not a valid query, and fallback to plaintext… IDK…

I work mainly with with Ruby, and one of the most common things for me to search for is for some error/message happening on/inside a specific class. My first instinct when opening the logs for the first time was, that I saw a few logs about one of these classes, and to search for that class name. Then it became a looong game of guessing how to query for it…

Here is an example log that I saw, and was expecting to find things similar to it:

2024-05-27T12:24:23.073Z pid=322 tid=c0ku class=Example::My::Class jid=ef21583701f2666ab06234534 INFO: started processing whatever, with id 12345
                                          ˆˆˆˆˆˆˆˆˆˆˆˆˆˆˆˆˆˆˆˆˆˆˆˆ

And my tentatives searches until finally it working:

  • Example::My::Class → Error, and saying “No data”
  • "Example::My::Class" → Error, and saying “No data”
  • 'Example::My::Class' → Error, and saying “No data”
  • Example\:\:My\:\:Class → Error, and saying “No data”
  • "Example\:\:My\:\:Class" → Error, and saying “No data”
  • 'Example\:\:My\:\:Class':thinking: No error!, but also “No data”
  • Gave up on trying it blindly, and had to search on Google about this Fly feature… found this topic, and found out that it was powered by Quickwit, went to their site, looked for the docs, took a minute to find the query language docs. Looks like the message: prefix may help; even though it feels like it should yield the same results as not adding the field prefix :man_shrugging:
  • message:Example::My::Class → Error, and saying “No data”
  • message:"Example::My::Class" → Error, and saying “No data”
  • message:'Example::My::Class':white_check_mark: Finally! Can see the results I would expect…
  • Just for completion’s sake, message:Example\:\:My\:\:Class → Error, and saying “No data”

Anyway, even if we disagree on all of the above; at a MINIMUM, on that Grafana page, you should have links to the Quickwit docs about the query language. Like this and this.

Hope this feedback is useful to make this feature one of the best out there :pray:. Love it being integrated into the Fly solutions :heart:. Would definitely pay extra for it.

4 Likes

Hi, Quickwit developer here.

I’m sorry you had troubles getting started. Thanks for taking the time to describe your though process!

Your first try doesn’t work, and is unlikely to ever work, because colons are a special element of syntax. However there is room for improvement for us on the other things you tried.
Double quoted strings and simple quoted ones have a slightly different meaning, the former needs more information to be indexed (which isn’t indexed with the current configuration). However we should improve how error is reported to the user, if you don’t know about our syntax, it’s not obvious why it doesn’t work, nor why would simple quote work differently.
I would have expected 'Example::My::Class' to work, but apparently there is a bug in our query parser. I’ve opened a ticket about that, and will try to get on it shortly.
Escape sequences with \ is currently something we do only inside single/double quoted strings, (meant to escape single/double quotes, though it should actually work with any character). Having them also work outside of quotes does seem like a reasonable thing to improve on (ticket).
I’m not sure why 'Example\:\:My\:\:Class' did not work. It seems to work for me :confused:

Thanks a lot for your feedback. I hope we’ll be able to make Quickwit better match your expectations in the future!

2 Likes

Thanks for this, feeling lucky to have found this when I was this :pinching_hand: close to set up the fly-log-shipper when this is more than enough for my basic logging needs for now.

I searched as much as I could, but couldn’t find a way to reverse the logs order. When using fly logs in the terminal, and other logging services I’ve tried, latest logs appear at the bottom. I find it a bit confusing to think bottom-to-top reading the logs in the dashboard.

Is there a way to change the logs order?

2 Likes

My log shipper instance for my prod app crashed and there’s nothing easy out-of-the-box for monitoring it.

This new feature saved me! Well, almost. Lets say I’m debugging a crashed job that has a uuid. I search the uuid, and find some log lines related to it. How do I quickly/easily see other logs from the time range where the uuid log lines are present?

Reason - sometimes logging doesn’t maintain the context w/ the uuid and such, and I have to look around in the same time window. In Betterstack, I was clicking the timestamp to the left and it was taking me to that point in time.

If I can get this sorted, I’ll drop my log shipper and my Betterstack subscription!

OK, found it! For anyone else: Hover over the log entry, click “Show context” button on the right. You might have to delete some of the filtering in the search box that pops up.

2 Likes

Can you link to a log entry’s details, like Datadog supports? (I don’t see any updates to the URL query when viewing the details, so probably not?)

1 Like

Hello, Is there a way to add the quickwit datasource to our own grafana cloud instance like we can for prometheus? I’m looking for the INDEX_ID to try and configure it.

Is this still the strategy Quickqwit uses to parse JSON logs? I’ve tried to follow this format, and my logs still seem to be un-parsed - I cannot query by the keys under “attributes”. Here’s an example:

Is there some configuration I’m missing? This is in the Grafana dashboard that came off-the-shelf with my fly.io app.

6 Likes

Thats awesome! Are there any plans for alerts and increased log retention/archiving?

I’m currently relying on Betterstack to notify me about errors or other weird things in logs

How can I download all the logs from my Quickwit log cluster to a text file on my computer? My 30 expiration period is almost over so really need to export these logs somehow.

Hi, is the log search broken? I only see a 404 page

Hey there. It’s not possible download the logs right now. You do, in theory, have direct API access to the Quickwit log storage endpoint where you could query the logs yourself. If you need more information about that, let us know.

1 Like

Can you try logging out of Grafana, then back in again?

I don’t think I am logged into Grafana. How does that affect? And which domain should I be logged out of?

The screenshot about is from Grafana, which is what’s used to show you your app metrics and logs. It looks like you’re having trouble logging in. You can try logging out of Fly.io itself, then going to your logs dashboard.

Any solution about this?

More information about how to query the Quickwit log storage endpoint would be great! I tried to figure it out of my own but I couldn’t figure out the auth structure.

Please let me know how to do that as my logs are starting to be deleted as they are beyond the 30 day period.

Thanks!

After looking into this a bit, it’s not going to be possible to download logs from Quickwit right now without some work on our end.

Is the problem that you don’t have enough log retention? What level of retention would be suitable for you?