Fly Logs over NATS

Yay!

I was/am connected. The trouble is fdaa::3 is not reachable from my machine. But then you linking me the docs reminded me about that tip earlier in the document about how the DNS server is accessible from fdaa:<org prefix>:3. I already noticed that the NATS endpoint and DNS resolver are sitting on the same IP.

So now this works for me from my dev machine:

$ nats -s 'nats://[fdaa:0:3119::3]:4223' --user invidious --password $(fly auth token) sub 'logs.>'
[#1] Received on "logs.>"
--SNIP--

Question/feature-suggestion: can I get my org’s 6PN prefix from flyctl somehow?

1 Like

You should be able to get it by doing flyctl ips private and “deducing” from the private IPs shown.

@samcday we’re looking into this this week.

1 Like

I keep getting the error nats: Authorization Violation. What am I doing wrong?

App logs:

2021-08-19T18:36:39.017793568Z app[46a406a4] lhr [info] Starting init (commit: 721b5c7)...
2021-08-19T18:36:39.041950092Z app[46a406a4] lhr [info] Running: `bash start-fly-log-transporter.sh` as root
2021-08-19T18:36:39.056902567Z app[46a406a4] lhr [info] 2021/08/19 18:36:39 listening on [fdaa:0:31a0:a7b:a98:46a4:6a4:2]:22 (DNS: [fdaa::3]:53)
2021-08-19T18:36:39.182004430Z app[46a406a4] lhr [info] Aug 19 18:36:39.178  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info" enable_datadog_tracing=false
2021-08-19T18:36:39.190296977Z app[46a406a4] lhr [info] Aug 19 18:36:39.188  INFO vector::app: Loading configs. paths=["/etc/vector/vector.toml"]
2021-08-19T18:36:39.236947257Z app[46a406a4] lhr [info] Aug 19 18:36:39.235  INFO vector::topology: Running healthchecks.
2021-08-19T18:36:39.239726946Z app[46a406a4] lhr [info] Aug 19 18:36:39.237  INFO vector::topology: Starting source. name="fly_log_metrics"
2021-08-19T18:36:39.243471216Z app[46a406a4] lhr [info] Aug 19 18:36:39.242  INFO vector::topology: Starting source. name="fly_socket"
2021-08-19T18:36:39.244903290Z app[46a406a4] lhr [info] Aug 19 18:36:39.243  INFO vector::topology: Starting transform. name="log_json"
2021-08-19T18:36:39.246048279Z app[46a406a4] lhr [info] Aug 19 18:36:39.243  INFO vector::topology: Starting sink. name="blackhole"
2021-08-19T18:36:39.247389974Z app[46a406a4] lhr [info] Aug 19 18:36:39.243  INFO vector::topology: Starting sink. name="fly_log_metrics_prometheus"
2021-08-19T18:36:39.248497824Z app[46a406a4] lhr [info] Aug 19 18:36:39.242  INFO vector::topology::builder: Healthcheck: Passed.
2021-08-19T18:36:39.249385818Z app[46a406a4] lhr [info] Aug 19 18:36:39.247  INFO vector::topology::builder: Healthcheck: Passed.
2021-08-19T18:36:39.250885050Z app[46a406a4] lhr [info] Aug 19 18:36:39.249  INFO vector: Vector has started. debug="false" version="0.15.2" arch="x86_64" build_id="a216c34 2021-08-11"
2021-08-19T18:36:39.252579829Z app[46a406a4] lhr [info] Aug 19 18:36:39.251  INFO vector::internal_events::api: API server running. address=0.0.0.0:8686 playground=http://0.0.0.0:8686/playground
2021-08-19T18:36:39.257371651Z app[46a406a4] lhr [info] Aug 19 18:36:39.255  INFO source{component_kind="source" component_name=fly_socket component_type=socket}: vector::sources::util::unix_stream: Listening. path="/var/run/vector.sock" type="unix"
2021-08-19T18:36:40.613773971Z app[46a406a4] lhr [info] nats: Authorization Violation
2021-08-19T18:36:40.618172580Z app[46a406a4] lhr [info] Aug 19 18:36:40.614  INFO source{component_kind="source" component_name=fly_socket component_type=socket}:connection: vector::sources::util::unix_stream: Finished sending.
2021-08-19T18:36:40.620132433Z app[46a406a4] lhr [info] Aug 19 18:36:40.619  INFO vector: Vector has stopped.
2021-08-19T18:36:41.052395384Z app[46a406a4] lhr [info] Main child exited normally with code: 0
2021-08-19T18:36:41.053597012Z app[46a406a4] lhr [info] Reaped child process with pid: 518, exit code: 0
2021-08-19T18:36:41.054202470Z app[46a406a4] lhr [info] Starting clean up.

These are the steps I executed:

  • Clone the repo
  • fly launch
    • An existing fly.toml file was found for app <appname>
      ? Would you like to copy its configuration to the new app? Yes
      Creating app in /Users/arno/dev/fly-log-shipper
      Scanning source code
      Detected Dockerfile app
      ? Select organization: <orgname>
      ? Select region: lhr (London, United Kingdom)
      Created app <appname> in organization <orgname>
      Wrote config file fly.toml
      Your app is ready. Deploy with `flyctl deploy`
      ? Would you like to deploy now? No
      
  • Generate a personal access token in my personal account settings
  • fly secrets set ORG=<orgname> ACCESS_TOKEN=<accesstoken> SUBJECT="logs.>"
  • fly deploy
  • The app builds successfully and emits the logs pasted above

I tried it with a sink the first time, then made another attempt without one, as described above.

EDIT: I’m not an admin of the org, just a normal member. But I created the log-shipper app myself, inside the org. Could that be it?

Can you try again?

The NATS proxy we run was having database connection issues specifically in lhr. It should be resolved now, and I’m adding more monitoring around that error case.

Yep, it works now. Thanks a lot! :pray:

Now that it works, the log-shipper instance only seems to subscribe to its own logs. What I try to do, and what I understand to be the idea here, is to subscribe to logs from a different app in the same org. Is that supposed to be the case?

that should indeed be the case. Can you let me know your app name and I can have a look in to it further for you?

Thanks @steveberryman, I’ve sent you the app details via DM.

We’ve identified the issue where the Rust client (which seems to be a wrapper around the sync client, looks pretty bad) is expecting a pretty strict “server info” response.

We aren’t sending the “client id”. We’ll be fixing this shortly!

1 Like

@samcday I believe this is now fixed.

We’ve set up logging to a stage where it’s working and it’s a really nice approach so far!

The top-level of the logs, however, seems to be in Elixir format, which is not very common and can’t be parsed by Vector. Could this be changed to JSON, in general?

I know this would change the main format for people already running the log-shipper, but I think it really should be encoded in JSON, not Elixir.

Which log lines aren’t JSON?

Every messages published by our NATS cluster should be in JSON format. It follows the ECS (Elastic Common Schema) format in general.

If you mean the contents of the message field in the log JSON, then you’re right, our init software does not output JSON format.

For us, every log line looks like this right now:

 %{"event" => %{"provider" => "app"}, "fly" => %{"app" => %{"instance" => "450b585a", "name" => "ev2"}, "region" => "cdg"}, "host" => "6c00", "log" => %{"level" => "info"}, "message" => "{\"application\":\"phoenix\",\" ...

This is in Logflare, which we forward logs to from the log_json source via a plain HTTP sink.

We’re investigating.

However: I’d be surprised if we outputted logs that way. We have no Elixir in that infrastructure (it’s mostly Go and Rust). We’re also using NATS logs ourselves in various places and they’re all in “normal” JSON format.

Nevermind, sorry. It must have been converted on Logflare’s side. When sending it to a requestbin.com bin, it shows as JSON on that level. Thank you anyway for your response :blush:

@steveberryman Thanks again for the feedback. Managed to make it all work nicely. One little thing i noticed, if you run vector validate vector.toml you get a warning that data_dir = "/var/lib/vector/"doesn’t exist. Creating the dir solves that.

Here is the transform and sink for LogDNA:

[transforms.log_json]
type = "remap"
inputs = ["fly_socket"]
source = '''
    .= parse_json!(.message)
    . |= parse_regex!(.message, r'^\[(?P<app>\w+)\]\s+(?P<message>.*)$')
    del(.log)
    if ends_with(.fly.app.name, "-staging") {
        .env = "staging"
    }
    '''

[sinks.logdna]
type = "logdna"
inputs = ["log_json"]
api_key = "${LOGDNA_APIKEY}"
hostname = "{{fly.app.name}}"
default_app = "system"
tags = ["${ORG}"]

I am running an S6 multi-process setup with multiple services in a single APP. The logs for each service are prepended with the service name in between brackets. This is done in the S6 log script.

I’m applying following parsing rules for specific fields recognised by LogDNA

  • hostname: Set based on app name
  • app: Parsing out the service name from the log message prefix
  • env: Parsing out from app name based on convention (app name ends with either -staging or -production)
  • tags: Set based on the ORG name

I’m also deleting log.level as LogDNA handles this out of the box and it seems the info by fly is always info ?

Keep up the great work!

5 Likes

@johan , this is really helpful, thank you! I’m curious about your last line regarding deleting log_level as everything is coming in as info. We’re seeing this too. How did you go about deleting/fixing this so that the log_level is accurate?

Edit: nevermind, it seems our log levels actually are coming through correctly.

1 Like

Thanks! Good question I haven’t looking into this again as it was not an issue for us. Will do that and get back to you with more info.

1 Like

Trying follow up where relevant…

We (Logflare) have a pull request in to add Logflare to the fly-log-shipper but you can use our fork until that happens!

A bit more info here: Is it possible to send logs elsewhere, or to grep logs back in time? - #9 by user69

2 Likes