Fly Logs over NATS

Hi folks!

I’ve been working on exposing a NATS endpoint for logs recently and it’s finally ready for some people to try out! I’ve written a proxy that lets you connect a NATS client to nats://[fdaa::3]:4223 from inside a fly app, using an organization slug (available from fly orgs list) as the username and a personal access token as the password, and subscribe to your fly logs to do what you please with them. The subject schema is logs.<app_name>.<region>.<instance_id> and you can use all the normal NATS wildcards to subscribe to a stream, such as logs.> to get all of your logs, or logs.*.lhr.> to just get logs from lhr.

I’ve made a repo with an example of a vector setup with a small Go nats client that takes a number of secrets to configure Vector to output to a number of places. Have a look here: GitHub - superfly/fly-log-shipper: Ship logs from fly to other providers. Let me know what works and what doesn’t and any thoughts you have (and any bugs you find)!

12 Likes

Great work. I have been waiting for this. Can I open a pull request for uptrace? Also this could be a great opportunity to push for a nats vector source since only a sink is implemented.

Please do!

And I agree about the nats source! That would be much better than my hacky go to socket solution.

1 Like

I was just trying out this feature with uptrace:

[sinks.uptrace]
  type = "http"
  inputs = ["log_json"]
  encoding.codec = "ndjson"
  compression = "gzip"
  uri = "https://api.uptrace.dev/api/v1/vector-logs"
  headers.uptrace-dsn = "https://${UPTRACE_API_KEY}@api.uptrace.dev/${UPTRACE_PROJECT}"

For some reason, I’m failing health checks. According to these logs, what do you think could be the issue:

2021-05-24T20:38:10.411335985Z runner[b5bb4381] lhr [info] Starting virtual machine
2021-05-24T20:38:10.550735408Z app[b5bb4381] lhr [info] Starting init (commit: cc4f071)...
2021-05-24T20:38:10.575069570Z app[b5bb4381] lhr [info] Running: `bash start-fly-log-transporter.sh` as root
2021-05-24T20:38:10.586684182Z app[b5bb4381] lhr [info] 2021/05/24 20:38:10 listening on [fdaa:0:1af2:a7b:a98:b5bb:4381:2]:22 (DNS: [fdaa::3]:53)
2021-05-24T20:38:10.671333540Z app[b5bb4381] lhr [info] May 24 20:38:10.668  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info"
2021-05-24T20:38:10.673034997Z app[b5bb4381] lhr [info] May 24 20:38:10.671  INFO vector::sources::host_metrics: PROCFS_ROOT is unset. Using default '/proc' for procfs root.
2021-05-24T20:38:10.674704414Z app[b5bb4381] lhr [info] May 24 20:38:10.673  INFO vector::sources::host_metrics: SYSFS_ROOT is unset. Using default '/sys' for sysfs root.
2021-05-24T20:38:10.677967630Z app[b5bb4381] lhr [info] May 24 20:38:10.676  INFO vector::app: Loading configs. path=[("/etc/vector/vector.toml", None)]
2021-05-24T20:38:10.736795094Z app[b5bb4381] lhr [info] May 24 20:38:10.735  WARN vector::sinks::util::http: Option `headers` has been deprecated. Use `request.headers` instead.
2021-05-24T20:38:10.737964941Z app[b5bb4381] lhr [info] May 24 20:38:10.737  INFO vector::topology: Running healthchecks.
2021-05-24T20:38:10.739105669Z app[b5bb4381] lhr [info] May 24 20:38:10.738  INFO vector::topology::builder: Healthcheck: Passed.
2021-05-24T20:38:10.739820425Z app[b5bb4381] lhr [info] May 24 20:38:10.738  INFO vector::topology::builder: Healthcheck: Passed.
2021-05-24T20:38:10.740481982Z app[b5bb4381] lhr [info] May 24 20:38:10.738  INFO vector::topology::builder: Healthcheck: Passed.
2021-05-24T20:38:10.742572841Z app[b5bb4381] lhr [info] May 24 20:38:10.741  INFO vector::topology: Starting source. name="fly_socket"
2021-05-24T20:38:10.744282378Z app[b5bb4381] lhr [info] May 24 20:38:10.743  INFO vector::topology: Starting source. name="fly_log_metrics"
2021-05-24T20:38:10.745145161Z app[b5bb4381] lhr [info] May 24 20:38:10.744  INFO vector::topology: Starting transform. name="log_json"
2021-05-24T20:38:10.747592533Z app[b5bb4381] lhr [info] May 24 20:38:10.745  INFO source{component_kind="source" component_name=fly_socket component_type=socket}: vector::sources::util::unix_stream: Listening. path="/var/run/vector.sock" type="unix"
2021-05-24T20:38:10.748797939Z app[b5bb4381] lhr [info] May 24 20:38:10.747  INFO vector::topology: Starting sink. name="blackhole"
2021-05-24T20:38:10.750048314Z app[b5bb4381] lhr [info] May 24 20:38:10.748  INFO vector::topology: Starting sink. name="uptrace"
2021-05-24T20:38:10.751527605Z app[b5bb4381] lhr [info] May 24 20:38:10.750  INFO vector::topology: Starting sink. name="fly_log_metrics_prometheus"
2021-05-24T20:38:10.753857689Z app[b5bb4381] lhr [info] May 24 20:38:10.751  INFO vector: Vector has started. version="0.13.1" git_version="v0.13.1" released="Thu, 29 Apr 2021 14:20:13 +0000" arch="x86_64"
2021-05-24T20:38:10.756241223Z app[b5bb4381] lhr [info] May 24 20:38:10.753  INFO vector::internal_events::api: API server running. address=127.0.0.1:8686 playground=http://127.0.0.1:8686/playground
2021-05-24T20:38:29.417805936Z proxy[b5bb4381] lhr [error] Health check status changed 'warning' => 'critical'
2021-05-24T20:39:28.051858416Z runner[b5bb4381] lhr [info] Shutting down virtual machine
2021-05-24T20:39:28.141125853Z app[b5bb4381] lhr [info] Sending signal SIGINT to main child process w/ PID 507
2021-05-24T20:39:33.194861716Z proxy[b5bb4381] lhr [info] Health check status changed 'critical' => 'passing'
Error 404 Not Found

I did set the secrets correctly:

$ flyctl secrets list        
NAME            DIGEST                           DATE      
ACCESS_TOKEN    6dbb2324ce87424071dff69bd7c386a8 1m50s ago 
ORG             6c3a0a43926e094574453dc9f27ea23f 1m50s ago 
SUBJECT         092a92ad88b481f7bd7c7d3d4fcce9fb 1m50s ago 
UPTRACE_API_KEY 65c1b2d013e61974a03f11fd9930558e 1m50s ago 
UPTRACE_PROJECT 2b44928ae11fb9384c4cf38708677c48 1m50s ago 

I’m following this example: https://docs.uptrace.dev/guide/logging.html#third-party-logs. And for the record I did manage to connect to uptrace for a brief moment before the app was failed:


Caption: logs belong to a caddy app

1 Like

Never mind just had to specifically bind to:

#vector.toml
....
[api]
  enabled = true
  address = "0.0.0.0:8686"

Now it’s working perfectly, and it’s so cool:


3 Likes

Oh that’s dope, nice job. Why do we have a public service on that example logs project?

Do mean the one in the images?

I think the answer to this is “I got too excited and didn’t remove it when I was writing things”. There isn’t a good reason for there to be a public service!

Just have to once again share my absolute admiration for all the work you guys do on Fly. Thanks, seriously!

4 Likes

@steveberryman Nice work. Log sync with New Relic is working great. Found a small issue submitted a PR: Fix typo by johanjanssens · Pull Request #2 · superfly/fly-log-shipper · GitHub

Quick question, in NewRelic there are 3 specific fly attributes I can filter on.

  • fly.region
  • fly.app.instance
  • fly.app.name

If i wanted to add custom attributes, for example for different services I’m running, any hints where to look how to go about that?

1 Like

Thanks for the PR!

To get custom attributes, you’d have to edit the vector config to parse your log messages a bit more than we do in the example. Vector has a pretty powerful DSL that you can use to parse your log lines a number of ways (for example, if your app outputs json logs, you could parse that in to separate attributes really easily with the parse_json function). Docs for the Vector Remap Language (VRL) are here: Vector Remap Language (VRL) | Vector documentation.

We already have a section that uses VRL to parse the json from the NATS message in the vector.toml:

[transforms.log_json]
  type = "remap"
  inputs = ["fly_socket"]
  source = '''
  . = parse_json!(.message)
  '''

You would need to augment that to further parse the .message attribute (which after the parse_json line there will have been remapped to just be your plain log line).

If you give me an example of one of your apps log lines and what you’d like to break in to attributes (or an example of some lines and attributes you’d like), feel free to post them here or privately and I’d be happy to help figure out a config that might do what you were after!

2 Likes

Thanks Steve, you are most welcome for the PR. I hope to be able to contribute more. I’m working on a little S6 LAMP stack for web apps.

About VRL, thanks for the hints. Thats more then enough to get me started. Will try to get it working myself, and bother you if I really get stuck.

A kind soul on the internet has contributed an initial implementation for a NATS source in Vector a few days ago.

I’m gonna play around with it today!

I saw this a couple of days ago! Very exciting. Unfortunately it doesn’t support authentication (yet) which is needed for Fly logs, but hopefully that’s an easy addition!

Does the sink implement auth in the upstream repo? Or is it just in that branch of yours?

Yeah, I noticed that shortly after my post here :smiley:

Code changes are pretty straightforward, I started up a draft PR here (#8674).

async-nats will also need to be bumped because 0.9 (currently used by Vector) doesn’t support IPv6 addresses.

I haven’t managed a build with my changes working in Fly.io yet. For one, producing a Docker image of Vector on a potato computer is an exercise in futility patience. I’m currently looking at this error: Aug 11 19:56:32.653 ERROR vector::topology: Configuration error. error=Source "fly_logs": cannot parse server info which seems to be coming from deep in the protobuf code of nats.rs, which is kinda scary.

Anyway, I’ll keep pluggin’ away at it today and report back if I have anything interesting to share.

1 Like

Okay, reporting back. Looks like the async-nats crate cannot connect to the Fly.io NATS server at fdaa::3.

I couldn’t remember how to do iptables things, so I SSH’d into a Fly.io instance, and from there did a ssh -R 1234:[fdaa::3]:4223 sam@my-dev-machine to access the NATS endpoint from my local machine for testing. I then wrote a simple Rust client that does this:

use color_eyre::Report;

#[tokio::main]
async fn main() -> Result<(), Report> {
    let nc = async_nats::Options::with_user_pass("fly-org", "fly-token").connect("nats://127.0.0.1:1234").await?;

    let sub = nc.subscribe("logs.>").await?;

    while let Some(msg) = sub.next().await {
        println!("{}", String::from_utf8(msg.data)?.to_owned());
    }

    Ok(())
}

This does not work, running it yields the following result:

$ cargo run

Error: cannot parse server info

I can, however, run the following successfully with the Go client:

package main

import (
	"fmt"
	"github.com/nats-io/nats.go"
	"time"
)

func main() {
	nc, err := nats.Connect("nats://fly-org:authToken@127.0.0.1:1234")
	if err != nil {
		panic(err)
	}

	sub, err := nc.SubscribeSync("logs.>")
	if err != nil {
		panic (err)
	}

	for {
		msg, err := sub.NextMsg(time.Minute)
		if err != nil {
			panic (err)
		}

		fmt.Printf("%s\n", string(msg.Data))
	}
}

That successfully yields the log stream in my Fly.io org.

Here’s the thing, I only just learned that NATS isn’t a terrifying shouted plural of Network Address Translation a couple of days ago. It’s possible my understanding of NATS is so limited that I’m doing something wrong in my Rust client code. Though I did confirm that authentication via with_user_pass in async-nats works via an integration test as part of that PR I previously linked so … :man_shrugging:

From a cursory glance at fly.rs, it looks like it’s expecting to be able to get server info immediately on a new connection. I noticed that if I tried to run something like this with the natscli, I’d see this:

$ nats -s 'nats://127.0.0.1:1234' --user invidious --password $(fly auth token) server info 
nats: error: no results received, ensure the account used has system privileges and appropriate permissions, try --help

And this is despite direct subscription working:

$ nats -s 'nats://127.0.0.1:1234' --user invidious --password $(fly auth token) sub 'logs.>'      
14:06:20 Subscribing on logs.>
[#1] Received on "logs.>"
--SNIP--

/cc @steveberryman

1 Like

We are “proxying” NATS (to prevent users from getting messages not destined for them), so there’s a fair chance some things aren’t implemented exactly to spec :slight_smile:

We can fix this though! I can take a look in a little bit.

Re not being able to access it via fdaa::3, that’s normal since this is the private network. You’ll have to connect via wireguard from your computer (I believe that would work :thinking:). It’s much easier from one of your VMs though!