fly-log-shipper isn't parsing json message

I have an instance of fly-log-shipper deployed that sends logs to a Grafana Cloud Loki instance. My Elixir backend uses the Elixir logger_json library to output JSON log messages.

The log messages are appearing in Loki escaped like this:

{
  "event": {
    "provider": "app"
  },
  "fly": {
    "app": {
      "instance": "instance-id",
      "name": "app-name"
    },
    "region": "lax"
  },
  "host": "f2a8",
  "level": "info",
  "log": {
    "level": "info"
  },
  "message": "\u001b[1;38;5;2mdashboard          \u001b[0m | {\"time\":\"2022-03-09T04:03:01.711Z\",\"severity\":\"info\",\"message\":\"\",\"metadata\":{\"domain\":[\"elixir\"],\"erl_level\":\"info\",\"httpRequest\":{\"requestMethod\":\"GET\",\"requestUrl\":\"https://dashboard-composed-pr-1585-preview.fly.dev/login\",\"status\":200,\"userAgent\":\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.109 Safari/537.36\",\"remoteIp\":\"",\"referer\":\"https://fly.io/\",\"latency\":\"0.00694278s\"},\"node\":{\"hostname\":\"e5a049d5\",\"vm_pid\":536},\"phoenix\":{\"controller\":\"Elixir.VrepsWeb.SessionController\",\"action\":\"new\"},\"request_id\":\"Ftqalb8Q0WFmTw8AAAFR\"}}"
}

When running flyctl logs, I get output that looks like this:

app[b6748c8f] sea [info]dashboard           | {"time":"2022-03-09T09:00:58.546Z","severity":"info","message":"tzdata release in place is from a file last modified Tue, 22 Dec 2020 23:35:21 GMT. Release file on server was last modified Fri, 22 Oct 2021 02:20:47 GMT.","metadata":{"domain":["elixir"],"erl_level":"info"}}

It appears that fly-log-shipper is struggling to parse the message since it contains these leading instance id/region/app name identifiers. When I test this with a local version of Vector, it fails. Am I misunderstanding something or are other people having a similar issue?

This is the vector.toml that fly-log-shipper created for my instance:

data_dir = "/var/lib/vector/"

[api]
  enabled = true
  address = "0.0.0.0:8686"

[sources.fly_log_metrics]
  type = "internal_metrics"

[sources.fly_socket]
  type = "socket"
  mode = "unix" 
  path = "/var/run/vector.sock"

[transforms.log_json]
  type = "remap"
  inputs = ["fly_socket"]
  source = '''
  . = parse_json!(.message)
  '''

[sinks.fly_log_metrics_prometheus]
  type = "prometheus_exporter" # required
  inputs = ["fly_log_metrics"] # required
  address = "0.0.0.0:9598" # required
  default_namespace = "fly-logs" # optional, no default

[sinks.blackhole]
  type = "blackhole"
  inputs = ["log_json"]
  print_interval_secs = 100000
[transforms.loki_json]
  type = "remap"
  inputs = ["log_json"]
  source = '''
  .level = .log.level
  if starts_with(.message, "{") ?? false {
    # parse json messages
    json = object!(parse_json!(.message))
    del(.message)
    . |= json
  }
  '''

[sinks.loki]
  type = "loki"
  inputs = ["loki_json"]
  endpoint = "${LOKI_URL}"
  compression = "gzip"
  auth.strategy = "basic"
  auth.user = "${LOKI_USERNAME}"
  auth.password = "${LOKI_PASSWORD}"
  encoding.codec = "json"

  labels.event_provider = "{{event.provider}}"
  labels.fly_region = "{{fly.region}}"
  labels.fly_app_name = "{{fly.app.name}}"
  labels.fly_app_instance = "{{fly.app.instance}}"
  labels.host = "{{host}}"
  labels.level = "{{level}}"

Hi @Matt_Stewart-Ronnisc

Those weird characters at the start of your log message would indeed be causing it to fail. Try locating what’s causing those extra characters to be added to your log messages.

We’re successfully running fly-logs-shipper with Grafana Cloud Loki and are having no trouble parsing the json messages.

Upon further investigation, it looks like those extra characters might be getting added by fly.io if you use their processes feature.

@Matt_Stewart-Ronnisc can you please confirm if you use that feature and have a process called dashboard? (if you’re unsure you can post your fly.toml file)

@joshua should the process name be part of the json provided by fly alongside app name and not in the message field?

Thanks! Your line of thought made me think of exactly what may be causing this. I use overmind for process management as recommended here. It is likely the one prepending these in the logs. I’ll dive into their documentation on how to fix this.