Fly + Datadog vector error

Hello there! We’ve been using the fly-log-shipper.

Today, all of a sudden, we started getting errors and the app spins up a machine and then it gets killed:

2024-06-25T16:49:22.868 app[e28675d2fe0578] lax [info] INFO Main child exited normally with code: 78
2024-06-25T16:49:22.881 app[e28675d2fe0578] lax [info] INFO Starting clean up.
2024-06-25T16:49:22.883 app[e28675d2fe0578] lax [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2024-06-25T16:49:22.883 app[e28675d2fe0578] lax [info] [ 1.591123] reboot: Restarting system
2024-06-25T16:49:22.993 runner[e28675d2fe0578] lax [info] machine did not have a restart policy, defaulting to restart
2024-06-25T16:49:26.721 app[e28675d2fe0578] lax [info] INFO Preparing to run: `bash start-fly-log-transporter.sh` as root
2024-06-25T16:49:26.729 app[e28675d2fe0578] lax [info] 2024/06/25 16:49:26 INFO SSH listening listen_address=[fdaa:0:69a9:a7b:2b8:e6ed:cdf:2]:22 dns_server=[fdaa::3]:53
2024-06-25T16:49:26.785 app[e28675d2fe0578] lax [info] 2024-06-25T16:49:26.784542Z INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info,buffers=info,kube=info"
2024-06-25T16:49:26.796 app[e28675d2fe0578] lax [info] 2024-06-25T16:49:26.796779Z ERROR vector::cli: Configuration error. error=redefinition of table `transforms.boost_logs` for key `transforms.boost_logs` at line 52 column 1

Any ideas where to start looking? We haven’t redeployed this app since forever and nothing has changed in quite some time.

Interestingly, I tried setting up all three necessary secrets again:
ACCESS_TOKEN
DATADOG_API_KEY
ORG

and now I’m getting errors like this one

  2024-06-25T20:04:05.941057Z  INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info,buffers=info,lapin=info,kube=info"
  2024-06-25T20:04:05.946149Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.toml", "/etc/vector/sinks"]
  2024-06-25T20:04:06.919915Z ERROR vector::topology: Configuration error. error=Source "nats": NATS Connect Error: unexpected line while connecting: Err("Authorization Violation")

I’ve checked multiple times, the credentials are correct. I’m generating the ACCESS_TOKEN via fly secrets set ACCESS_TOKEN=$(fly auth token) (as per the github readme).

I’m really at a loss here.

Hey @dilirity , Can you try restarting and/or redeploying your log shipper app? It should be fixed now.

This one was on our side. We shipped a change to some auth pieces earlier today that had unintended side effects on Log Shipper apps (or any apps that connect to NATS with an Oauth token).

Thank you @Sam-Fly!

We’ve redeployed the app, but now we’re getting these entries in the logs:

2024-06-26T04:07:44.619 app[080eed5f044708] dfw [info] 2024-06-26T04:07:44.618815Z INFO vector::internal_events::api: API server running. address=[::]:8686 playground=http://:::8686/playground

2024-06-26T04:07:44.620 app[080eed5f044708] dfw [info] 2024-06-26T04:07:44.620281Z INFO vector::sinks::blackhole::sink: Collected events. events=5 raw_bytes_collected=1534

2024-06-26T04:07:44.800 app[080eed5f044708] dfw [info] 2024-06-26T04:07:44.800529Z ERROR vector::topology::builder: msg="Healthcheck failed." error=Unexpected status: 403 Forbidden component_kind="sink" component_type="datadog_logs" component_id=datadog component_name=datadog

2024-06-26T04:07:49.770 app[080eed5f044708] dfw [info] 2024-06-26T04:07:49.770631Z WARN sink{component_kind="sink" component_id=datadog component_type=datadog_logs component_name=datadog}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=Client request was forbidden. internal_log_rate_limit=true

2024-06-26T04:07:50.806 app[080eed5f044708] dfw [info] 2024-06-26T04:07:50.805808Z WARN sink{component_kind="sink" component_id=datadog component_type=datadog_logs component_name=datadog}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] is being rate limited.

2024-06-26T04:08:01.949 app[080eed5f044708] dfw [info] 2024-06-26T04:08:01.949182Z WARN sink{component_kind="sink" component_id=datadog component_type=datadog_logs component_name=datadog}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] has been rate limited 4 times.

The 403 seems to come from Datadog. I recommended to your colleague to doublecheck your DATADOG_API_KEY in case it got improperly set or something similar.

  • Daniel

Thanks @roadmr

I’ve double checked the secrets and they are correct. After a redeploy, we’re still getting these entries:

2024-06-26T16:29:19.983 app[9185e715f90783] lax [info] 2024-06-26T16:29:19.983052Z ERROR vector::topology::builder: msg="Healthcheck failed." error=Unexpected status: 403 Forbidden component_kind="sink" component_type="datadog_logs" component_id=datadog component_name=datadog

2024-06-26T16:29:24.987 app[9185e715f90783] lax [info] 2024-06-26T16:29:24.987215Z WARN sink{component_kind="sink" component_id=datadog component_type=datadog_logs component_name=datadog}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=Client request was forbidden. internal_log_rate_limit=true

2024-06-26T16:29:26.056 app[9185e715f90783] lax [info] 2024-06-26T16:29:26.055967Z WARN sink{component_kind="sink" component_id=datadog component_type=datadog_logs component_name=datadog}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] is being rate limited.

2024-06-26T16:29:37.343 app[9185e715f90783] lax [info] 2024-06-26T16:29:37.343139Z WARN sink{component_kind="sink" component_id=datadog component_type=datadog_logs component_name=datadog}:request{request_id=1}: vector::sinks::util::retries: Internal log [Retrying after error.] has been rate limited 4 times.

2024-06-26T16:29:37.343 app[9185e715f90783] lax [info] 2024-06-26T16:29:37.343210Z WARN sink{component_kind="sink" component_id=datadog component_type=datadog_logs component_name=datadog}:request{request_id=1}: vector::sinks::util::retries: Retrying after error. error=Client request was forbidden. internal_log_rate_limit=true

I’ve tried a few API Keys from Datadog with the account for my organization.

I’m setting the secrets via:

fly secrets set ACCESS_TOKEN=xxx ORG=xxx DATADOG_API_KEY=xxx -a xxx

I even deleted them manually via the fly UI for the app.

I’ve double checked the ORG name, it is correct.
The user in fly.io I’m creating an access token (the one passed via ACCESS_TOKEN) for is in the same ORG (I’m adding the access token under Sign in to Your Account · Fly).
The DATADOG_API_KEY was changed multiple times during testing.

Our DATADOG_SITE is also configured properly.

Is there something else that I’m missing?

It turns out that DATADOG_SITE wasn’t configured properly.

We were redeploying the image without it configured, and it defaulted to datadoghq.com.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.