Hi,
I am trying to run telegraf with an MQTT consumer on fly.io but it never starts after a deploy and the machines are in a stopped state. I get no logs at all - it stops at “Configuring firecracker”. Running the container locally works perfectly.
Logs:
2025-05-14 10:15:35.350 Configuring firecracker
2025-05-14 10:15:35.191 Configuring firecracker
2025-05-14 10:15:33.990 Successfully prepared image registry.fly.io/telegraf-emqx:deployment-01JV71706AWE2SJXFTJB1QJATW (881.796796ms)
2025-05-14 10:15:33.861 Successfully prepared image registry.fly.io/telegraf-emqx:deployment-01JV71706AWE2SJXFTJB1QJATW (847.777961ms)
2025-05-14 10:15:33.108 Pulling container image registry.fly.io/telegraf-emqx:deployment-01JV71706AWE2SJXFTJB1QJATW
2025-05-14 10:15:33.013 Pulling container image registry.fly.io/telegraf-emqx:deployment-01JV71706AWE2SJXFTJB1QJATW
Dockerfile:
FROM telegraf
COPY ./telegraf.conf /etc/telegraf/
COPY ./emqxsl-ca.pem /etc/telegraf/
fly.toml:
app = 'telegraf-emqx'
primary_region = 'fra'
[build]
[http_service]
internal_port = 1883
force_https = true
auto_stop_machines = 'off'
auto_start_machines = true
min_machines_running = 1
processes = ['app']
[[services]]
internal_port = 8883
protocol = "tcp"
auto_stop_machines = "off"
auto_start_machines = true
min_machines_running = 1
processes = ['app']
[[services.ports]]
port = 8883
[[vm]]
memory = '1gb'
cpu_kind = 'shared'
cpus = 1
The MQTT consumer establishes an outbound connection (TLS port 8883) to EMQX and subscribes to topics. Telegraf writes the data to serverless influxdb over HTTPS (outbound again - regular TLS port 443). I.E. I don’t think I need any [[services]]
or [[http_service]]
Things I have tried:
- fly.toml without the
[[services]]
. - fly.toml without
[http_service]
. - fly.toml with neither
[[services]]
nor[[http_service]]
- Adding a
[[processes]]
section withapp = "telegraf"
- I tried copying the
entrypoint.sh
from the base layer into my layer and addedecho
statements everywhere - they never show up. - I tried adding:
[experimental]
exec = ["sleep", "1d"]
so that I could SSH in to investigate but the machine is still stopped after re-deploying.
Does anyone have any ideas?