Problem:
My machine, which has been running for weeks, all of a sudden is restarted by flyd a number of times. No further logs or other indication why flyd restarted the machine.
It is a shared machine. Would anyone know if this is expected behaviour and if/where I can see why this occured?
could not find a good candidate within 20 attempts at load balancing
Oct 20, 2025 @ 07:38:13.376
PU01
client problem: invalid authority
Oct 20, 2025 @ 02:09:41.114
PU01
client problem: invalid authority
Oct 20, 2025 @ 02:09:26.489
PU01
client problem: invalid authority
Oct 19, 2025 @ 13:26:00.400
PU01
client problem: invalid authority
You can verify PU01 by searching this forum that its related to Host header which could have caused PR04 at Oct 20, 2025 @ 23:25:09.357 (close to your restarts)
Edit: or perhaps your app takes long to bind the http port and the proxy could not see that machine as candidate?
Exit code 1 means that your app’s main process is exiting with an error. When that occurs flyd stops the machine because your app has exited. Depending on your restart policy, we’ll try and restart the machine. The default is on_failure, which would restart it in the case of a non-zero exit code.
Multiple exit code 1’s in a row indicates your app is unhealthy and frequently crashing. The Grafana hosted logs in fly-metrics.net should have 30d retention and would be the best place to look to diagnose why your app is crashing
@szindel oh, yes I missed the - in there. That does change the answer slightly.
In that case it indicates our scheduler (flyd) hit an issue starting the process in the first place, but before your app code started running. That’d be why you’re not seeing anything in the logs. In your first screenshot you can see the state / event columns support that, it repeats the stopped -> starting -> (exit -1)-> stopped without ever actually getting to a started state.
I took a look at the host your machines on, it looks like it was taken down for a reboot right before you saw those errors. From the timestamps those exit -1 events as the host was first coming back online. My guess is the scheduler saw the host come back online and tried to start the machine a few times before it was ready. Once it was ready, the last start succeeded.
For production apps (or most apps really) we recommend running multiple machines to ensure downtime of a single host doesn’t cause downtime for your app