Hi there! I have a Fly Machine image that was previously working great; recently an issue has arisen with Machines configured with restart: always
in that a new machine builds + starts fine, restarts fine, and then subsequents restarts fail such that the Machine goes down and doesn’t come back up.
Error logs aren’t particularly insightful (to me, at least!)
Initial start is fine:
2024-06-11T10:58:56.177 runner[185e76df4246e8] lhr [info] Machine created and started in 3.707s
....
First restart succeeds:
2024-06-11T10:59:16.196 app[185e76df4246e8] lhr [info] [ 20.460491] reboot: Restarting system
2024-06-11T10:59:16.615 app[185e76df4246e8] lhr [info] 2024-06-11T10:59:16.615569983 [01J03F6RVSVNNBEX4FSGKPNDA1:main] Running Firecracker v1.7.0
2024-06-11T10:59:16.747 app[185e76df4246e8] lhr [info] [ 0.048541] PCI: Fatal: No config space access function found
2024-06-11T10:59:17.089 app[185e76df4246e8] lhr [info] INFO Starting init (commit: dec752a2)...
2024-06-11T10:59:17.130 app[185e76df4246e8] lhr [info] INFO Preparing to run: `docker-entrypoint.sh node ../../modules/cli-scripts/bin/set-env.js -c APP_ENV -- node dist/worker.js` as root
2024-06-11T10:59:17.137 app[185e76df4246e8] lhr [info] INFO [fly api proxy] listening at /.fly/api
2024-06-11T10:59:17.141 app[185e76df4246e8] lhr [info] 2024/06/11 10:59:17 INFO SSH listening listen_address=[fdaa:2:967a:a7b:15d:6ba8:fd40:2]:22 dns_server=[fdaa::3]:53
2024-06-11T10:59:17.161 runner[185e76df4246e8] lhr [info] Machine started in 625ms
...
Subsequent restarts fail, after which the machine can’t be restarted successfully:
2024-06-11T10:59:33.156 app[185e76df4246e8] lhr [info] INFO Main child exited normally with code: 0
2024-06-11T10:59:33.169 app[185e76df4246e8] lhr [info] INFO Starting clean up.
2024-06-11T10:59:33.170 app[185e76df4246e8] lhr [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2024-06-11T10:59:33.171 app[185e76df4246e8] lhr [info] [ 16.471415] reboot: Restarting system
2024-06-11T10:59:33.568 app[185e76df4246e8] lhr [info] 2024-06-11T10:59:33.568044282 [01J03F6RVSVNNBEX4FSGKPNDA1:main] Running Firecracker v1.7.0
2024-06-11T10:59:33.693 app[185e76df4246e8] lhr [info] [ 0.047461] PCI: Fatal: No config space access function found
2024-06-11T10:59:34.018 app[185e76df4246e8] lhr [info] INFO Starting init (commit: dec752a2)...
2024-06-11T10:59:34.061 app[185e76df4246e8] lhr [info] ERROR Error: an unhandled IO error occurred: File exists (os error 17)
2024-06-11T10:59:34.062 app[185e76df4246e8] lhr [info] [ 0.415046] reboot: Restarting system
2024-06-11T10:59:34.144 app[185e76df4246e8] lhr [warn] Virtual machine exited abruptly
2024-06-11T10:59:34.553 app[185e76df4246e8] lhr [info] 2024-06-11T10:59:34.553241506 [01J03F6RVSVNNBEX4FSGKPNDA1:main] Running Firecracker v1.7.0
2024-06-11T10:59:34.689 app[185e76df4246e8] lhr [info] [ 0.052915] PCI: Fatal: No config space access function found
2024-06-11T10:59:35.015 app[185e76df4246e8] lhr [info] INFO Starting init (commit: dec752a2)...
2024-06-11T10:59:35.056 app[185e76df4246e8] lhr [info] ERROR Error: an unhandled IO error occurred: File exists (os error 17)
2024-06-11T10:59:35.057 app[185e76df4246e8] lhr [info] [ 0.420334] reboot: Restarting system
2024-06-11T10:59:35.160 app[185e76df4246e8] lhr [warn] Virtual machine exited abruptly
...
…and so on, ad infinitum.
The relevant error appears to be ERROR Error: an unhandled IO error occurred: File exists (os error 17)
Any insight into what may be causing this?
Thanks!