Machine is intermittently suspending

My website has been going down somewhat periodically. The logs say:

2025-06-04T16:52:24Z proxy[2865100f961158] sjc [info]App naagm has excess capacity, autostopping machine 2865100f961158. 0 out of 1 machines left running (region=sjc, process group=app)
2025-06-04T16:52:24Z app[2865100f961158] sjc [info] INFO Sending signal SIGTERM to main child process w/ PID 649
2025-06-04T16:52:24Z app[2865100f961158] sjc [info]16:52:24.691 [notice] SIGTERM received - shutting down
2025-06-04T16:52:25Z app[2865100f961158] sjc [info] WARN Reaped child process with pid: 718 and signal: SIGUSR1, core dumped? false
2025-06-04T16:52:26Z app[2865100f961158] sjc [info] INFO Main child exited normally with code: 0
2025-06-04T16:52:26Z app[2865100f961158] sjc [info] INFO Starting clean up.
2025-06-04T16:52:26Z app[2865100f961158] sjc [info] INFO Umounting /dev/vdc from /mnt/db
2025-06-04T16:52:26Z app[2865100f961158] sjc [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2025-06-04T16:52:26Z app[2865100f961158] sjc [info][11060.842739] reboot: Restarting system
2025-06-04T17:21:57Z proxy[2865100f961158] sjc [info]Starting machine
2025-06-04T17:21:57Z proxy[2865100f961158] sjc [error][PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-06-04T17:21:57Z proxy[2865100f961158] sjc [info]Starting machine

When I look in the dashboard it says my machine is ‘suspended’.

I don’t really know how to proceed. I can fix it by redeploying the app, but then it happens again later. The app is deployed in SJC.

The app doesn’t get much traffic or do anything resource intensive. I shouldn’t need to scale it. I

Can someone please help?

Would you add your TOML config file to this thread?

app = 'naagm'
primary_region = 'sjc'
kill_signal = 'SIGTERM'

[build]

[env]
DATABASE_PATH = '/mnt/db/naagm.db'
PHX_HOST = 'naagm.fly.dev'
PORT = '8080'

[[mounts]]
source = 'db'
destination = '/mnt/db'

[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = 'stop'
auto_start_machines = true
min_machines_running = 0
processes = ['app']

[http_service.concurrency]
type = 'connections'
hard_limit = 1000
soft_limit = 1000

[[vm]]
memory = '2gb'
cpu_kind = 'shared'
cpus = 2

this was auto-generated for my elixir / phoenix project. The only thing I edited was memory/cpus.

Generally you do want this, since it will save you a lot of money for a Machine that large…

https://fly.io/docs/reference/configuration/#the-http_service-section

(But you can change it to false if you want to turn it off.)

1 Like

Ok, got it. I get that it auto stops when it’s idle to save money. But why can’t it start back up when it gets traffic? auto_start_machines is set to true in my toml.

It’s trying to, but you can see the problem near the end of the log snippet:

This is one of the reasons why it’s unwise to run just a single Machine on the Fly.io platform.

(Another is the high risk of permanent data loss on the volume, :weary_cat:.)

The Machine is pinned to a single underlying physical host, and although they do migrate them sometimes these days, you can’t rely on it happening on the timescale of auto-start, etc. If there’s a capacity crunch there, then your site is down for a while.


I’d suggest rethinking your architecture to match the platform’s strengths and limitations. E.g., a managed Postgres database + 2 Elixir app Machines, instead.

(Possibly a different hosting service entirely, if you really do just want 1 of everything…)