My website has been going down somewhat periodically. The logs say:
2025-06-04T16:52:24Z proxy[2865100f961158] sjc [info]App naagm has excess capacity, autostopping machine 2865100f961158. 0 out of 1 machines left running (region=sjc, process group=app)
2025-06-04T16:52:24Z app[2865100f961158] sjc [info] INFO Sending signal SIGTERM to main child process w/ PID 649
2025-06-04T16:52:24Z app[2865100f961158] sjc [info]16:52:24.691 [notice] SIGTERM received - shutting down
2025-06-04T16:52:25Z app[2865100f961158] sjc [info] WARN Reaped child process with pid: 718 and signal: SIGUSR1, core dumped? false
2025-06-04T16:52:26Z app[2865100f961158] sjc [info] INFO Main child exited normally with code: 0
2025-06-04T16:52:26Z app[2865100f961158] sjc [info] INFO Starting clean up.
2025-06-04T16:52:26Z app[2865100f961158] sjc [info] INFO Umounting /dev/vdc from /mnt/db
2025-06-04T16:52:26Z app[2865100f961158] sjc [info] WARN could not unmount /rootfs: EINVAL: Invalid argument
2025-06-04T16:52:26Z app[2865100f961158] sjc [info][11060.842739] reboot: Restarting system
2025-06-04T17:21:57Z proxy[2865100f961158] sjc [info]Starting machine
2025-06-04T17:21:57Z proxy[2865100f961158] sjc [error][PM01] machines API returned an error: "could not reserve resource for machine: insufficient memory available to fulfill request"
2025-06-04T17:21:57Z proxy[2865100f961158] sjc [info]Starting machine
When I look in the dashboard it says my machine is ‘suspended’.
I don’t really know how to proceed. I can fix it by redeploying the app, but then it happens again later. The app is deployed in SJC.
The app doesn’t get much traffic or do anything resource intensive. I shouldn’t need to scale it. I
Can someone please help?
Would you add your TOML config file to this thread?
app = 'naagm'
primary_region = 'sjc'
kill_signal = 'SIGTERM'
[build]
[env]
DATABASE_PATH = '/mnt/db/naagm.db'
PHX_HOST = 'naagm.fly.dev'
PORT = '8080'
[[mounts]]
source = 'db'
destination = '/mnt/db'
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = 'stop'
auto_start_machines = true
min_machines_running = 0
processes = ['app']
[http_service.concurrency]
type = 'connections'
hard_limit = 1000
soft_limit = 1000
[[vm]]
memory = '2gb'
cpu_kind = 'shared'
cpus = 2
this was auto-generated for my elixir / phoenix project. The only thing I edited was memory/cpus.
Generally you do want this, since it will save you a lot of money for a Machine that large…
https://fly.io/docs/reference/configuration/#the-http_service-section
(But you can change it to false
if you want to turn it off.)
1 Like
Ok, got it. I get that it auto stops when it’s idle to save money. But why can’t it start back up when it gets traffic? auto_start_machines
is set to true in my toml.
It’s trying to, but you can see the problem near the end of the log snippet:
This is one of the reasons why it’s unwise to run just a single Machine on the Fly.io platform.
(Another is the high risk of permanent data loss on the volume,
.)
The Machine is pinned to a single underlying physical host, and although they do migrate them sometimes these days, you can’t rely on it happening on the timescale of auto-start, etc. If there’s a capacity crunch there, then your site is down for a while.
I’d suggest rethinking your architecture to match the platform’s strengths and limitations. E.g., a managed Postgres database + 2 Elixir app Machines, instead.
(Possibly a different hosting service entirely, if you really do just want 1 of everything…)