SIGKILL Reaped Child Processes during job on L40S VM

siliconslayer · October 15, 2025, 3:06pm

Hey all;

I’m using a 16 vCPU, 64GB machine with a L40S on it, and my task is simple. I render a few thousand frames on headless browser using Remotion (Node.js) and encode a video if it using FFMPEG.

During rendering frames I’m constantly getting:

INFO Main child exited normally with code: 130
WARN Reaped child process with pid: 894 and signal: SIGKILL, core dumped? false
WARN Reaped child process with pid: 922 and signal: SIGKILL, core dumped? false
…
reboot: Power down

I have a pretty regular toml file, I autoscale from 0, I stop machines when idle, and I use a soft and hard limit of 0 for concurrency. So I’m sort of mimicking a lambda situation here.

I was wondering if any of you have experienced the same errors? What was it for, and how did you manage to fix it?

Thanking you in advance!

khuezy · October 15, 2025, 3:29pm

How many concurrent headless instance are you running? They’re pretty resource intensive which would explain the OOM warnings.

PeterCxy · October 15, 2025, 3:34pm

How does computation in this machine work? Do you use an HTTP request to start the computation, and does the HTTP client wait for the computation to complete while holding the connection open? If nothing holds the connection open, then fly-proxy is free to scale the instance down (i.e. stop it) since from its point of view the machine is serving 0 requests.

If you do need background task to keep running even without an active client-side request, you might want to consider disabling autostop and instead have your machine exit (by exitting the main process) once it is done with processing all jobs.

khuezy · October 15, 2025, 3:40pm

Good question but doesn’t Fly usually exit with code 0 when it autoscales down to 0?

OP said

So I’m assuming OP is manually handling when the app is idle.

PeterCxy · October 15, 2025, 3:45pm

It depends on how the process inside the machine reacts to the kill signal. Exit code 130 means the process was killed with SIGINT and is the default behavior for a process without a custom signal handler. SIGINT is also the default kill_signal we send to machines on stop.

siliconslayer · October 15, 2025, 3:48pm

I’m spinning up exactly one Chromium instance using Remotion itself, so it’s not even custom. I use Remotion’s own openBrowserfunction to spin it up.

siliconslayer · October 15, 2025, 3:50pm

Good point… Yes I’m starting job using a POST req, and the client doesn’t await it, it’s a background job. Since Fly is dropping it in the middle of the pipeline, how would adding manual exits help…?

siliconslayer · October 15, 2025, 3:51pm

Sorry if I confused you, no I’m using auto_stop of “stop” and a min machines of 0.

PeterCxy · October 15, 2025, 3:53pm

Setting auto_stop_machines to off would prevent Fly from stopping the machine at all, and then adding a manual exit after the job is done guarantees that your machine will only be stopped when it knows it is done. This is the recommended configuration for machines with client-side triggered long-running jobs, since there’s no good way for the Fly platform to tell whether the job in your machine is actually done or not without active connections.

siliconslayer · October 15, 2025, 4:10pm

Just to clarify, by manual exit you mean a process.exit(1 | 0) call?

khuezy · October 15, 2025, 4:20pm

exit 0, non-zero would restart the machine I believe.

PeterCxy · October 15, 2025, 5:10pm

Yes, just exit the process with code 0 when it is done. That’ll put the machine into a stopped state – and you can rely on autostart to get the machine to start again if a new request comes in.

siliconslayer · October 16, 2025, 1:10pm

Thanks Peter, I’m testing this method. I’ll mark it as a solution once I get it running!