project exited silently

user121 · January 10, 2023, 4:59pm

This morning we noticed requests to one of our projects was failing. We checked the deployment and it showed the maintenance icon.

We checked the logs, there were no logs about it exiting. Just the logs for the previous request.

So we just restarted and it went back online.
Here are the logs, the last successful request and then when we restarted.

2023-01-10T15:25:28.762 app[47e967c0] gru [info] {"latencyInNs":228000000,"level":"info","message":"POST /token 200 228ms","method":"POST","statusCode":200,"url":"/token"}

2023-01-10T16:42:18.004 runner[47e967c0] gru [info] Starting instance

2023-01-10T16:42:32.307 runner[47e967c0] gru [info] Configuring virtual machine

2023-01-10T16:42:41.891 runner[47e967c0] gru [info] Pulling container image

2023-01-10T16:46:31.809 runner[47e967c0] gru [info] Unpacking image

2023-01-10T16:46:58.481 runner[47e967c0] gru [info] Preparing kernel init

2023-01-10T16:48:14.311 runner[47e967c0] gru [info] Configuring firecracker

2023-01-10T16:48:16.056 runner[47e967c0] gru [info] Starting virtual machine

2023-01-10T16:48:16.281 app[47e967c0] gru [info] Starting init (commit: f447594)...

2023-01-10T16:48:16.355 app[47e967c0] gru [info] Preparing to run: `docker-entrypoint.sh pnpm run start` as root

2023-01-10T16:48:16.389 app[47e967c0] gru [info] 2023/01/10 16:48:16 listening on [fdaa:0:3bd8:a7b:1f63:47e9:67c0:2]:22 (DNS: [fdaa::3]:53)

In Graphana it looks like the project was just off for that amount of time.

My question is, why didn’t the deployment restart? If something failed it should show those logs and restart. And what can I do so this doesn’t happen again.

Thanks for the help!

ignoramous · January 10, 2023, 6:27pm

To persist logs one needs to setup fly-log-shipper.

What is restart_limit set to in your app’s health check (services.tcp_checks) fly.toml section? If health checks fail or the app OOMs, Fly’s control plane should ideally attempt to auto-restart the app restart_limit many number of times (afaik).

That said, in the past when apps have gone down without warning to never come back up, it has been due to VM (and volume) migrations slipping through the cracks when decommissioning lemon hosts.

One solution is to run at least 2 instances, possibly in different regions.

wjordan · January 10, 2023, 7:31pm

Hi @user121,

The host server where your application was deployed hit a Linux-kernel bug that required a reboot to resolve. We’ve been investigating this kernel bug and also looking into future improvements to our apps platform to help deployments automatically migrate away from unresponsive servers more quickly.

In general, if your application needs to be highly available, we recommend running two or more instances as @ignoramous suggested.

Topic		Replies	Views
Service crashed	2	379	September 23, 2021
App not responding since restarting.	4	244	December 1, 2022
[521] [INFO] Shutting down: Master	6	1975	November 13, 2022
[GRU] Instability with deploy and logs Build debugging elixir	12	571	October 23, 2022
Virtual machine exited abruptly machine exited with exit code 0, not restarting	1	56	August 14, 2024

project exited silently

Related topics