Internal problem and server restarts without any log

We have an application deployed in Fly.io on top of Cloudflare. We’re receiving random errors without any explanation in the logs.

2022-02-08T19:18:30Z proxy[ba906637] sea [error]Health check status changed 'passing' => 'critical'
2022-02-08T19:18:33Z runner[f25e536b] cdg [info]Shutting down virtual machine
2022-02-08T19:18:33Z app[f25e536b] cdg [info]Sending signal SIGINT to main child process w/ PID 511
2022-02-08T19:18:43Z proxy[f25e536b] cdg [error]Error 2: Internal problem
2022-02-08T19:18:43Z proxy[f25e536b] cdg [error]Error 2: Internal problem
2022-02-08T19:18:43Z proxy[f25e536b] cdg [error]Error 2: Internal problem
2022-02-08T19:18:43Z proxy[f25e536b] cdg [error]Error 2: Internal problem
2022-02-08T19:18:43Z proxy[f25e536b] cdg [error]Error 2: Internal problem
2022-02-08T19:18:44Z runner[09d4ce6f] cdg [info]Starting instance
2022-02-08T19:18:44Z runner[09d4ce6f] cdg [info]Configuring virtual machine
2022-02-08T19:18:44Z runner[09d4ce6f] cdg [info]Pulling container image
2022-02-08T19:18:45Z proxy[f25e536b] cdg [error]Error 2001: App connection timed out
2022-02-08T19:18:46Z proxy[f25e536b] cdg [error]Error 2001: App connection timed out
2022-02-08T19:18:47Z runner[09d4ce6f] cdg [info]Unpacking image
2022-02-08T19:18:51Z proxy[f25e536b] cdg [error]Error 2001: App connection timed out
2022-02-08T19:18:55Z runner[09d4ce6f] cdg [info]Preparing kernel init
2022-02-08T19:18:55Z proxy[ba906637] sea [info]Health check status changed 'critical' => 'passing'

Internal problem means this is our fault.

Looking into it now.

Ok, this is for lack of a better error, but basically we have a check to make sure we don’t proxy a connection to the wrong app. Without this check, it can happen due to our state not propagating fast enough around the world.

What you’re seeing here are requests for an app instance that’s been shutdown. In these cases our check fails and we return an error. Usually they are retriable, but not always (if retry attempts are exhausted or if somehow the body has been read past a certain point).

Do you have more than 1 instance running?

Yes I do, and I see this error everytime I deploy.

You can ignore these errors in the logs. They are an artifact of our somewhat slow edge service propagation. When a request comes in to one of our edges, it doesn’t yet know your previous VM has stopped, so it sends it over. When we see this kind of error, we retry.

We’re working on speeding up service discovery. You are unlikely to see these errors when we get our new proxies rolled out everywhere. It’s getting close!