When the instance ID you return in a Fly-Replay header is valid, but the instance doesn’t exist, the connection hangs. It would be great if the connection would either close immediately, or return an HTTP error, so it can be handled client-side.
My personal use case: if the instance is not found, either because it’s been replaced or because it never existed in the first place, I want to communicate it to the user and propose to connect to another (random) instance.
Sidenote: through experimentation, I found that 3 types of IDs are considered valid (or at least, these can hang silently, other strings give you errors in the logs):
8 digits hex strings (what you can find in the dashboard, aka the beginning of $FLY_ALLOC_ID - which is not explicited in the docs)
I found a workaround: I will issue a DNS query (<alloc_id>.vm.<appname>.internal) before I send a Fly-Replay header, to check if the instance exists, and if not I’ll return a 404.
That said, it’s a short amount of time, but there is no guarantee the instance won’t disappear between the DNS query and my response. It would be great if there was a real fix for this.
(I really like Fly-Replay. It’s ideal for game servers.)
We know this is a bit of a pain point. For reasons related to how our networking layer is implemented there isn’t a simple fix. We are working on a way to at the very least provide more feedback in the form of logs about what is happening when you see fly-replay appear to hang.