Fly-Replay hangs

When the instance ID you return in a Fly-Replay header is valid, but the instance doesn’t exist, the connection hangs. It would be great if the connection would either close immediately, or return an HTTP error, so it can be handled client-side.

My personal use case: if the instance is not found, either because it’s been replaced or because it never existed in the first place, I want to communicate it to the user and propose to connect to another (random) instance.

Sidenote: through experimentation, I found that 3 types of IDs are considered valid (or at least, these can hang silently, other strings give you errors in the logs):

  • 8 digits hex strings (what you can find in the dashboard, aka the beginning of $FLY_ALLOC_ID - which is not explicited in the docs)
  • 12 digits hex strings (which correspond to machine IDs, as in this official blog post)
  • UUIDs (but you cannot send $FLY_ALLOC_ID, you really need to trim it to the first 8 characters. That confused me for a good while)

I found a workaround: I will issue a DNS query (<alloc_id>.vm.<appname>.internal) before I send a Fly-Replay header, to check if the instance exists, and if not I’ll return a 404.

That said, it’s a short amount of time, but there is no guarantee the instance won’t disappear between the DNS query and my response. It would be great if there was a real fix for this.

(I really like Fly-Replay. :slightly_smiling_face: It’s ideal for game servers.)

We know this is a bit of a pain point. For reasons related to how our networking layer is implemented there isn’t a simple fix. We are working on a way to at the very least provide more feedback in the form of logs about what is happening when you see fly-replay appear to hang.

Nice, thank you. :slightly_smiling_face:

So I guess my workaround is the way to go. Is it reasonable to do so many DNS queries, do I risk being rate-limited?

You should be fine! If you run into any problems with it let me know here.

1 Like