You’ve done the things that should work (the IPv6/listen stuff, and calling the request using the internal port … both things that are easy to miss).
And you say it is working … sometimes.
So it would be a case of why it sometimes doesn’t resolve. A few thoughts
- Are you using a base image that uses alpine? I used to keep getting random DNS errors using that for Node. So do other people. Nobody has figured out why that I know of. I switched to a -slim image and the DNS issues went away.
- Is it correct you have two different processes listening on the same port and protocol? Personally in case there is any conflict/race, I’d either change one of the ports and/or temporarily remove one entirely. To get e.g your /graphql endpoint to work consistently, then add back in the webhook, and if that breaks it, well that would prove it was that.
- If none of that helps, experiment with the other hostnames available privately. Do any of those work any better? Shouldn’t be needed, but by this point it would be a guess: Private Networking
- If still no luck, debug the DNS the app can resolve. My example uses Fastify but the Express version will be similar. Check the bit starting
import dns
at the end of the long post: (Node) How do I connect to my API app from my WEB app in Fly? - #10 by greg And call that from e.g different regions to see if your hunch about it being a new vm or vm in a certain region is the culprit, since if the DNS can’t resolve, that would explain why it could not connect