Internal routing flakiness (getaddrinfo ENOTFOUND)

We have been observing flakiness when making TCP requests between regional instances of the same app.

Log snippet:

2022-09-04T13:50:37.030 app[24d8907eb49587] mia [info] store unsubscribeUser {"stateId":"XXXX","userId":"XXXX"}
2022-09-04T13:51:01.945 app[591854ea39d836] ewr [info] AxiosError: getaddrinfo ENOTFOUND 24d8907eb49587.vm.hathora-games-coordinator.internal

This shows a connect failure from the ewr instance to the mia instance. This connection succeeds 90% of the time but 10% of the time it randomly fails with the above error. The mia instance definitely seems up and available when we see the flakiness.

Please advise on whether we should be doing something different (like using IPV6 address instead of internal hostname) or how we can debug this further.

2 Likes