fly dig is returning 65 IP addresses for one of our apps which only has one instance (and has only ever had 9 instances in total). Most of these IP addresses do not appear to work, and other Fly services cannot talk to this service as a result.
I have tried restarting and re-deploying the app, but neither fixed the problem. Every other app I have checked on Fly seems fine.
$ fly status
App
Name = codeday-labs-gql
Owner = codeday
Version = 9
Status = running
Hostname = codeday-labs-gql.fly.dev
Deployment Status
ID = f93ceb2d-c9b1-5ea1-9147-976101fd24da
Version = v9
Status = successful
Description = Deployment completed successfully
Instances = 1 desired, 1 placed, 1 healthy, 0 unhealthy
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
2b6af384 app 9 lax(B) run running 0 3m13s ago
fly dig codeday-labs-gql.internal -a codeday-gql
;; opcode: QUERY, status: NOERROR, id: 12671
;; flags: qr rd; QUERY: 1, ANSWER: 65, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;codeday-labs-gql.internal. IN AAAA
;; ANSWER SECTION:
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:fcd3:b003:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:74c4:6969:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:7191:a733:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:86ad:aa70:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:64fa:4645:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:68e1:b83d:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:2f1a:8960:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:5aed:177a:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:e4b3:dfe:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:15cc:fe26:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:1fec:bcb:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:d494:fbb1:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:e431:1e96:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:15f8:8e2:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:6473:eb20:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:78bd:b793:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:d240:1db:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:c799:526e:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:a132:6e29:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:8d96:f71a:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:7007:6e01:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:9447:29e7:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:6943:94a8:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:4c0:4362:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:14e2:2244:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:531:c744:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:d0ef:f510:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:489b:360:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:637d:29d:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:e392:b38b:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:d0f9:570b:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:e573:55bd:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:a69e:2b0d:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:4c48:6fd5:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:e933:e658:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:eff:e61b:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:c7da:e035:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:a86d:7b82:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:8adb:b8b2:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:863c:bf17:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:50a1:cda7:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:d337:dc84:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:54c2:e32e:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:b142:2d37:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:a52c:4a6a:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:68cf:dada:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:8c69:bbee:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:5c70:fe6c:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:7de2:b971:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:21be:1404:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:5763:2867:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:b9e9:8e72:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:e769:c802:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:f022:f5bf:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:c2f5:1232:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:803e:bac8:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:f42e:dd50:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:3daa:b495:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2d30:b13e:649c:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2c60:6f53:a8e0:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:4516:62da:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:8f4:1254:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:b907:6cac:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:58d1:204a:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:85:2b6a:f384:2
Other DNS (e.g. top5.nearest.of) also return invalid IPs.
That’s super weird! I bounced the DNS on the worker you’re deployed on, which looks like it’s cleared that up, but I’m investigating now to see what’s happening there.
Folks, this is still broken for us and you’ve already said this is an internal problem for you. Can someone please take a look at restarting our DNS worker?
Thanks; is there a way to force the app to associate to a new DNS worker without destroying and re-creating the app from scratch?
I’d be surprised if this is anything special about the app – the container is pretty stock Alpine and the TS code is based on the same template of most of the rest of the apps we’re using on Fly.
Hey! I think we’ve tracked this down (thanks!). We’re getting phantom updates from worker servers we decommissioned. Kurt says we’re going to give up building platforms and do to-do lists instead.
I’m writing some code real quick to scrub these entries off all the servers in our fleet.
Hi @thomas, we’re still having problems with DNS on this instance.
This time it’s returning three DNS entries, and only one resolves:
~/C/g/c/gql-server ❯❯❯ fly dig -a codeday-gql codeday-labs-gql.internal ✘ 127 master ✭
Update available 0.0.311 -> v0.0.332.
Run "flyctl version update" to upgrade.
;; opcode: QUERY, status: NOERROR, id: 27772
;; flags: qr rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;codeday-labs-gql.internal. IN AAAA
;; ANSWER SECTION:
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:ab2:96a9:1fef:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:ab3:7ce8:cec4:2
codeday-labs-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:ab2:ffbd:688a:2
~/C/g/c/labs-gql ❯❯❯ fly status ✘ 127 main
Update available 0.0.311 -> v0.0.332.
Run "flyctl version update" to upgrade.
App
Name = codeday-labs-gql
Owner = codeday
Version = 2
Status = running
Hostname = codeday-labs-gql.fly.dev
Deployment Status
ID = bab0171e-e3ad-dacd-e4b4-800e9cf1f1dd
Version = v2
Status = successful
Description = Deployment completed successfully
Instances = 1 desired, 1 placed, 1 healthy, 0 unhealthy
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
ffbd688a app 2 ewr run running 0 2m42s ago
As you can see I tried deleting and recreating the service but no luck, it’s still returning 3 entries.
Other resources in our account still return the correct number of entries:
~/C/g/c/gql-server ❯❯❯ fly dig -a codeday-gql codeday-showcase-gql.internal ✘ 127 master ✭
Update available 0.0.311 -> v0.0.332.
Run "flyctl version update" to upgrade.
;; opcode: QUERY, status: NOERROR, id: 49495
;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;codeday-showcase-gql.internal. IN AAAA
;; ANSWER SECTION:
codeday-showcase-gql.internal. 5 IN AAAA fdaa:0:39fa:a7b:2dbb:bd7c:1c0a:2
(Please let me know if you need to do anything which will change the IP address of the worker in order to fix this – I had to hard-code the address in order to keep everything working)
It seems like the problem is that whenever I do a new deploy, the old addresses stick around in DNS for an indefinite amount of time. (I just kicked off a new deploy and it dropped off 60 seconds after the old instance shut down.)
I’m not totally sure if that’s what was happening in the post I made this morning, because I hadn’t written down the original IPs. If so, it was definitely worse then, because the IPs were still showing up about 10 minutes after the old instances shut down.
I’m looking across the fleet to see if any other host has spurious entries for this app, and not finding any.
We had a fleetwide NATS outage yesterday, which is I think what caused this. We’re deploying a new version of the DNS server in the next day or so that will make us more resilient to that problem.
Interested in workarounds: Would you suggest the VM restart itself (process kills itself) when it is sure to expect a functioning 6pn IP but doesn’t find one in its top.n.nearest.of.appname.internal queries, if that will rid of incorrect entries?
Looks unlikely since deploys didn’t fix it for OP.
Or, if there’s a way to signal the DNS worker to refresh its entries / cache? (edit) For instance, on Android, there exists APIs (ex) to inform the OS of network connectivity issues, that then the OS may treat as a signal to tear down the active network and attempt to bring it back up.
I encountered this identical issue with one app. There is only one instance running but nslookup returns two addresses:
C:\test>nslookup <redactedhostname>.internal
Server: UnKnown
Address: fdaa:0:78e2::3
Name: <redactedhostname>.internal
Addresses: fdaa:0:78e2:a7b:c207:3670:9e29:2 <- some phantom
fdaa:0:78e2:a7b:c207:ee76:bdd0:2 <- this is only running instance
I tried scaling to 0, restarting, re-deploying and seems nothing is working.