.internal DNS returns large number of invalid IP addresses

fly dig is returning 65 IP addresses for one of our apps which only has one instance (and has only ever had 9 instances in total). Most of these IP addresses do not appear to work, and other Fly services cannot talk to this service as a result.

I have tried restarting and re-deploying the app, but neither fixed the problem. Every other app I have checked on Fly seems fine.

$ fly status
App
  Name     = codeday-labs-gql          
  Owner    = codeday                   
  Version  = 9                         
  Status   = running                   
  Hostname = codeday-labs-gql.fly.dev  

Deployment Status
  ID          = f93ceb2d-c9b1-5ea1-9147-976101fd24da         
  Version     = v9                                           
  Status      = successful                                   
  Description = Deployment completed successfully            
  Instances   = 1 desired, 1 placed, 1 healthy, 0 unhealthy  

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS 	HEALTH CHECKS	RESTARTS	CREATED   
2b6af384	app    	9      	lax(B)	run    	running	             	0       	3m13s ago
fly dig codeday-labs-gql.internal -a codeday-gql
;; opcode: QUERY, status: NOERROR, id: 12671
;; flags: qr rd; QUERY: 1, ANSWER: 65, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;codeday-labs-gql.internal.	IN	 AAAA

;; ANSWER SECTION:
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:fcd3:b003:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:74c4:6969:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:7191:a733:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:86ad:aa70:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:64fa:4645:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:68e1:b83d:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:2f1a:8960:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:5aed:177a:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:e4b3:dfe:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:15cc:fe26:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:1fec:bcb:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:d494:fbb1:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:e431:1e96:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:15f8:8e2:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:6473:eb20:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:78bd:b793:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:d240:1db:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:c799:526e:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:a132:6e29:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:8d96:f71a:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:7007:6e01:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:9447:29e7:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:6943:94a8:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:4c0:4362:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:14e2:2244:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:531:c744:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:d0ef:f510:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:489b:360:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:637d:29d:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:e392:b38b:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:d0f9:570b:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:e573:55bd:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:a69e:2b0d:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:4c48:6fd5:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:e933:e658:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:eff:e61b:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:c7da:e035:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:a86d:7b82:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:8adb:b8b2:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:863c:bf17:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:50a1:cda7:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:d337:dc84:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:54c2:e32e:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:b142:2d37:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:a52c:4a6a:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:68cf:dada:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:8c69:bbee:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:5c70:fe6c:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:7de2:b971:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:21be:1404:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:5763:2867:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:b9e9:8e72:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:e769:c802:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:f022:f5bf:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:c2f5:1232:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:803e:bac8:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:f42e:dd50:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:3daa:b495:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:b13e:649c:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:6f53:a8e0:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:4516:62da:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:8f4:1254:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:b907:6cac:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:58d1:204a:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:85:2b6a:f384:2

Other DNS (e.g. top5.nearest.of) also return invalid IPs.

1 Like

That’s super weird! I bounced the DNS on the worker you’re deployed on, which looks like it’s cleared that up, but I’m investigating now to see what’s happening there.

1 Like

@thomas bug is back again, and once again it is returning exactly 65 entries:

fly dig -a codeday-labs-gql codeday-labs-gql.internal                                                                                                                                                                                ✘ 1
;; opcode: QUERY, status: NOERROR, id: 29899
;; flags: qr rd; QUERY: 1, ANSWER: 65, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;codeday-labs-gql.internal.	IN	 AAAA

;; ANSWER SECTION:
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:fcd3:b003:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:74c4:6969:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:7191:a733:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:86ad:aa70:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:64fa:4645:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:68e1:b83d:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:2f1a:8960:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:5aed:177a:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:e4b3:dfe:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:15cc:fe26:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:1fec:bcb:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:d494:fbb1:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:e431:1e96:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:15f8:8e2:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:6473:eb20:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:78bd:b793:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:d240:1db:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:c799:526e:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:a132:6e29:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:8d96:f71a:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:7007:6e01:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:9447:29e7:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:6943:94a8:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:4c0:4362:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:14e2:2244:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:531:c744:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:d0ef:f510:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:489b:360:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:637d:29d:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:e392:b38b:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:d0f9:570b:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:e573:55bd:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:a69e:2b0d:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:4c48:6fd5:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:e933:e658:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:eff:e61b:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:c7da:e035:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:a86d:7b82:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:8adb:b8b2:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:863c:bf17:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:50a1:cda7:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:d337:dc84:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:54c2:e32e:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:b142:2d37:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:a52c:4a6a:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:68cf:dada:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:8c69:bbee:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:5c70:fe6c:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:7de2:b971:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:21be:1404:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:5763:2867:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:b9e9:8e72:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:e769:c802:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:f022:f5bf:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:c2f5:1232:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:803e:bac8:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:f42e:dd50:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:3daa:b495:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2d30:b13e:649c:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2c60:6f53:a8e0:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:4516:62da:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:8f4:1254:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:b907:6cac:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:58d1:204a:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:85:2b6a:f384:2

Folks, this is still broken for us and you’ve already said this is an internal problem for you. Can someone please take a look at restarting our DNS worker?

Kicked it, it looks OK now. We’re investigating whether this is something distinctive about the way your app is running.

Thanks; is there a way to force the app to associate to a new DNS worker without destroying and re-creating the app from scratch?

I’d be surprised if this is anything special about the app – the container is pretty stock Alpine and the TS code is based on the same template of most of the rest of the apps we’re using on Fly.

Hey! I think we’ve tracked this down (thanks!). We’re getting phantom updates from worker servers we decommissioned. Kurt says we’re going to give up building platforms and do to-do lists instead.

I’m writing some code real quick to scrub these entries off all the servers in our fleet.

1 Like

Glad to hear it, (well other than the to-do lists). thank you!

Hi @thomas, we’re still having problems with DNS on this instance.

This time it’s returning three DNS entries, and only one resolves:

~/C/g/c/gql-server ❯❯❯ fly dig -a codeday-gql codeday-labs-gql.internal                                                                                                                                                         ✘ 127 master ✭
Update available 0.0.311 -> v0.0.332.
Run "flyctl version update" to upgrade.
;; opcode: QUERY, status: NOERROR, id: 27772
;; flags: qr rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;codeday-labs-gql.internal.	IN	 AAAA

;; ANSWER SECTION:
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:ab2:96a9:1fef:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:ab3:7ce8:cec4:2
codeday-labs-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:ab2:ffbd:688a:2
~/C/g/c/labs-gql ❯❯❯ fly status                                                                                                                                                                                                     ✘ 127 main
Update available 0.0.311 -> v0.0.332.
Run "flyctl version update" to upgrade.
App
  Name     = codeday-labs-gql          
  Owner    = codeday                   
  Version  = 2                         
  Status   = running                   
  Hostname = codeday-labs-gql.fly.dev  

Deployment Status
  ID          = bab0171e-e3ad-dacd-e4b4-800e9cf1f1dd         
  Version     = v2                                           
  Status      = successful                                   
  Description = Deployment completed successfully            
  Instances   = 1 desired, 1 placed, 1 healthy, 0 unhealthy  

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS 	HEALTH CHECKS	RESTARTS	CREATED   
ffbd688a	app    	2      	ewr   	run    	running	             	0       	2m42s ago

As you can see I tried deleting and recreating the service but no luck, it’s still returning 3 entries.

Other resources in our account still return the correct number of entries:

~/C/g/c/gql-server ❯❯❯ fly dig -a codeday-gql codeday-showcase-gql.internal                                                                                                                                                     ✘ 127 master ✭
Update available 0.0.311 -> v0.0.332.
Run "flyctl version update" to upgrade.
;; opcode: QUERY, status: NOERROR, id: 49495
;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;codeday-showcase-gql.internal.	IN	 AAAA

;; ANSWER SECTION:
codeday-showcase-gql.internal.	5	IN	AAAA	fdaa:0:39fa:a7b:2dbb:bd7c:1c0a:2

(Please let me know if you need to do anything which will change the IP address of the worker in order to fix this – I had to hard-code the address in order to keep everything working)

How is it looking now?

It’s kind of working.

It seems like the problem is that whenever I do a new deploy, the old addresses stick around in DNS for an indefinite amount of time. (I just kicked off a new deploy and it dropped off 60 seconds after the old instance shut down.)

I’m not totally sure if that’s what was happening in the post I made this morning, because I hadn’t written down the original IPs. If so, it was definitely worse then, because the IPs were still showing up about 10 minutes after the old instances shut down.

I’m looking across the fleet to see if any other host has spurious entries for this app, and not finding any.

We had a fleetwide NATS outage yesterday, which is I think what caused this. We’re deploying a new version of the DNS server in the next day or so that will make us more resilient to that problem.

Interested in workarounds: Would you suggest the VM restart itself (process kills itself) when it is sure to expect a functioning 6pn IP but doesn’t find one in its top.n.nearest.of.appname.internal queries, if that will rid of incorrect entries?

Looks unlikely since deploys didn’t fix it for OP.

Or, if there’s a way to signal the DNS worker to refresh its entries / cache? (edit) For instance, on Android, there exists APIs (ex) to inform the OS of network connectivity issues, that then the OS may treat as a signal to tear down the active network and attempt to bring it back up.

Hi,

I encountered this identical issue with one app. There is only one instance running but nslookup returns two addresses:

C:\test>nslookup <redactedhostname>.internal
Server:  UnKnown
Address:  fdaa:0:78e2::3

Name:    <redactedhostname>.internal
Addresses:  fdaa:0:78e2:a7b:c207:3670:9e29:2 <- some phantom
          fdaa:0:78e2:a7b:c207:ee76:bdd0:2 <- this is only running instance

I tried scaling to 0, restarting, re-deploying and seems nothing is working.

Is there some way to do some flushing of these?

edit: now it seems to be sorted out