`fly ssh console -s` keeps showing old VMs

This might have been mentioned before but it’s reaaaally annoying :frowning:

This I think is also why fly ssh console itself is often not working, because it’s trying to connect to 1 of those from that list, which can be a dead VM.

I can imagine how that would get in the way, thanks for bringing it up.

Do you see the dead VMs when you run fly status -a staxcloud-staging or if you dig vms.staxcloud-staging.internal @fdaa:0:8efd::3 from your org’s 6PN?

Do you see the dead VMs when you run fly status -a staxcloud-staging

Nope. That’s what I use now: I run fly status. Pick one of the 3 IDs and then match that to the instances that show up in fly ssh console -s

1 Like

This gives a timeout.

Ah my mistake, I think you’d want to dig txt here instead

What exactly is the notation? :grimacing: Sorry I don’t know much about this topic.

You’ll need both TXT and @fdaa:0:33::3 (different IP) or: fly dig -a staxcloud-staging TXT staxcloud-staging.internal

1 Like

Not sure if this shows anything interesting :grin:

Oh sorry:

fly dig -a staxcloud-staging TXT vms.staxcloud-staging.internal

It seems fine. Let me see if I can reproduce it. fly ssh console -s is not showing dead VMs right now

Yes, that seems correct, I expect the list with fly ssh console -s -a staxcloud-staging is also correct now?

When there’s a deploy, there’s a short period where, due to the nature of distributed systems, we’re reconciling state and it’s possible for old VMs to show up.

1 Like

Yea alright, I understand the technical difficulty. It would be nice if it wouldn’t be a problem for us though :slight_smile:

I have a few ideas I can try.

there’s a short period where, due to the nature of distributed systems

It’s not really a short period though. I don’t know what you consider short, but I am comparing it to read replica’s catching up to write replica’s. The dead VMs show up for up to 5 minutes sometimes in fly ssh console -s

A few seconds is what I’d expect.

I dug further and, based on your other post, it looks like you’re connecting to one of our “backup” gateways which is using an older, slower-to-replicate, version of our DNS server.

Can you try fly wireguard reset for the same org that app you’re playing with is in? This will get you a new wireguard peer, in a primary region.

I’m going to be investigating how you got a peer on that specific gateway so it doesn’t happen again.

2 Likes

I understood half of what you said, but I ran the fly wireguard reset for the organisation :grin: Thanks for looking into it.

Yep, took like 2 secs today for (machine) vm list to get up to date, for me.

And this has been the case for quite sometime now: Does stopped VMs incur costs? - #3 by ignoramous