`flyctl ssh console` intermittently times out

julia · August 2, 2022, 7:09pm

About half the time I run flyctl ssh console, it times out:

$ flyctl ssh console
Connecting to tunnel ⣽ Error tunnel unavailable: failed probing "personal": read tcp [fdaa:0:bff:a7b:1221:0:a:0]:43484->[fdaa:0:bff::3]:53: i/o timeout

It’s also slow in general – flyctl ssh console -c 'ls' takes about 5 seconds.

I have a workaround so this isn’t a big issue – just using plain ssh (ssh -o "StrictHostKeyChecking=no" root@mess-with-dns.internal) is working every time` and is a lot faster (maybe 500ms instead of 5s).

tj1 · August 2, 2022, 7:57pm

I’m curious, how fast is fly ssh console -swhere you have to select the actual instance?

julia · August 2, 2022, 8:03pm

it’s hard to tell because the variance is really high, it takes between 1.8 and 10 seconds.

rahmatjunaid · August 2, 2022, 8:04pm

Just to see where this extra time is going can you run LOG_LEVEL=debug fly ssh console and paste the logs that you see?

julia · August 2, 2022, 8:08pm

Here’s the output:

DEBUG Loaded flyctl config from/home/bork/.fly/config.yml
DEBUG determined hostname: "kiwi"
DEBUG determined working directory: "/home/bork/work/mess-with-dns"
DEBUG determined user home directory: "/home/bork"
DEBUG determined config directory: "/home/bork/.fly"
DEBUG ensured config directory exists.
DEBUG ensured config directory perms.
DEBUG cache loaded.
DEBUG config initialized.
DEBUG initialized task manager.
DEBUG skipped querying for new release
DEBUG client initialized.
DEBUG --> POST https://api.fly.io/graphql

{
  "query": "query ($appName: String!) { appbasic:app(name: $appName) { id name platformVersion organization { id slug } } }",
  "variables": {
    "appName": "mess-with-dns"
  }
}

DEBUG {}
DEBUG <-- 200 https://api.fly.io/graphql (2.78s)

{
  "data": {
    "appbasic": {
      "id": "mess-with-dns",
      "name": "mess-with-dns",
      "platformVersion": "nomad",
      "organization": {
        "id": "aaV5JD7y9pVvoTGeGQLvZ4RLvqiOee",
        "slug": "personal"
      }
    }
  }
}
DEBUG app config loaded from /home/bork/work/mess-with-dns/fly.toml
DEBUG Retrieving app info for mess-with-dns
DEBUG --> POST https://api.fly.io/graphql

{
  "query": "query ($appName: String!) { appcompact:app(name: $appName) { id name hostname deployed status appUrl platformVersion organization { id slug } } }",
  "variables": {
    "appName": "mess-with-dns"
  }
}

DEBUG {}
DEBUG <-- 200 https://api.fly.io/graphql (83.96ms)

{
  "data": {
    "appcompact": {
      "id": "mess-with-dns",
      "name": "mess-with-dns",
      "hostname": "mess-with-dns.fly.dev",
      "deployed": true,
      "appUrl": "https://213.188.214.254",
      "platformVersion": "nomad",
      "organization": {
        "id": "aaV5JD7y9pVvoTGeGQLvZ4RLvqiOee",
        "slug": "personal"
      },
      "status": "running"
    }
  }
}
DEBUG --> POST https://api.fly.io/graphql

{
  "query": "mutation($input: ValidateWireGuardPeersInput!) { validateWireGuardPeers(input: $input) { invalidPeerIps } }",
  "variables": {
    "input": {
      "peerIps": [
        "fdaa:0:bff:a7b:1221:0:a:2"
      ]
    }
  }
}

DEBUG {}
DEBUG <-- 200 https://api.fly.io/graphql (62.09ms)

{
  "data": {
    "validateWireGuardPeers": {
      "invalidPeerIps": []
    }
  }
}
Connecting to tunnel ⣽ Error tunnel unavailable: failed probing "personal": read tcp [fdaa:0:bff:a7b:1221:0:a:0]:43626->[fdaa:0:bff::3]:53: i/o timeout

rahmatjunaid · August 2, 2022, 8:16pm

Thanks @julia, this is looking like an issue related to your wireguard peer.

Can you run fly doctor and paste the results?

This will help pinpoint it. You might need to create a new wireguard peer connection flyctl wireguard create

julia · August 2, 2022, 8:21pm

$ fly doctor
Testing authentication token... PASSED
Testing flyctl agent... PASSED
Testing local Docker instance... PASSED
Pinging WireGuard gateway (give us a sec)... PASSED

rahmatjunaid · August 2, 2022, 8:28pm

Um that’s interesting.

Can you try creating a new wireguard peer and then rerun the fly ssh console command

julia · August 2, 2022, 8:30pm

How do I do that?

thomas · August 2, 2022, 8:31pm

This is definitely the problem, and it’s presumably something on our side.

You can force us to create a new peer for you by running flyctl wireguard reset. I’m poking around now.

rahmatjunaid · August 2, 2022, 8:52pm

Sorry didn’t realise some of my message was missing, it was supposed to say:

Can you try creating a new wireguard peer with flyctl wireguard reset and then rerun the fly ssh console command

julia · August 2, 2022, 9:13pm

Resetting it seems to have fixed the problem, thanks!

thomas · August 2, 2022, 9:15pm

Hrm. Curious. I’ve got enough info from your debug dump to do some hunting, but yeah, for future reference: your “interactive” WireGuard peers (the ones flyctl makes for you; they all have interactive in the name) are effectively disposable; if you delete them, flyctl (and flyctl agent) will notice and just make a new one for you. So if you’re seeing WireGuard-related wonkiness, you can always just flyctl wireguard reset to shake off the misbehaving peer connection.

But of course, this shouldn’t be happening in the first place!

julia · August 2, 2022, 9:22pm

For my mental model: are the Wireguard not peers not used when I do ssh root@mess-with-dns.internal? (I’m a bit confused about why one way of sshing worked and the other way didn’t)

thomas · August 2, 2022, 9:27pm

It’s a good question. If you can use native ssh, a la ssh root@mess-with-dns.internal, you’ve got a “static” WireGuard peer set up that you created explicitly with flyctl wireguard create, added to your host WireGuard, and set up the DNS for. Presumably, you either have that WireGuard connection always-on, or explicitly turn it on before working with stuff in your organization.

When you use flyctl ssh console, we run WireGuard for you, in userland, behind the scenes (along with a complete TCP/IP stack). We keep those WireGuard connections in the flyctl agent, which is just a program that runs in the background that tries to keep WireGuard peers available and shareable across different invocations of flyctl.

So when you’re using flyctl ssh console, you’re asking the flyctl agent to enable the WireGuard peer (creating it if it isn’t already there), then probe it to see if it’s live (we do a trial DNS query across it to make sure it’s working), and only then make the actual 22/tcp SSH connection.

If flyctl agent’s WireGuard probe fails, we start over from the top, creating a new WireGuard peer (which will add a couple seconds of latency as we orchestrate the peer in our backend), probing it, and only then make the new connection.

The advantage to flyctl ssh console is that you don’t need root to set it up, and don’t have to change any configuration on your dev machine. But native WireGuard will always be faster, and probably? more reliable, though flyctl ssh should always eventually work.

julia · August 2, 2022, 9:32pm

that mostly makes sense, thanks!

dedolence · October 11, 2022, 7:16pm

this has also worked for me. cheers.

edit: worked, yes! however now it appears i need to reset wireguard every time i access the console.
edit edit: maybe not??? idk, ignore me!!

Topic		Replies	Views
Getting flyctl ssh console timeout	3	905	May 2, 2021
fly ssh console times out Questions / Help	12	103	June 12, 2025
fly ssh console not working Questions / Help	8	2581	October 26, 2022
Cannot access console anymore Questions / Help	3	526	May 25, 2022
Broken my fly ssh console command	5	1115	September 16, 2022

`flyctl ssh console` intermittently times out

Related topics