To add another datapoint: It works fine for me using a minimal Dockerfile app (see below).
(Aside: I believe flyctl ssh issue --agent is only necessary if you want to use ssh directly; since you are using flyctl ssh console it shouldn’t be necessary.)
Full steps:
mkdir app && cd app
cat > Dockerfile <<"EOF"
FROM alpine
CMD sleep 1000000
EOF
fly launch # accept defaults
fly deploy --detach # we skip monitoring because the health checks will fail
# wait a couple of seconds for the VM to finish starting up
fly ssh console # no need for "-a appname" thanks to generated fly.toml
Output:
Connecting to fdaa:xxx:2... complete
/ #
Update: This is an unrelated issue, but I am getting weird errors when using fly ssh console -C. Sometimes it works but often it doesn’t (the output of the remote command isn’t displayed):
$ fly ssh console -C id
Connecting to fdaa:xxx:2... complete
uid=0(root) gid=0(root)
$ fly ssh console -C id
Connecting to fdaa:xxx:2... complete
$ fly ssh console -C id
Connecting to fdaa:xxx:2... complete
$ fly version
flyctl v0.0.450 linux/amd64 Commit: 51325e4a BuildDate: 2023-01-13T21:57:29Z
However, fly ssh console (without -C) always works for me.
Several months ago a similar issue was described here, they mention that the logs (flyctl logs) contained some more information and that SSH worked after completely removing the app and launching+deploying it again.
There was also another similar issue a couple of months (SSH Handshake Failed) which was resolved internally.
The output of fly logs might be useful (especially if it has any lines like “unexpected error: transient SSH server error: can’t resolve _orgcert.internal”), and it might also be useful to know which region your app is deployed in.
Since a minimal app works, you could try deploying a copy of your app to try to identify if the problem is related to your app. (The nuclear option is to completely remove the app then launch+deploy it again, but beware of data loss!)
As a data point: I got the same error (with an application in LHR, scale count of 2) on my first attempt to connect. I re-ran the command and connection was made, and subsequent retries have succeeded. I did not make any changes to my app (just waited the amount of time it took me to search this forum for the error message, about 2 minutes).
1st try:
fly ssh console
Connecting to [redacted]... complete
Error error connecting to SSH server: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
2nd try:
fly ssh console
Connecting to [redacted]... complete
/ #
So, if it is the dns issue linked by @tom93 above, it looks like it’s (intermittently?) impacting LHR too – maybe something for @kurt to confirm.
edit for logs:
2023-01-27T13:07:03Z app[b5c698f6] lhr [info]2023/01/27 13:07:03 unexpected error: durable SSH server error: malformed _orgcert.internal record: Invalid public key format
2023-01-27T13:07:03Z app[b5c698f6] lhr [info]2023/01/27 13:07:03 unexpected error: [ssh: no auth passed yet, durable SSH server error: malformed _orgcert.internal record: Invalid public key format]