Flyctl agent wont start

I run flyctl postgres db list <name of db> and I get Error can't establish agent: agent: failed to start. I then run flyctl doctor and I get:

Testing authentication token... PASSED
Testing flyctl agent... FAILED
(Error: couldn't ping agent: agent not running)

Can't communicate with flyctl's background agent.

Run 'flyctl agent restart'.

So I run flyctl agent restart and now I get

Error failed establishing connection to agent: agent: failed to start


The agent failed to start with the following error log:

2022/06/02 09:19:24.450633 srv another instance of the agent is already running

When I run ps aux | grep "flyctl agent", nothing returns though.

This is debilitating because it stops me from running anything else.

There should be some logs in ~/.fly/agent-logs/. Find the one with the latest timestamp and check its contents. There may be a hint as to what’s causing the problem. Did this just start happening suddenly? Also, which version of flyctl?

It happened when I tried to do postgres attach. I did a bunch of stuff and then (mysteriously) there was an agent that I could kill, so I killed it and now can run everything again. Honestly have no idea how to describe this bug report.

Happened to me. Some of the commands (ex: flyctl m list -a <app-name>) won’t work; while flyctl doctor was stuck trying to test the agent.

The logs in ~/flyctl/agent-logs/ dir said Connected and seemingly pointed to no errors / exceptions.

logs
# some sort of config change?
2022/09/10 02:33:57.264750 #55 connected...                                    
2022/09/10 02:33:57.264879 srv config change at: 2022-09-10 02:33:57.2598036 +0530 IST
# this is first and only "dropped" log in the file`
2022/09/10 02:33:57.264984 #55 dropped.                                         
2022/09/10 02:33:57.265169 #56 connected ...                          
# more changes?          
2022/09/10 02:33:57.265263 #56 <- (   20) "reestablish personal"                
2022/09/10 02:34:01.543847 #56 -> (  740) "\xe2\x02ok {\"WireGuardState\":{\"org\":\"personal\",\"name\":\"interactive-agent-[redacted]-01GCJ120DVT05DRH336XECMHMX\",\"region\":\"maa\",\"localPrivateKey\":\"[redacted]\",\"localpublic\":\"O5ToEz7fwQuMxaGy5O2/8aRbgoCc4JLNLzu1je64lPY=\",\"dns\":\"\",\"peer\":{\"peerip\":\"fdaa:0:7161:a7b:936e:0:a:202\",\"endpointip\":\"maa2.gateway.6pn.dev\",\"pubkey\":\"RsHJtmGgM6dAO+Fzqr42ruQHWUXtRX7a4jSb/g2d+FU=\"}},\"TunnelConfig\":{\"LocalPrivateKey\":\"[redacted]\",\"LocalNetwork\":\"fdaa:0:7161:a7b:936e:0:a:200/120\",\"RemotePublicKey\":\"RsHJtmGgM6dAO+Fzqr42ruQHWUXtRX7a4jSb/g2d+FU=\",\"RemoteNetwork\":\"fdaa:0:7161::/48\",\"Endpoint\":\"maa2.gateway.6pn.dev:51820\",\"DNS\":\"fdaa:0:7161::3\",\"KeepAlive\":0,\"MTU\":0,\"LogLevel\":0}}\n"
2022/09/10 02:34:01.543908 #56 dropped.                                         
# again, no such "validated" logs in the file apart from these:
2022/09/10 02:35:21.696366 srv validated wireguard peers                        
2022/09/10 02:37:22.135545 srv validated wireguard peers                        
2022/09/10 02:39:22.770927 srv no peer for personal in config - closing tunnel ...
# problem started here, says connected but ...
2022/09/17 02:33:03.718196 #57 connected ...                                    
2022/09/17 02:33:56.216224 #58 connected ...                                    
2022/09/17 02:35:03.859710 #59 connected ...                                    
2022/09/17 02:35:20.388677 #5a connected ...                                    
2022/09/17 02:42:55.763702 #5b connected ...                                    
2022/09/17 02:43:02.672888 #5c connected ...                                    
2022/09/17 02:43:50.339447 #5d connected ...                                    
2022/09/17 02:44:02.040331 #5e connected ...                                    
2022/09/17 02:44:33.292453 #5f connected ...                                    
2022/09/17 02:46:09.419835 #60 connected ...                                    
2022/09/17 02:46:22.865566 #61 connected ...                                    
2022/09/17 02:47:16.754605 #62 connected ...                                    
2022/09/17 02:48:25.993027 #63 connected ...                                    
2022/09/17 02:48:43.473520 #64 connected ...                                    
2022/09/17 02:49:50.773692 #65 connected ...                                    
2022/09/17 02:50:04.269823 #66 connected ...                                    
2022/09/17 02:50:39.416825 #67 connected ...                                    
2022/09/17 20:04:19.054925 #68 connected ...                                    
2022/09/17 20:05:35.966777 #69 connected ...                                    
2022/09/17 20:05:48.564900 #6a connected ...                                    
2022/09/17 20:06:41.611613 #6b connected ...                                    
# after killing the agent
2022/09/17 20:11:52.861372 srv shutting down ...                                
2022/09/17 20:11:52.861626 srv QUIT    

# things start working from here on (no logs)                                      

What did the trick for me was to killall flyctl and kill -3 <pid-of-flyctl-agent-run>