Hi, we’ve got a long-lived app (last deployed about 8 months ago) that appears to have recently gone dead. flyctl restart
doesn’t appear to do anything and flyctl logs
are completely silent. Can someone have a look?
sure! would you mind including some additional info to help narrow this down a bit?
- what does
fly status --all
say? - have you tried running
fly logs
withLOG_LEVEL=debug
? - do you have a time estimate for when this app stopped working?(timestamps, minutes, hours, etc)
Sure, though I’d rather not post all of the requested info in public. Can we take this to DM or similar?
We believe the app was working as recently as last week, though we check it infrequently and can’t be certain.
completely understandable – feel free to redact any info from that ouput that you’re not comfortable sharing!
hey, just wanted to chime in here with a few follow-up tips:
There are a few things in flyctl you can use to inspect the behavior of individual VMs. This might turn up more information to help you debug!
For example, fly logs -i
will show you output from a specific recent instance. This can come in handy in concert with fly status --all
, whose display includes completed instances. You can also always use fly vm status <id>
, which displays things like exit codes, health checks, etc.
Hopefully this is still helpful in its fully-redacted state.
Unfortunately, no instances (previous or current) are shown:
flyctl status --all Update available 0.0.328 -> v0.0.330.
Run "flyctl version update" to upgrade.
App
Name = appname
Owner = orgname
Version = 37
Status = dead
Hostname = appname.fly.dev
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
Here’s the log debug output:
LOG_LEVEL=debug flyctl logs DEBUG Loaded flyctl config from/Users/username/.fly/config.yml
DEBUG determined hostname: "hostname"
DEBUG determined working directory: "/Users/username/git/appname"
DEBUG determined user home directory: "/Users/username"
DEBUG determined config directory: "/Users/username/.fly"
DEBUG ensured config directory exists.
DEBUG ensured config directory perms.
DEBUG cache loaded.
DEBUG config initialized.
DEBUG initialized task manager.
DEBUG skipped querying for new release
Update available 0.0.328 -> v0.0.330.
Run "flyctl version update" to upgrade.
DEBUG client initialized.
DEBUG app config loaded from /Users/username/git/appname/fly.toml
DEBUG --> POST https://api.fly.io/graphql {{"query":"query ($appName: String!) { app(name: $appName) { id name hostname deployed status version appUrl platformVersion currentRelease { evaluationId status inProgress version } config { definition } organization { id slug } services { description protocol internalPort ports { port handlers } } ipAddresses { nodes { id address type createdAt } } imageDetails { repository version } machines{ nodes { id name config state region createdAt app { name } ips { nodes { family kind ip maskSize } } host { id } } } postgresAppRole: role { name } } }","variables":{"appName":"appname"}}
}
DEBUG <-- 200 https://api.fly.io/graphql (357.84ms) {"data":{"app":{"id":"appname","name":"appname","hostname":"appname.fly.dev","deployed":true,"version":37,"appUrl":"https://xxx.xxx.xxx.xxx","platformVersion":"nomad","currentRelease":{"evaluationId":null,"status":"succeeded","inProgress":false,"version":37},"config":{"definition":{"kill_timeout":5,"kill_signal":"SIGINT","processes":[],"experimental":{"allowed_public_ports":[],"entrypoint":[],"cmd":[],"exec":[]},"services":[{"processes":[],"protocol":"tcp","internal_port":8081,"concurrency":{"soft_limit":20,"hard_limit":25,"type":"connections"},"ports":[{"port":443,"handlers":["tls","http"]}],"tcp_checks":[{"interval":"15s","timeout":"2s","grace_period":"1s","restart_limit":6}],"http_checks":[],"script_checks":[]}],"env":{}}},"organization":{"id":"xxxx","slug":"orgname"},"services":[{"description":"TCP 443 ⇢ 8081","protocol":"TCP","internalPort":8081,"ports":[{"port":443,"handlers":["TLS","HTTP"]}]}],"imageDetails":{"repository":"appname","version":null},"postgresAppRole":null,"ipAddresses":{"nodes":[{"id":"ip_xxx","address":"xxx.xxx.xxx.xxx","type":"v4","createdAt":"2021-08-04T12:09:41Z"},{"id":"ip_xxx","address":"x:x:x::x","type":"v6","createdAt":"2021-08-04T12:09:42Z"}]},"machines":{"nodes":[]},"status":"dead"}}}
DEBUG --> POST https://api.fly.io/graphql {{"query":"query ($appName: String!) { app(name: $appName) { id name hostname deployed status version appUrl platformVersion currentRelease { evaluationId status inProgress version } config { definition } organization { id slug } services { description protocol internalPort ports { port handlers } } ipAddresses { nodes { id address type createdAt } } imageDetails { repository version } machines{ nodes { id name config state region createdAt app { name } ips { nodes { family kind ip maskSize } } host { id } } } postgresAppRole: role { name } } }","variables":{"appName":"appname"}}
}
DEBUG <-- 200 https://api.fly.io/graphql (343.22ms) {"data":{"app":{"id":"appname","name":"appname","hostname":"appname.fly.dev","deployed":true,"version":37,"appUrl":"https://xxx.xxx.xxx.xxx","platformVersion":"nomad","currentRelease":{"evaluationId":null,"status":"succeeded","inProgress":false,"version":37},"config":{"definition":{"kill_timeout":5,"kill_signal":"SIGINT","processes":[],"experimental":{"allowed_public_ports":[],"entrypoint":[],"cmd":[],"exec":[]},"services":[{"processes":[],"protocol":"tcp","internal_port":8081,"concurrency":{"soft_limit":20,"hard_limit":25,"type":"connections"},"ports":[{"port":443,"handlers":["tls","http"]}],"tcp_checks":[{"interval":"15s","timeout":"2s","grace_period":"1s","restart_limit":6}],"http_checks":[],"script_checks":[]}],"env":{}}},"organization":{"id":"xxx","slug":"orgname"},"services":[{"description":"TCP 443 ⇢ 8081","protocol":"TCP","internalPort":8081,"ports":[{"port":443,"handlers":["TLS","HTTP"]}]}],"imageDetails":{"repository":"appname","version":null},"postgresAppRole":null,"ipAddresses":{"nodes":[{"id":"ip_xxx","address":"xxx.xxx.xxx.xxx","type":"v4","createdAt":"2021-08-04T12:09:41Z"},{"id":"ip_xxx","address":"x:x:x::x","type":"v6","createdAt":"2021-08-04T12:09:42Z"}]},"machines":{"nodes":[]},"status":"dead"}}}
DEBUG --> POST https://api.fly.io/graphql {{"query":"mutation($input: ValidateWireGuardPeersInput!) { validateWireGuardPeers(input: $input) { invalidPeerIps } }","variables":{"input":{"peerIps":["x:x:x::x"]}}}
}
DEBUG <-- 200 https://api.fly.io/graphql (109.59ms) {"data":{"validateWireGuardPeers":{"invalidPeerIps":[]}}}
Definitely still useful! The output of fly status --all
tells us that there aren’t any vms active for the past few days (which is why that field is blank).
So in this case you’d probably want to redeploy (or up the scale count) to create more instances.
OK, sure, we can do that, and I do want to thank you for your quick responses and for being helpful. But more important to us than getting this particular app running again is to try to get some insight into a) why it happened, b) what we can do to help prevent this from happening in the future, and c) why flyctl restart
didn’t work.
Unfortunately, on our end, the lack of logs or any trace of a previous instance doesn’t provide us with any actionable information, so I was hoping someone from Fly.io could get to the bottom of it, or at least explain why this might have happened.
completely understandable!
So we do only keep app logs for two days-- if you need to keep them for longer, you can ship them somewhere else (perhaps something like the fly-log-shipper might fit into your stack).
We just now rolled out a change that should allow you to see failed VMs up to 7 days afterward.
This was previously only 2 days, so fly status --all
might provide you with a little more insight now! You can then run fly vm status <id>
on those instances and look for non-zero exit codes, restarts, etc.
On a related note, while it’s not at all a universal solution, running multiple instances of an app (ie fly scale count) can greatly improve reliability.
Thank you again for bringing this up!