Something not right on Fly.io

afterburner · February 4, 2023, 2:51pm

I’m also getting some issues with my wireguard peer, can’t ssh into my application at all now

mkozak · February 4, 2023, 2:59pm

I have the same issue, cannot deploy any change for past few hours

klucass · February 4, 2023, 3:00pm

Me too (django app).

miharekar · February 4, 2023, 4:31pm

Yeah, I seem to be stuck too with a rails app.

This is with debug log

Running release task (pending)... 🌍DEBUG --> POST https://api.fly.io/graphql

{
  "query": "query ($id: ID!) { releaseCommandNode: node(id: $id) { id ... on ReleaseCommand { id instanceId command status exitCode inProgress succeeded failed } } }",
  "variables": {
    "id": "rcmd_v0or2w9dg18y9gxk"
  }
}

DEBUG {}
Running release task (pending)... 🌎DEBUG <-- 200 https://api.fly.io/graphql (214.83ms)

{
  "data": {
    "releaseCommandNode": {
      "id": "rcmd_v0or2w9dg18y9gxk",
      "instanceId": null,
      "command": "bin/rails fly:release",
      "status": "pending",
      "exitCode": null,
      "inProgress": true,
      "succeeded": false,
      "failed": false
    }
  }
}

and just keeps going on and on like this.

Everything is green on https://status.flyio.net/ so I’m not sure if anyone is looking into this cc @michael

notif · February 4, 2023, 4:53pm

I am also not able to deploy at the moment. Stuck on: Running release task (pending)

nerdyworm · February 4, 2023, 5:03pm

Also unable to deploy. Started last night for me.

App is still up, so that’s good

bram-dingelstad · February 4, 2023, 5:15pm

yeah, same for me. Started around 3 Feb 19:00:00 GMT+1

r4z4 · February 4, 2023, 7:55pm

Yep same thing here. Pretty frustrating but thanks for keeping an eye out

bram-dingelstad · February 4, 2023, 9:55pm

Update on my situation: turns out my app was actually fine but had a different underlying problem.

My app relies on Litestream (author also works for Fly i think) and it seems that the backup process was somehow corrupted by an earlier Fly outage (proxy issue or otherwise). The backup was consequently poisoned and unrecoverable. I had to dig into my other backups in order to recover it fully, losing about 5 days of database activity in the process.

I’m gonna look deeper into the original cause of the issue, but it was at the same outage window as the earlier reports of this issue. If you’re experiencing this issue still, make sure that your data source isn’t corrupted and causing your application to be unrecoverable.

odabajo · February 5, 2023, 8:59pm

Hello,

I’m also suffering intermittent errors (but long in downtime) in several apps:

3rd of February

Rails and Elixir apps down at 20:10h UTC. Downtime of ~15 minutes. Came back by themselves.

4th of February

Rails app went down at 06:05h UTC. Downtime of 38 minutes.

5th of February

Elixir app went down at 22:28 UTC. Downtime of 13 minutes.
Elixir app went down again at 22:49 UTC. Downtime of 3 minutes.

This weekend we accumulated a total of 1 hour and 24 minutes of downtime across two apps. And of course we cannot really get a response (we couldn’t get one the last time we wrote about problems here).

The errors vary: sometimes the PostgreSQL instance is down, sometimes I cannot even connect to the HTTP service.

What can we do to improve this? My apps are in the paid tier but not generating expenses (yet). Is this the same for paying apps?

The elixir site is quite popular and receives a 1k+ visits per day, so it would be nice to find a solution.

Update

The error I’m currently seeing as I wrote this is:

(DBConnection.ConnectionError) tcp connect (top2.nearest.of.MY-DB-NAME.internal:5432): non-existing domain - :nxdomain

so it looks like a DNS resolving issue on the instance

Also, it is an intermittent one: some requests go through, and other don’t - probably the ones going through are persistent connections in the pool.

hikita · February 6, 2023, 1:39am

Same issue here

kurt · February 6, 2023, 2:52am

This topic was primarily about the outage on Friday. If you’re having issues since then, it’s likely unrelated.

If your app is having reliability issues, please ensure you’re running 2+ database nodes and 2+ application instances.

Also run fly status --all -a <pg-app> and fly status --all -a <app> and make sure you’re not getting unexpected restarts or vm failures.

If you’re getting delays deploying, this is likely due to intermittent capacity issues in European regions. We’re prioritizing deploys on paid plans. If you are having issues and are on a paid plan, please email the premium support address in your profile and we’ll look into it.

notif · February 6, 2023, 3:41am

Hi Kurt,

I have a paid account (albeit a pay as you go hobby plan pending launch) but I do not see an email support address in my profile. Where should I look to find that?

Thanks

odabajo · February 6, 2023, 7:20am

I don’t know if it is related or not, because I’m suffering issues since Friday and my apps have no new releases since weeks ago. The last and more painful (the DB :nxdomain one) is specially hard, as it is not related to the app.

No restarts at all. Also a third app of my own (also Elixir one) is suffering from the DNS issues trying to connect to the database since ~10h ago. Tried to downscale to zero and start the DB again, restart the app… Nothing fixes it.

I’m in the pay as you go plan, so I suppose I have no support available. Is there anything I can do aside form backing it up and start from scratch?

bcomnes · March 4, 2023, 5:56pm

Seeing this again today.

Rob_Cole · March 4, 2023, 6:02pm

Also seeing this today: Could not proxy HTTP request. Retrying in 1000 ms

sea region, but I’m guessing it’s a broader issue. Status page shows an outage around state propagation An ongoing upgrade is causing delayed app instances state propagation but it’s unclear if that’s the same source of the errors that I’m seeing that seem to have to do with edge routing.

Topic		Replies	Views
fly.io is offline - cannot proxy http request	32	1993	March 22, 2025
Something went wrong? Questions / Help	42	1504	September 22, 2022
Global outage (maybe already recovering) just now? proxy	5	132	December 19, 2024
FLy status shows up but app is down for seven hours	9	836	March 21, 2023
fly.io site is currently inaccessible...	83	3236	December 5, 2024

Something not right on Fly.io

Update

Related topics