dfw not working as expected

Hey there!

Just wanted to throw this out there in case it helps. We have been deployed to dfw for quite some time. Yesterday we started noticing our servers were no longer responding to requests after deploying. Lots and lots of painful hours later, deploying to another region like mia fixed everything.

Weird things we noticed:

  • some times it would work, some times it would not
  • when it was working, we could not ssh into the box
  • when it was not working, we could ssh into the box and curling localhost:port revealed that our server was responding as expected, but could not be reached from the outside world

I’m having a hard time thinking its our application code at this point. LMK if you need any more info :man_shrugging:

Hi-- thank you for reporting this! Is everything working for you now?

It’s hard to say for certain, but it’s possible that you were running into the API issues which people were reporting a few hours ago:

If you are still having problems, it’d be great if you could provide a little more detail so we can help you dig into it. For example, this might include things like:

  • LOG_LEVEL=debug output for your failing commands
  • a description of affected components (app name, the type of image you’re running, your fly.toml config, etc)
  • what specific flyctl troubleshooting steps you’ve already done

Just let us know! Always happy to take a closer look at any surprising behavior you find on our platform.

I was coming here to post the same thing.

We were on a single region (dfw) and I started experiencing this last night. It’s been a lot better this morning, but I just experienced the same issue again after a recent deploy. I thought maybe the API unresponsive issue was it, but since it just happened again, perhaps that is not fully resolved or it is something else?

Switching regions as Cade suggested just worked for me.

since it just happened again, perhaps that is not fully resolved or it is something else?

That’s always a possibility that we want to explore as much as we can! Do you have any more details about this most recent failure? (timestamps or a rough time estimate, flyctl output, app name, etc)

The problems is, all the logs look normal. All the server statuses are healthy, everything deploys fine. I was able to pretty easily recreate by making a new server following the elixir guide here. I followed the steps exactly, deployed to dfw, and the app does not respond after a deploy.

You might have to try a few times, because as I mentioned above, it happens on some deploys, but not all.

Ah, okay, thank you for explaining this to me! Could you give us a specific app name? We’re currently trying to see if we can observe this behavior on our end, so that would definitely help us narrow things down a bit

@eli I booted up a fresh app test-for-eli which is currently deployed to dfw but not responding to requests. All I did was boot up a fresh elixir project and run fly launch.

1 Like

On the machine, I verified that the service is responding on the given port:

# curl localhost:8080
<!DOCTYPE html>
<html lang="en">
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">

Awesome, thanks for the assist! We were able to identify and fix some behavior on hosts in dfw, which should resolve the earlier problems y’all were seeing with some deploys.

Please let us know if you notice this in the future!

Thanks for checking into it! We’ll deploy back to dfw some time in the future.

1 Like