Unable to deploy / scale occasionally...

Hello,

I’m using Remix and deploying to my production sever that was set with scale count 3 and 1GB of ram. Over the past few days when pushing to production, sometimes it fails, sometimes it works and I’m not sure why. Today however I can’t get it to successfully push to production at all. When I push to my staging server, it works just fine. So I set my scale count back down from 3 to 1 and I was able to deploy just fine. However when I set the scale count back to 3 - it failed to scale. Now I tried again and it completed. Why is this happening? Do I not have something setup correctly? My site doesn’t have any traffic yet as I’m still in dev, so I don’t think it’s related to that.

Here is an output of my status:

Thanks!

I also just wanted to chime in and say that I was having deployment issues last night as well, just tried deploying this morning (same image) and it deployed just fine…

In my case I was receiving a Task Not Running By Deadline error:

2022-02-17T09:22:46Z Received        Task received by client                         
2022-02-17T09:22:46Z Task Setup      Building Task Directory                         
2022-02-17T09:27:46Z Alloc Unhealthy Task not running by deadline                    
2022-02-17T09:27:48Z Killing         Sent interrupt. Waiting 5s before force killing 

Wish I knew how I could debug it a little better

Also seeing intermittent deploy issues this morning. Just received:

Error failed to fetch an image or build from source: error connecting to docker: failed building options: failed probing "personal": context deadline exceeded

Using the --remote-only flag.

Tried again and received:

Error 1 error occurred:
	* Post "https://api.fly.io/graphql": read tcp 192.168.40.233:65442->77.83.143.220:443: read: connection reset by peer

I also received this same error earlier.

Yet another new error message:

--> v199 failed - Failed due to unhealthy allocations - not rolling back to stable job version 199 as current job has same specification and deploying as v200

For what it’s worth — though I don’t know if it actually led to a fix — I pulled my scale count way down, on the theory I had a box’s state out of sync. Deploy is now working.

I’ll add another update to this:
I normally do a git commit/ git push from my code base and then have a github action deploy to FLY automatically. I did that and I had yet another 20+ min wait for a failed deployment. Just before that I was able to deployed just using NPM in my terminal and had a successful deployment. So I went back to my terminal and manually deployed using NPM instead of going through github. Currently my terminal is sitting on “3 desired, 3 placed, 2 healthy, 0 unhealthy [health checks: 2 total, 2 passing]”. Normally my deployments have been taking at most 10 min. Now I wait 20+ min.

What worked earlier was going back down to scale count 1. Deploying, and if successful, trying to scale back up. I’m wondering if my github action is messing with my build somehow…

Here is my github action for what it’s worth. All of this was working for the last like 3 months with no issues:


name: 🚀 Deploy

on:
  push:
    branches: [ main ]

env:
  FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
  PUBLIC_WP_API_URL: "PRODUCTION_URL"

jobs:
  build:

    runs-on: ubuntu-latest

    strategy:
      matrix:
        node-version: [16.x]
        # See supported Node.js release schedule at https://nodejs.org/en/about/releases/

    steps:
    - uses: actions/checkout@v2
    - name: Use Node.js ${{ matrix.node-version }}
      uses: actions/setup-node@v2
      with:
        node-version: ${{ matrix.node-version }}
        cache: 'npm'
    - run: npm ci
    - run: npm run build --if-present
    - uses: superfly/flyctl-actions@1.1
      with:
        args: "deploy"

Ok hopefully final update - I changed in my github actions file this line:

- uses: superfly/flyctl-actions@1.1

to the newer version I saw in the docs:

- uses: superfly/flyctl-actions@1.3

and I’m able to deploy again.

1 Like

More context for folks if it’s helpful: I am not using GitHub actions, only a generated fly.toml and a pretty lightweight Dockerfile that hasn’t changed recently. Deploys still fail frequently today, even after my fly scale count 1 “trick.”

We’re investigating networking issues in the lax region that might actually be the source of the problem.

If you’re having deploy issues, will you see if lax appears in fly regions list? If it does, you can try removing that region to see if it improves things.

Yea I was able to deploy but all LAX is failing like you said, so I switched it to SEA. Thanks for the update!