I’m using Remix and deploying to my production sever that was set with scale count 3 and 1GB of ram. Over the past few days when pushing to production, sometimes it fails, sometimes it works and I’m not sure why. Today however I can’t get it to successfully push to production at all. When I push to my staging server, it works just fine. So I set my scale count back down from 3 to 1 and I was able to deploy just fine. However when I set the scale count back to 3 - it failed to scale. Now I tried again and it completed. Why is this happening? Do I not have something setup correctly? My site doesn’t have any traffic yet as I’m still in dev, so I don’t think it’s related to that.
I also just wanted to chime in and say that I was having deployment issues last night as well, just tried deploying this morning (same image) and it deployed just fine…
In my case I was receiving a Task Not Running By Deadline error:
2022-02-17T09:22:46Z Received Task received by client
2022-02-17T09:22:46Z Task Setup Building Task Directory
2022-02-17T09:27:46Z Alloc Unhealthy Task not running by deadline
2022-02-17T09:27:48Z Killing Sent interrupt. Waiting 5s before force killing
Also seeing intermittent deploy issues this morning. Just received:
Error failed to fetch an image or build from source: error connecting to docker: failed building options: failed probing "personal": context deadline exceeded
Using the --remote-only flag.
Tried again and received:
Error 1 error occurred:
* Post "https://api.fly.io/graphql": read tcp 192.168.40.233:65442->77.83.143.220:443: read: connection reset by peer
--> v199 failed - Failed due to unhealthy allocations - not rolling back to stable job version 199 as current job has same specification and deploying as v200
For what it’s worth — though I don’t know if it actually led to a fix — I pulled my scale count way down, on the theory I had a box’s state out of sync. Deploy is now working.
I’ll add another update to this:
I normally do a git commit/ git push from my code base and then have a github action deploy to FLY automatically. I did that and I had yet another 20+ min wait for a failed deployment. Just before that I was able to deployed just using NPM in my terminal and had a successful deployment. So I went back to my terminal and manually deployed using NPM instead of going through github. Currently my terminal is sitting on “3 desired, 3 placed, 2 healthy, 0 unhealthy [health checks: 2 total, 2 passing]”. Normally my deployments have been taking at most 10 min. Now I wait 20+ min.
What worked earlier was going back down to scale count 1. Deploying, and if successful, trying to scale back up. I’m wondering if my github action is messing with my build somehow…
Here is my github action for what it’s worth. All of this was working for the last like 3 months with no issues:
name: 🚀 Deploy
on:
push:
branches: [ main ]
env:
FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
PUBLIC_WP_API_URL: "PRODUCTION_URL"
jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [16.x]
# See supported Node.js release schedule at https://nodejs.org/en/about/releases/
steps:
- uses: actions/checkout@v2
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v2
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
- run: npm ci
- run: npm run build --if-present
- uses: superfly/flyctl-actions@1.1
with:
args: "deploy"
More context for folks if it’s helpful: I am not using GitHub actions, only a generated fly.toml and a pretty lightweight Dockerfile that hasn’t changed recently. Deploys still fail frequently today, even after my fly scale count 1 “trick.”
We’re investigating networking issues in the lax region that might actually be the source of the problem.
If you’re having deploy issues, will you see if lax appears in fly regions list? If it does, you can try removing that region to see if it improves things.