Have referenced the related issues, but the solutions didn’t appear to apply. I get the message:
“Failed due to unhealthy allocations - no stable job version to auto revert to”
Despite this, the app appears to be running… but want to figure this out.
When you say “the app appears to be running” … do you mean you are able to access /healthcheck in a browser, and get a successful response (200)?
Only when I’ve had a deploy say [1 passing, 1 critical] generally it’s the tcp healthcheck that passes but the http healthcheck that fails.
You can check on that if you run fly logs. Do you see the Fly system attempt to call /healthcheck? What response code is shown? If you see a non-200 code (like a 500) the healthcheck is failing and hence the deploy is not completing. Often you will see a message to say why from your app, like an exception (assuming you have some kind of logging).
The other thing to double-check would be whether using e.g 10_000 in the fly.toml is valid. I assume it is (given the lack of error to say it’s not valid) but the example has e.g 10000:
Ah, well that would explain the error on-deploy then. The Fly system would also get a 404 when it tries to access /healthcheck. Being a non-200, it would fail.
So … you could either edit the path in the fly.toml so the healthcheck is done on /. As you say you can access that, so that means it could too. And so pass, and the deploy would complete.
Or you could leave the fly.toml as-is and add a route in your app for /healthcheck. So, again, it (and you) would get a successful response from a request to that. And the deploy would complete.
I keep getting the unhealthy checks error on my GithubActions build.
the vm status:
Instance
ID = 4caa034a
Process = app
Version = 92
Region = ord
Desired = stop
Status = complete
Health Checks = 2 total, 2 critical
Restarts = 0
Created = 5m50s ago
Events
TIMESTAMP TYPE MESSAGE
2022-10-12T13:24:56Z Received Task received by client
2022-10-12T13:24:56Z Task Setup Building Task Directory
2022-10-12T13:24:59Z Started Task started by client
2022-10-12T13:29:56Z Alloc Unhealthy Task not running for min_healthy_time of 10s by deadline
2022-10-12T13:29:57Z Killing Sent interrupt. Waiting 5s before force killing
2022-10-12T13:30:15Z Terminated Exit Code: 0
2022-10-12T13:30:15Z Killed Task successfully killed
Checks
ID SERVICE STATE OUTPUT
3df2415693844068640885b45074b954 tcp-8080 critical dial tcp <ip>:8080: connect: connection refused
03833b6def760b24d9962af66e7ec077 tcp-8080 critical Get "<ip>:8080/healthcheck": dial tcp 172.19.1.50:8080: connect: connection refused
Recent Logs
2022-10-12T13:30:12Z [info]Shutting down virtual machine
2022-10-12T13:30:12Z [info][2022-10-12 13:30:12 +0000] [520] [INFO] Handling signal: int
2022-10-12T13:30:12Z [info]Sending signal SIGINT to main child process w/ PID 520
2022-10-12T13:30:12Z [info][2022-10-12 13:30:12 +0000] [525] [INFO] Worker exiting (pid: 525)
2022-10-12T13:30:12Z [info][2022-10-12 13:30:12 +0000] [520] [INFO] Shutting down: Master
2022-10-12T13:30:13Z [info]Starting clean up.
and the Dockerfile
FROM python:latest
WORKDIR /app
EXPOSE 8080
COPY ./requirements.txt /app
RUN pip install -r requirements.txt
COPY . /app
CMD ["gunicorn","-b","127.0.0.1:8080","app:app"]
So the previous suggestion resolved my issue, but then I encountered a https issue with my Flask app. I used the werkzeug proxy workaround. But now my app is not deploying. Giving me the same unhealthy allocation error.
v143 is being deployed
232b495f: ord pending
232b495f: ord pending
232b495f: ord running unhealthy [health checks: 2 total, 1 passing]
232b495f: ord running unhealthy [health checks: 2 total, 1 passing, 1 critical]
Failed Instances
Instance
Failure #1
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
232b495f app 143 ord run running 2 total, 1 passing, 1 critical 0 4m56s ago
Recent Events
TIMESTAMP TYPE MESSAGE
--> v143 failed - Failed due to unhealthy allocations - not rolling back to stable job version 143 as current job has same specification and deploying as v144
I tried editing the healthcheck section of my fly.toml.