Failed due to unhealthy allocations - no stable job version to auto revert to

Have referenced the related issues, but the solutions didn’t appear to apply. I get the message:
“Failed due to unhealthy allocations - no stable job version to auto revert to”

Despite this, the app appears to be running… but want to figure this out.

Here is my fly.toml:

app = "hellotrava"
kill_signal = "SIGINT"
kill_timeout = 5
processes = [ ]

[env]
PORT = "8080"

[deploy]
release_command = "npx prisma migrate deploy"

[experimental]
allowed_public_ports = [ ]
auto_rollback = true

[[services]]
internal_port = 8080
processes = [ "app" ]
protocol = "tcp"
script_checks = [ ]

  [services.concurrency]
  hard_limit = 25
  soft_limit = 20
  type = "connections"

  [[services.ports]]
  handlers = [ "http" ]
  port = "80"
  force_https = true

  [[services.ports]]
  handlers = [ "tls", "http" ]
  port = "443"

  [[services.tcp_checks]]
  grace_period = "1s"
  interval = "15s"
  restart_limit = 0
  timeout = "2s"

  [[services.http_checks]]
  interval = 10_000
  grace_period = "5s"
  method = "get"
  path = "/healthcheck"
  protocol = "http"
  timeout = 2_000
  tls_skip_verify = false
  headers = { }

And here are the relevant logs:

2022-04-07T14:38:41.7053633Z 	 Starting instance
2022-04-07T14:38:41.7054698Z 	 Configuring virtual machine
2022-04-07T14:38:41.7055890Z 	 Pulling container image
2022-04-07T14:38:41.7056460Z 	 Unpacking image
2022-04-07T14:38:41.7056950Z 	 Preparing kernel init
2022-04-07T14:38:41.7057437Z 	 Configuring firecracker
2022-04-07T14:38:41.7057714Z 	 Starting virtual machine
2022-04-07T14:38:41.7057988Z 	 Starting init (commit: 6f9865f)...
2022-04-07T14:38:41.7058780Z 	 Preparing to run: `docker-entrypoint.sh npx prisma migrate deploy` as root
2022-04-07T14:38:41.7078978Z 	 2022/04/07 14:38:34 listening on [fdaa:0:5938:a7b:a9e:a07c:b56c:2]:22 (DNS: [fdaa::3]:53)
2022-04-07T14:38:41.7079455Z 	 Prisma schema loaded from prisma/schema.prisma
2022-04-07T14:38:41.7080313Z 	 Datasource "db": PostgreSQL database "hellotrava", schema "public" at "top2.nearest.of.hellotrava-db.internal:5432"
2022-04-07T14:38:41.7080789Z 	 2 migrations found in prisma/migrations
2022-04-07T14:38:41.7081113Z 	 No pending migrations to apply.
2022-04-07T14:38:41.7081386Z 	 npm notice
2022-04-07T14:38:41.7081824Z 	 npm notice New minor version of npm available! 8.5.0 -> 8.6.0
2022-04-07T14:38:41.7082248Z 	 npm notice Changelog: <https://github.com/npm/cli/releases/tag/v8.6.0>
2022-04-07T14:38:41.7082752Z 	 npm notice Run `npm install -g npm@8.6.0` to update!
2022-04-07T14:38:41.7083050Z 	 npm notice
2022-04-07T14:38:41.7083327Z 	 Main child exited normally with code: 0
2022-04-07T14:38:41.7083956Z 	 Starting clean up.
2022-04-07T14:38:41.8373047Z e[32m==> Monitoring deploymente[0m
2022-04-07T14:38:42.0006103Z 
2022-04-07T14:38:42.0006583Z v3 is being deployed
2022-04-07T14:38:50.1227130Z 376f2ec9: vin running healthy
2022-04-07T14:38:51.0162950Z 376f2ec9: vin running unhealthy [health checks: 2 total, 1 critical]
2022-04-07T14:39:01.2375537Z 376f2ec9: vin running unhealthy [health checks: 2 total, 1 passing, 1 critical]
2022-04-07T14:43:42.6283822Z Failed Instances
2022-04-07T14:43:42.8652108Z 
2022-04-07T14:43:42.8652901Z e[1mInstancee[0m
2022-04-07T14:43:42.8654695Z Failure #1
2022-04-07T14:43:42.8655347Z ID      	PROCESS	VERSION	REGION	DESIRED	STATUS 	HEALTH CHECKS                 	RESTARTS	CREATED   
2022-04-07T14:43:42.8660315Z 
2022-04-07T14:43:42.8660797Z 376f2ec9	       	3      	vin   	run    	running	2 total, 1 passing, 1 critical	0       	4m53s ago	
2022-04-07T14:43:42.8690375Z e[38;5;252m--> v3 failed - Failed due to unhealthy allocations - no stable job version to auto revert to and deploying as v4 

When you say “the app appears to be running” … do you mean you are able to access /healthcheck in a browser, and get a successful response (200)?

Only when I’ve had a deploy say [1 passing, 1 critical] generally it’s the tcp healthcheck that passes but the http healthcheck that fails.

You can check on that if you run fly logs. Do you see the Fly system attempt to call /healthcheck? What response code is shown? If you see a non-200 code (like a 500) the healthcheck is failing and hence the deploy is not completing. Often you will see a message to say why from your app, like an exception (assuming you have some kind of logging).

The other thing to double-check would be whether using e.g 10_000 in the fly.toml is valid. I assume it is (given the lack of error to say it’s not valid) but the example has e.g 10000:

No, I am unable to access /healthcheck in the browser. Also, when I run fly logs I see a bunch of 404s on the /healthcheck GET request.

By “appears to be running”, I meant that I could access the home page of the app in the browser.

In a previous attempt, I removed the underscores, and still received the same errors.

Appreciate your help.

1 Like

Ah, well that would explain the error on-deploy then. The Fly system would also get a 404 when it tries to access /healthcheck. Being a non-200, it would fail.

So … you could either edit the path in the fly.toml so the healthcheck is done on /. As you say you can access that, so that means it could too. And so pass, and the deploy would complete.

Or you could leave the fly.toml as-is and add a route in your app for /healthcheck. So, again, it (and you) would get a successful response from a request to that. And the deploy would complete.

1 Like