Fly machines were working fine till yesterday but today they are not working. I cleaned the machines out and recreated them (2x machines) and they get created but do not start and logs are stuck with the following:
PR03] could not find a good candidate within 40 attempts at load balancing. last error: [PR01] no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the ‘immediate’ strategy? have your app’s instances all reached their hard limit?)
I suspected some code change, so I reverted to last working commit and destroyed the machines and recreated them, still same failure.
Hi… It generally helps to post the full logs (from boot) for one of the Machines; there really should be another error message in there. You can use the </> button in the toolbar to get an area suitable for pasting code, output, etc.
It would also be good to provide the entire fly.toml and the output of fly m list.
Fly logs from moment I deployed machine (logs below):
06:44:53
Pulling container image registry.fly.io/sesra-backend@sha256:b7c82825a742f3c0a26fb0195a78f402e28d22bb5f6b79155836f6f9bdb3fd05
06:49:54
[PR03] could not find a good candidate within 40 attempts at load balancing. last error: [PR01] no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)
06:49:55
[PR03] could not find a good candidate within 40 attempts at load balancing. last error: [PR01] no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)
06:50:21
[PR03] could not find a good candidate within 40 attempts at load balancing. last error: [PR01] no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)
Fly.toml file:
app = "sesra-backend"
primary_region = "bom"
[build]
[env]
NODE_ENV = "production"
NODE_OPTIONS = "--max-old-space-size=768"
[http_service]
internal_port = 8080
force_https = true
auto_stop_machines = false
auto_start_machines = true
min_machines_running = 1
[http_service.concurrency]
type = "requests"
soft_limit = 25
hard_limit = 50
[[vm]]
cpu_kind = "shared"
cpus = 2
memory_mb = 1024
[checks]
[checks.health]
grace_period = "60s"
interval = "45s"
method = "get"
path = "/health"
timeout = "5s"
type = "http"
I am also experiencing same issue. Is it due to SYD incident? My app is in suspended status now and machines are in created status.
What region(s) are you in, @equbit_dev?
I would try deploying to nrt or sin at this point, just to see if it works…
(Another user was describing problems with bom registry pulls a couple days ago.)
I tried with sin region and it worked. but I will need to go back to BOM very soon. how to get info about what exactly is wrong with BOM machine and when will be available. I am new to fly.io and hence need guidance for keeping up to date information. thanks in advance
Glad to hear that it worked…
Generally, the community forum here is the slowest and least reliable. The best is to get one of the Support plans, which have their own ticketing system, etc.
Going without a Support plan is really only for hobbyist dabblers like myself, basically.
You can also watch the global status page (which it looks like you already know about) as well as the personalized one on your dashboard (which will show any additional notes about specific physical host machines that you’re on).
Yes, it seems to be BOM region problem. SIN region worked out. Why is this not updated on fly status page? BOM region shows operational.
The status page is updated manually, so it doesn’t always reflect problems immediately. They’ve talked a bit in the past about publishing some of the automated statistics that they already have, but that seems to have fallen by the wayside…