Health check failing and no app

j_c · May 20, 2024, 10:19am

When i run fly deploy, everything looks to be going well until its waiting for the app to become healthy and then i cant get my app working. Healthy states 0/3 have passed and all showing critical

servicecheck-02-http-8080	critical	connect: connection refused	2024-05-20 09:40:03
servicecheck-01-http-8080	critical	connect: connection refused	2024-05-20 09:01:55
servicecheck-00-tcp-8080	critical	i/o timeout	2024-05-20 09:40:19

This is my .toml from the app configuration screen

app = "hmsu-staging"
kill_signal = "SIGINT"
kill_timeout = "5s"
primary_region = "lhr"
swap_size_mb = 512

[experimental]
auto_rollback = true

[[mounts]]
destination = "/data"
source = "data"

[[services]]
internal_port = 8_080
processes = [ "app" ]
protocol = "tcp"

  [services.concurrency]
  hard_limit = 100
  soft_limit = 80
  type = "requests"

  [[services.http_checks]]
  grace_period = "5s"
  interval = "10s"
  method = "get"
  path = "/resources/healthcheck"
  protocol = "http"
  timeout = "2s"
  tls_skip_verify = false

  [[services.http_checks]]
  grace_period = "10s"
  interval = "30s"
  method = "GET"
  path = "/litefs/health"
  timeout = "5s"

  [[services.ports]]
  force_https = true
  handlers = [ "http" ]
  port = 80

  [[services.ports]]
  handlers = [ "tls", "http" ]
  port = 443

  [[services.tcp_checks]]
  grace_period = "1s"
  interval = "15s"
  timeout = "2s"

Live logs:

2024-05-20T10:02:09.584 app[32871445bd4dd8] lhr [info] 🚀 We have liftoff!

2024-05-20T10:02:09.585 app[32871445bd4dd8] lhr [info] Local: http://localhost:3001

2024-05-20T10:02:09.585 app[32871445bd4dd8] lhr [info] On Your Network: http://172.19.144.202:3001

2024-05-20T10:02:09.585 app[32871445bd4dd8] lhr [info] Press Ctrl+C to stop

[PR03] could not find a good candidate within 90 attempts at load balancing. last error: [PR01] no known healthy instances found for route tcp/443. (hint: is your app shut down? is there an ongoing deployment with a volume or are you using the 'immediate' strategy? have your app's instances all reached their hard limit?)

Local logs:

--> Pushing image done
image: registry.fly.io/hmsu-staging:deployment-01HYAQ2SXVGFJDXYSEPB2QV106
image size: 456 MB

Watch your deployment at https://fly.io/apps/hmsu-staging/monitoring

-------
Updating existing machines in 'hmsu-staging' with rolling strategy

WARNING The app is not listening on the expected address and will not be reachable by fly-proxy.
You can fix this by configuring your app to listen on the following addresses:
  - 0.0.0.0:8080
Found these processes inside the machine with open listening sockets:
  PROCESS                                                       | ADDRESSES                             
----------------------------------------------------------------*---------------------------------------
  litefs mount -- /app/other/docker-entrypoint.js npm run start | [::]:20202, [::]:34291                
  /.fly/hallpass                                                | [fdaa:5:1cc2:a7b:1ef:95e7:f5df:2]:22  
  node .                                                        | [::]:3001                             


-------
 ✖ Machine 32871445bd4dd8 [app] update failed: timeout reached waiting for health checks to pass for machine 32871445bd4dd8: failed to g…
-------
Checking DNS configuration for hmsu-staging.fly.dev
Error: timeout reached waiting for health checks to pass for machine 32871445bd4dd8: failed to get VM 32871445bd4dd8: Get "https://api.machines.dev/v1/apps/hmsu-staging/machines/32871445bd4dd8": net/http: request canceled
Your machine never reached the state "%s".

You can try increasing the timeout with the --wait-timeout flag

sturpin · May 20, 2024, 2:09pm

Hi @j_c

Try this change:

app = "hmsu-staging"
kill_signal = "SIGINT"
kill_timeout = "5s"
primary_region = "lhr"
swap_size_mb = 512

[experimental]
auto_rollback = true

[[mounts]]
destination = "/data"
source = "data"

[[services]]
internal_port = 8080
processes = [ "app" ]
protocol = "tcp"

  [services.concurrency]
  hard_limit = 100
  soft_limit = 80
  type = "requests"

  [[services.http_checks]]
  grace_period = "5s"
  interval = "10s"
  method = "GET"
  path = "/resources/healthcheck"
  protocol = "http"
  timeout = "2s"
  tls_skip_verify = false

  [[services.http_checks]]
  grace_period = "10s"
  interval = "30s"
  method = "GET"
  path = "/litefs/health"
  timeout = "5s"
  tls_skip_verify = false

  [[services.ports]]
  force_https = true
  handlers = [ "http" ]
  port = 80

  [[services.ports]]
  handlers = [ "tls", "http" ]
  port = 443

  [[services.tcp_checks]]
  grace_period = "1s"
  interval = "15s"
  timeout = "2s"

Deploy yor app again:
$ fly deploy

Check your app status:
$ fly status

Let me know how it goes,
Sergio Turpín

j_c · May 20, 2024, 2:46pm

Thanks Sergio.

On fly deploy i get the following:

Updating existing machines in 'hmsu-staging' with rolling strategy

WARNING The app is not listening on the expected address and will not be reachable by fly-proxy.
You can fix this by configuring your app to listen on the following addresses:
  - 0.0.0.0:8080
Found these processes inside the machine with open listening sockets:
  PROCESS                                                       | ADDRESSES                             
----------------------------------------------------------------*---------------------------------------
  litefs mount -- /app/other/docker-entrypoint.js npm run start | [::]:20202, [::]:44833                
  /.fly/hallpass                                                | [fdaa:5:1cc2:a7b:1ef:95e7:f5df:2]:22  
  node .                                                        | [::]:3001                             


-------
 ✖ Machine 32871445bd4dd8 [app] update failed: timeout reached waiting for health checks to pass for machine 328…
-------
Checking DNS configuration for hmsu-staging.fly.dev
Error: timeout reached waiting for health checks to pass for machine 32871445bd4dd8: failed to get VM 32871445bd4dd8: Get "https://api.machines.dev/v1/apps/hmsu-staging/machines/32871445bd4dd8": net/http: request canceled
Your machine never reached the state "%s".

You can try increasing the timeout with the --wait-timeout flag

fly status gives me:

App
  Name     = hmsu-staging                                        
  Owner    = personal                                            
  Hostname = hmsu-staging.fly.dev                                
  Image    = hmsu-staging:deployment-01HYB71RGE5MMMQRG2Q8YYZPR7  

Machines
PROCESS ID              VERSION REGION  STATE   ROLE    CHECKS                  LAST UPDATED         
app     32871445bd4dd8  8       lhr     started primary 3 total, 3 critical     2024-05-20T14:39:13Z

sturpin · May 20, 2024, 3:03pm

Oops, I think you have an incorrect declaration service. Try changing [[services]] to [http_service].

Let me know how it goes,
Sergio Turpín

j_c · May 20, 2024, 5:18pm

did the update and get the following:

--> Pushing image done
image: registry.fly.io/hmsu-staging:deployment-01HYBEHFZYW443N83ZDFV5MM3Q
image size: 464 MB

Watch your deployment at https://fly.io/apps/hmsu-staging/monitoring

-------
Updating existing machines in 'hmsu-staging' with rolling strategy

WARNING The app is not listening on the expected address and will not be reachable by fly-proxy.
You can fix this by configuring your app to listen on the following addresses:
  - 0.0.0.0:8080
Found these processes inside the machine with open listening sockets:
  PROCESS                                                       | ADDRESSES                             
----------------------------------------------------------------*---------------------------------------
  litefs mount -- /app/other/docker-entrypoint.js npm run start | [::]:20202, [::]:44991                
  /.fly/hallpass                                                | [fdaa:5:1cc2:a7b:1ef:95e7:f5df:2]:22  
  node .                                                        | [::]:3001                             


-------
 ✖ Machine 32871445bd4dd8 [app] update failed: timeout reached waiting for health checks to pass for machine 328…
-------
Checking DNS configuration for hmsu-staging.fly.dev
Error: timeout reached waiting for health checks to pass for machine 32871445bd4dd8: failed to get VM 32871445bd4dd8: Get "https://api.machines.dev/v1/apps/hmsu-staging/machines/32871445bd4dd8": net/http: request canceled
Your machine never reached the state "%s".

You can try increasing the timeout with the --wait-timeout flag

fly status

App
  Name     = hmsu-staging                                        
  Owner    = personal                                            
  Hostname = hmsu-staging.fly.dev                                
  Image    = hmsu-staging:deployment-01HYBEHFZYW443N83ZDFV5MM3Q  

Machines
PROCESS ID              VERSION REGION  STATE   ROLE    CHECKS                  LAST UPDATED         
app     32871445bd4dd8  11      lhr     started primary 3 total, 3 warning      2024-05-20T16:50:07Z

When trying to access the url i get:

2024-05-20T17:18:27.892 proxy[32871445bd4dd8] lhr [error] [PC01] instance refused connection. is your app listening on 0.0.0.0:8080? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)

2024-05-20T17:18:38.353 proxy[32871445bd4dd8] lhr [error] [PC01] instance refused connection. is your app listening on 0.0.0.0:8080? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)

sturpin · May 20, 2024, 6:05pm

Yeah! we are on the right path

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]

Temporarily removes the other labels that refer to these indications.

Let me know how it goes,
Sergio Turpín

j_c · May 20, 2024, 7:53pm

Hey Sergio, not sure what you mean. Do you mean, replace

[http_service]
internal_port = 8080
processes = [ "app" ]
protocol = "tcp"

with

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]

thank you.

sturpin · May 20, 2024, 8:32pm

I’m sorry, I didn’t explain well:

# fly.toml 

app = "hmsu-staging"
primary_region = "lhr"

[http_service]
  internal_port = 8080
  force_https = true
  auto_stop_machines = true
  auto_start_machines = true
  min_machines_running = 0
  processes = ["app"]
  [http_service.concurrency]
    type = "requests"
    soft_limit = 80
    hard_limit = 100

Show me the output.

Let me know how it goes,
Sergio Turpín

j_c · May 21, 2024, 8:19am

Hopefully im on the right path…

my toml file from configuration screen is:

app = "hmsu-staging"
kill_signal = "SIGINT"
kill_timeout = "5s"
primary_region = "lhr"
swap_size_mb = 512

[experimental]
auto_rollback = true

[http_service]
auto_start_machines = true
auto_stop_machines = true
force_https = true
internal_port = 8_080
min_machines_running = 0
processes = [ "app" ]

  [http_service.concurrency]
  hard_limit = 100
  soft_limit = 80
  type = "requests"

[[mounts]]
destination = "/data"
source = "data"

[[services]]
[[services.http_checks]]
grace_period = "5s"
interval = "10s"
method = "GET"
path = "/resources/healthcheck"
protocol = "http"
timeout = "2s"
tls_skip_verify = false

[[services.http_checks]]
grace_period = "10s"
interval = "30s"
method = "GET"
path = "/litefs/health"
timeout = "5s"
tls_skip_verify = false

[[services.ports]]
force_https = true
handlers = [ "http" ]
port = 80

[[services.ports]]
handlers = [ "tls", "http" ]
port = 443

[[services.tcp_checks]]
grace_period = "1s"
interval = "15s"
timeout = "2s"

fly deploy

--> Pushing image done
image: registry.fly.io/hmsu-staging:deployment-01HYD35DJKB8AEKFNKZC4FZBBE
image size: 464 MB

Watch your deployment at https://fly.io/apps/hmsu-staging/monitoring

-------
Updating existing machines in 'hmsu-staging' with rolling strategy

WARNING The app is not listening on the expected address and will not be reachable by fly-proxy.
You can fix this by configuring your app to listen on the following addresses:
  - 0.0.0.0:8080
Found these processes inside the machine with open listening sockets:
  PROCESS                                                       | ADDRESSES                             
----------------------------------------------------------------*---------------------------------------
  litefs mount -- /app/other/docker-entrypoint.js npm run start | [::]:20202, [::]:45267                
  /.fly/hallpass                                                | [fdaa:5:1cc2:a7b:1ef:95e7:f5df:2]:22  
  node .                                                        | [::]:3001                             


-------
 ✖ Machine 32871445bd4dd8 [app] update failed: timeout reached waiting for health checks to pass for machine 328…
-------
Checking DNS configuration for hmsu-staging.fly.dev
Error: timeout reached waiting for health checks to pass for machine 32871445bd4dd8: failed to get VM 32871445bd4dd8: Get "https://api.machines.dev/v1/apps/hmsu-staging/machines/32871445bd4dd8": net/http: request canceled
Your machine never reached the state "%s".

You can try increasing the timeout with the --wait-timeout flag

fly status

App
  Name     = hmsu-staging                                        
  Owner    = personal                                            
  Hostname = hmsu-staging.fly.dev                                
  Image    = hmsu-staging:deployment-01HYD35DJKB8AEKFNKZC4FZBBE  

Machines
PROCESS ID              VERSION REGION  STATE   ROLE    CHECKS                  LAST UPDATED         
app     32871445bd4dd8  15      lhr     stopped primary 3 total, 3 warning      2024-05-21T08:15:17Z

sturpin · May 21, 2024, 9:16am

Hi @j_c

I see you are having some issues with getting your app to listen on the correct port/address. I have found some documentation that lists out steps on what to do when experiencing this issue:

Troubleshooting your deployment

Hope this helps!
Sergio Turpín

j_c · May 28, 2024, 9:08am

Hi Sergio, tried the solutions in the above but still no luck.

Update: destroyed and rebuilt my app, managed to get it running on port 3001… looks like its working for now.

system · June 4, 2024, 9:09am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fly deploy fails waiting for health checks suddenly - no configuration changes since last deploy Build debugging	35	1093	March 19, 2024
Strange behavior preventing app startup	14	587	January 29, 2021
App deployment times out on health check Questions / Help	5	1189	April 28, 2023
App listening on tcp://0.0.0.0:3000, but healthcheck fails on port 3000 Questions / Help rails	5	873	October 11, 2023
Waiting for app to become healthy / timeout reached waiting for healthchecks to pass for machine	4	1294	October 9, 2023

Health check failing and no app

Related topics