Deployment causes failure on CI workflow but actually succeeding

Hello ,
i’ve been dealing with this problem on guthub actions, so there is a deployment step which i deploy the app using $ flyctl deploy --remote-only . it was working until some point but now it just stucks until the workflow end up with failure due to timeout reached

Waiting for d8d7155b215768 [app] to become healthy: 0/2
217
218:heavy_multiplication_x: Machine d8d7155b215768 [app] update failed: timeout reached waiting for health checks to pass for machine d8d7155b215768: failed to get VM d8d7155b215768: Get “https://api.machines.dev/v1/apps/fso-cicd-pokedex/machines/d8d7155b215768”: net/http: request canceled
219Error: timeout reached waiting for health checks to pass for machine d8d7155b215768: failed to get VM d8d7155b215768: Get “https://api.machines.dev/v1/apps/fso-cicd-pokedex/machines/d8d7155b215768”: net/http: request canceled
220Your machine never reached the state “%s”.
221
222You can try increasing the timeout with the --wait-timeout flag
223
224Error: Process completed with exit code 1.

so it happens in workflow but when i check the fly dashboard i see deployment actually has been done and okay. so what’s the problem? i’m not really an expert im a student and ive been dealing with a lot in last few days learning ci/cd but this one i don’t know how to solve without having to delete health check.

EDIT . when clicking on the links i get this :
{

  • error: “You must be authenticated to view this.”

}

is it due to invalid token ? i can’t think of any other auth related thing.

1 Like

when i deleted health checks from fly.toml the problem solved.
but i want to involve health checks , so i am really looking forward to find the proper solution.
btw this is health checks i wrote in fly.toml:

[[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"

[[services.http_checks]]
    interval = 10000
    grace_period = "4s"
    method = "get"
    path = "/dummy/health"
    protocol = "http"
    timeout = 2000
    tls_skip_verify = false
    [services.http_checks.headers]
1 Like

anyone? i still have the issue. having health checks in my fly.toml file causes running the wokflow in ci stages problems , but when removing health checks everything works fine.

Its trying to hit your endpoint that you defined for the healthcheck, {YOUR_URL}/dummy/health

Can you check that this exists and is accessible?

1 Like

thanks for response.
yes that endpoint exists , and accessible through web browser.

Another thing to check is whether that endpoint is returning status code 200 which is what its looking for.

of course is returning http 200. before i open this topic i worked in detail to figure out why the health check is facing problems and it was something more than a mistake from me, unless there are some very specific configuration with fly.io that i don’t know about and hav’nt implemeted and that is causing the problem

Try extending the grace period to 10 seconds and see if that makes a difference.

Otherwise, might be an issue on fly’s side.

1 Like

issue still persists. i just removed the health checks and using my custom ones in workflow ( and a health check action too)

Could also be that its not hitting the right path. Do you have logs to see? You can try writing some middleware in your app to log some incoming request.

But yeah, not many options left.

Hi @crawwwler, I was able to reproduce your error locally. With a health check in place, my deploy failed with the same error message. Without a health check, my deploy succeeded.

I was finally able to get the deploy to pass by adding an X-Forwarded-Proto header to the health check. In your case, that would be the following:

[[services.http_checks]]
    interval = 10000
    grace_period = "4s"
    method = "get"
    path = "/dummy/health"
    protocol = "http"
    timeout = 2000
    tls_skip_verify = false
    [services.http_checks.headers]
      X-Forwarded-Proto = "https"

Can you give that a try?

Alternately, maybe you could use checks instead of services.http_checks:

[checks]
  [checks.status]
    port = 3000
    type = "http"
    interval = "10s"
    timeout = "2s"
    grace_period = "5s"
    method = "GET"
    path = "/dummy/health"
    protocol = "http"
    tls_skip_verify = false
    [checks.status.headers]
      X-Forwarded-Proto = "https"

If you’re still encountering errors, can you post the output of fly logs after you try to deploy with CI?

One other possible gotcha here is that you can use services.http_checks only if you also have a services section defined in your fly.toml.

If instead you’re using an http_service section, you can implement a health check with http_service.checks.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.