Fly deploy seems to have failed connection at Proxy

I am building a Go / KeyDB app.

Of 5 deploys today, only 1 worked. None of them fail the deployment or health check, and all worked locally. I am not convinced yet it’s my code. But they won’t receive requests.

https://showdown-labs.fly.dev

A few things -

  1. The first thing my server does on a request, is log the path. Logs don’t show incoming requests.
  2. The [statics] section works fine when the deploy works (response is sent back). When the deployment finishes and is healthy, but there is no response, [statics] breaks also, which should not be hitting my app.
  3. Only errors I am seeing are from proxy

After a while of the failed request, Minutes later the log will show this error.

2022-02-14T18:29:19Z proxy[bc9c1c18] chi [error]Error 2: Internal problem

Notice the proxy in the message, as opposed to runner or app.

Can my deployment break the proxy? I am really confused.

I have also seen this error, but I am unsure if it was during a period of failure.

error.message="Undocumented" 2022-02-14T11:59:34Z proxy[ffe08d52] mia [error]error.code=1 request.method="POST" request.url="/HNAP1/" request.id="01FVW1M3RM7ADXBC563MYCNT20" response.status=502 

Could you post the fly.toml file to see if anything stands out in that? Probably not but doesn’t hurt.

I’m not very familiar with Go but I assume it builds using a port of 8080. So that would need to match what is in the fly.toml for requests to get to the app.

Given you say the first thing the server does is log a request, it sounds like the request is not getting to your app from the proxy. And hence that internal problem showing in the logs. The small delay for that line to appear shouldn’t be an issue as logs for me are near realtime but can be slightly delayed. I assume that if the Fly proxy receives a request but then can’t, for some reason, connect to your app, and so pass it along, reports a problem, and hence that line.

I can post the fly.toml, but that has not changed.

Yeah, the requets do not appear to be getting to my application. The logs are delayed, but when I do a fresh deploy, I can see those logs, or any messages from the server starting.

# fly.toml file generated for showdown-labs on 2022-01-29T17:11:59-05:00

app = "showdown-labs"

kill_signal = "SIGINT"
kill_timeout = 5
processes = []

[build]
  dockerfile = "./Dockerfile"

  [build.args]
    BP_KEEP_FILES = "./goapp/public/*"

[deploy]
  strategy = "rolling"

[env]
  PORT = "8080"
  KEYDB_HOST = "showdown-labs-keydb-fly.internal"
  KEYDB_PAGE_CACHE_DB = 0
  KEYDB_PAGE_DATA_DB = 1
  KEYDB_STALE_TTL = 3600
  KEYDB_STORE_TTL = 86400

[experimental]
  allowed_public_ports = []
  auto_rollback = true

[[statics]]
  guest_path = "/goapp/public"
  url_prefix = "/"

[[services]]
  http_checks = []
  internal_port = 8080
  processes = ["app"]
  protocol = "tcp"
  script_checks = []

  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

  [[services.ports]]
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"

This is probably related to our slow state propagation. Does this app have one instance?

The unknown errors are requests trying to hit VMs that have been shut down. The error could be better, but these will go away as we roll out better service discovery (which is happening this week, unless something goes wrong).

Yeah, it does only have one instance. But it never recovers. An hour later, it still won’t recieve traffic.

I never had this issue before today.

Even if it’s not routing to my app, should [statics] be impacted by this service discovery issue?

Oh I misunderstood your post, sorry about that. Let me have a look at what might be happening here.

[statics] are impacted by the service discovery issue, each new deploy registers statics the same way it does app instances, those also need to propagate.

1 Like

Ok, we identified the issue. The physical host your app was landing on wasn’t accepting connections for new VMs. Your app should be working now. This was a failure mode we haven’t seen before, so we’re instrumenting it not so we can catch it next time.

1 Like

Yeah it totally is.

I spent HOURS trying to understand what I did wrong in my app to cause that. Happy it wasn’t me.

What red flags can I look too in the future to understand my app vs fly?

Also are their health checks that run that should of identified this was not reachable? Or are the health checks lower level and internal so proxy issues won’t be flagged by it?

1 Like

The health checks happen out of band from the proxy (right now), so the proxy wouldn’t have seen it.

If you have an app successfully deploy and it’s not accepting traffic, that’s a giant red flag. I didn’t read your initial post right or I’d have told you that immediately. :smiley:

I am just new to Go, so worried I screwed something up. =)

But fair enough, I just didn’t know if there was a way to check service discovery or anything.