Of 5 deploys today, only 1 worked. None of them fail the deployment or health check, and all worked locally. I am not convinced yet it’s my code. But they won’t receive requests.
The first thing my server does on a request, is log the path. Logs don’t show incoming requests.
The [statics] section works fine when the deploy works (response is sent back). When the deployment finishes and is healthy, but there is no response, [statics] breaks also, which should not be hitting my app.
Only errors I am seeing are from proxy
After a while of the failed request, Minutes later the log will show this error.
2022-02-14T18:29:19Z proxy[bc9c1c18] chi [error]Error 2: Internal problem
Notice the proxy in the message, as opposed to runner or app.
Can my deployment break the proxy? I am really confused.
I have also seen this error, but I am unsure if it was during a period of failure.
error.message="Undocumented" 2022-02-14T11:59:34Z proxy[ffe08d52] mia [error]error.code=1 request.method="POST" request.url="/HNAP1/" request.id="01FVW1M3RM7ADXBC563MYCNT20" response.status=502
Could you post the fly.toml file to see if anything stands out in that? Probably not but doesn’t hurt.
I’m not very familiar with Go but I assume it builds using a port of 8080. So that would need to match what is in the fly.toml for requests to get to the app.
Given you say the first thing the server does is log a request, it sounds like the request is not getting to your app from the proxy. And hence that internal problem showing in the logs. The small delay for that line to appear shouldn’t be an issue as logs for me are near realtime but can be slightly delayed. I assume that if the Fly proxy receives a request but then can’t, for some reason, connect to your app, and so pass it along, reports a problem, and hence that line.
I can post the fly.toml, but that has not changed.
Yeah, the requets do not appear to be getting to my application. The logs are delayed, but when I do a fresh deploy, I can see those logs, or any messages from the server starting.
This is probably related to our slow state propagation. Does this app have one instance?
The unknown errors are requests trying to hit VMs that have been shut down. The error could be better, but these will go away as we roll out better service discovery (which is happening this week, unless something goes wrong).
Oh I misunderstood your post, sorry about that. Let me have a look at what might be happening here.
[statics] are impacted by the service discovery issue, each new deploy registers statics the same way it does app instances, those also need to propagate.
Ok, we identified the issue. The physical host your app was landing on wasn’t accepting connections for new VMs. Your app should be working now. This was a failure mode we haven’t seen before, so we’re instrumenting it not so we can catch it next time.
I spent HOURS trying to understand what I did wrong in my app to cause that. Happy it wasn’t me.
What red flags can I look too in the future to understand my app vs fly?
Also are their health checks that run that should of identified this was not reachable? Or are the health checks lower level and internal so proxy issues won’t be flagged by it?
The health checks happen out of band from the proxy (right now), so the proxy wouldn’t have seen it.
If you have an app successfully deploy and it’s not accepting traffic, that’s a giant red flag. I didn’t read your initial post right or I’d have told you that immediately.