I have a simple NodeJS app listening for incoming requests. It was working well for a week or so and today these errors started:
2023-03-06T18:31:44.155 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 60)
2023-03-06T18:31:44.707 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 20)
2023-03-06T18:31:54.158 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 70)
2023-03-06T18:31:54.629 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 30)
2023-03-06T18:32:04.137 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 80)
2023-03-06T18:32:04.514 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 40)
2023-03-06T18:32:14.065 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 90)
2023-03-06T18:32:14.671 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 50)
2023-03-06T18:32:24.749 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 60)
2023-03-06T18:32:34.807 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 70)
2023-03-06T18:32:44.716 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 80)
2023-03-06T18:32:54.839 proxy[b77cb7ec] fra [warn] Could not proxy HTTP request. Retrying in 1000 ms (attempt 90)
Code snippet:
const express = require('express');
const app = express();
const PORT = process.env.PORT || 8080;
const axios = require('axios');
app.listen(PORT, function () {
console.log(`App listening on port ${PORT}!`);
});
app.post('/music', function (req) {
// do something
});
I’m also seeing this error and I’ve emailed support. My app is currently scaled to 2. I’ve tried restarting and scaling the count up and down it’s not fixing the problem. This is really serious.
Yes, it was the same timeout. The app was working well and suddenly it started failing with that error, after a 3hrs downtime we JUST were able to fix it by scaling down our vm, this is really weird.
To be clear, we scale down the resources but not the vm instances.
@foocux Fly support responded via email earlier saying, “We had some state cleanup to perform on our end which caused the proxy to have intermittent issues in a few regions.” It’s fixed for me now. This was the first time I’ve seen this specific issue.
When the errors started, I deployed the app many times and restarted it at least two times and that didn’t fix it but when I scaled down the app, it started working again, very odd.
I’ve had time to look at your app, here are some observations:
For a while our proxy started seeing “connection closed before message completed” error from your instances. That usually indicates the app is erroring and not responding with HTTP headers.
Your concurrency limits are 99999,99999 so you’re not getting limit enforcement or getting any load balancing from us.
Your app is slow to respond with HTTP headers, that’s usually an indication something is wrong with the app. I see response headers taking up to 3-4s under normal load, but during the problematic times, it took over a minute sometimes.
I think the scaling down fixing the issue was only a coincidence, your app received less traffic at that time.
These metrics are all available to you: https://fly-metrics.net (the Fly Instance dashboard should tell you more about your app’s performance)
Some things you can try:
Requests concurrency (explained here). Without that, our proxy establishes a new connection for every request. This is to avoid race conditions with connection pooling in general. Opening many connections per second might not suit your app, it seems to stop working right at ~20-30 connections per second (this isn’t the same as concurrent connections)
Find the right concurrency your app can handle, either by benchmarking or trial and error. I expect not all routes have the same resource cost. My guess would be that each instance can handle 10-20 concurrent requests.
Troubleshoot your app’s performance: What’s taking so much time in these requests? Could be slow DB queries, slow network calls, or almost anything.
I’m happy to help some more figuring out our metrics and logs.
Hmm, what you just said makes a lot of sense. I actually didn’t know we had those concurrency limits which explain a lot of things about why our iad load was always high and our other instance all the opposite.
Thank you for your analysis, I’m gonna take a closer look to the metrics you sent me, it’s gonna be really helpful right now.