Experiencing very slow response times

I’ve been running my app on fly.io for the past couple of months, and response times have always been great. This morning all my responses seem to be very slow, on the order of 15-30 seconds. If I hit an invalid path, I get an immediate response. If I hit a path that accesses my postgres database, the response takes a long time. Proxying into my database with flyctl proxy I can access the database just fine use a database application with no noticeable response lag. I do notice some oddities that I’m not sure if they are normal in my postgres metrics. This app is not deployed to any users yet, so I am the only user. With me not doing anything to access the app, “Queries per second” is consistently at about 8 per second, “Transactions” is at 8 commits, “Tuples” is at 400 fetched and 500 returned, and “Cache hit ratio” is at 100%. Any ideas what’s going on? FLY_REGION is lax, which seems reasonable.

fyi: I just created a similar thread, but not specific to postgresql

Same. Since yesterday response times that normally take ~2 seconds are over 20 seconds. This morning I can’t get a response at all.

@jsphc @containerops Are you both also accessing a Postgres hosted on Fly?

Yes, DB is postgres hosted on fly. Rails app is also hosted on fly.

We have a passive health check system where our edges will avoid sending connections to certain hosts if they’re detected as unhealthy from the edge’s perspective. This allows us to circumvent some nodes temporarily in case of network failures or any other condition causing connection failures.

If you only have 1 instance, and it happens to be on a node experiencing difficulties, there’s nowhere better our edge can route to.

That said, these problems shouldn’t persist without us creating a status page incident and investigating. This could be a bug in our proxy where it might be keeping the bad status of a host. I’ve deployed the latest version of our proxy now to all our edges and hope this will get cleared.

That said, I don’t think all our issues are the same!

@davidfro and @jsphc: we’ve detected very high latency between the 2 denver nodes hosting your app and your DB. Looking into that now.

@containerops: this might be the issue I’ve outlined at the start of my message, can you tell if your metrics are better from synthetic monitoring?

@davidfro @jsphc the issue should now be resolved for Denver.

1 Like

Much better, thanks!

Yep, can confirm everything is back to normal. Thanks!

A post was merged into an existing topic: Fly Network: Slow response times since yesterday