Some requests are not reaching fly.io servers

I noticed that some requests are not reaching the fly.io servers.

I tried changing the region or scaling the machines, and the behavior is the same: some requests are lost before reaching the fly.io servers, and CloudFlare, which sits on top of my fly.io servers, returns a 523 status code (unreachable origin).

Also I wanted to open this topic to know if more users are experience the issue.
I’ve already verified that it happens in an intermittent way, especially when I perform parallel requests. Also, I believe this behavior is recently introduced bug since nothing has changed in my setup. The last time this worked fine was 1 week ago.

Is there any way I can debug this? I can’t see logs using fly logs since the request is lost before reaching the servers, maybe fly proxy is getting troubles stablishing the TLS handshake?

Hi,

That is odd. I recall having issues with Cloudflare as a proxy. Not that issue (I was getting 525, which was a different issue). I haven’t come across a 523. Perhaps others have, though the lack of replies suggests perhaps not.

I’m not sure how you could debug since if the request is not even getting to Fly, it wouldn’t be in their log :thinking:

I assume Cloudflare’s raw logs are Enterprise-only but that could be worth investigating to approach it from their end.

If it is something to do with TLS … another option to look at could be TCP pass through, where your app handles the SSL. It’s a bit of extra work but you would probably add the cert to the app (origin, in Cloudflare-land) …

… and could then securely connect to it.

In your fly.toml you would remove the http/tls handler. That would take the proxy to some extent out of the way (at least that part of it):

It seems I need to setup a certificate:

fly certs show {DOMAIN} -a {APP}
The certificate for {DOMAIN} has not been issued yet.

Hostname                  = {DOMAIN}
DNS Provider              = cloudflare
Certificate Authority     = Let's Encrypt
Issued                    =
Added to App              = 46 minutes ago
Source                    = fly

You're using Cloudflare's proxying feature (orange cloud active) for this hostname.
If you do not need Cloudflare-specific features, it's best to turn off proxying.
The only way to create certificates for proxied hostnames is to use the DNS challenge.
You can validate your ownership of {DOMAIN} by:

1: Adding an CNAME record to your DNS service which reads:
    CNAME _acme-challenge.{DOMAIN} => {DOMAIN}.l2g0r.flydns.net.

If you've already set this up, your certificate should be issued soon.
For much more information, check our docs at: https://fly.io/docs/app-guides/custom-domains-with-fly/

It seems to me certificates are manually created by Fly.io team, so I’m waiting for it.

Hmm, that’s odd. It was working intermittently w/o a valid cert?

Indeed, seems odd to me too.

It seems to me certificates are manually created by Fly.io team

Er … not sure about that :thinking: I think the Let’s Encrypt bot is in charge.

You should get a certificate in seconds unless there is an issue validating you own that domain. In your case I think you will run into that problem as you are using an orange-cloud (aka proxied) record. That CLI response suggests that. There are a bunch of threads on here about that very issue, if you search for Cloudflare and SSL.

Not sure this is what’s causing the 523. SSL issues tend to be 525. But it’s certainly worth checking.

I don’t see this cname - did you make sure to add it? I’m looking at a .io domain who’s certificate was generated today.

ops, fixed. I requested two certificates for two of my apps. Can you take a look now?

is your SSH/TLS settings on Cloudflare set to Full?
As for the output regarding Cloudflare proxying:

  1. Delete the DNS record to your app on Cloudflare
  2. Remove fly certs: fly certs remove example.com
  3. Recreate app cert: fly certs add example.com
  4. Add DNS record on CF: Type: CNAME, Name: @ for root or w/e for subdomain, target: your-app.fly.dev IMPORTANT: make sure you disable proxy, it should read DNS only.
  5. Wait a few minutes then go to your site. Once the TLS connection is good, you can switch to Proxy

You don’t need to setup the acme challenge

That was my issue. Thanks! Now the application has the certificate properly set.

However, I was doing that hoping that can fix my issue, and not really, but I found something.

I’m still experiencing a bizarre issue that is related in the way CloudFlare and fly.io servers interacts. Let me explain.

I have a CloudFlare page rules to cache any response from fly.io servers:

When this is enable, requests tend to fail, specially if they are performed in parallel.

However, if I change the cache behavior to the standard (it just cache assets like images or videos):

All work fine. I’m sure it’s a buggy behavior between fly <> CloudFlare interaction and not just in the CloudFlare side not just CloudFlare because I have the exactly setup using AWS <> CloudFlare and it works with no issues.

It seems to be something is happening in the TLS handshake, is it a way I can see what’s happening there at fly proxy logs?

If a fly.io engineer contact me I’ll be happy to show the behavior, it’s easily to reproduce.

Interesting; what’s the use case for caching everything? Usually you’d want finer control via the cache-control headers. I just enabled this on my app, haven’t seen any issues yet but I’ll let you know.

Essentially for caching JSON payloads in the network

I’m still experiencing the issue. It seems Fly <> CF networking issues has been a thing over time (related Fly apps proxied through Cloudflare may route to incorrect regions.)

I haven’t had any issues myself so far :person_shrugging:
Can’t you just add the cache-control headers to your JSON payload?

that is exactly what I’m doing.

Rather the write into the network cache I believe the issue is retrieving the value from CF cache when fly.io performs the petition.

I just noticed that setting the Page Rule to cache everything broke pages that were streaming. IMO adding a Cache Rule would be better than the Page Rule (there’s too many options.)
I think setting example.com/api/* on Page Rule would work too but you get 10 Cache Rules vs 3 Page Rules on the free tier.

hum I’m not seeing how that is fixing this, it’s just a different way to have the same setup

Yea I was following up w/ my experiment w/ what you were doing.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.