I noticed that some requests are not reaching the fly.io servers.
I tried changing the region or scaling the machines, and the behavior is the same: some requests are lost before reaching the fly.io servers, and CloudFlare, which sits on top of my fly.io servers, returns a 523 status code (unreachable origin).
Also I wanted to open this topic to know if more users are experience the issue.
I’ve already verified that it happens in an intermittent way, especially when I perform parallel requests. Also, I believe this behavior is recently introduced bug since nothing has changed in my setup. The last time this worked fine was 1 week ago.
Is there any way I can debug this? I can’t see logs using fly logs since the request is lost before reaching the servers, maybe fly proxy is getting troubles stablishing the TLS handshake?
That is odd. I recall having issues with Cloudflare as a proxy. Not that issue (I was getting 525, which was a different issue). I haven’t come across a 523. Perhaps others have, though the lack of replies suggests perhaps not.
I’m not sure how you could debug since if the request is not even getting to Fly, it wouldn’t be in their log
I assume Cloudflare’s raw logs are Enterprise-only but that could be worth investigating to approach it from their end.
If it is something to do with TLS … another option to look at could be TCP pass through, where your app handles the SSL. It’s a bit of extra work but you would probably add the cert to the app (origin, in Cloudflare-land) …
… and could then securely connect to it.
In your fly.toml you would remove the http/tls handler. That would take the proxy to some extent out of the way (at least that part of it):
fly certs show {DOMAIN} -a {APP}
The certificate for {DOMAIN} has not been issued yet.
Hostname = {DOMAIN}
DNS Provider = cloudflare
Certificate Authority = Let's Encrypt
Issued =
Added to App = 46 minutes ago
Source = fly
You're using Cloudflare's proxying feature (orange cloud active) for this hostname.
If you do not need Cloudflare-specific features, it's best to turn off proxying.
The only way to create certificates for proxied hostnames is to use the DNS challenge.
You can validate your ownership of {DOMAIN} by:
1: Adding an CNAME record to your DNS service which reads:
CNAME _acme-challenge.{DOMAIN} => {DOMAIN}.l2g0r.flydns.net.
If you've already set this up, your certificate should be issued soon.
For much more information, check our docs at: https://fly.io/docs/app-guides/custom-domains-with-fly/
It seems to me certificates are manually created by Fly.io team, so I’m waiting for it.
It seems to me certificates are manually created by Fly.io team
Er … not sure about that I think the Let’s Encrypt bot is in charge.
You should get a certificate in seconds unless there is an issue validating you own that domain. In your case I think you will run into that problem as you are using an orange-cloud (aka proxied) record. That CLI response suggests that. There are a bunch of threads on here about that very issue, if you search for Cloudflare and SSL.
Not sure this is what’s causing the 523. SSL issues tend to be 525. But it’s certainly worth checking.
is your SSH/TLS settings on Cloudflare set to Full?
As for the output regarding Cloudflare proxying:
Delete the DNS record to your app on Cloudflare
Remove fly certs: fly certs remove example.com
Recreate app cert: fly certs add example.com
Add DNS record on CF: Type: CNAME, Name: @ for root or w/e for subdomain, target: your-app.fly.dev IMPORTANT: make sure you disable proxy, it should read DNS only.
Wait a few minutes then go to your site. Once the TLS connection is good, you can switch to Proxy
All work fine. I’m sure it’s a buggy behavior between fly <> CloudFlare interaction and not just in the CloudFlare side not just CloudFlare because I have the exactly setup using AWS <> CloudFlare and it works with no issues.
It seems to be something is happening in the TLS handshake, is it a way I can see what’s happening there at fly proxy logs?
If a fly.io engineer contact me I’ll be happy to show the behavior, it’s easily to reproduce.
Interesting; what’s the use case for caching everything? Usually you’d want finer control via the cache-control headers. I just enabled this on my app, haven’t seen any issues yet but I’ll let you know.
I just noticed that setting the Page Rule to cache everything broke pages that were streaming. IMO adding a Cache Rule would be better than the Page Rule (there’s too many options.)
I think setting example.com/api/* on Page Rule would work too but you get 10 Cache Rules vs 3 Page Rules on the free tier.