Cloudflare 525 error randomly occurs

I guess after I set SSL_KEY in secrets, in my app i need a function to put the SSL_KEY into a file and place the file in the root? how you accomplish this?

I don’t think you need to save it to a file in order to use it.
You could directly read the env variable in your app code and use it from there.
Steps 2 & 3 from Use Cloudflare Certs to FlyIO - #8 by ignoramous shows you how.

Or the SSL_KEY env will be automatically detected by fly.io as cert key? and I do not need to do anything from my side?

I’m afraid you have to handle TLS in your app :slight_smile:
Fly won’t auto-detect that env as a cert and do anything.

1 Like

Get it! I will try that. Thanks FrequentSolver again :gift_heart:

To read the certs and enable HTTPS will depend on your application, framework, runtime, etc.

For example, in Fastify and other Node servers you simply pass the certs when configuring the server.

https://nodejs.org/dist/latest-v14.x/docs/api/https.html#https_https_createserver_options_requestlistener

I don’t use CloudFlare, but is it really not possible to have them communicate with https://<app>.fly.dev? Because that’s kind of silly.

You probably can setup the CDN with a CNAME record instead of an A record over a Fly subdomain.

My issue was about making requests from Cloudflare Workers to the Fly app while using an A record to the IPv4 of the Fly app.

Yep, if you set that app.fly.dev as a CNAME record, they will pull from that using https.

My issue was getting random 525s (when behind Cloudflare, which I wanted to do to get its geo-ip header). But it did work 99% of the time.

Not sure from @Loonb if they are consistently getting a 525 or randomly, like me. I solved it by handing TLS in-app but given all the changes to the proxy/routing since then, that may not be needed as I’ve not tried reverting back.

2 Likes

Yes randomly and rarely. Currently I leave it as it is. Coz I have no idea how to do the TLS in app using django.

This was really helpful, thank you!

I’ve set up my app to handle SSL and it all works, but I’m still getting random 525 errors that last 5 - 10 minutes.

I’m clutching at straws trying to figure out what I’m missing to stop these errors.

This might be a n00b question, but if I’ve added the origin certs to my app, do I still need the fly created SSL certificate and _acme-challenge DNS record? Could having those in place be causing my random 525 errors?

Thank you so much! :pray:

In step 5 I detailed how to tell Fly that your app will be handling SSL (instead of Fly).

If Cloudflare is trying to communicate with your app, and it receives the Fly certs instead of the Cloudflare issued cert, then you will get the 525 errors.

AFAIK the DNS records are just for verification purposes and shouldn’t interfere with SSL.

1 Like

Perfect, thank you! I’ll delete the Fly SSL certificate and hopefully the 525 errors will stop! :crossed_fingers:

Is this happening randomly from the same browser for the same user?

It looks like we can’t fulfill #5 here: Community Tip - Fixing Error 525: SSL handshake failed - Tutorial - Cloudflare Community

(I know this was pointed out before)

Our ciphers are more limited than Cloudflare’s. This is because we’re using a different TLS library that only supports TLSv1.2 and TLSv1.3 with modern ciphers. Even with a shorter list of ciphers, we still support the vast majority of clients. Clients that don’t support any of these are considered insecure.

No, it was happening for two separate uptime checks and I checked via my laptop and phone.

For the record (know this is an old comment), this would be a dealbreaker for us since we need broader SSL termination support than Fly offers (ref).

There are also a ton of other CF features (eg WAF stuff) that we can’t give up, are happy with, and which in any case I wouldn’t expect Fly to start offering.

There’s another thread which feels modestly related, with a commenter and I looking for static count-per-region scaling.

Putting these together, Fly could support a “fewer bells-and-whistles” style of deploy, which feels like it would be a subset of the features already built in to fly. e.g.:

  • Want magic anycast routing? A & AAAA your domain to your fly anycast ip, we’ll handle SSL termination and do the rest.
  • Using Cloudflare or don’t want us doing geographic load balancing? CNAME your domain to yourapp.<region>.fly.dev and it’ll always route to your instances in that region.

(I would certainly respect that offering & maintaining more deployment styles would increase your support costs, but I have to think this kind of ‘operational parity’ would make it easier to conquest existing e.g. Heroku customers. If that’s a goal…)

1 Like

“Unsupported” for us just means we can’t do much to help. Cloud Flare’s error pages seem more designed to deflect blame than to help people debug these types of problems. We need a lot more information than they provide to prevent these errors. We wouldn’t do anything to intentionally disable it, though.

You can do what you’re asking though!. fly ips allocate-v4 --region <region> will route the way you want. You should be able to point anything you want to those.

3 Likes

One other thought about Cloudflare 525s: that code can be a bit misleading. It may not actually be caused by an SSL issue. I recently got a random 525 error page, after not touching anything, and noticed a vm in the app had been recently replaced by Fly (maybe capacity, region-load, hardware etc - have to allow for those kind of things). And so Cloudflare would have been briefly unable to connect. It reported that as a 525. Not e.g a 500 or some other 50x code.

1 Like

Yeah that makes sense! My best guess is there’s some timeout mismatch between their proxies and our proxies. If we time a connection out and they don’t, it might just be that they serve an error.

When a VM gets replaced in your app you shouldn’t get a connection error these days, but it’s possible!

1 Like

There are also a ton of other CF features (eg WAF stuff) that we can’t give up, are happy with, and which in any case I wouldn’t expect Fly to start offering.

Why not? I expect Fly to offer WAF, particularly because they already terminate HTTP/S with Nginx (edit: or, is it something else based on rust?) (and permissively licensed open source WAF for Nginx is a thing).

CF has an enormous set of products and features which would take a lot of resources to develop. I would expect Fly to focus on being a rock solid PaaS first, before going after more of these ‘higher level’ products. (You’re definitely right that Fly’s position as an SSL-terminating reverse proxy makes it possible, tho…)

Just one outsider’s conjecture, of course…

1 Like

We get enough demand for WAF that we’ll ultimately end up doing it. But you’re right, we have no interest in building another CDN. Our big bet is that people don’t really need CDNs if you give them enough flexibility about where their app runs.

It might still make sense for some apps to run behind third party DDoS protection and similar features. At least for the near future. :smiley:

1 Like

I’m getting some 522 errors (not 525) when using my Fly app with a CF Worker.

It’s very weird.

The Worker serves an MP3 file from a subdomain of my main domain media.domain.com/audio.mp3. When directly calling the URL, like say in the browser address, it works fine. But if an HTML in another subdomain like app.domain.com calls the MP3, I get the 522 error.

The app is hosted on Fly and proxied by CF using A and AAAA records.

If instead of using my subdomain for the Worker I use the one provided by CF like something.something.workers.dev it works fine.

Since both media.domain.com and app.domain.com are on the same domain, it shouldn’t be a referrer policy issue. Right?

I’ve tried not using CF to proxy the Fly app but that didn’t solve it.

I’m going to try to use the certs provided by CF on my Fly app (as I detailed in a previous post in this thread) and see if this solves it.

Edit:

Dumb me.

The solution was to add this header to the worker response:

'Cross-Origin-Resource-Policy', 'same-site'

Edit:

No that didn’t fix it. I’m getting the error again.

Edit:

The only thing that consistently worked was using the workers.dev subdomain CF provides.

So I ended up adding a new domain to CF and use it to trigger the worker.

I never tried to change the cert on the Fly app since the error also happened when trying to call the MP3 from any other domain.

1 Like