525 errors are back

Hello,

I started getting reports of a 525 error with an app from a Cloudflare healthcheck, and sure enough when I visit the domain, I get a Cloudflare error page :frowning:

It’s been like this for about 5 minutes so far, and a refresh still shows the error.

I checked the app and it says in flyctl status, info etc that all is well. The logs don’t show any errors. I haven’t made any changes to it. It just started happening.

Are there any more routing issues you are aware of? You thought that may have been the case a few days ago when this last happened. I’m using the standard port 443:

Services
PROTOCOL PORTS
TCP 443 => 8080 [TLS, HTTP]

Does your app work if you visit .fly.dev directly? We’re not having any issues at the moment.

Also, if you visit debug.fly.dev are do you see Fly-Region: lhr?

Yes, I see Fly-Region: lhr.

Interestingly it does work when I visit the app directly, using name.fly.dev.

But … I’ve just tried running flyctl certs list, and the certificate’s status has dropped to ‘Awaiting configuration’. Ah ha. That’s not right, it should be ready.

So that suggests in the past few minutes it has presumably tried to revalidate with lets encrypt or whoever it is, and failed, and so that would explain why I can’t use my custom domain but can use the name.fly.dev domain.

I’ll see if I need to make an extra acme record for that

(this is different to the issue a few days ago, as then I definitely had ‘ready’ for the cert status throughout, and did have acme dns records already, so perhaps this 525 is actually a 525 due to a lost cert not port/route error …)

Ah! You will definitely need an _acme-challenge record so we can issue a certificate behind cloudflare. You can probably just set <app>.fly.dev directly in cloudflare, though.

Ah … yes. Ok, so yes, the 525 error before was with an app that did have an acme-challenge already set up, and a ‘ready’ for the cert. This time, it is the same error code but for a different reason: there actually is no cert. At least, not until a few minutes ago. Since there was one. But now it’s gone. Which suggests the validation has been checked and what worked before, didn’t.

So the question is: what is the acme-challenge value to use? It’s something.flydns.net. What is that something value?

I tried flyctl certs check hostname but it tells me to make a CNAME record to name.fly.dev. Which I have done already. That’s the problem it seems: because it is an orange cloud CNAME, it is failing the IP check now. Because it goes to Cloudflare’s IP. Hmm. But I can’t just grey cloud it … else this’ll happen again in X months :slight_smile: So I need to get that flydns.net record …

If you turn it to the gray cloud, you don’t need that extra DNS entry for us to issue certs (since lets encrypt then connects to our IPs instead of cloudflare).

You can get the _acme-challenge target with flyctl certs show <hostname>.

Ah, I didn’t explain that very well!

I meant if I turn it to a grey cloud temporarily (to get the cert to issue now), and then turn it back to an orange cloud (which I want to do) it’ll happen again in X months. As the validation will fail then (when it checks it at some random time, like is happening now).

Hence needing that acme-challenge dns record, which can be grey clouded.

Alas I did try flyctl certs show hostname already, and it doesn’t show that flydns.net record. It just says to make a CNAME …

It just gives a list of one option …

1: Adding an CNAME record to your DNS service which reads:

(it’s like there should be a 2, perhaps for this flydns, but it’s not there)

Ok, I can’t see a way in the CLI to get the DNS entry needed and using the CNAME it says (which I already have) would mean it would happen again at some random time once I turn that back to an orange cloud.

However while I was waiting I wondered if your dashboard showed the DNS value to use, and handily, it does. Its certificates page shows the something CNAME to use with the acme-challenge to verify the request. And having done so, sure enough after a few minutes the certificate was issued, and no more 525 error.

So it’s fixed. It was an expired cert that could not be validated, since the DNS record was now orange-clouded, and there was no acme-challenge to this particular app. And so it was a combination of timing (I didn’t know when certs happen to get revalidated) and on the heels of the prior 525, didn’t think to check for that. Ah well, all good now.

It would be handy if the CLI did say the flydns acme-challenge, but at least the console does.