We’re having an issue where clients are unable to access some custom domains, because of SSL issues. Looking at nginx and nft logs, traffic isn’t reaching our proxy. At this time we’re telling our visitors to visit sites via HTTP, which works.
I understand a lot is going on right now, but keep up the good work.
What are the errors they’re seeing? Can you reproduce it? (Any chance you can check the certs that the browser sees? Sounds like maybe an expired cert?)
Unfortunately, we’re unable to see any issues on our end. We can use postman or a browser to show the SSL connection failing, but the packets aren’t actually reaching Nginx. Here’s an example:
Nginx access and errors logs don’t show anything from my IP, nor does nft (which is wide open, but set to log ingress traffic).
Regarding the certificate, that didn’t even dawn on me. We’re not showing any errors on the certificate on the app, and others we’ve had even longer of Fly are still working. Is this an issue from certbot that we can handle ourselves somehow?
Well I’m not totally sure on your setup just yet - It looks like the NS records on that domain use cloudflare.
Are you using CloudFlare just for DNS records (pass-thru) or is CloudFlare acting as a proxy as well (maybe providing SSL + caching?)
Yes, only DNS. Both our certificates for cgcookie.com and *.cgcookie.com have been created in the last 6 months, and show issued by Let’s Encrypt.
I went ahead and removed the certificates manually, and recreated them. We are still experiencing SSL errors on that particular domain. To help diagnose, nothing has changed with the domain or proxy for over 6 days, and only recently starting showing issues in the past few hours.
@fideloper-fly Thank you again for all of your help!
Just to add a conclusion to this thread, we had a wildcard and apex domain certificate on the domain experiencing the SSL failures. While it did work for many months, our recent SSL issue was only resolved when recreating the apex*, and removed the wildcard completely.
EDIT: *I forgot that you guys had to do this manually, but also mentioned to have plans to issue a fix to allow it to be handled on the user-end.
I think we’re all in agreement that theorizing that it was a change in LE. Still, I think it would be good to know where the issue lies specifically, and what caused our dual-certificate setup to fail after months of success. On that note, we still have our staging domain setup this way (one apex and one wildcard), so we’ll continue to leave it that way to try to track down the issue. If we experience the same issue when the certificate renews, we can surely chalk it up as a LE issue.