We’re having an issue where clients are unable to access some custom domains, because of SSL issues. Looking at nginx and nft logs, traffic isn’t reaching our proxy. At this time we’re telling our visitors to visit sites via HTTP, which works.
I understand a lot is going on right now, but keep up the good work.
What are the errors they’re seeing? Can you reproduce it? (Any chance you can check the certs that the browser sees? Sounds like maybe an expired cert?)
Unfortunately, we’re unable to see any issues on our end. We can use postman or a browser to show the SSL connection failing, but the packets aren’t actually reaching Nginx. Here’s an example:
Regarding the certificate, that didn’t even dawn on me. We’re not showing any errors on the certificate on the app, and others we’ve had even longer of Fly are still working. Is this an issue from certbot that we can handle ourselves somehow?
I went ahead and removed the certificates manually, and recreated them. We are still experiencing SSL errors on that particular domain. To help diagnose, nothing has changed with the domain or proxy for over 6 days, and only recently starting showing issues in the past few hours.
Just to add a conclusion to this thread, we had a wildcard and apex domain certificate on the domain experiencing the SSL failures. While it did work for many months, our recent SSL issue was only resolved when recreating the apex*, and removed the wildcard completely.
EDIT: *I forgot that you guys had to do this manually, but also mentioned to have plans to issue a fix to allow it to be handled on the user-end.
I think we’re all in agreement that theorizing that it was a change in LE. Still, I think it would be good to know where the issue lies specifically, and what caused our dual-certificate setup to fail after months of success. On that note, we still have our staging domain setup this way (one apex and one wildcard), so we’ll continue to leave it that way to try to track down the issue. If we experience the same issue when the certificate renews, we can surely chalk it up as a LE issue.