We have an internal tool that we recently migrated to Fly.io from Digital Ocean. Among other things, it fetches assets from various different websites.
Recently, I’ve noticed that it’s unable to fetch assets from a subset of websites and instead errors out with a Connection reset by peer
or timeout
messages, even though the websites are accessible locally as well as from our previous DO servers (never any issues).
So I went digging in:
1. Request with curl
I started by running fly ssh console
and making a simple curl
request. It failed with the following error:
curl -v https://www.ascentvictorypark.com/
* Trying 198.190.14.13:443...
* Connected to www.ascentvictorypark.com (198.190.14.13) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/ssl/certs/ca-certificates.crt
* CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: CN=ascentvictorypark.com
* start date: Mar 29 09:00:12 2022 GMT
* expire date: Jun 27 09:00:11 2022 GMT
* subjectAltName: host "www.ascentvictorypark.com" matched cert's "www.ascentvictorypark.com"
* issuer: C=US; O=Let's Encrypt; CN=R3
* SSL certificate verify ok.
> GET / HTTP/1.1
> Host: www.ascentvictorypark.com
> User-Agent: curl/7.79.1
> Accept: */*
>
* OpenSSL SSL_read: Connection reset by peer, errno 104
* Closing connection 0
curl: (56) OpenSSL SSL_read: Connection reset by peer, errno 104
2. Accessible by other Services (DigitalOcean)
I ran the same command locally and it succeeded, and the website is also accessible directly.
I wasn’t sure what was going on at this point and wanted to isolate the issue. My thought was maybe this issue was limited to the networking at Fly.io and wanted to confirm. I SSH’ed back into the old DigitalOcean instance for the app and made the same curl
request which succeeded without issues.
3. Accessible by other Apps
Next step: isolate the issue further. I have a few other apps deployed to Fly.io in various regions throughout the world, and I SSH’ed into a few of them and ran the same command and all of them succeeded as well.
4. Changing IPs & Regions
This led me to believe the issue was isolated to my current app instance only. Maybe the server/IP somehow landed on the blacklist/firewall of every single website at the same. It could happen.
So changing the IPs and regions would solve the problem, right? Wrong.
I released old IPs and assigned new ones (both v4 and v6), changed the region multiple times, and restarted the app. But the curl command always returned the same error.
5. Deployed a new App
I then launched a brand new app on Fly.io in a completely different region (but with the same code + Dockerfile), and retried the curl command and it failed again with the same error.
So something very weird is going on here. Could this be an issue with my Dockerfile? Seems unlikely though.
Why are requests to some websites successful for some apps but not for others?