Response body timing out in IAD & EWR (all good in 13 other regions)

I have been tracking down some gnarly response body timeouts in IAD & EWR. The app is deployed in 15 regions, and IAD + EWR are the only regions where this issue is happening on a consistent basis. I dug deep in the app that sits behind Fly.io Proxy and everything looks correct.

To reproduce the issue (it may take a few requests to hit it):

curl --verbose --output /dev/null --header 'Host: nightly.changelog.com' --header 'Flyio-Debug: doit' --header 'Fly-Force-Region: ewr' --connect-timeout 10 --max-time 20 --resolve changelog.com:443:137.66.16.250 'https://changelog.com/'

* Added changelog.com:443:137.66.16.250 to DNS cache
* Hostname changelog.com was found in DNS cache
*   Trying 137.66.16.250:443...
* Connected to changelog.com (137.66.16.250) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=changelog.com
*  start date: Dec 31 16:52:47 2025 GMT
*  expire date: Mar 31 16:52:46 2026 GMT
*  subjectAltName: host "changelog.com" matched cert's "changelog.com"
*  issuer: C=US; O=Let's Encrypt; CN=E8
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://changelog.com/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: nightly.changelog.com]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.7.1]
* [HTTP/2] [1] [accept: */*]
* [HTTP/2] [1] [flyio-debug: doit]
* [HTTP/2] [1] [fly-force-region: ewr]
> GET / HTTP/2
> Host: nightly.changelog.com
> User-Agent: curl/8.7.1
> Accept: */*
> Flyio-Debug: doit
> Fly-Force-Region: ewr
>
* Request completely sent off
< HTTP/2 200
< date: Fri, 02 Jan 2026 19:10:01 GMT
< fly-request-id: 01KE01ZTQ5D40PRR7CRECPXQ77-lhr
< last-modified: Fri, 02 Jan 2026 03:59:21 GMT
< server: Fly/fbde0e6c3 (2025-12-17)
< content-type: text/html
< vary: Accept-Encoding
< x-varnish: 295985 624063
< age: 132
< via: 2 fly.io, 2 fly.io, 1.1 080eee0c702098 (Varnish/7.7), 2 fly.io, 2 fly.io
< etag: W/"69574299-caa3c"
< accept-ranges: bytes
< x-request-id: 01KE01ZTQ5D40PRR7CRECPXQ77-lhr
< access-control-allow-origin: *
< cache-status: region=ewr; origin=nightly(localhost:5030),changelog-nightly-2023-10-10.fly.dev; ttl=-72.593; grace=86400.000; keep=604800.000; storage=storage.memory; hit; stale; hits=1
< content-length: 830012
< flyio-debug: {"n":"edge-cf-lon1-a6eb","nr":"lhr","ra":"195.144.8.28","rf":"Verbatim","sr":"ewr","sdc":"ewr1","sid":"080eee0c702098","st":0,"nrtt":8,"bn":"worker-cf-ewr1-7a70","mhn":"edge-cf-ewr1-b989","mrtt":69}
<
* Operation timed out after 20006 milliseconds with 0 out of 830012 bytes received
* Connection #0 to host changelog.com left intact
curl: (28) Operation timed out after 20006 milliseconds with 0 out of 830012 bytes received

Notice that the response headers are received, but the body remains stuck at 0 out of 830012 bytes

I am able to reproduce the issue if I run the command on the LHR instance:

root@2873321a446d58:/# curl --verbose --output /dev/null --header 'Host: nightly.changelog.com' --header 'Flyio-Debug: doit' --header 'Fly-Force-Region: ewr' --connect-timeout 10 --max-time 20 --resolve changelog.com:443:137.66.16.250 'https://changelog.com/'
* Added changelog.com:443:137.66.16.250 to DNS cache
* Hostname changelog.com was found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 137.66.16.250:443...
* ALPN: curl offers h2,http/1.1
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [1568 bytes data]
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2039 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [80 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / x25519 / id-ecPublicKey
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=changelog.com
*  start date: Dec 31 16:52:47 2025 GMT
*  expire date: Mar 31 16:52:46 2026 GMT
*  subjectAltName: host "changelog.com" matched cert's "changelog.com"
*  issuer: C=US; O=Let's Encrypt; CN=E8
*  SSL certificate verify ok.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA384
*   Certificate level 1: Public key type EC/secp384r1 (384/192 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 2: Public key type RSA (4096/152 Bits/secBits), signed using sha256WithRSAEncryption
* Connected to changelog.com (137.66.16.250) port 443
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://changelog.com/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: nightly.changelog.com]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.14.1]
* [HTTP/2] [1] [accept: */*]
* [HTTP/2] [1] [flyio-debug: doit]
* [HTTP/2] [1] [fly-force-region: ewr]
} [5 bytes data]
> GET / HTTP/2
> Host: nightly.changelog.com
> User-Agent: curl/8.14.1
> Accept: */*
> Flyio-Debug: doit
> Fly-Force-Region: ewr
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [81 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [81 bytes data]
* Request completely sent off
{ [5 bytes data]
< HTTP/2 200
< date: Fri, 02 Jan 2026 19:29:11 GMT
< fly-request-id: 01KE02YY42T1SCBVQAGZEC2PXG-lhr
< last-modified: Fri, 02 Jan 2026 03:59:21 GMT
< server: Fly/fbde0e6c3 (2025-12-17)
< content-type: text/html
< vary: Accept-Encoding
< x-varnish: 2359342 1966767
< age: 2
< via: 2 fly.io, 2 fly.io, 1.1 080eee0c702098 (Varnish/7.7), 2 fly.io, 2 fly.io
< etag: W/"69574299-caa3c"
< accept-ranges: bytes
< x-request-id: 01KE02YY42T1SCBVQAGZEC2PXG-lhr
< access-control-allow-origin: *
< cache-status: region=ewr; origin=nightly(localhost:5030),changelog-nightly-2023-10-10.fly.dev; ttl=57.830; grace=86400.000; keep=604800.000; storage=storage.memory; hit; hits=1
< content-length: 830012
< flyio-debug: {"n":"worker-cf-lon1-6b6a","nr":"lhr","ra":"172.19.29.50","rf":"Verbatim","sr":"ewr","sdc":"ewr1","sid":"080eee0c702098","st":0,"nrtt":0,"bn":"worker-cf-ewr1-7a70","mhn":"edge-cf-ewr1-7292","mrtt":69}
<
} [5 bytes data]
  0  810k    0     0    0     0      0      0 --:--:--  0:00:19 --:--:--     0* Operation timed out after 20002 milliseconds with 0 out of 830012 bytes received
  0  810k    0     0    0     0      0      0 --:--:--  0:00:20 --:--:--     0
* Connection #0 to host changelog.com left intact
curl: (28) Operation timed out after 20002 milliseconds with 0 out of 830012 bytes received

If I run the command directly on the EWR instance I am NOT able to reproduce:

curl --verbose --output /dev/null --header 'Host: nightly.changelog.com' --header 'Flyio-Debug: doit' --header 'Fly-Force-Region: ewr' --connect-timeout 10 --max-time 20 --resolve changelog.com:443:137.66.16.250 'https://changelog.com/'
* Added changelog.com:443:137.66.16.250 to DNS cache
* Hostname changelog.com was found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 137.66.16.250:443...
* ALPN: curl offers h2,http/1.1
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [1568 bytes data]
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.3 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2039 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [80 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / x25519 / id-ecPublicKey
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=changelog.com
*  start date: Dec 31 16:52:47 2025 GMT
*  expire date: Mar 31 16:52:46 2026 GMT
*  subjectAltName: host "changelog.com" matched cert's "changelog.com"
*  issuer: C=US; O=Let's Encrypt; CN=E8
*  SSL certificate verify ok.
*   Certificate level 0: Public key type EC/prime256v1 (256/128 Bits/secBits), signed using ecdsa-with-SHA384
*   Certificate level 1: Public key type EC/secp384r1 (384/192 Bits/secBits), signed using sha256WithRSAEncryption
*   Certificate level 2: Public key type RSA (4096/152 Bits/secBits), signed using sha256WithRSAEncryption
* Connected to changelog.com (137.66.16.250) port 443
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://changelog.com/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: nightly.changelog.com]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.14.1]
* [HTTP/2] [1] [accept: */*]
* [HTTP/2] [1] [flyio-debug: doit]
* [HTTP/2] [1] [fly-force-region: ewr]
} [5 bytes data]
> GET / HTTP/2
> Host: nightly.changelog.com
> User-Agent: curl/8.14.1
> Accept: */*
> Flyio-Debug: doit
> Fly-Force-Region: ewr
>
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [81 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [81 bytes data]
* Request completely sent off
{ [5 bytes data]
< HTTP/2 200
< date: Fri, 02 Jan 2026 19:26:56 GMT
< fly-request-id: 01KE02V1QRPBDDWNDZAT5NZ7J3-ewr
< last-modified: Fri, 02 Jan 2026 03:59:21 GMT
< server: Fly/fbde0e6c3 (2025-12-17)
< content-type: text/html
< vary: Accept-Encoding
< x-varnish: 624196 1966746
< age: 9
< via: 2 fly.io, 2 fly.io, 1.1 080eee0c702098 (Varnish/7.7), 2 fly.io
< etag: W/"69574299-caa3c"
< accept-ranges: bytes
< x-request-id: 01KE02V1QRPBDDWNDZAT5NZ7J3-ewr
< access-control-allow-origin: *
< cache-status: region=ewr; origin=nightly(localhost:5030),changelog-nightly-2023-10-10.fly.dev; ttl=50.966; grace=86400.000; keep=604800.000; storage=storage.memory; hit; hits=12
< content-length: 830012
< flyio-debug: {"n":"worker-cf-ewr1-7a70","nr":"ewr","ra":"172.19.2.162","rf":"Verbatim","sr":"ewr","sdc":"ewr1","sid":"080eee0c702098","st":0,"nrtt":0,"bn":null,"mhn":null,"mrtt":null}
<
{ [441 bytes data]
100  810k  100  810k    0     0  29.6M      0 --:--:-- --:--:-- --:--:-- 30.4M
* Connection #0 to host changelog.com left intact

I have similar issues with the IAD instance, but this one is failing even before returning any headers:

curl --verbose --output /dev/null --header 'Host: nightly.changelog.com' --header 'Flyio-Debug: doit' --header 'Fly-Force-Region: iad' --connect-timeout 10 --max-time 20 --resolve changelog.com:443:137.66.16.250 'https://changelog.com/'
* Added changelog.com:443:137.66.16.250 to DNS cache
* Hostname changelog.com was found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 137.66.16.250:443...
* Connected to changelog.com (137.66.16.250) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
} [318 bytes data]
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* (304) (IN), TLS handshake, Unknown (8):
{ [19 bytes data]
* (304) (IN), TLS handshake, Certificate (11):
{ [2039 bytes data]
* (304) (IN), TLS handshake, CERT verify (15):
{ [80 bytes data]
* (304) (IN), TLS handshake, Finished (20):
{ [36 bytes data]
* (304) (OUT), TLS handshake, Finished (20):
} [36 bytes data]
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=changelog.com
*  start date: Dec 31 16:52:47 2025 GMT
*  expire date: Mar 31 16:52:46 2026 GMT
*  subjectAltName: host "changelog.com" matched cert's "changelog.com"
*  issuer: C=US; O=Let's Encrypt; CN=E8
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://changelog.com/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: nightly.changelog.com]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.7.1]
* [HTTP/2] [1] [accept: */*]
* [HTTP/2] [1] [flyio-debug: doit]
* [HTTP/2] [1] [fly-force-region: iad]
> GET / HTTP/2
> Host: nightly.changelog.com
> User-Agent: curl/8.7.1
> Accept: */*
> Flyio-Debug: doit
> Fly-Force-Region: iad
>
* Request completely sent off
  0     0    0     0    0     0      0      0 --:--:--  0:00:19 --:--:--     0* Operation timed out after 20006 milliseconds with 0 bytes received
  0     0    0     0    0     0      0      0 --:--:--  0:00:20 --:--:--     0
* Connection #0 to host changelog.com left intact
curl: (28) Operation timed out after 20006 milliseconds with 0 bytes received

I can see that 2700 connections are currently opened so maybe that has something to do with it:

I am not sure why the Fly.io Proxy is not closing these connections (idle timeout is configured to 60 & concurrency is set to requests (not connections). FTR, here is the fly.toml.

Also, these checks are running every hour, and currently failing due to timeouts in these two regions - IAD & EWR. As a stop-gap, I am going to restart the instances, but the fix is temporary (the issue comes back after a few days at most, usually a few hours).

Is there anything else that I can provide to help debug this?

There is another significant piece of info that I missed: forcing the request to HTTP/1.1 always works. In other words, HTTP/2 triggers this issue:

curl --verbose --output /dev/null --header 'Host: nightly.changelog.com' --header 'Flyio-Debug: doit' --header 'Fly-Force-Region: ewr' --connect-timeout 10 --max-time 20 --resolve changelog.com:443:137.66.16.250 'https://changelog.com/'

* Added changelog.com:443:137.66.16.250 to DNS cache
* Hostname changelog.com was found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 137.66.16.250:443...
* Connected to changelog.com (137.66.16.250) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
} [318 bytes data]
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* (304) (IN), TLS handshake, Unknown (8):
{ [19 bytes data]
* (304) (IN), TLS handshake, Certificate (11):
{ [2039 bytes data]
* (304) (IN), TLS handshake, CERT verify (15):
{ [79 bytes data]
* (304) (IN), TLS handshake, Finished (20):
{ [36 bytes data]
* (304) (OUT), TLS handshake, Finished (20):
} [36 bytes data]
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=changelog.com
*  start date: Dec 31 16:52:47 2025 GMT
*  expire date: Mar 31 16:52:46 2026 GMT
*  subjectAltName: host "changelog.com" matched cert's "changelog.com"
*  issuer: C=US; O=Let's Encrypt; CN=E8
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://changelog.com/
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: nightly.changelog.com]
* [HTTP/2] [1] [:path: /]
* [HTTP/2] [1] [user-agent: curl/8.7.1]
* [HTTP/2] [1] [accept: */*]
* [HTTP/2] [1] [flyio-debug: doit]
* [HTTP/2] [1] [fly-force-region: ewr]
> GET / HTTP/2
> Host: nightly.changelog.com
> User-Agent: curl/8.7.1
> Accept: */*
> Flyio-Debug: doit
> Fly-Force-Region: ewr
>
* Request completely sent off
< HTTP/2 200
< date: Fri, 02 Jan 2026 20:50:39 GMT
< fly-request-id: 01KE07M4EGWYHYH81HHJHEK4H7-lhr
< last-modified: Fri, 02 Jan 2026 03:59:21 GMT
< server: Fly/fbde0e6c3 (2025-12-17)
< content-type: text/html
< vary: Accept-Encoding
< x-varnish: 557308 557306
< age: 2
< via: 2 fly.io, 2 fly.io, 1.1 080eee0c702098 (Varnish/7.7), 2 fly.io, 2 fly.io
< etag: W/"69574299-caa3c"
< accept-ranges: bytes
< x-request-id: 01KE07M4EGWYHYH81HHJHEK4H7-lhr
< access-control-allow-origin: *
< cache-status: region=ewr; origin=nightly(localhost:5030),changelog-nightly-2023-10-10.fly.dev; ttl=57.198; grace=86400.000; keep=604800.000; storage=storage.memory; hit; hits=1
< content-length: 830012
< flyio-debug: {"n":"edge-cf-lon1-8e62","nr":"lhr","ra":"31.222.222.66","rf":"Verbatim","sr":"ewr","sdc":"ewr1","sid":"080eee0c702098","st":0,"nrtt":0,"bn":"worker-cf-ewr1-7a70","mhn":"edge-cf-ewr1-b989","mrtt":69}
<
  0  810k    0     0    0     0      0      0 --:--:--  0:00:19 --:--:--     0* Operation timed out after 20006 milliseconds with 0 out of 830012 bytes received
  0  810k    0     0    0     0      0      0 --:--:--  0:00:20 --:--:--     0
* Connection #0 to host changelog.com left intact
curl: (28) Operation timed out after 20006 milliseconds with 0 out of 830012 bytes received

I cannot reproduce the failure when forcing HTTP/1.1:

curl --http1.1 --verbose --output /dev/null --header 'Host: nightly.changelog.com' --header 'Flyio-Debug: doit' --header 'Fly-Force-Region: ewr' --connect-timeout 10 --max-time 20 --resolve changelog.com:443:137.66.16.250 'https://changelog.com/'
* Added changelog.com:443:137.66.16.250 to DNS cache
* Hostname changelog.com was found in DNS cache
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 137.66.16.250:443...
* Connected to changelog.com (137.66.16.250) port 443
* ALPN: curl offers http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
} [315 bytes data]
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* (304) (IN), TLS handshake, Unknown (8):
{ [25 bytes data]
* (304) (IN), TLS handshake, Certificate (11):
{ [2039 bytes data]
* (304) (IN), TLS handshake, CERT verify (15):
{ [78 bytes data]
* (304) (IN), TLS handshake, Finished (20):
{ [36 bytes data]
* (304) (OUT), TLS handshake, Finished (20):
} [36 bytes data]
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256 / [blank] / UNDEF
* ALPN: server accepted http/1.1
* Server certificate:
*  subject: CN=changelog.com
*  start date: Dec 31 16:52:47 2025 GMT
*  expire date: Mar 31 16:52:46 2026 GMT
*  subjectAltName: host "changelog.com" matched cert's "changelog.com"
*  issuer: C=US; O=Let's Encrypt; CN=E8
*  SSL certificate verify ok.
* using HTTP/1.x
> GET / HTTP/1.1
> Host: nightly.changelog.com
> User-Agent: curl/8.7.1
> Accept: */*
> Flyio-Debug: doit
> Fly-Force-Region: ewr
>
* Request completely sent off
< HTTP/1.1 200 OK
< date: Fri, 02 Jan 2026 20:50:39 GMT
< fly-request-id: 01KE07PM559RXKGZ2ERA98XVYG-lhr
< last-modified: Fri, 02 Jan 2026 03:59:21 GMT
< server: Fly/fbde0e6c3 (2025-12-17)
< content-type: text/html
< vary: Accept-Encoding
< x-varnish: 360493 557306
< age: 84
< via: 2 fly.io, 2 fly.io, 1.1 080eee0c702098 (Varnish/7.7), 1.1 fly.io, 1.1 fly.io
< etag: W/"69574299-caa3c"
< accept-ranges: bytes
< x-request-id: 01KE07PM559RXKGZ2ERA98XVYG-lhr
< access-control-allow-origin: *
< cache-status: region=ewr; origin=nightly(localhost:5030),changelog-nightly-2023-10-10.fly.dev; ttl=-24.516; grace=86400.000; keep=604800.000; storage=storage.memory; hit; stale; hits=2
< content-length: 830012
< connection: keep-alive
< flyio-debug: {"n":"edge-cf-lon1-a6eb","nr":"lhr","ra":"31.222.222.66","rf":"Verbatim","sr":"ewr","sdc":"ewr1","sid":"080eee0c702098","st":0,"nrtt":0,"bn":null,"mhn":"worker-cf-ewr1-7a70","mrtt":69}
<
{ [1450 bytes data]
100  810k  100  810k    0     0  1188k      0 --:--:-- --:--:-- --:--:-- 1186k
* Connection #0 to host changelog.com left intact

Hm… I don’t know whether this is the source of all your problems, but the configuration of the idle timeout is actually malformed:

[http_service.http_options]
  idle_timeout = 60

[[services]]
  internal_port = 9000

# ...

[[services.ports]]
  handlers = ["tls", "http"]
  port = 443

There is no [http_service] block above the [http_service.options] one, so I believe this is getting interpreted as an HTTP service declaration that’s missing almost all of its required keys.

(TOML syntax is rather strange about this kind of thing.)

Moreover, the Fly Proxy is generally unhappy when you have both [http_service] and [[services]] simultaneously. It’s really common to see strange behavior in that situation, :dragon:.

Hope this helps a little!

1 Like

That was a good catch!

I made the change in Right-size cdn-2025-12-06 + cleanup by gerhard · Pull Request #49 · thechangelog/pipely · GitHub , captured all the context (especially the machine configs before & after), and rolled it out.

Even before the manifest fix got applied, the issue fixed itself:

All instances have been healthy for the last 6 hours, since Jan 3, 5:33 AM GMT: Check all instances · Workflow runs · thechangelog/pipely · GitHub

Will keep an eye on this & ensure that everything remains healthy for the next 48 hours.

Thank you all :flexed_biceps:

1 Like

Apart from a few slower regions (NRT → IAD & GRU → IAD) everything has been working well since:

That gives us an 82% success ratio across all regions which is not great, but a significant improvement over what we had before (0%).

Will check in again in a week’s time. FWIW: Kaizen 22 - Let it Crash - January, 2026