Unable to push to Fly registry

We use docker push to push directly to the Fly registry from GitHub Actions. We’ve seen the odd TLS handshake timeout error over the last couple of weeks, with something like:

Run docker push registry.fly.io/prereview:7b3789b2092fd241c132b3a306265ea1785da482
The push refers to repository [registry.fly.io/prereview]
Get "https://registry.fly.io/v2/": net/http: TLS handshake timeout
Error: Process completed with exit code 1.

For the last few hours, however, every deploy has seem the problem. Rerunning the step has seen them work, but our latest deploy has now failed 3 times.

Are there any known issues with the Registry? I can’t see anything on status pages at the moment.

2 Likes

Just tried again, and it’s still failing.

Tried a fresh commit, it got some of the way before failing:

Run docker push registry.fly.io/prereview:157fb2a000f60e83322d0926516879d162ce1c62
The push refers to repository [registry.fly.io/prereview]
06628f6b63b4: Preparing
5e32844f06a1: Preparing
7c88f77ab0aa: Preparing
3d4459d3294d: Preparing
1709f4dd23fa: Preparing
92f7edb51836: Preparing
af0fd4ac5053: Preparing
aedc3bda2944: Preparing
af0fd4ac5053: Waiting
aedc3bda2944: Waiting
92f7edb51836: Waiting
1709f4dd23fa: Pushed
06628f6b63b4: Pushed
7c88f77ab0aa: Pushed
3d4459d3294d: Pushed
aedc3bda2944: Layer already exists
af0fd4ac5053: Retrying in 5 seconds
af0fd4ac5053: Retrying in 4 seconds
92f7edb51836: Retrying in 5 seconds
5e32844f06a1: Retrying in 5 seconds
af0fd4ac5053: Retrying in 3 seconds
92f7edb51836: Retrying in 4 seconds
5e32844f06a1: Retrying in 4 seconds
af0fd4ac5053: Retrying in 2 seconds
92f7edb51836: Retrying in 3 seconds
5e32844f06a1: Retrying in 3 seconds
af0fd4ac5053: Retrying in 1 second
92f7edb51836: Retrying in 2 seconds
5e32844f06a1: Retrying in 2 seconds
92f7edb51836: Retrying in 1 second
5e32844f06a1: Retrying in 1 second
92f7edb51836: Pushed
5e32844f06a1: Pushed
af0fd4ac5053: Retrying in 10 seconds
af0fd4ac5053: Retrying in 9 seconds
af0fd4ac5053: Retrying in 8 seconds
af0fd4ac5053: Retrying in 7 seconds
af0fd4ac5053: Retrying in 6 seconds
af0fd4ac5053: Retrying in 5 seconds
af0fd4ac5053: Retrying in 4 seconds
af0fd4ac5053: Retrying in 3 seconds
af0fd4ac5053: Retrying in 2 seconds
af0fd4ac5053: Retrying in 1 second
af0fd4ac5053: Pushed
Head "https://registry.fly.io/v2/prereview/blobs/sha256:21d047f6aca6dba52dc1de79e1ae557dc20e418806dfb3a2d0fb8c7db5852472": net/http: TLS handshake timeout
Error: Process completed with exit code 1.

Hey @thewilkybarkid

Could you please run the following command and post the output here, please?

curl -I -H 'flyio-debug: doit' -w '%header{flyio-debug}' -s -o /dev/null https://registry.fly.io

Thanks @pavel, build underway.

Edit: Now running, but is hanging without any output. (The command works locally.)

Edit: Finished, but failed:

Run curl -I -H 'flyio-debug: doit' -w '%header{flyio-debug}' -s -o /dev/null https://registry.fly.io
  
%header{flyio-debug}
Error: Process completed with exit code 28.

This is happening to us right now as well, from a GitHub action. Unable to push the image to the registry.

1 Like

Hmm, interesting. Can you please do another attempt like this?

One step to hit debug.fly.dev with verbose output:

curl -H 'flyio-debug: doit' -v https://debug.fly.dev

And another one is to hit registry, but also with verbose output:

curl -H 'flyio-debug: doit' -v https://registry.fly.io

The debug.fly.dev request is fine:

Run curl -H 'flyio-debug: doit' -v https://debug.fly.dev
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 77.83.140.164:443...
* Connected to debug.fly.dev (77.83.140.164) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Server hello (2):
{ [122 bytes data]
* TLSv1.2 (IN), TLS header, Finished (20):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
{ [19 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Certificate (11):
{ [2379 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
{ [79 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Finished (20):
{ [52 bytes data]
* TLSv1.2 (OUT), TLS header, Finished (20):
} [5 bytes data]
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=debug.fly.dev
*  start date: May 19 11:19:13 2024 GMT
*  expire date: Aug 17 11:19:12 2024 GMT
*  subjectAltName: host "debug.fly.dev" matched cert's "debug.fly.dev"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
* Using Stream ID: 1 (easy handle 0x55caeb963e20)
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
> GET / HTTP/2
> Host: debug.fly.dev
> user-agent: curl/7.81.0
> accept: */*
> flyio-debug: doit
> 
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [81 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [81 bytes data]
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [81 bytes data]
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
{ [81 bytes data]
* old SSL session ID is stale, removing
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* Connection state changed (MAX_CONCURRENT_STREAMS == 32)!
* TLSv1.2 (OUT), TLS header, Supplemental data (23):
} [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
* TLSv1.2 (IN), TLS header, Supplemental data (23):
{ [5 bytes data]
< HTTP/2 200 
< fly-region: iad
< remote-addr: 172.16.129.202:54062
< date: Fri, 31 May 2024 09:22:32 GMT
< content-length: 553
< content-type: text/plain; charset=utf-8
< server: Fly/63a16321 (2024-05-30)
< via: 2 fly.io
< fly-request-id: 01HZ6ZAFHK0GFNRH9S1FG3Z4XF-iad
< flyio-debug: {"n":"edge-cf-iad2-bf81","nr":"iad","ra":"74.249.14.241","rf":"Verbatim","sr":"iad","sdc":"iad2","sid":"6e82957cee5287","st":0,"nrtt":0,"bn":"worker-cf-iad2-da22"}
< 
{ [553 bytes data]
100   553  100   553    0     0   7672      0 --:--:-- --:--:-- --:--:--  7788
* Connection #0 to host debug.fly.dev left intact
=== Headers ===
Host: debug.fly.dev
Accept: */*
X-Request-Start: t=1717147352627248
X-Forwarded-Ssl: on
Fly-Forwarded-Port: 443
User-Agent: curl/7.81.0
X-Forwarded-For: 74.249.14.241, 77.83.140.164
Fly-Forwarded-Ssl: on
Fly-Client-Ip: 74.249.14.241
Fly-Forwarded-Proto: https
X-Forwarded-Proto: https
X-Forwarded-Port: 443
Fly-Region: iad
Fly-Request-Id: 01HZ6ZAFHK0GFNRH9S1FG3Z4XF-iad
Via: 2 fly.io
Fly-Traceparent: 00-34b756527bf574b11266d94aef025f9a-718671e5e479695c-00
Fly-Tracestate: 
2024-05-31 09:22:32.632686246 +0000 UTC m=+11183247.914793281

The registry.fly.io request is hanging:

Run curl -H 'flyio-debug: doit' -v https://registry.fly.io
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 77.83.143.221:443...
* Connected to registry.fly.io (77.83.143.221) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:03 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:04 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:05 --:--:--     0
[...]
  0     0    0     0    0     0      0      0 --:--:--  0:04:53 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:04:54 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:04:55 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:04:56 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:04:57 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:04:58 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:04:59 --:--:--     0* SSL connection timeout
  0     0    0     0    0     0      0      0 --:--:--  0:05:00 --:--:--     0
* Closing connection 0
curl: (28) SSL connection timeout
Error: Process completed with exit code 28.

I just tried doing it from a Github Action worker via SSH (Debugging with ssh · Actions · GitHub Marketplace · GitHub) and it worked fine:

runner@fv-az1249-52:~/work/actions-ssh/actions-ssh$ curl -I -H "flyio-debug: doit" https://registry.fly.io
HTTP/2 307
content-type: text/html; charset=utf-8
location: https://fly.io
date: Fri, 31 May 2024 10:41:11 GMT
server: Fly/63a16321 (2024-05-30)
via: 2 fly.io
fly-request-id: 01HZ73TG21K6F6YQCW4T04FRXH-iad
flyio-debug: {"n":"edge-cf-iad2-69b2","nr":"iad","ra":"20.55.118.209","rf":"Verbatim","sr":"iad","sdc":"iad2","sid":"1857445b109068","st":0,"nrtt":0,"bn":"worker-cf-iad2-74c4"}

So maybe there are some networking/routing issues between 74.249.14.241 (IP that your worker got) and 77.83.143.221 (registry), but not between 20.55.118.209 (IP that my worker got) and 77.83.143.221. I’ll see if I can debug it further.

1 Like

@pavel In case it’s useful, I’ve triggered another deploy which failed (from 20.109.38.196).

Here’s the debug.fly.dev response:

< HTTP/2 200 
< fly-region: iad
< remote-addr: 172.16.128.2:49408
< date: Mon, 03 Jun 2024 08:28:26 GMT
< content-length: 553
< content-type: text/plain; charset=utf-8
< server: Fly/63a16321 (2024-05-30)
< via: 2 fly.io
< fly-request-id: 01HZEKDJ920X1RMQN7K1GDEWYC-iad
< flyio-debug: {"n":"edge-cf-iad2-69b2","nr":"iad","ra":"20.109.38.196","rf":"Verbatim","sr":"iad","sdc":"iad2","sid":"73d8d7eeeae891","st":0,"nrtt":0,"bn":"gpu-cf-iad2-69ef"}
< 
{ [553 bytes data]

100   553  100   553    0     0   4460      0 --:--:-- --:--:-- --:--:--  4495
* Connection #0 to host debug.fly.dev left intact
=== Headers ===
Host: debug.fly.dev
Accept: */*
Fly-Forwarded-Ssl: on
X-Forwarded-Ssl: on
Fly-Region: iad
Via: 2 fly.io
Fly-Traceparent: 00-e3368abd216dc1318326b3f2bcf81b72-354dde82fe8391f2-00
Fly-Tracestate: 
X-Forwarded-Proto: https
Fly-Request-Id: 01HZEKDJ920X1RMQN7K1GDEWYC-iad
User-Agent: curl/7.81.0
X-Request-Start: t=1717403306274319
Fly-Client-Ip: 20.109.38.196
X-Forwarded-For: 20.109.38.196, 77.83.140.164
Fly-Forwarded-Proto: https
Fly-Forwarded-Port: 443
X-Forwarded-Port: 443

Hey @thewilkybarkid

I was able to reproduce this by hitting a particular URL multiple times in a row (doesn’t matter which one). So it looks like a firewall on GitHub/Azure side of things starts blocking connections for a short period of time to this URL.

We will see if we can do something about this.

1 Like

Hi folks, we really appreciate the report here. This one was tricky! We ultimately tracked this down to GitHub Actions re-using source ports in their TCP connection to Fly.io, which triggered SYN cookie mitigation on our end. We’re currently adjusting our connection handling to account for this quite unexpected behavior. Initial tests show that this is working. Please allow us a few more minutes to confirm and deploy the changes.

2 Likes

Just to follow up. The changes we made to better handle GH Actions’ NAT implementation have been deployed. Please let us know if you run into any more snags!

1 Like

Thanks @jssjr and @pavel for your help and patience; that’s great news. Sounds like one of those issues rather are pretty frustating to uncover, but satisfiying when you do. :slightly_smiling_face:

I’ve just triggered a deploy:

and it failed to set up flyctl:

Run superfly/flyctl-actions/setup-flyctl@1.5
Error: Request timeout: /app/flyctl_releases/linux/amd64/latest

But a rerun saw it succeed and deploy successfully.

I’ll treat that as unrelated network weirdness. :smiley:

1 Like

Hmm, may have spoken too soon. After a few successful builds,

has just failed on:

Run flyctl auth docker
Error: failed authenticating with registry.fly.io: Error response from daemon: Get "https://registry.fly.io/v2/": net/http: TLS handshake timeout

Error: Process completed with exit code 1.

Edit: I retried the job, and it worked.

We’ve just had an alert from Grafana Cloud that it’s had the same problem when querying our Prometheus:

[sse.dataQueryError] failed to execute query [A]: Post "https://api.fly.io/prometheus/prereview/api/v1/query": net/http: TLS handshake timeout

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.