I’m experiencing something similar on the GRU region as well.
Yesterday one app would not show logs on monitoring page and fly logs
would get stuck. The deploy would work sometimes (even without any logs) and serve traffic. This was a test app and after deleting and creating another one with the same name, the problem seems to be resolved.
Today I seen other issues with others apps related to serving content, again, in the GRU region.
One app stopped serving content, no new deploys or changes, very slow response times, timings (the time below are in America/Sao_Paulo):
❯ curl --trace-time -v https://APP_DOMAIN
12:38:50.458725 * Trying APP_IP:443...
12:38:50.466943 * Connected to APP_DOMAIN (APP_IP) port 443 (#0)
12:38:50.469024 * ALPN, offering h2
12:38:50.469044 * ALPN, offering http/1.1
12:38:50.476464 * successfully set certificate verify locations:
12:38:50.476485 * CAfile: /etc/ssl/cert.pem
12:38:50.476498 * CApath: none
12:38:50.477730 * (304) (OUT), TLS handshake, Client hello (1):
12:38:50.486743 * (304) (IN), TLS handshake, Server hello (2):
12:38:50.487570 * (304) (IN), TLS handshake, Unknown (8):
12:38:50.487617 * (304) (IN), TLS handshake, Certificate (11):
12:38:50.489504 * (304) (IN), TLS handshake, CERT verify (15):
12:38:50.489728 * (304) (IN), TLS handshake, Finished (20):
12:38:50.489786 * (304) (OUT), TLS handshake, Finished (20):
12:38:50.489808 * SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384
12:38:50.489822 * ALPN, server accepted to use h2
12:38:50.489837 * Server certificate:
12:38:50.489853 * subject: CN=APP_DOMAIN
12:38:50.489897 * start date: Aug 28 13:02:58 2022 GMT
12:38:50.489911 * expire date: Nov 26 13:02:57 2022 GMT
12:38:50.489929 * subjectAltName: host "APP_DOMAIN" matched cert's "APP_DOMAIN"
12:38:50.489946 * issuer: C=US; O=Let's Encrypt; CN=R3
12:38:50.489960 * SSL certificate verify ok.
12:38:50.489992 * Using HTTP2, server supports multiplexing
12:38:50.490006 * Connection state changed (HTTP/2 confirmed)
12:38:50.490020 * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
12:38:50.490098 * Using Stream ID: 1 (easy handle 0x7f895f810a00)
12:38:50.490127 > GET / HTTP/2
12:38:50.490127 > Host: APP_DOMAIN
12:38:50.490127 > user-agent: curl/7.79.1
12:38:50.490127 > accept: */*
12:38:50.490127 >
12:38:50.498867 * Connection state changed (MAX_CONCURRENT_STREAMS == 32)!
12:39:07.345726 < HTTP/2 200
12:39:07.345767 < accept-ranges: bytes
12:39:07.345787 < content-length: 1331
12:39:07.345807 < content-type: text/html; charset=utf-8
12:39:07.345826 < request-id: cdam06s45ebs315fcj20
12:39:07.345845 < date: Sun, 23 Oct 2022 15:39:07 GMT
12:39:07.345866 < server: Fly/51c45b355 (2022-10-19)
12:39:07.345887 < via: 2 fly.io
12:39:07.345908 < fly-request-id: 01GG2QYAFSSNVQV5ZK2J2WSZ7C-gru
12:39:07.345923 {CONTENT}
12:39:07.346378 * Connection #0 to host APP_DOMAIN left intact
Timing using this format:
❯ curl -w "@curl-format.txt" -o /dev/null -s "https://APP_DOMAIN/api/healthz"
time_namelookup: 0.125110s
time_connect: 0.133622s
time_appconnect: 0.148274s
time_pretransfer: 0.148331s
time_redirect: 0.000000s
time_starttransfer: 214.727177s
----------
time_total: 214.727239s
The app present this behavior for some time and suddenly it just restarted.
I dont know the reason but it seems to have solved the problem for now (for this app).
17:27 (UTC) other app stopped serving content or presenting very slow response times.
Tried to restart it using using fly restart
, but the app don’t restart, the new instance is stuck on pending state and don’t present any logs.
Using fly scale count 0 -a APP_NAME
don’t seem to work as well.
Everything seems to be stuck:
Using fly scale count 1 -a APP_NAME
seems to change a little bit and start a new instance, it’s serving traffic.
Metrics for the stuck instance (v6) seems odd as well (time in UTC-3):
I have yet other app starting to present problems, random slow response times, it seems it tried to restart but it get stuck as well.
Other apps that seems to work without problems (all on the GRU region), the apps have different images and do different things, I tried to explain the best as I could but I this time I’m having trouble to understand myself.