I’m in the process of finally migrating over from our old infrastructure hosted on DigitalOcean to our new setup on Fly.io, but I’ve noticed that a basic health endpoint that used to take ~50ms on average is now up to ~300ms. (Both measurements from our setup on Fly.io)
Is the influx of customers from Heroku affecting things?
Some more info. My sentry.io tracing config seems to be indicating that the added latency isn’t coming from the app itself (slowest transaction it’s seen is under 100ms).
Additionally, I spun up a basic express server (no DB, no users) in the TOR region and I was seeing response times of 250-300ms on a “hello world” endpoint.
If you are connecting through EWR, you’re basically making two large-ish round trips. 300ms is a little high for that journey, but not shockingly high.
If those ipv4 versions are also slow, please run a traceroute <yourapp>.ipv4.fly.dev and share it with us. We should figure out why you’re getting routed to the wrong city.
Wild, the ipv4 versions are all in the 40-50ms response time range, while the regular ones are in the ~250ms range. Beyond fixing my app, I’m also really curious as to what’s happening here on a technical level.
Probably just means we’re routing IPv6 to ewr for you, but IPv4 is still going to Toronto. debug.ipv4.fly.dev probably shows a different region in your browser, too.
Will you run traceroute6 fly-api.usepastel.com and share the output? There’s nothing sensitive in it, but it may help us fix the routing.
Funny enough, debug.ipv4.fly.dev actually shows EWR too, so maybe something more funky is happening.
=== Headers ===
Host: debug.ipv4.fly.dev
Fly-Request-Id: 01GBP5V8BHDQKAQW9M1XJCQCR3-lga
Via: 2 fly.io
Sec-Ch-Ua-Platform: "macOS"
Sec-Fetch-User: ?1
Fly-Forwarded-Proto: https
X-Forwarded-Port: 443
Fly-Forwarded-Ssl: on
Sec-Ch-Ua: "Chromium";v="104", " Not A;Brand";v="99", "Google Chrome";v="104"
Sec-Fetch-Mode: navigate
Sec-Fetch-Dest: document
Accept-Language: en-US,en;q=0.9,de;q=0.8
X-Forwarded-Proto: https
X-Forwarded-Ssl: on
Sec-Ch-Ua-Mobile: ?0
X-Request-Start: t=1661822935410000
Sec-Fetch-Site: cross-site
Accept-Encoding: gzip, deflate, br
Fly-Forwarded-Port: 443
Fly-Region: lga
Cache-Control: max-age=0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.101 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
=== ENV ===
2022-08-30 01:28:55.413020162 +0000 UTC m=+227496.713843230
Here’s the output from traceroute6 fly-api.usepastel.com:
traceroute6 to fly-api.usepastel.com (2a09:8280:1::5770) from 2607:fea8:4e20:9200:bce9:9ed7:b592:a4eb, 64 hops max, 12 byte packets
1 2607:fea8:4e20:9200:e2db:d1ff:fe4d:d60c 4.766 ms 4.636 ms 4.452 ms
Also if it’s helpful, here’s traceroute fly-api.usepastel.com:
traceroute to fly-api.usepastel.com (, 64 hops max, 52 byte packets
1 ( 4.930 ms 4.412 ms 4.494 ms
2 ( 24.736 ms 15.384 ms 13.067 ms
3 8081-dgw02.wlfdle.rmgt.net.rogers.com ( 14.610 ms 16.373 ms 17.377 ms
4 3132-cgw01.bloor.rmgt.net.rogers.com ( 21.639 ms
3032-cgw01.bloor.rmgt.net.rogers.com ( 16.830 ms ( 20.195 ms
5 ( 85.649 ms 25.118 ms 15.983 ms
6 ix-ae-13-0.tcore1.tnk-toronto.as6453.net ( 18.630 ms 18.815 ms 26.182 ms
7 ae-6.a00.toroon02.ca.bb.gin.ntt.net ( 25.922 ms 21.858 ms 20.761 ms
8 ae-8.r21.nwrknj03.us.bb.gin.ntt.net ( 43.148 ms 37.678 ms 32.937 ms
9 ae-1.a01.nycmny17.us.bb.gin.ntt.net ( 40.542 ms 34.209 ms 37.050 ms
Try letting that IPv6 traceroute run for a few minutes?
You can use IPv4 only with your domains, but we won’t generate certificate renewals if you do. You’ll need to setup DNS verification for your certificates.
We only automatically generate certs for domains pointed IPv6 addresses. IPv6 addresses are unique for all time so this prevents certificate hijacking.
traceroute6 to fly-api.usepastel.com (2a09:8280:1::5770) from 2607:fea8:4e20:9200:6810:da3:78d6:3e02, 64 hops max, 12 byte packets
1 2607:fea8:4e20:9200:e2db:d1ff:fe4d:d60c 4.849 ms 3.929 ms 5.317 ms
2 * * *
3 2607:f798:10:10b9:0:672:3122:2237 22.796 ms 15.598 ms 15.565 ms
4 2607:f798:10:10e0:0:690:6324:9082 17.652 ms
2607:f798:10:31f:0:2091:4823:3185 18.444 ms
2607:f798:10:ea45:0:721:3913:6086 13.716 ms
5 2607:f798:10:359:0:2091:4823:5210 18.275 ms 18.078 ms 20.329 ms
6 xe-11-0-1.edge2.washington1.level3.net 19.126 ms 37.411 ms 18.369 ms
7 ntt-level3-toronto1.level3.net 70.798 ms 67.149 ms 76.347 ms
8 ae-8.r21.nwrknj03.us.bb.gin.ntt.net 73.541 ms 99.623 ms 79.841 ms
9 ae-1.a01.nycmny17.us.bb.gin.ntt.net 81.012 ms 79.088 ms 83.888 ms
10 2001:418:0:5000::1e13 88.381 ms 121.179 ms 89.290 ms
11 2607:f740:70:101::6 84.390 ms 92.023 ms 84.476 ms
To clarify re: certificates, I was just reading through the docs here (Custom Domains and SSL Certificates · Fly Docs). If I’ve already set up DNS verification, and then I remove the AAAA record, will certificate renewals still happen automatically?
If I’ve already set up DNS verification, and then I remove the AAAA record, will certificate renewals still happen automatically?
Yup, that’ll work! The only thing that would change would be the challenge type. One thing to keep in mind with DNS-01 at the moment, however, is that you’ll want to create certs a few days apart. for potentially overlapping _acme-challenge records (like a wildcard and an apex domain).
This will make sure that our DNS has plenty of time to use the correct TXT record for each challenge.
Potentially a related issue, potentially not. For context, I’m looking into sources of latency for my app and I’m taking a look at the database I have hosted on DigitalOcean in their TOR1 region. The latency going from Fly.io YYZ region → DigitalOcean TOR1 region seems to be ~15ms, but going the other way from DO TOR1 → Fly.io YYZ its ~0.5ms.
Any idea why that would be?
Source here is SSH-ing into my corresponding services on each platform and then ping-ing its equivalent on the other platform.
Hey folks, I just discovered that one of my fly.io VMs in the YYZ region has ~1ms latency to my Postgres DB (and other services like Redis) hosted on in the DigitalOcean TOR1 region, but every other VM has ~15ms latency to the same services.
Some questions:
What’s going on here and how do I ensure that all my VMs (or at least the ones for my backend service) have ~1ms latency rather than ~15ms?
Are there separate YYZ regions/zones?
Would love some info here as this really has an effect on my app’s overall latency as the cache & DB requests stack up.
VM with ~1ms latency
❯ fly ssh console -a pastel-frontend -s
Update available 0.0.382 -> v0.0.385.
Run "fly version update" to upgrade.
? Select instance: yyz (fdaa:0:756b:a7b:aa2:6019:cf16:2)
Connecting to [fdaa:0:756b:a7b:aa2:6019:cf16:2]... complete
/ # ping api.usepastel.com
PING api.usepastel.com ( 56 data bytes
64 bytes from seq=0 ttl=58 time=1.482 ms
64 bytes from seq=1 ttl=58 time=0.782 ms
64 bytes from seq=2 ttl=58 time=0.716 ms
64 bytes from seq=3 ttl=58 time=0.716 ms
64 bytes from seq=4 ttl=58 time=0.721 ms
64 bytes from seq=5 ttl=58 time=0.696 ms
64 bytes from seq=6 ttl=58 time=0.637 ms
--- api.usepastel.com ping statistics ---
7 packets transmitted, 7 packets received, 0% packet loss
round-trip min/avg/max = 0.637/0.821/1.482 ms
VM with ~15ms latency
❯ fly ssh console -a pastel-frontend -s
Update available 0.0.382 -> v0.0.385.
Run "fly version update" to upgrade.
? Select instance: yyz (fdaa:0:756b:a7b:88dc:5b8f:28a:2)
Connecting to [fdaa:0:756b:a7b:88dc:5b8f:28a:2]... complete
/ # ping api.usepastel.com
PING api.usepastel.com ( 56 data bytes
64 bytes from seq=0 ttl=52 time=15.486 ms
64 bytes from seq=1 ttl=52 time=14.930 ms
64 bytes from seq=2 ttl=52 time=14.926 ms
64 bytes from seq=3 ttl=52 time=15.012 ms
--- api.usepastel.com ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 14.926/15.088/15.486 ms