Fly Dashboard Constantly Refreshing

Since yesterday, I can’t use my Fly Dashboard at all, since it’s constantly refreshing and stuck with skeleton loaders. The same seems true for my app overviews, from time to time.

I know that these kind of issues can occur with WebSockets / LiveView :see_no_evil: , and indeed my browser console suggests such an issue:

I also repeatedly restarted my browser and tried different ones, to no avail.

Plus, maybe unrelated, fly console ssh into some of my apps fails with

Error host unavailable: read tcp [fdaa:...]:39401->[fdaa:...]:53: i/o timeout

I’m located in Vienna/Austria, if that’s of any relevance.

Thanks!

Hey there

Sorry to hear about this. Would you perhaps be using a VPN?

Nope,

I actually even tried different WiFi networks / Internet providers.

Are you able to run liveSocket.enableDebug() in the javascript console and see if it reports other failures? It looks like longpoll connection is working properly, so it sounds like LiveView is falling back to a failsafe refresh because the dashboard is failing to mount consecutively. Can you verify? Thanks!

Hey Chris,

thanks for helping out!

Sure, here you go (I’ve filtered out the warning from above):

can you provide the full output (pruning data is fine)? There should have been something prior to the “child has been removed from the parent” about the join failing. I’m specifically looking for something like this:

[Log] phx-Fv-RSg error: unable to join -  – {reason: "timeout"}

Ah yes, I saw a timeout, if that helps. Will follow up with a more detailed trace later.

It turns out there was a bug in phoenix.js and phoenix_live_view.js when it comes to replacing the transport (such as when falling back to long poling), which happens only in the case of a successfully 101 websocket upgrade, but subsequent failed websocket frames, which can happen in some weird proxy scenarios or so I’ve heard. We pushed some fixes to address this. Can you try again and report back? Thanks!

Here you go, I don’t know if that’s helpful:

app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1425 WebSocket connection to 'wss://fly.io/phx/live/websocket?_csrf_token=AWkPHixhRyAhJjcpB1sHWFkSf205JBwe0DXw_8ulBmsDCmK44X64AfM-&_track_static%5B0%5D=https%3A%2F%2Ffly.io%2Fphx%2Fassets%2Fapp-d5ef3b5d3b4446a62bf8f300348aa423.css%3Fvsn%3Dd&_track_static%5B1%5D=https%3A%2F%2Ffly.io%2Fphx%2Fassets%2Fapp-085cf5b70198b4c4d49b6907e9fb1bd7.js%3Fvsn%3Dd&_mounts=0&vsn=2.0.0' failed: WebSocket is closed before the connection is established.
(anonymous) @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1425
waitForBufferDone @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1440
teardown @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1420
disconnect @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1328
disconnect @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:4807
fallback @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10828
setTimeout (async)
doConnect @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10836
connectWithLongPollFallback @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10853
(anonymous) @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10858
(anonymous) @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10885

app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent -  undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent -  undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10846 established longpoll fallback
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh error: unable to join -  {reason: 'timeout'}
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent -  undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent -  undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh join: encountered 1 consecutive reloads -  undefined
Navigated to https://fly.io/dashboard

Sometimes it works, but then I get
Multiple IDs detected: listbox-option-0. Ensure unique element ids.

Sadly, I precisely tried it and compiled the above message when you wrote this… so I’m still seeing it :frowning_with_open_mouth:

The multiple ID thing is fine, so sometimes it works now whereas before it always failed?

I‘m not sure TBH. First I thought it was only the dashboard, then it appeared on other pages as well.

It seems that if I wait, after some reconnections it works…

Thanks for the follow up! It looks like there is still a bug in LiveView when we swap the transport connection over to longpoll. I should have a fix up in the morning and hopefully we’ll have smooth sailing from there.

1 Like

Of course the engineer in my wants to know why it falls back to long polling in the first place, but that’s beyond this discussion I suppose :nerd_face:

Okay, fixes have been pushed. Please give it a final go and let us know how it looks!

Since you asked :), WebSocket fallback can be tricky. It’s easy in the straight forward “websocket failed to connect mode”. However, some proxies will complete the 101 Switching Protocols WebSocket upgrade, but then fail any websocket frames after that. On the client, the connection looks healthy, but the only way to detect this failure mode is to try to ping the server and ensure it responds in some reasonable time. We do this for the dashboard to fallback to long polling, which can happen in two ways:

  • there’s a proxy/network mechanism that upgrades the websocket but drops frames
  • the user’s connection is slow enough that we can’t distinguish a very slow connection from the above

So in your case, it must have been one or the other. A weird proxy/firewall or very latent connection. All of this should have been transparent for you, except for the bugs in Phoenix that entered the bad reconnection cycle.

Okay, thanks for the explanation. I’m happy that there are savvy folks like you to abstract that stuff away from us :slight_smile:

Now, as for the fix, it does seem like the fallback to long poll is working now, but everything just seems veeeeeery slow still.

This is still kind of a mystery to me because my connection is pretty decent (ping to the nearest backbone is 8ms) and I’m not behind any proxy that I know of (I’ve even tried different WiFis, went to a coffee shop, tethering to my phone etc.)

I made a video I could shoot you as a DM if you like. I’m just offering this feedback because it’s probably not the UX you’re after :slight_smile: