Fly Dashboard Constantly Refreshing

julianrubisch · July 7, 2022, 10:32am

Since yesterday, I can’t use my Fly Dashboard at all, since it’s constantly refreshing and stuck with skeleton loaders. The same seems true for my app overviews, from time to time.

I know that these kind of issues can occur with WebSockets / LiveView , and indeed my browser console suggests such an issue:

I also repeatedly restarted my browser and tried different ones, to no avail.

Plus, maybe unrelated, fly console ssh into some of my apps fails with

Error host unavailable: read tcp [fdaa:...]:39401->[fdaa:...]:53: i/o timeout

I’m located in Vienna/Austria, if that’s of any relevance.

Thanks!

lubien · July 7, 2022, 11:42am

Hey there

Sorry to hear about this. Would you perhaps be using a VPN?

julianrubisch · July 7, 2022, 11:45am

Nope,

I actually even tried different WiFi networks / Internet providers.

chrismccord · July 7, 2022, 1:41pm

Are you able to run liveSocket.enableDebug() in the javascript console and see if it reports other failures? It looks like longpoll connection is working properly, so it sounds like LiveView is falling back to a failsafe refresh because the dashboard is failing to mount consecutively. Can you verify? Thanks!

julianrubisch · July 7, 2022, 2:03pm

Hey Chris,

thanks for helping out!

Sure, here you go (I’ve filtered out the warning from above):

chrismccord · July 7, 2022, 2:09pm

can you provide the full output (pruning data is fine)? There should have been something prior to the “child has been removed from the parent” about the join failing. I’m specifically looking for something like this:

[Log] phx-Fv-RSg error: unable to join -  – {reason: "timeout"}

julianrubisch · July 7, 2022, 3:29pm

Ah yes, I saw a timeout, if that helps. Will follow up with a more detailed trace later.

chrismccord · July 7, 2022, 5:24pm

It turns out there was a bug in phoenix.js and phoenix_live_view.js when it comes to replacing the transport (such as when falling back to long poling), which happens only in the case of a successfully 101 websocket upgrade, but subsequent failed websocket frames, which can happen in some weird proxy scenarios or so I’ve heard. We pushed some fixes to address this. Can you try again and report back? Thanks!

julianrubisch · July 7, 2022, 5:25pm

Here you go, I don’t know if that’s helpful:

app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1425 WebSocket connection to 'wss://fly.io/phx/live/websocket?_csrf_token=AWkPHixhRyAhJjcpB1sHWFkSf205JBwe0DXw_8ulBmsDCmK44X64AfM-&_track_static%5B0%5D=https%3A%2F%2Ffly.io%2Fphx%2Fassets%2Fapp-d5ef3b5d3b4446a62bf8f300348aa423.css%3Fvsn%3Dd&_track_static%5B1%5D=https%3A%2F%2Ffly.io%2Fphx%2Fassets%2Fapp-085cf5b70198b4c4d49b6907e9fb1bd7.js%3Fvsn%3Dd&_mounts=0&vsn=2.0.0' failed: WebSocket is closed before the connection is established.
(anonymous) @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1425
waitForBufferDone @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1440
teardown @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1420
disconnect @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1328
disconnect @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:4807
fallback @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10828
setTimeout (async)
doConnect @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10836
connectWithLongPollFallback @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10853
(anonymous) @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10858
(anonymous) @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10885

app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent -  undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent -  undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10846 established longpoll fallback
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh error: unable to join -  {reason: 'timeout'}
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent -  undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent -  undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh join: encountered 1 consecutive reloads -  undefined
Navigated to https://fly.io/dashboard

Sometimes it works, but then I get
Multiple IDs detected: listbox-option-0. Ensure unique element ids.

julianrubisch · July 7, 2022, 5:27pm

Sadly, I precisely tried it and compiled the above message when you wrote this… so I’m still seeing it

chrismccord · July 7, 2022, 5:33pm

The multiple ID thing is fine, so sometimes it works now whereas before it always failed?

julianrubisch · July 7, 2022, 6:55pm

I‘m not sure TBH. First I thought it was only the dashboard, then it appeared on other pages as well.

It seems that if I wait, after some reconnections it works…

chrismccord · July 8, 2022, 2:13am

Thanks for the follow up! It looks like there is still a bug in LiveView when we swap the transport connection over to longpoll. I should have a fix up in the morning and hopefully we’ll have smooth sailing from there.

julianrubisch · July 8, 2022, 10:53am

Of course the engineer in my wants to know why it falls back to long polling in the first place, but that’s beyond this discussion I suppose

chrismccord · July 8, 2022, 1:29pm

Okay, fixes have been pushed. Please give it a final go and let us know how it looks!

Since you asked :), WebSocket fallback can be tricky. It’s easy in the straight forward “websocket failed to connect mode”. However, some proxies will complete the 101 Switching Protocols WebSocket upgrade, but then fail any websocket frames after that. On the client, the connection looks healthy, but the only way to detect this failure mode is to try to ping the server and ensure it responds in some reasonable time. We do this for the dashboard to fallback to long polling, which can happen in two ways:

there’s a proxy/network mechanism that upgrades the websocket but drops frames
the user’s connection is slow enough that we can’t distinguish a very slow connection from the above

So in your case, it must have been one or the other. A weird proxy/firewall or very latent connection. All of this should have been transparent for you, except for the bugs in Phoenix that entered the bad reconnection cycle.

julianrubisch · July 8, 2022, 1:47pm

Okay, thanks for the explanation. I’m happy that there are savvy folks like you to abstract that stuff away from us

Now, as for the fix, it does seem like the fallback to long poll is working now, but everything just seems veeeeeery slow still.

This is still kind of a mystery to me because my connection is pretty decent (ping to the nearest backbone is 8ms) and I’m not behind any proxy that I know of (I’ve even tried different WiFis, went to a coffee shop, tethering to my phone etc.)

I made a video I could shoot you as a DM if you like. I’m just offering this feedback because it’s probably not the UX you’re after

Topic		Replies	Views
Phoenix LiveView hangs on websocket connection on all browsers Phoenix elixir	6	2223	January 27, 2022
Phoenix Livewview - websocket connection fails Questions / Help elixir	5	1719	August 22, 2022
Dowtime for more that 15 minutes already	7	348	November 3, 2022
fly deploy passes - but no activity on Endpoint Phoenix	12	416	October 3, 2022
Timeouts on dashboard Build debugging	4	316	April 9, 2023

Fly Dashboard Constantly Refreshing

Related topics