Since yesterday, I can’t use my Fly Dashboard at all, since it’s constantly refreshing and stuck with skeleton loaders. The same seems true for my app overviews, from time to time.
I know that these kind of issues can occur with WebSockets / LiveView , and indeed my browser console suggests such an issue:
Are you able to run liveSocket.enableDebug() in the javascript console and see if it reports other failures? It looks like longpoll connection is working properly, so it sounds like LiveView is falling back to a failsafe refresh because the dashboard is failing to mount consecutively. Can you verify? Thanks!
can you provide the full output (pruning data is fine)? There should have been something prior to the “child has been removed from the parent” about the join failing. I’m specifically looking for something like this:
[Log] phx-Fv-RSg error: unable to join - – {reason: "timeout"}
It turns out there was a bug in phoenix.js and phoenix_live_view.js when it comes to replacing the transport (such as when falling back to long poling), which happens only in the case of a successfully 101 websocket upgrade, but subsequent failed websocket frames, which can happen in some weird proxy scenarios or so I’ve heard. We pushed some fixes to address this. Can you try again and report back? Thanks!
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1425 WebSocket connection to 'wss://fly.io/phx/live/websocket?_csrf_token=AWkPHixhRyAhJjcpB1sHWFkSf205JBwe0DXw_8ulBmsDCmK44X64AfM-&_track_static%5B0%5D=https%3A%2F%2Ffly.io%2Fphx%2Fassets%2Fapp-d5ef3b5d3b4446a62bf8f300348aa423.css%3Fvsn%3Dd&_track_static%5B1%5D=https%3A%2F%2Ffly.io%2Fphx%2Fassets%2Fapp-085cf5b70198b4c4d49b6907e9fb1bd7.js%3Fvsn%3Dd&_mounts=0&vsn=2.0.0' failed: WebSocket is closed before the connection is established.
(anonymous) @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1425
waitForBufferDone @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1440
teardown @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1420
disconnect @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1328
disconnect @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:4807
fallback @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10828
setTimeout (async)
doConnect @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10836
connectWithLongPollFallback @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10853
(anonymous) @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10858
(anonymous) @ app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10885
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent - undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent - undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:10846 established longpoll fallback
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh error: unable to join - {reason: 'timeout'}
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent - undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh destroyed: the child has been removed from the parent - undefined
app-085cf5b70198b4c4d49b6907e9fb1bd7.js?vsn=d:1738 phx-Fv-bw3PpUMDXlBKh join: encountered 1 consecutive reloads - undefined
Navigated to https://fly.io/dashboard
Sometimes it works, but then I get Multiple IDs detected: listbox-option-0. Ensure unique element ids.
Thanks for the follow up! It looks like there is still a bug in LiveView when we swap the transport connection over to longpoll. I should have a fix up in the morning and hopefully we’ll have smooth sailing from there.
Okay, fixes have been pushed. Please give it a final go and let us know how it looks!
Since you asked :), WebSocket fallback can be tricky. It’s easy in the straight forward “websocket failed to connect mode”. However, some proxies will complete the 101 Switching Protocols WebSocket upgrade, but then fail any websocket frames after that. On the client, the connection looks healthy, but the only way to detect this failure mode is to try to ping the server and ensure it responds in some reasonable time. We do this for the dashboard to fallback to long polling, which can happen in two ways:
there’s a proxy/network mechanism that upgrades the websocket but drops frames
the user’s connection is slow enough that we can’t distinguish a very slow connection from the above
So in your case, it must have been one or the other. A weird proxy/firewall or very latent connection. All of this should have been transparent for you, except for the bugs in Phoenix that entered the bad reconnection cycle.
Okay, thanks for the explanation. I’m happy that there are savvy folks like you to abstract that stuff away from us
Now, as for the fix, it does seem like the fallback to long poll is working now, but everything just seems veeeeeery slow still.
This is still kind of a mystery to me because my connection is pretty decent (ping to the nearest backbone is 8ms) and I’m not behind any proxy that I know of (I’ve even tried different WiFis, went to a coffee shop, tethering to my phone etc.)
I made a video I could shoot you as a DM if you like. I’m just offering this feedback because it’s probably not the UX you’re after