Laravel Reverb websocket messages getting dropped

This is quite strange. I have an Event that ShouldBroadcastNow:

<?php

namespace App\Events;

use App\Models\Run;
use App\Models\Input;
use Illuminate\Broadcasting\InteractsWithSockets;
use Illuminate\Broadcasting\PrivateChannel;
use Illuminate\Contracts\Broadcasting\ShouldBroadcastNow;
use Illuminate\Foundation\Events\Dispatchable;
use Illuminate\Queue\SerializesModels;
use Illuminate\Support\Facades\Log;

class RunStreamed implements ShouldBroadcastNow
{
    use Dispatchable, InteractsWithSockets, SerializesModels;

    // ...constructor and channels cut...

   public function broadcastWith(): array
    {
        $data = [
            'thread_id' => $this->run->thread->id,
            'output_id' => $this->run->output->id,
            'input_id' => $this->input->id,
            'word' => $this->word,
        ];

        Log::info('RunStreamed preparing broadcast data', [
            'data' => $data,
            'timestamp' => microtime(true)
        ]);

        return $data;
    }
}

When I look at my production logs, I see RunStreamed preparing broadcast data logs accurately show each word in the logs. I’d expect each word to be broadcast by Laravel Reverb.

I’m running php artisan reverb:start --debug on production and the logs for that show every few events not show up in the Reverb logs. Meaning, every few words aren’t broadcast down to ws.

I am basically running broadcast() in a loop and maybe that’s the problem? I’m not sure.

        // In some worker...
        foreach ($response as $chunk) {
            $word = $chunk->choices[0]->delta->content ?? '';

            if ($word !== null) {
                $event = new RunStreamed($this->run, $this->input, $word);
                broadcast($event);
            }
        }

Are there some Reverb limits that aren’t well documented? I Feel like running a broadcast in a loop, regardless of how fast the loop is, shouldn’t cause issues.

I’ve tried:

  1. Changing from ShouldBroadcastNow to just ShouldBroadcast
  2. Tried chunking words, up to 10 at a time, that still makes Reverb skip broadcasting events
  3. Have removed any DB calls from RunStreamed so DB isn’t the bottleneck
  4. There are virtually no users, so the load is 0 and it is on a beefy machine

Quite stumped because I’m unable to repro this on localhost.

1 Like

I don’t have an answer for you, but maybe some ideas.

Do any browsers have (occasional) difficulty making a websocket connection?

Can you do anything in Laravel to list the connected clients, to see if this agrees with the number of clients you think you have?

Have you tried other browsers, in case you are bumping into a browser-specific websocket oddity?

If you were to put a short usleep in the loop, does that change the behaviour?

I assume that the local test is connecting over IPv4. Is the Fly test also connecting over IPv4?

1 Like

I tried this too yesterday, and it had no change.

OK this is really bizarre. I spent some time at a park getting away from this problem. I came back. Saw your comment, and said heck let me test the same thing out in the Arc browser. And boom it all works fine. And now even Chrome is working fine?
I’m really not sure. My last deploy was 6 hours ago when it wasn’t, and now it is working fine.

So for now, thank you. I think this is Schrödinger’s websocket. :man_shrugging:

When it’s happening again, and I have more insight, I’ll share 'em here.

I figured out the root cause! Because fly brings up two machines by default (one primary and one as back up) my stress test was actually waking up both machines.

So user connects to ws machine A but the responses were being streamed from both ws machine A and B and this doesn’t work for ws. We need a “sticky” way for both the messages to be sent <> received from ws machine A if that’s what the user is connecting to. Reverb out of the box isn’t meant to be running multiple machines binding to the same host and port.

Without over engineering a solution, is there a way to tell fly to not create a backup machine? Like I literally just want 1 machine for my reverb server that binds to the host / port.

For now, I’ve changed min_machines_running = 0.

[[services]]
  internal_port = 2053
  protocol = 'tcp'
  processes = ['reverb']
  auto_stop_machines = 'stop'
  auto_start_machines = true
  min_machines_running = 0

  [[services.ports]]
    handlers = ['tls']
    port = 2053
1 Like

Single-machine apps on Fly.io have major caveats, including the potential for total data loss, so it’s not really overengineering to have ≥2, :sweat_smile:.

Having said that, if you understand the risks in full, you can use fly m destroy to remove the unwanted Machine, and use --ha=false in future fly deploy and fly launch invocations.

Hope this helps a little!

1 Like

Yep, I think you can just do flyctl scale count 1, and it will only ever have one machine in the app.