We have an application deployed on a Fly instance that is intermittently not receiving broadcasts from Phoenix PubSub. On our development machines nothing is missed, on Fly, it’s random, but at least 50% of the messages that are broadcast are not picked up by the subscription. In addition it’s not certain subscriptions that are failing, it’s random there too.
As far as I can tell we have a vanilla PubSub implementation, and our Fly install is standard other than we are connectiong to an external database on CrunchyBridge. Our Fly install is running on two machines “shared-cpu-1x”.
I’m posting here as it feels like a network issue, and I’m hoping someone has some ideas about where we can look to troubleshoot this further.
Thanks in advance.
EDIT I should add to this that there are no errors being shown.
The 1/2 error rate with two Machines suggests that they’re maybe thinking of themselves as being two separate clusters. (I.e., not really talking to each other.)
Could you perhaps post the section of your code that defines the clustering—particularly config :libcluster and env.sh.eex?
Just to avoid ambiguity… [] is what I was expecting you to get (based on your original problem description); it confirms the hypothesis that there are no other nodes in that cluster.
Ok, more broadly what code do you have that does cluster discovery?
Possibilities include libcluster, dns_cluster, and peerage.
Or someone else in your group might have implemented something specific to the details of your local environment.
Thanks for your help. We didn’t have the clustering set up, we had, rather naively assumed that if we had two machines as a default they’d be clustered. Lesson learned.