I work @ flyio and I’m looking for input on a feature that’s been requested a few times on our community forum; receiving machine/application events through webhooks
It’s on our feature radar & we generally believe it’s a good feature to have. Before we allocate resources towards it, we’re trying to gather more information on how it’s actually useful for users and we need your input.
You can help by answering these questions
What are your current use-cases for receiving application & machine events through webhooks?
What are you unable to do due to the lack of this feature?
What are you unable to do easily due to lack of application & machine notifications??
How are you currently working around this limitation of the platform to get these events in realtime?
Feel free to be as descriptive as possible, it helps inform our decisions.
One use case that we were looking at is for custom internal dns functionality on fly.
We were wanting to be able to have SRV records available inside our org for our apps that we could query but fly doesn’t support that.
We looked into setting up a custom dns server that would forward to Flys internal dns server, then using some sort of webhook or Nats registration we would listen to updates for apps/machines and update the dns records accordingly.
We didn’t go ahead with it and currently are using different methods of configuration to get around not having SRV dns records.
My use case would be for an app that handles the orchestration of individual machines.
Each machine is connecting to X number of chat channels via websocket connections, and since it’s rate-limited on how many channels can be connected per minute I have to distribute those connections across many machines so that when a machine restarts (or all machines restart), it doesn’t take more than a few minutes to reconnect all the channels.
Right now, I just assume if the machine goes down that it will eventually start up again and then re-establish those connections it was responsible for.
With the real-time events, I could know if a machine has failed to start up again after a set amount of time, then start up a new machine and redirect the load to that new one.
It may be overkill if the machine orchestration is sturdy enough to always bring up a replacement though.