is there a way to configure log-shipper to restart when it dies?
is there any built-in alerting to detect dead services like this?
does anyone have any tips for diagnosing an outage like this? The logs stored on fly (e.g. flyctl logs) do not go back far enough to cover the incident.
That’s suspicious. I wonder if backblaze had an issue. The machine API incident wouldn’t have affected already running machines.
You can run fly machine status <id> to get a look at the events. You might find non zero exit codes, which would point to Vector crashing (and suggest a backblaze issue).
One thing to check is the restart policy on the log shipper machines. You may want to set those to always.
You can run fly machine status <id> to get a look at the events. You might find non zero exit codes, which would point to Vector crashing (and suggest a backblaze issue).
I don’t have any machines. My log shippers are deployed as apps. Could this be part of the issue?
App
Name = my-app-name
Owner = my-org
Version = 9
Status = running
Hostname = my-app-name.fly.dev
Platform = nomad
Instances
ID PROCESS VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
abcd1234 app 9 xxx run running 0 2023-04-03T01:23:45Z
does anyone have any tips for diagnosing an outage like this? The logs stored on fly (e.g. flyctl logs) do not go back far enough to cover the incident.
If you share the name of your log shipper app, maybe i can check and see if i’ll see anything weird. I can’t promise i’ll find anything, but i can look
Weird. Your app has 0 restarts and it says it’s running but it’s “dead” on the dashboard?
I restarted manually around the time that I started this thread.
If you share the name of your log shipper app, maybe i can check and see if i’ll see anything weird. I can’t promise i’ll find anything, but i can look