Rarely, the database provider that I use requires a maintenance window, or to restart the postgres instance. When this happens, my scripts sometimes issue an unexpected disconnect error:
2023-12-12T21:43:01.305 app[] dfw [info] return super().execute(*args, **kwargs)
2023-12-12T21:43:01.305 app[] dfw [info] psycopg2.OperationalError: server closed the connection unexpectedly
2023-12-12T21:43:01.305 app[] dfw [info] This probably means the server terminated abnormally
2023-12-12T21:43:01.305 app[] dfw [info] before or while processing the request.
2023-12-12T21:43:01.305 app[] dfw [info] The above exception was the direct cause of the following exception:
2023-12-12T21:43:01.305 app[] dfw [info] Traceback (most recent call last):
The script catches this exception and marks the job as being in an error state, and next time the script starts up, I do my best to recover.
The problem I’m having is that it seems like sometimes my script gets “stuck” after such an error, and just hangs indefinitely.
Is there any way to detect such a state in fly, and restart the process?