@nickanthony machines only get “killed” if you have auto_stop_machines = true. What you can do is set that to false so they don’t auto-stop, and instead have your application exit gracefully (exit code 0), in which case the machine will be shut down waiting for a request to wake it up.
That said - auto_stop_machines does NOT kill machines at 2 minutes or less; it takes more like 5-10 minutes to consider a machine idle and stop it. Can you give more detail as to how yours are getting killed? Maybe there’s something else at play.
These are the log of the most recent incident. I delayed the time from the SIGINT to the SIGKILL to see if anything popped up. You can see at 08:48:06 we have a log from the app, then at 08:48:20 we have a SIGINT.
At 08:49:31, we do have a failure, although it seemed implausible that SIGINT would know about the failure 10 seconds before it appears in our logs. Maybe I’m wrong there.
2024-04-12 08:49:31.000 Invalid `prisma.quote.findUnique()` invocation:
2024-04-12 08:49:31.000 prisma:error
2024-04-12 08:49:31.000 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-04-12 08:49:31.000 called `Option::unwrap()` on a `None` value
2024-04-12 08:49:31.000 thread 'tokio-runtime-worker' panicked at query-engine/core/src/query_document/selection.rs:150:51:
2024-04-12 08:49:15.000 id#5addae93-5c70-47ee item count: 15
2024-04-12 08:49:15.000 id#5addae93-5c70-47ee openai timing:: 1:55.557 (m:ss.mmm)
2024-04-12 08:49:15.000 id#5addae93-5c70-47ee tokens: {"prompt_tokens":11239,"completion_tokens":1947,"total_tokens":13186}
2024-04-12 08:48:42.000 id#6e13b095-b341-4c17 openai timing:: 1:09.559 (m:ss.mmm)
2024-04-12 08:48:42.000 id#6e13b095-b341-4c17 tokens: {"prompt_tokens":4376,"completion_tokens":1099,"total_tokens":5475}
2024-04-12 08:48:34.000 id#0a3fce07-1a67-4c00 openai timing:: 1:02.195 (m:ss.mmm)
2024-04-12 08:48:34.000 id#0a3fce07-1a67-4c00 tokens: {"prompt_tokens":4513,"completion_tokens":1097,"total_tokens":5610}
2024-04-12 08:48:26.000 id#468059f2-25bf-4cfd openai timing:: 53.962s
2024-04-12 08:48:26.000 id#468059f2-25bf-4cfd tokens: {"prompt_tokens":3860,"completion_tokens":1015,"total_tokens":4875}
2024-04-12 08:48:20.000 INFO Sending signal SIGINT to main child process w/ PID 306
2024-04-12 08:48:20.000 Downscaling app flybyrd-api from 1 machines to 0 machines, stopping machine 6e82403da15108 (region=ewr, process group=app)
2024-04-12 08:48:06.000 id#6cedc7ab-bbd5-4944 item count: 23
2024-04-12 08:48:06.000 id#6cedc7ab-bbd5-4944 openai timing:: 23.668s
2024-04-12 08:48:06.000 id#6cedc7ab-bbd5-4944 tokens: {"prompt_tokens":4876,"completion_tokens":1926,"total_tokens":6802}
2024-04-12 08:47:50.000 id#dac3ffaf-8a22-4bd9 openai timing:: 18.269s
2024-04-12 08:47:50.000 id#dac3ffaf-8a22-4bd9 tokens: {"prompt_tokens":1545,"completion_tokens":357,"total_tokens":1902}
Does auto_stop_machines imply some sort of maximum run time before stop is called? Or is there a flag that stops the machine if there is an error?
We’ve added API streaming from OpenAI and chunked our database calls so that something runs at most every ~10 seconds. We’re still finding that auto-stop kills the app while it’s running.
We’re switching auto_stop_machines to false for now, but we’re don’t know what rules in autostop we’re violating