How to avoid machines getting killed during long OpenAI calls

We have an express app that uses a bullmq job queue to make calls to OpenAI.

We have to use GPT4 because our input contexts are more than 20k tokens.

Each call can take up to 2 minutes, which means that our machine gets killed while waiting for OpenAI’s API to respond.

AFAICT, OpenAI does not offer streaming responses for their completion API.

Any recommendations on how to fix this? Is this possible with

@nickanthony machines only get “killed” if you have auto_stop_machines = true. What you can do is set that to false so they don’t auto-stop, and instead have your application exit gracefully (exit code 0), in which case the machine will be shut down waiting for a request to wake it up.

That said - auto_stop_machines does NOT kill machines at 2 minutes or less; it takes more like 5-10 minutes to consider a machine idle and stop it. Can you give more detail as to how yours are getting killed? Maybe there’s something else at play.


Update: OpenAI does in fact offer streaming via the nodejs library! How to stream completions | OpenAI Cookbook

I’m digging up some logs that show what we’re seeing. I’ll update in the next reply.

These are the log of the most recent incident. I delayed the time from the SIGINT to the SIGKILL to see if anything popped up. You can see at 08:48:06 we have a log from the app, then at 08:48:20 we have a SIGINT.

At 08:49:31, we do have a failure, although it seemed implausible that SIGINT would know about the failure 10 seconds before it appears in our logs. Maybe I’m wrong there.

2024-04-12 08:49:31.000	Invalid `prisma.quote.findUnique()` invocation:
2024-04-12 08:49:31.000	prisma:error
2024-04-12 08:49:31.000	note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-04-12 08:49:31.000	called `Option::unwrap()` on a `None` value
2024-04-12 08:49:31.000	thread 'tokio-runtime-worker' panicked at query-engine/core/src/query_document/
2024-04-12 08:49:15.000	id#5addae93-5c70-47ee item count: 15
2024-04-12 08:49:15.000	id#5addae93-5c70-47ee openai timing:: 1:55.557 (m:ss.mmm)
2024-04-12 08:49:15.000	id#5addae93-5c70-47ee tokens: {"prompt_tokens":11239,"completion_tokens":1947,"total_tokens":13186}
2024-04-12 08:48:42.000	id#6e13b095-b341-4c17 openai timing:: 1:09.559 (m:ss.mmm)
2024-04-12 08:48:42.000	id#6e13b095-b341-4c17 tokens: {"prompt_tokens":4376,"completion_tokens":1099,"total_tokens":5475}
2024-04-12 08:48:34.000	id#0a3fce07-1a67-4c00 openai timing:: 1:02.195 (m:ss.mmm)
2024-04-12 08:48:34.000	id#0a3fce07-1a67-4c00 tokens: {"prompt_tokens":4513,"completion_tokens":1097,"total_tokens":5610}
2024-04-12 08:48:26.000	id#468059f2-25bf-4cfd openai timing:: 53.962s
2024-04-12 08:48:26.000	id#468059f2-25bf-4cfd tokens: {"prompt_tokens":3860,"completion_tokens":1015,"total_tokens":4875}
2024-04-12 08:48:20.000	INFO Sending signal SIGINT to main child process w/ PID 306
2024-04-12 08:48:20.000	Downscaling app flybyrd-api from 1 machines to 0 machines, stopping machine 6e82403da15108 (region=ewr, process group=app)
2024-04-12 08:48:06.000	id#6cedc7ab-bbd5-4944 item count: 23
2024-04-12 08:48:06.000	id#6cedc7ab-bbd5-4944 openai timing:: 23.668s
2024-04-12 08:48:06.000	id#6cedc7ab-bbd5-4944 tokens: {"prompt_tokens":4876,"completion_tokens":1926,"total_tokens":6802}
2024-04-12 08:47:50.000	id#dac3ffaf-8a22-4bd9 openai timing:: 18.269s
2024-04-12 08:47:50.000	id#dac3ffaf-8a22-4bd9 tokens: {"prompt_tokens":1545,"completion_tokens":357,"total_tokens":1902}

Does auto_stop_machines imply some sort of maximum run time before stop is called? Or is there a flag that stops the machine if there is an error?

We’ve added API streaming from OpenAI and chunked our database calls so that something runs at most every ~10 seconds. We’re still finding that auto-stop kills the app while it’s running.

We’re switching auto_stop_machines to false for now, but we’re don’t know what rules in autostop we’re violating

These docs might be helpful. They explain when the proxy decides to stop machines, if auto_stop_machines is enabled.

If that doesn’t reveal anything new, you might consider handling shutdown of machines more manually. There’s some information on that lower down.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.