How to avoid machines getting killed during long OpenAI calls

nickanthony · April 12, 2024, 12:31pm

We have an express app that uses a bullmq job queue to make calls to OpenAI.

We have to use GPT4 because our input contexts are more than 20k tokens.

Each call can take up to 2 minutes, which means that our machine gets killed while waiting for OpenAI’s API to respond.

AFAICT, OpenAI does not offer streaming responses for their completion API.

Any recommendations on how to fix this? Is this possible with fly.io?

roadmr · April 12, 2024, 1:04pm

@nickanthony machines only get “killed” if you have auto_stop_machines = true. What you can do is set that to false so they don’t auto-stop, and instead have your application exit gracefully (exit code 0), in which case the machine will be shut down waiting for a request to wake it up.

That said - auto_stop_machines does NOT kill machines at 2 minutes or less; it takes more like 5-10 minutes to consider a machine idle and stop it. Can you give more detail as to how yours are getting killed? Maybe there’s something else at play.

Regards,

nickanthony · April 12, 2024, 1:15pm

Update: OpenAI does in fact offer streaming via the nodejs library! How to stream completions | OpenAI Cookbook

I’m digging up some logs that show what we’re seeing. I’ll update in the next reply.

nickanthony · April 12, 2024, 1:52pm

These are the log of the most recent incident. I delayed the time from the SIGINT to the SIGKILL to see if anything popped up. You can see at 08:48:06 we have a log from the app, then at 08:48:20 we have a SIGINT.

At 08:49:31, we do have a failure, although it seemed implausible that SIGINT would know about the failure 10 seconds before it appears in our logs. Maybe I’m wrong there.

2024-04-12 08:49:31.000	Invalid `prisma.quote.findUnique()` invocation:
2024-04-12 08:49:31.000	prisma:error
2024-04-12 08:49:31.000	note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
2024-04-12 08:49:31.000	called `Option::unwrap()` on a `None` value
2024-04-12 08:49:31.000	thread 'tokio-runtime-worker' panicked at query-engine/core/src/query_document/selection.rs:150:51:
2024-04-12 08:49:15.000	id#5addae93-5c70-47ee item count: 15
2024-04-12 08:49:15.000	id#5addae93-5c70-47ee openai timing:: 1:55.557 (m:ss.mmm)
2024-04-12 08:49:15.000	id#5addae93-5c70-47ee tokens: {"prompt_tokens":11239,"completion_tokens":1947,"total_tokens":13186}
2024-04-12 08:48:42.000	id#6e13b095-b341-4c17 openai timing:: 1:09.559 (m:ss.mmm)
2024-04-12 08:48:42.000	id#6e13b095-b341-4c17 tokens: {"prompt_tokens":4376,"completion_tokens":1099,"total_tokens":5475}
2024-04-12 08:48:34.000	id#0a3fce07-1a67-4c00 openai timing:: 1:02.195 (m:ss.mmm)
2024-04-12 08:48:34.000	id#0a3fce07-1a67-4c00 tokens: {"prompt_tokens":4513,"completion_tokens":1097,"total_tokens":5610}
2024-04-12 08:48:26.000	id#468059f2-25bf-4cfd openai timing:: 53.962s
2024-04-12 08:48:26.000	id#468059f2-25bf-4cfd tokens: {"prompt_tokens":3860,"completion_tokens":1015,"total_tokens":4875}
2024-04-12 08:48:20.000	INFO Sending signal SIGINT to main child process w/ PID 306
2024-04-12 08:48:20.000	Downscaling app flybyrd-api from 1 machines to 0 machines, stopping machine 6e82403da15108 (region=ewr, process group=app)
2024-04-12 08:48:06.000	id#6cedc7ab-bbd5-4944 item count: 23
2024-04-12 08:48:06.000	id#6cedc7ab-bbd5-4944 openai timing:: 23.668s
2024-04-12 08:48:06.000	id#6cedc7ab-bbd5-4944 tokens: {"prompt_tokens":4876,"completion_tokens":1926,"total_tokens":6802}
2024-04-12 08:47:50.000	id#dac3ffaf-8a22-4bd9 openai timing:: 18.269s
2024-04-12 08:47:50.000	id#dac3ffaf-8a22-4bd9 tokens: {"prompt_tokens":1545,"completion_tokens":357,"total_tokens":1902}

nickanthony · April 15, 2024, 4:25pm

Does auto_stop_machines imply some sort of maximum run time before stop is called? Or is there a flag that stops the machine if there is an error?

We’ve added API streaming from OpenAI and chunked our database calls so that something runs at most every ~10 seconds. We’re still finding that auto-stop kills the app while it’s running.

We’re switching auto_stop_machines to false for now, but we’re don’t know what rules in autostop we’re violating

jfent · April 15, 2024, 5:22pm

These docs might be helpful. They explain when the proxy decides to stop machines, if auto_stop_machines is enabled.

If that doesn’t reveal anything new, you might consider handling shutdown of machines more manually. There’s some information on that lower down.

system · April 22, 2024, 5:23pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Question about streaming response and auto_stop_machines=true streams , autoscaling	1	209	March 12, 2024
OpenAI Stream Slow in Production Questions / Help rails , redis	2	379	January 22, 2024
Severe API request delays during batch processing machines , proxy	7	38	September 1, 2024
gRPC 4s delay while benchmarking Questions / Help	5	251	March 2, 2024
Advanced concurrency, scaling & load balancing Questions / Help machines	8	107	January 17, 2025

How to avoid machines getting killed during long OpenAI calls

Related topics