GPU warm up period?

zackangelo · May 4, 2024, 5:10pm

I’m working on a custom inference server and I’d like the server to be ready to serve requests as quickly as possible. I’m observing some behavior that I don’t see on local GPUs during testing: the first request takes quite a while to complete when hitting the GPU.

Is there something that happens at the hypervisor level when the GPU is used for the first time? If so, is there a way to avoid it/cache it?

nina · May 15, 2024, 9:25pm

Hmm that’s odd - do you have logs on the behavior? What GPU type and region are you using?

Are you launching a machine and waiting for it to create and serve a request, or are you starting an existing machine? Starts should be very fast.

Topic		Replies	Views
GPU reliability improvements and better integration testing Fresh Produce	0	129	June 12, 2024
Get help with Fly GPUs! gpu	26	1499	February 15, 2025
Built-in security for GPU-based FlyApps to avoid unauthorized requests waking them up? Questions / Help machines	2	34	August 26, 2024
GPU scale to zero gpu	6	625	June 27, 2024
Enable GPU Questions / Help volumes , gpu , billing	1	128	June 28, 2024

GPU warm up period?

Related topics