GPU L40S machines crash mid-validation with "Possible seccomp violation", yolo11m specific

ilia1 · March 26, 2026, 4:04pm

Hi,

We’re running PyTorch training jobs on L40S GPU machines (ord region) and hitting a consistent crash during the validation step of epoch 1:

==== Possible seccomp violation ====

Virtual machine exited abruptly

What we’ve ruled out:

Not memory — happens at both 24GB and 32GB RAM
Not batch size — tried 8, 16, 32
Not dataset — happens on two different datasets
Model-specific: only yolo11m crashes. yolo11n, yolo11s, yolo11l, yolo11x all run fine on the same L40S machines with identical config
Crash always happens mid-validation (~batch 79/157), never during training
Moving to A100 fixes it — confirms this is L40S-specific

Setup: PyTorch 2.5.1, CUDA 12.4, Ultralytics YOLO, region ord

Has anyone seen this or found a workaround?

halfer · March 26, 2026, 5:11pm

I don’t have GPU experience, but from your description, I’d say this is not Fly-specific, and could/should be reported to PyTorch or one of the downstream libraries.