I’ve got a project where I’d like to process some large files that are stored on Tigris. (Large = 1-10gb).
My process is:
on HTTP trigger, copy file from Tigris to fly machine (currently using Node AWS S3 SDK v3)
do some processing to make more derivative files (primarily at the commandline with tiffsplit and vips)
send all derivative files back to Tigris
delete everything locally.
Or, at least, that’s the plan.
However, download performance from Tigris feels painfully slow, and that’s before I’ve got to working on the files. I’ve tried various combinations of CPU, RAM, shared/performance and… it just feels like a network constraint.
Is there something I’m missing?
About the best performance I’ve had so far is mounting Tigris via geesefs, and just using cp. But I’m trying to work out if there’s something else I should be doing here.
It looks like the issue was related to Tigris configuration following issues in the lhr region - I was inadvertently moving data further than planned. A quick prod suggests data is coming down a lot faster now, to the extent the task I’m working on feels viable again.
Just to add to what you said, we had deactivated the LHR region on our side temporarily, as Fly.io had some capacity issues there. Once, we verified that the capacity issues were no longer there, we reactivated the LHR region on our end.