I need to run scripts that make multiple ML/AI calls, most of which are third party services. Is it possible to use Fly to optimize the execution location to minimize the latency between the script and the ML/AI services rather than between user and machine?
Yes, you can of course place your fly app in a region close to the servers you’re talking with.
Most times it’s better to place your app close to your users though.
Fly machines are run in datacenters with very good internet connections. These AI services will also live in datacenters with very good internet connections.
The AI use-case means that these requests take a long time (in the context of networking), so placing your app closer to your users will improve usability more than gaining some 10-100ms extra on just these API calls between the very well-connected servers. (that will take time on the order of seconds anyway)
What is the typical latency of your ML/AI request? Does it make sense to optimize regional latency to save 100ms if the expected latency from users using an LLM API to be in the seconds?
This may be of use: Dynamic Request Routing with fly-replay · Fly Docs