Deploying fasttrackr.ai-like AI Assistants on Fly.io - Any Gotchas?

I’m experimenting with deploying a lightweight AI assistant platform (similar to Fasttrackr.AI It is like a AI Stack For RIAs & Financial Advisors) on Fly io to reduce latency for a global user base.

The setup involves FastAPI + queue workers, and some endpoints hit external LLM APIs. A few questions:

  1. Has anyone had success minimizing cold starts for sporadically-used AI endpoints?
  2. Are there patterns for region-specific routing when users log in from multiple countries?
  3. For usage that fluctuates during the workday (like a productivity tool), is there a cost-efficient autoscaling strategy?

Appreciate any real-world advice - especially if you’ve worked on a Fasttrackr.AI style app or anything AI-heavy on Flyio.