I have recently deployed a rust axum web service onto fly.io, and while testing (alone, so definitely not heavy usage) I have noticed that the response time becomes incredibly long intermittently (p99 > 15s). For more context on the web service,
- Memory and CPU utilization looks fine when the response time spiked, so it is probably i/o.
- The service is hooked to an external postgres database, but I have verified that the database connection/query is not causing the spike.
- My biggest suspicion is on the internal HTTP requests some of the API calls make to itself (bad design I know, but it is what it is). An API can make multiple internal requests to http://localhost:9000/xxx.
- This is the configuration I deployed with in fly.toml
[http_service]
internal_port = 9000
force_https = true
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
processes = [“app”]
[http_service.concurrency]
type = “requests”
hard_limit = 1000
soft_limit = 800
[[http_service.checks]]
interval = “1m0s”
timeout = “5s”
grace_period = “10s”
method = “GET”
path = “/health”
- I am quite clueless when it comes to networking, could the issue be because it is calling localhost instead of 127.0.0.1/0.0.0.0, or because it is making http calls when force_https is set to true?
Thank you for your time