Does Prometheus keep scraping after a kill signal? How to avoid losing custom metrics during shutdown?

Hi all,

I’m running a service on Fly.io with custom Prometheus metrics (exposed via :9091/metrics). I have a question about how Fly handles shutdown and how it affects Prometheus scraping:

When Fly sends a kill signal (SIGINT) to a Machine during a deploy or scale-down event, does Prometheus still attempt to scrape metrics after that point? Or is the instance removed from routing immediately, making it unreachable?

If it’s the latter (no more scrapes after the kill signal), what’s the recommended way to avoid losing in-flight custom metrics during shutdown? For example, should I delay the shutdown with a setTimeout to give Prometheus time to perform a final scrape? Is there a best practice for this on Fly?

Thanks in advance!

Hi… Last I heard, logs and metrics were in a kind of limbo state, and I would avoid relying on any specific behavior there until that fog has fully cleared up.

(For logs, there’s a branch in the flyctl repository which seems to revolve around them being stored on S3 or Tigris—which would be a neat compromise! Not all branches get merged into the final product, though.)