Nothing in our infrastructure serves 401s, so if people were getting 401s it was almost definitely from the app process.
Concurrency can go to zero if requests get really fast. 401s seem like they might happen very fast? If that’s true, you might’ve seen a change in response times during that interval as well.
Deploying will often put VMs on new hosts with new IPs. If you’re relying on an upstream API or service, it’s possible it rate limited your existing VMs and the deploy just worked around the rate limit.
Concurrency is exactly the number of requests that are in progress when we scrape the metric (every 15s). When requests are coming back very fast, that gauge usually hangs out at 0. It’s a little counter intuitive.
@kurt We deug deeper and we still think there is an issue with Fly. But it happened earlier in our pipeline. We think that the app name we sent via the CLI to deploy was not honoured somehow.
Take a look at two job outputs for the same workflow in our GH CI. The app names in the env are preview ones, and we’ve been using this bash to script flyctl for months without issue until now. We don’t think there is a bug in our ci code. We see in the flyctl output that it pushes to our production app image registry, not the preview one.
Given that this stuff coincided around the same time as Fly was having deployment issues on their platform, we are inclined to believe the issue affected our deployment pipelines. Could you confirm this is possible?
The “No deployment available to monitor” error is the issue we were having. Deploys are working fine in the background, but the data we sync to update flyctl is lagging.
Those mismatched app names make me think there’s an app = "pdp-zebra" in the fly.toml. The env var is pdp-3118-zebra, but the deploy is definitely happening against pdp-zebra, note the line about the config with fly.toml.
We can look up previous release changes for you. You all are paying, I think, so if you just choose the appropriate plan here, you’ll get a paid support email. Those plans are basically a minimum commitment, they won’t cost you anything else: Plan Pricing · Fly
Those mismatched app names make me think there’s an app = "pdp-zebra" in the fly.toml. The env var is pdp-3118-zebra, but the deploy is definitely happening against pdp-zebra, note the line about the config with fly.toml.
Yep that’s true, that’s what you see on line 238 for example where it says:
“An existing fly.toml file was found for app pdp-mammoth”.
But once we run flyctl we are always passing the --app flag, or, in the create case, using flyctl launch, the --name flag.
We are wondering if its possible that the flags passed failed to be respected, thus not overriding the fly.toml file.
We only had problems with flyctl during the noted Fly platform issues, its bash/CI code we’ve not touched, nor ever had issues with like this, for around 90+ days now.
We can re-run these jobs and see different results now in CI than they emitted during said problem period on Dec 19.
We can look up previous release changes for you. You all are paying, I think, so if you just choose the appropriate plan here, you’ll get a paid support email. Those plans are basically a minimum commitment, they won’t cost you anything else: Plan Pricing · Fly
Yep we’re paying so we can go through that channel.
Here is an example of re-running the CI job on GH (left is old, bad; right is new re-run working), it shows the output diff we appear to have gotten from Fly:
From our point of view we’re still pretty sure the fault wasn’t on our side. At the same time we’ve moved on since this and no longer use the same token/account between production/preview making this kind of cross stage deployment impossible for us going forward.