Failed to create volume: no capacity available in eze

Since the problem in eze I had to rebuild my postgre cluster but now I can’t because there is no capacity available in eze?
So I am at two weeks from launch after beeing more than 4 months in tests (paying for the resources here) just to no being able to launch because of the lack of capacity?
I don’t know what I’m supossed to do now…
I need to be in eze because here is where my users are, I don’t want extra latency, the performance is really great when servers are close to your users so another region is not good for me.
Is there any information regarding this issue?
It is going to be solved or should I ask for a refund and find another provider?
It’s not a rant, I’m genuinely asking because this is giving me a ulcer…

That also means that we cannot scale our current apps to provision more machines?
I think this is horrible…

Hi, Jerónimo… Sorry to hear this is giving you so much trouble, :adhesive_bandage:. I was able to create a small Machine with a volume in eze just now, and the new capacity API shows a fair amount of free space in that part of the world. Consequently, I don’t think the region as a whole is blocked off.

Perhaps you’re running into the confusing error message mentioned in this other recent thread?

https://community.fly.io/t/vm-won-t-launch-volume-stuck-after-host-incident-in-eze-2-weeks-ago/25508

1 Like

to my understanding, this issue is because OP is trying to create a 3-node cluster and we only have two nodes eligible for volumes in eze. our upstream provider in the region is hoping to get replacement hardware soon.

I agree it’s unacceptable to advertise regions and then not have capacity for a basic 3-node cluster; I don’t have any resolution right now but will be looking into how we can prevent this kind of issue in the future (whether it’s procuring hardware differently, or not advertising this region at all if we can’t procure hardware in an acceptable manner).

2 Likes

Thanks, Lillian, I’ll wait a couple days for an answer or I’ll gently ask for a refund because I start to feel like we lost our time here, we were happy with the platform but this is a big no-no if it doesn’t get solved properly…

Hi Jeronimo,
With two hosts eligible in eze, and if you’re okay with host failure possibly requiring manual action to recover from, you can Create a Postgres cluster with two machines (fly postgres create and select “custom configuration” and only two nodes).

If your primary fails, this will require you manually promoting the remaining replica to primary.

You can add a third machine fly machine clone existing_machine_id --volume-requires-unique-zone=false, but this will inevitably place two machines on the same host; if the host with two machines fails and the primary was there, you’ll still need to manually fix things.

It’s definitely a workaround kind of thing and the resiliency implications require careful consideration as to whether this is usable for you (definitely put in place several levels of backup and a disaster recovery plan and rehearse them thoroughly), but it might enable you to move forward while we get more hardware in place to allow a fully-HA cluster to be created.

  • Daniel

Hi Daniel, I thought of that too but I’m not sure it will fit because my backend is also in that zone and we scale that to 20-30 machines and now it seems there is no confidence in that either, until the problem is resolved we cannot be sure that scale operationes will go as expected.
Also, the thing with taking care of extra steps and workarounds is that is more work to be done for something that, seemed, to be taken care of.
We also were about to deliver the third service that works with a set of 5 minimun and that can’t be done either so, I’ll wait for a solution a week or so while we try to find alternatives.
I say this but do not take it lightly, we were working in the platform for the last 5 months and we were about to move from our actual cloud servers, we start to feel like we lost these months…

It shouldn’t be a problem at all to scale your backend to 30 machines or 5 machines. The database only has this issue because it tries to place machines on individual hosts, as @roadmr mentioned; app instances without volumes won’t have this issue as the database (there’s quite a bit of spare capacity in general).

I understand, we will discuss this a bit further with the team, but the Postgre cluster is a must, and the necessity to mannualy swith to a replica in another region is a show stopper for a small team like us.
Nonetheless we will discuss it and try the suggestion., thanks.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.