Lack of capacity? no problem, we have you covered

dangra · July 10, 2024, 7:33pm

Apologies for the catchy title, the true story behind it is much more boring but impactful on difficult times.

Since last week we have been testing a new capability. You may have heard about machine migrations and capacity rebalances already, where we move machines to a new host to ensure they can start successfully. These are triggered on extreme conditions like when there is a host degradation due to hardware failure, or when load is reaching a point that it starts affecting others tenants.

But sometimes those conditions aren’t met, and yet the dormant machines waiting for incoming requests fail to start due to lack of capacity. For example, stopped GPU machines may not start because there aren’t available GPU cards on its host at that point in time.

To overcome this limitation, we’re enabling auto-migration, aka moving a machine from one host to another with idle resources on start, for GPU machines and all non-volume attached machines. Non-GPU machines with volumes attached won’t be automatically migrated by this capability (to avoid potential issues with Postgres).

That’s it. You don’t have to do anything, it works behind the scenes to ensure your app is up when needed.

amo · July 10, 2024, 8:21pm

Amazing! Thank you for the update.

Two questions come to my mind:

Do auto-migrated machines stay in the same region or may it be auto-migrated to another region?
Is there or will there be a way to auto-migrate non-GPU machines with volumes attached that are not Postgres machines?

dangra · July 10, 2024, 9:09pm

@amo glad you liked it.

Yes, auto-migrations are always within the same region
Not yet, but we plan to revisit this decision at some point

Topic		Replies	Views
PSA: Machine migration has started again Fresh Produce	25	1470	August 23, 2024
App is now randomly "Not Deployed" - Why?	8	560	August 3, 2021
Postgres database down after v2 migration: "unable to use requested volume due to capacity constraints" Questions / Help postgres	5	565	November 13, 2023
Capacity Fixes & Improvements Fresh Produce	6	593	December 16, 2023
Enable GPU Questions / Help volumes , gpu , billing	1	129	June 28, 2024

Lack of capacity? no problem, we have you covered

Related topics