Tigris global storage released a migration tool: shadow buckets

joshua-fly · February 2, 2024, 2:53pm

Tigris global object storage, now in private beta, released a new migration tool called shadow buckets.

Shadow buckets enable transparent copying and writing of objects as they are requested or uploaded. This is helpful for a few scenarios, like:

Avoiding egress fees from large data migrations
Avoiding downtime by serving requests from Tigris while running a data migration
Testing Tigris global cache performance
Testing Tigris functionality with the option to switch back to your current provider

Learn more about this in our documentation, or check out usage with flyctl storage create --help.

travtarr · February 2, 2024, 5:25pm

This is pretty neat.

Also FYI, you linked the wrong private beta. Should be Global caching object storage on Fly.io in private beta

ovaistariq · February 2, 2024, 7:25pm

Just to add to what @joshua-fly wrote, here is what the migration process looks like with the shadow bucket feature.

When creating a new bucket, you can specify the bucket from where the data should be migrated. We call this the shadow bucket.

This is how the process works:

When you upload a new object to the Tigris bucket, it is first uploaded to the shadow bucket and then copied to the Tigris bucket. This ensures that the data is always available in the shadow bucket and is eventually consistent in the Tigris bucket.
When you request an object, we first check if the object exists in the Tigris bucket. If it does, we return it. If it doesn’t, we check the shadow bucket and return the object from there. Then, we copy the object to the Tigris bucket to make it available for future requests.
The objects in the Tigris bucket are automatically stored in the region closest to the user.
When you delete an object, we delete it from the Tigris bucket and the shadow bucket.

atridad · February 4, 2024, 8:39pm

Has the issue with deleted buckets not being removed from flycyl been resolved?

adil · February 4, 2024, 11:12pm

Hi @atridad yes the issue is resolved. If you recently tried and running into the same issue, can you send the bucket names to the support email, I’ll look into it.

smorimoto · February 27, 2024, 8:18am

If any parameter, such as a shadow region, is wrong, an internal error will occur, and it would be nice to have a more friendly error message for that.

adil · February 27, 2024, 7:11pm

Thanks for the feedback. I’ll take care of it.

smorimoto · February 29, 2024, 4:10pm

And other things:

Allow users to see the usage space of the entire bucket
Document the way to do a bulk migration, not just an incremental migration

etc. may also be a good improvement.

adil · March 1, 2024, 12:57am

We are working on adding stats on the dashboard. Also, this information should be available in flyctl storage list command, I’ll work on it. Where else do you think this information should be available?

I have asked team to document some bulk migration strategies.

ben-io · March 1, 2024, 11:11pm

For now you can use the AWS CLI to calculate the total storage used in a bucket:

Configure the AWS CLI following AWS CLI | Tigris Object Storage Documentation
Run aws s3 ls --summarize --human-readable --recursive s3://my-bucket

adil · March 5, 2024, 10:46pm

The storage and object count information is now available on Tigris console. See the following announcement:

I will keep you posted on any further enhancements around this.

indirect · March 6, 2024, 12:47am

Am I missing something about how this is supposed to work? I created a bucket, and then ran:

fly --verbose --debug storage update <storage-name> --shadow-endpoint <endpoint> --shadow-name <name> --shadow-region <region> --shadow-access-key <id> --shadow-secret-key <key>

The results I got:

Error: input:3: updateAddOn We encountered an internal errors, please try again.

Stacktrace:
goroutine 1 [running]:
runtime/debug.Stack()
        /opt/hostedtoolcache/go/1.21.7/x64/src/runtime/debug/stack.go:24 +0x64
github.com/superfly/flyctl/internal/cli.printError(0x14000065360, 0x14000b65b26, 0x1041ca100?, {0x1045acca0, 0x1400000d098})
        /home/runner/work/flyctl/flyctl/internal/cli/cli.go:162 +0x3c8
github.com/superfly/flyctl/internal/cli.Run({0x1045c74f0?, 0x1400085f5c0?}, 0x14000065360, {0x140001b6010?, 0xf, 0xf})
        /home/runner/work/flyctl/flyctl/internal/cli/cli.go:110 +0x7bc
main.run()
        /home/runner/work/flyctl/flyctl/main.go:47 +0x174
main.main()
        /home/runner/work/flyctl/flyctl/main.go:26 +0x20

Yevgeniy · March 6, 2024, 4:33am

Hi, @indirect,

Looks like endpoint parameter you are providing is not publicly accessible hostname, so our service cannot resolve DNS name for it and connect.

Also, we are working on better error response in such cases.

indirect · March 6, 2024, 5:03am

Got it. The hostname I am providing is an instance of minio, running inside my Fly organization. Is it not possible to shadow a bucket hosted inside my org?

Yevgeniy · March 7, 2024, 4:45am

As soon as endpoint is accessible over internet, it can be hosted inside Fly organization.

ovaistariq · March 7, 2024, 6:11am

@indirect Every org in Fly has its own private network. The .internal domain points to the private addresses of the application, which are not accessible outside of the org network. Which is why the instance of minio is not accessible to us. If you are able to make the minio instance accessible then you can use the shadow bucket feature to transparently move the objects over to Tigris.

nph · March 26, 2024, 10:12am

Hey. Would it be possible to point the shadow bucket to a public bucket? Right now I have to provide access and secret keys as well as the endpoint instead of a custom domain. It’s not a problem in and of itself, but my objects in R2 do not specify cache-control - Cloudflare cache rules for the public bucket do. In other words, when Tigris copies data from the shadow bucket, it doesn’t find cache-control headers, because those are set for the custom domain access. I then end up with max-age=3600 instead of max-age=31536000 when requesting objects from the Tigris bucket.

ovaistariq · April 16, 2024, 5:07am

@nph I apologize for the delayed response. Currently, we do not support using a shadow bucket with a public bucket. To be able to support the use case you have mentioned we would need a way to access the content in your public bucket through the Cloudflare cache so we have access to the headers it sets. This is not in the roadmap right now. Is it problematic that the max-age gets set to 3600 and what issues do you see with that?

nph · April 16, 2024, 9:53pm

The cache in front of R2 is completely transparent. I just give you the public URL of a bucket. Say a request comes in for https://bucket.fly.storage.tigris.dev/key, then you copy the object from the shadow bucket at https://bucket.domain.com/key. It’s much simpler than the status quo as no authentication is involved.

My objects are immutable, so I try to cache for as long as possible. However, if successful ETag validations (i.e. 304 responses) don’t count towards GET requests/are not billable, then 3600 is a non-issue.

ovaistariq · April 16, 2024, 11:57pm

That would work for migrating objects as they are read. But how about object writes? The way shadow bucket works today is that it is able to migrate objects both as they are written and read.

Topic		Replies	Views
Global caching object storage on Fly.io in private beta Fresh Produce storage	49	3052	February 5, 2024
Our Experience Dealing with Issues of Tigris Reliability, 500 Responses, and Docs: Questions & Requests wishlist , help-me-help-you , docs , storage , tigris	6	499	February 20, 2025
We’re enabling billing for Tigris in July Fresh Produce storage	10	757	June 12, 2024
AWS S3 Migration for Object ACL not working properly Questions / Help flyctl , storage , tigris	6	109	December 12, 2024
Bottomless S3-backed volumes Fresh Produce storage , volumes	28	6452	April 20, 2025

Tigris global storage released a migration tool: shadow buckets

Related topics