Tigris global storage released a migration tool: shadow buckets

Tigris global object storage, now in private beta, released a new migration tool called shadow buckets.

Shadow buckets enable transparent copying and writing of objects as they are requested or uploaded. This is helpful for a few scenarios, like:

  • Avoiding egress fees from large data migrations
  • Avoiding downtime by serving requests from Tigris while running a data migration
  • Testing Tigris global cache performance
  • Testing Tigris functionality with the option to switch back to your current provider

Learn more about this in our documentation, or check out usage with flyctl storage create --help.

11 Likes

This is pretty neat.

Also FYI, you linked the wrong private beta. Should be Global caching object storage on Fly.io in private beta

1 Like

Just to add to what @joshua-fly wrote, here is what the migration process looks like with the shadow bucket feature.

When creating a new bucket, you can specify the bucket from where the data should be migrated. We call this the shadow bucket.

This is how the process works:

  • When you upload a new object to the Tigris bucket, it is first uploaded to the shadow bucket and then copied to the Tigris bucket. This ensures that the data is always available in the shadow bucket and is eventually consistent in the Tigris bucket.
  • When you request an object, we first check if the object exists in the Tigris bucket. If it does, we return it. If it doesn’t, we check the shadow bucket and return the object from there. Then, we copy the object to the Tigris bucket to make it available for future requests.
  • The objects in the Tigris bucket are automatically stored in the region closest to the user.
  • When you delete an object, we delete it from the Tigris bucket and the shadow bucket.
5 Likes

Has the issue with deleted buckets not being removed from flycyl been resolved?

Hi @atridad yes the issue is resolved. If you recently tried and running into the same issue, can you send the bucket names to the support email, I’ll look into it.

If any parameter, such as a shadow region, is wrong, an internal error will occur, and it would be nice to have a more friendly error message for that.

Thanks for the feedback. I’ll take care of it.

1 Like

And other things:

  • Allow users to see the usage space of the entire bucket
  • Document the way to do a bulk migration, not just an incremental migration

etc. may also be a good improvement.

We are working on adding stats on the dashboard. Also, this information should be available in flyctl storage list command, I’ll work on it. Where else do you think this information should be available?

I have asked team to document some bulk migration strategies.

For now you can use the AWS CLI to calculate the total storage used in a bucket:

  1. Configure the AWS CLI following AWS CLI | Tigris Object Storage Documentation
  2. Run aws s3 ls --summarize --human-readable --recursive s3://my-bucket
2 Likes

The storage and object count information is now available on Tigris console. See the following announcement:

I will keep you posted on any further enhancements around this.

1 Like

Am I missing something about how this is supposed to work? I created a bucket, and then ran:

fly --verbose --debug storage update <storage-name> --shadow-endpoint <endpoint> --shadow-name <name> --shadow-region <region> --shadow-access-key <id> --shadow-secret-key <key>

The results I got:

Error: input:3: updateAddOn We encountered an internal errors, please try again.

Stacktrace:
goroutine 1 [running]:
runtime/debug.Stack()
        /opt/hostedtoolcache/go/1.21.7/x64/src/runtime/debug/stack.go:24 +0x64
github.com/superfly/flyctl/internal/cli.printError(0x14000065360, 0x14000b65b26, 0x1041ca100?, {0x1045acca0, 0x1400000d098})
        /home/runner/work/flyctl/flyctl/internal/cli/cli.go:162 +0x3c8
github.com/superfly/flyctl/internal/cli.Run({0x1045c74f0?, 0x1400085f5c0?}, 0x14000065360, {0x140001b6010?, 0xf, 0xf})
        /home/runner/work/flyctl/flyctl/internal/cli/cli.go:110 +0x7bc
main.run()
        /home/runner/work/flyctl/flyctl/main.go:47 +0x174
main.main()
        /home/runner/work/flyctl/flyctl/main.go:26 +0x20

Hi, @indirect,

Looks like endpoint parameter you are providing is not publicly accessible hostname, so our service cannot resolve DNS name for it and connect.

Also, we are working on better error response in such cases.

Got it. The hostname I am providing is an instance of minio, running inside my Fly organization. Is it not possible to shadow a bucket hosted inside my org?

As soon as endpoint is accessible over internet, it can be hosted inside Fly organization.

@indirect Every org in Fly has its own private network. The .internal domain points to the private addresses of the application, which are not accessible outside of the org network. Which is why the instance of minio is not accessible to us. If you are able to make the minio instance accessible then you can use the shadow bucket feature to transparently move the objects over to Tigris.

1 Like

Hey. Would it be possible to point the shadow bucket to a public bucket? Right now I have to provide access and secret keys as well as the endpoint instead of a custom domain. It’s not a problem in and of itself, but my objects in R2 do not specify cache-control - Cloudflare cache rules for the public bucket do. In other words, when Tigris copies data from the shadow bucket, it doesn’t find cache-control headers, because those are set for the custom domain access. I then end up with max-age=3600 instead of max-age=31536000 when requesting objects from the Tigris bucket.

@nph I apologize for the delayed response. Currently, we do not support using a shadow bucket with a public bucket. To be able to support the use case you have mentioned we would need a way to access the content in your public bucket through the Cloudflare cache so we have access to the headers it sets. This is not in the roadmap right now. Is it problematic that the max-age gets set to 3600 and what issues do you see with that?

The cache in front of R2 is completely transparent. I just give you the public URL of a bucket. Say a request comes in for https://bucket.fly.storage.tigris.dev/key, then you copy the object from the shadow bucket at https://bucket.domain.com/key. It’s much simpler than the status quo as no authentication is involved.

My objects are immutable, so I try to cache for as long as possible. However, if successful ETag validations (i.e. 304 responses) don’t count towards GET requests/are not billable, then 3600 is a non-issue.

1 Like

That would work for migrating objects as they are read. But how about object writes? The way shadow bucket works today is that it is able to migrate objects both as they are written and read.