Just to add to what @joshua-fly wrote, here is what the migration process looks like with the shadow bucket feature.
When creating a new bucket, you can specify the bucket from where the data should be migrated. We call this the shadow bucket.
This is how the process works:
When you upload a new object to the Tigris bucket, it is first uploaded to the shadow bucket and then copied to the Tigris bucket. This ensures that the data is always available in the shadow bucket and is eventually consistent in the Tigris bucket.
When you request an object, we first check if the object exists in the Tigris bucket. If it does, we return it. If it doesn’t, we check the shadow bucket and return the object from there. Then, we copy the object to the Tigris bucket to make it available for future requests.
The objects in the Tigris bucket are automatically stored in the region closest to the user.
When you delete an object, we delete it from the Tigris bucket and the shadow bucket.
Hi @atridad yes the issue is resolved. If you recently tried and running into the same issue, can you send the bucket names to the support email, I’ll look into it.
If any parameter, such as a shadow region, is wrong, an internal error will occur, and it would be nice to have a more friendly error message for that.
We are working on adding stats on the dashboard. Also, this information should be available in flyctl storage list command, I’ll work on it. Where else do you think this information should be available?
I have asked team to document some bulk migration strategies.
Got it. The hostname I am providing is an instance of minio, running inside my Fly organization. Is it not possible to shadow a bucket hosted inside my org?
@indirect Every org in Fly has its own private network. The .internal domain points to the private addresses of the application, which are not accessible outside of the org network. Which is why the instance of minio is not accessible to us. If you are able to make the minio instance accessible then you can use the shadow bucket feature to transparently move the objects over to Tigris.
Hey. Would it be possible to point the shadow bucket to a public bucket? Right now I have to provide access and secret keys as well as the endpoint instead of a custom domain. It’s not a problem in and of itself, but my objects in R2 do not specify cache-control - Cloudflare cache rules for the public bucket do. In other words, when Tigris copies data from the shadow bucket, it doesn’t find cache-control headers, because those are set for the custom domain access. I then end up with max-age=3600 instead of max-age=31536000 when requesting objects from the Tigris bucket.
@nph I apologize for the delayed response. Currently, we do not support using a shadow bucket with a public bucket. To be able to support the use case you have mentioned we would need a way to access the content in your public bucket through the Cloudflare cache so we have access to the headers it sets. This is not in the roadmap right now. Is it problematic that the max-age gets set to 3600 and what issues do you see with that?
The cache in front of R2 is completely transparent. I just give you the public URL of a bucket. Say a request comes in for https://bucket.fly.storage.tigris.dev/key, then you copy the object from the shadow bucket at https://bucket.domain.com/key. It’s much simpler than the status quo as no authentication is involved.
My objects are immutable, so I try to cache for as long as possible. However, if successful ETag validations (i.e. 304 responses) don’t count towards GET requests/are not billable, then 3600 is a non-issue.
That would work for migrating objects as they are read. But how about object writes? The way shadow bucket works today is that it is able to migrate objects both as they are written and read.