Global distribution of files for low latency reads

Are there any plans to be able to distribute files to specific datacenters - ie: I want everything in Bucket X to automatically be distributed to Datacenter X, Y and Z? We have a need for quick reads everywhere but writes can be slow (ie: write to only one datacenter).

If not, is the FDB metadata distributed everywhere? Can I do a query for a specific file’s last update time? I could then compare it to local data and retrieve file locally or pull down a new version if the last file update is newer than what I have. Or is the metadata also going to be centralized and not distributed to every datacenter?

Tigris smartly distributes data to ensure it is close to the users. This is done by default. Both metadata and data blocks are globally distributed, making them accessible from anywhere in the world. For your use case, you can read an object from any region and apply the logic of comparing the timestamp. For example, you can write to region ‘X’ and read from region ‘Y’.

Regarding distribute files to specific datacenters - If you want to restrict an object to specific regions or a single region, we offer a feature that allows you to control this. Take a look at this doc

1 Like

Added distributed, tigris

How exactly does Caching on PUT (Eager Caching) work? Let’s say I update a copy of a file in Virginia where it was originally PUT. There is a cached copy in Europe. When I utilize Eager Caching, will the copy in Europe be a) invalidated on the put b) replaced by the PUT c) replaced when another read occurs in Europe since its got a different timestamp or d) nothing, the cached copy remains until it expires.

Eager Caching or Cache-on-read(the default caching semantics) both has the behavior on invalidating the cache on put i.e. the cached entry in any region will be invalidated on the put.

Cool that works. One other question - in the docs it states that an object will be cached in the datacenter if there is enough usage. If the object is written in Virginia and a user makes a request for the object from Seattle, will the doc be retrieved in Virginia and then immediately cached in Seattle after the 1st request? Or is there a gating algorithm that determines how many requests need to be made before its cached in Seattle?

I’ll explain what the use case is. We currently have a database cache that is spread among 6 locations. There is a single read-write cluster in Chicago and then we have read only replicas in all the other regions. Tail latency between a write in CHI and then a replication to the replicas is under 100ms which is fine for our needs. When a new value is written into the cache in Chicago, it winds up available locally very quickly. I am trying to research whether Tigris would be an appropriate replacement for this cache. Using the example above, if Tigris writes an object in one data center and its requested in another datacenter, will it immediately be cached in the other data center or is there a gating count of requests before it serves the object from the requesting datacenter.

Thank you for explaining your use case. Tigris nicely fits here where you can replace your distributed cache and simply rely on Tigris for global distribution.

Regarding your question, there is no gating count approach used and when a request is made from a remote region then it will be cached in that region to optimize the access for all the future request and then automatically invalidated when the object is modified.

Wow, impressive. I did some simple benchmarking and the datacenters Tigris already colocates with Fly, request times are in the 3-5ms range for a 1.2k file after the 1st request. That is a little higher than a call to our existing cache but not by much and is perfect for our use case. We had setup something similar a couple of years ago with AWS Lambdas and cached files sitting on EFS and saw times in the 5-7ms range.

The other data centers are not as good for our particular use case, but still not bad - in the 10-30ms range. Your docs say that you expect to be in all fly.io datacenters in the next several months so I would guess as you rollout to the other datacenters, I’d see the times come down to what I see in Chicago, VA and SGC.

5 Likes

That’s great to hear that you’re seeing a good performance. Here is the list of all the regions where Tigris is currently deployed. As we expand to more fly regions we will keep updating the new regions to the doc.

Looking forward to see you using Tigris for your use-case.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.