Global distribution of files for low latency reads

agad · August 7, 2024, 1:49pm

Are there any plans to be able to distribute files to specific datacenters - ie: I want everything in Bucket X to automatically be distributed to Datacenter X, Y and Z? We have a need for quick reads everywhere but writes can be slow (ie: write to only one datacenter).

If not, is the FDB metadata distributed everywhere? Can I do a query for a specific file’s last update time? I could then compare it to local data and retrieve file locally or pull down a new version if the last file update is newer than what I have. Or is the metadata also going to be centralized and not distributed to every datacenter?

himank · August 7, 2024, 5:33pm

Tigris smartly distributes data to ensure it is close to the users. This is done by default. Both metadata and data blocks are globally distributed, making them accessible from anywhere in the world. For your use case, you can read an object from any region and apply the logic of comparing the timestamp. For example, you can write to region ‘X’ and read from region ‘Y’.

Regarding distribute files to specific datacenters - If you want to restrict an object to specific regions or a single region, we offer a feature that allows you to control this. Take a look at this doc

mayailurus · August 7, 2024, 8:56pm

Added distributed, tigris

agad · August 7, 2024, 11:02pm

How exactly does Caching on PUT (Eager Caching) work? Let’s say I update a copy of a file in Virginia where it was originally PUT. There is a cached copy in Europe. When I utilize Eager Caching, will the copy in Europe be a) invalidated on the put b) replaced by the PUT c) replaced when another read occurs in Europe since its got a different timestamp or d) nothing, the cached copy remains until it expires.

himank · August 8, 2024, 4:34am

Eager Caching or Cache-on-read(the default caching semantics) both has the behavior on invalidating the cache on put i.e. the cached entry in any region will be invalidated on the put.

agad · August 8, 2024, 2:42pm

Cool that works. One other question - in the docs it states that an object will be cached in the datacenter if there is enough usage. If the object is written in Virginia and a user makes a request for the object from Seattle, will the doc be retrieved in Virginia and then immediately cached in Seattle after the 1st request? Or is there a gating algorithm that determines how many requests need to be made before its cached in Seattle?

I’ll explain what the use case is. We currently have a database cache that is spread among 6 locations. There is a single read-write cluster in Chicago and then we have read only replicas in all the other regions. Tail latency between a write in CHI and then a replication to the replicas is under 100ms which is fine for our needs. When a new value is written into the cache in Chicago, it winds up available locally very quickly. I am trying to research whether Tigris would be an appropriate replacement for this cache. Using the example above, if Tigris writes an object in one data center and its requested in another datacenter, will it immediately be cached in the other data center or is there a gating count of requests before it serves the object from the requesting datacenter.

himank · August 8, 2024, 6:53pm

Thank you for explaining your use case. Tigris nicely fits here where you can replace your distributed cache and simply rely on Tigris for global distribution.

Regarding your question, there is no gating count approach used and when a request is made from a remote region then it will be cached in that region to optimize the access for all the future request and then automatically invalidated when the object is modified.

agad · August 11, 2024, 3:57am

Wow, impressive. I did some simple benchmarking and the datacenters Tigris already colocates with Fly, request times are in the 3-5ms range for a 1.2k file after the 1st request. That is a little higher than a call to our existing cache but not by much and is perfect for our use case. We had setup something similar a couple of years ago with AWS Lambdas and cached files sitting on EFS and saw times in the 5-7ms range.

The other data centers are not as good for our particular use case, but still not bad - in the 10-30ms range. Your docs say that you expect to be in all fly.io datacenters in the next several months so I would guess as you rollout to the other datacenters, I’d see the times come down to what I see in Chicago, VA and SGC.

himank · August 13, 2024, 4:22pm

That’s great to hear that you’re seeing a good performance. Here is the list of all the regions where Tigris is currently deployed. As we expand to more fly regions we will keep updating the new regions to the doc.

Looking forward to see you using Tigris for your use-case.

system · August 20, 2024, 4:22pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Downloading a file from Tigris after it has been uploaded is very slow Questions / Help tigris	8	145	February 3, 2025
Our Experience Dealing with Issues of Tigris Reliability, 500 Responses, and Docs: Questions & Requests wishlist , help-me-help-you , docs , storage , tigris	6	497	February 20, 2025
Are Tigris list operations cached? Questions / Help storage , tigris	7	109	November 11, 2024
Tigris Consistency Questions Questions / Help tigris	1	71	March 20, 2025
X-Tigris-Regions does not work Questions / Help tigris	5	69	January 26, 2025

Global distribution of files for low latency reads

Related topics