I’ve got a small app that consumes data from a Tigris-backed LSVD disk. Recently, the majority of deploys/machine starts have failed with the following error:
[info] lsvd: Failed to initialize device: failed to recover from S3: operation error S3: GetObject, failed to get rate limit token, retry quota exceeded, 9 available, 10 requested
The specific numbers cited are never exactly the same, but the outcdome generally is. The result is a dead instance of the app (the app has a hard dependency on the volume). I’ve gotten lucky with one or two deploys in the last few weeks, but that’s about it. Now, for this particular app, I don’t really have any hard requirements or expectations – the LSVD volume can take as long as it wants to come online, and I am happy to modify the app to wait for the volume to be consistent.
Is there any way to tune the retry behavior to try to eventually get the volume initialized, or otherwise influence (or even monitor) this process? The /.fly/lsvd
binary doesn’t seem to take any arguments that I’ve been able to discern.