We have followed this article getting a new Rails 7.2 app deployed on fly with both sqlite and litefs.
We ran this: bin/rails generate dockerfile --litefs
INFO Preparing to run: `litefs mount bundle exec rake solid_queue:start` as root
INFO [fly api proxy] listening at /.fly/api
2024/08/12 22:46:21 INFO SSH listening listen_address=[fdaa:0:9547:a7b:b0bb:dbc:d2b7:2]:22 dns_server=[fdaa::3]:53
Machine created and started in 5.111s
ERROR: too many arguments, specify a '--' to specify an exec command
Something that is confusing is I am seeing a db:prepare statement in both the bin/docker-entrypoint file and within the config/litefs.yml
exec:
# Only run migrations on candidate nodes.
- cmd: "./bin/rails db:prepare"
if-candidate: true
# Then run the application server on all nodes.
- cmd: "./bin/rails server"
#!/bin/bash -e
# mount litefs
sudo -E litefs mount &
# If running the rails server then create or migrate existing database
if [ "${1}" == "./bin/rails" ] && [ "${2}" == "server" ] && [ "$FLY_REGION" == "$PRIMARY_REGION" ]; then
./bin/rails db:prepare
fi
exec "${@}"
Any help would be greatly appreciated - excited to get this new app deployed on fly!
It seems we are having issues getting a simple rails app deployed with multiple processes (app, solidq).
There is a lot of LiteFS docs on Fly, but they all seem a bit out of sync and out of date. For being a large part of the site, it feels very difficult to get setup. Is there an updated article or doc that I am missing somewhere?
Would we be able to get an updated resource or article on deploying a simple rails app with multiple processes using sqlite / litefs.
Should workers need volumes as well? Should workers be able to be litefs primaries? We have tried the following, but our workers are still being assigned to be the primary (which we are also not even sure is a problem)
(Not sure this is the correct syntax to accomplish only allowing machines in the app process group being selected as the primaries)
Another question, with sqlite and litefs - why does the {FLY_REGION == PRIMARY_REGION} condition ever matter?
Also, we had issues with consul, and in order to correct this without simply appending characters to the key like so: key: "litefs/${FLY_APP_NAME}-v2" - we had to ssh into a random machine, manually get the consul cli installed, and then manually destroy the key. This seems like something that should be possible via the fly cli? Or is there a way to have consul better handle zombie primary cluster IDs?
Hiā¦ These are many good observationsāand generally are classic initial pain points with LiteFS (which, broadly speaking, seems stuck in pre-v1.0). Without claiming to address them comprehensivelyā¦
I would like to see this, as well. You can vote for it and/or chime in with your own perspective in the following docs feedback thread:
So workers can execute code that would cause there to be writes, in this case are you saying we should not be using multiple fly process groups? If so - are we loosing out on the horizontal scaling of adding more workers, etc?
I guess this is all just very confusing to how it should be setup for a standard application with a simple solid queue worker using sqlite. The idea of using sqlite to simplify things is beginning to feel more complex, fragile, and more difficult to manage than postgresā¦
What doesnt make sense to me is that everything is moving more towards sqlite ie. SolidQueue and SolidCache in Rails - so why is this so hard to deploy on Fly?
For a simple app, we should be able to have multiple app processes with LiteFS - it is just very unclear how to do this on Fly at the moment. We are not trying to do anything complex. Simply use SQLite - is there a better alternative to LiteFS?
This has to be the most annoying part when trying to get this stuff setup:
2024-08-13T18:38:41.379 app[6e825496a57568] iad [info] level=INFO msg="cannot find primary, retrying: no primary"
2024-08-13T18:38:42.384 app[6e825496a57568] iad [info] level=INFO msg="cannot become primary, local node has no cluster ID and \"consul\" lease already initialized with cluster ID LFSCEB6A8B19657B8393"
There is no real easy way to solve this problem, why can things get stuck in this bad state?
exec:
# Only run migrations on candidate nodes.
- cmd: "./bin/rails db:prepare"
if-candidate: true
# Then run the application server on all nodes.
- cmd: "./bin/rails server"
# Only run workers on candidate nodes.
- cmd: "./bin/bundle exec rake solid_queue:start"
if-candidate: true
This doesnt workā¦
It also make no sense that a candidate is defined by it just being in a āprimary regionā which honestly just makes no senseā¦ What if there is multiple machines in that same region? There can be multiple ācandidatesā? If so, how would it only run on read/write machines?
All this LiteFS stuff makes zero sense and it simply doesnāt work.
LiteFS seems to be advertised to get read write replicas anywhere you want running on the local app machine, but it simply doesnāt do this.
I am at a loss to why it even exists to be completely honest.
This is all in part with tons of inconsistent documentation and unclear docs to how to use LiteFS with multiple fly process groups that both need to read and write. (puma server and solid_queue workers).
The tldr is that using LiteFS with worker instances is not a good setup. LiteFS works well when you can fly-replay HTTP requests that do writes, itās less useful for running a bunch of processes that need to write to sqlite.
If you want to run workers that also use LiteFS, youāll need to come up with some mechanism for sending writes to the primary instance. I probably wouldnāt bother doing this, though, itās not the sweet spot.
This is just a limitation of LiteFS and using sqlite directly. For what youāre doing, I expect youāll have much more success with https://turso.tech/
Thanks so much for your reply. I think the main confusion and frustration from this comes from it being difficult to completely understand how LiteFS works.
If I am being honest, I pretty much thought that LiteFS basically just kept a local copy of a file on all machines, if a write happen anywhere, i would make sure that was persisted on all other nodes, meaning every machine was read/write compatible, therefor allowing every machine to locally have a copy of the DB. Meaning that we could basically infinitely scale horizontally as well as geographically.
We have looked a bit into turso but they have a lack of Ruby / Rails documentation if any at all, plus it just feels like a database as a service similar to supabase and there is no local copy of the database?
We have moved and migrated this simple project back to Postgres to move things along.
Thereās no ROR sdk yet, but itās on their todo list. As for now, youāll need to interface via HTTP.
They have embedded replicas which syncs the remote db to your local FS for sub ms reads.