I’ve been using KeyDB for a low-traffic site for about a year, IIRC. Since a week or two ago (can’t be exactly sure), the keydb servers have been failing. There’s nothing in the log output to show why, and they’re not hitting any memory limits or anything, according to the metrics I can see in the dashboard.
I’ve just executed fly vm status --app {my_app} {instance_id} and see this:
Instance
ID = e28bXXXX
Process =
Version = 1
Region = ams
Desired = run
Status = running
Health Checks = 2 total, 2 critical
Restarts = 1
Created = 12h34m ago
Recent Events
TIMESTAMP TYPE MESSAGE
2021-10-11T18:30:00Z Received Task received by client
2021-10-11T18:30:00Z Task Setup Building Task Directory
2021-10-11T18:30:08Z Started Task started by client
2021-10-12T05:20:40Z Restart Signaled User requested restart
2021-10-12T05:20:44Z Terminated Exit Code: 0
2021-10-12T05:20:44Z Restarting Task restarting in 0s
2021-10-12T05:20:49Z Started Task started by client
Checks
ID SERVICE STATE OUTPUT
27913efa8328a926c127fdb9XXXXXXXX tcp-6379 critical rpc error: code = Unknown desc = Post "http://unix/v1/exec": EOF
8d0f2f31129f763f17b124b7XXXXXXXX tcp-6379 critical rpc error: code = Unknown desc = Post "http://unix/v1/exec": EOF
I’ve tried recreating the cluster using the launcher, but the problem persists.
I’ve also tried creating a redis instance following the github example but that also seems to crash on connection by a client.
I don’t think I’m doing anything out of the ordinary. If the cache (either redis or keydb) doesn’t crash on connection, then it crashes on the first command such as MEMORY STATS. I’ve tried different versions of redis clients to connect. Doesn’t help.
I need keydb (or redis) as an LRU cache to be persistent between app restarts/updates, so moving the cache to my app would be counter productive.
I’ve also tried following the newer redis cluster but no joy. It’s like the routing for .internal for my app isn’t working. I’m using lhr and ams regions, if that helps.
Ah, that would explain it! The launcher uses our old ‘script’ health checks. These will run scripts on the VM to check its health. We’ve deprecated this behavior. So the best way to get things working would be to use the updated Github example: GitHub - fly-apps/keydb: KeyDB server on Fly
You could grab this and change the values in fly.toml to match your app. The important part would be to make sure the volume names are the same. You can check those with fly volumes -a keydb-app-name.
We’d like to allow flyctl to run launchers like this for services that aren’t fully-formed like fly postgres. This would likely pull config from our up-to-date example repo instead of hiding the behavior behind a UI.
That said, we want to keep the UI and have it work in a similar way as flyctl. We’re working on an Elixir/Phoenix launcher now which will serve as a template for this. Stay tuned!
Sorry to say, this also hasn’t worked. Same as the redis issue: the .internal URL doesn’t seem to route successfully. The Redis client can’t connect, and the keydb server doesn’t show a client connection.
What Redis client are you using? Can you double check and make sure it (a) can connect to IPv6 and (b) will do an AAAA lookup on a hostname? These things didn’t work by default with libcluster or Ecto, it wouldn’t surprise me if a Redis client on Elixir has the same issue.
It would be worth posting an issue on the Redis repo to ask about IPv6 support. I would bet money (a small amount) that it requires a configuration option to work properly over IPv6.
Will do, though I did check docs/issues/ddg and couldn’t find a single mention of IPv6. I’ll also do some more testing locally against the public IPv6 of my working cache to confirm it is an issue with Redix.