Issues connecting to Consul with LiteFS

Hi,

We did a new deploy this morning on one of our applications that uses LiteFS with a consul lease setup. On startup, the app halts with:

2023-12-07T14:14:29.445 app[4d891155f62728] yyz [info] config file read from /etc/litefs.yml

2023-12-07T14:14:29.445 app[4d891155f62728] yyz [info] LiteFS v0.5.9, commit=d861e9176990746c64a804d831ded9793f06ab9d

2023-12-07T14:14:29.445 app[4d891155f62728] yyz [info] level=INFO msg="host environment detected" type=fly.io

2023-12-07T14:14:29.446 app[4d891155f62728] yyz [info] level=INFO msg="litefs cloud backup client configured: https://litefs.fly.io"

2023-12-07T14:14:29.446 app[4d891155f62728] yyz [info] level=INFO msg="Using Consul to determine primary"
ERROR: cannot init consul: cannot connect to consul: register node "appname-****/litefs": Put "https://consul-iad-5.fly-shared.net/v1/catalog/register": EOF

We’ve been running for months and this is the first time I’ve seen this particular issue. Is there an service issue at the moment with consul in iad or is anyone else experiencing this issue?

Thanks!
Dave

We just started seeing this issue this morning as well. Machines that have been running for months are unable to boot and connect to a consul URL. One of our database servers is in an endless reboot cycle.

The consul URL’s server appears to be disconnecting the request with no response (premature connection disconnection with no data, showing as “consul-iad-5.fly-shared.net unexpectedly closed the connection.” in Chrome). The logs show “EOF” for us, too.

In our case, we’re seeing it on https://consul-iad-5.fly-shared.net/… same as you.

1 Like

Well, glad it’s not just us then. Hopefully some fly.io folks notice the thread and can chime in. I’m reluctant to do any deploys at the moment since I don’t know if it’ll break our production systems.

Thanks,
Dave

Yeah, I wouldn’t. Right now I don’t believe any new servers can start (or restart) successfully.

The Consul server that is not responding is a Hashicorp Consul instance (presumably) which orchestrates / organizes networking configurations. Apps/servers register with it when they come online so that it can route traffic to them. If it’s offline… no registration, no routing.

We’re looking into this now, thanks for bringing it up.

  • Daniel
1 Like

I believe we found and resolved the issue with consul-iad-5. Let me know if you’re continuing to have problems with it.

I restarted our dev servers and they came back up without any issues. We’ll do a few more test deploys before doing a prod release, but so far so good. Thanks!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.