Struggling to get CouchDB running

Hi, I’m trying to get CouchDB running on Fly.io, but haven’t gotten very far. I’m using this for my fly.toml.

app = "kitty-couchdb"
primary_region = "iad"

[build]
  image = "couchdb:3"

[mounts]
source = "couchdb_data"
destination = "/opt/couchdb/data"

[[services]]
internal_port = 5984
protocol = "tcp"

I don’t have a public IP assigned to this app because I do not want it publicly accessible, but I have assigned a private IP for Flycast. I can’t figure out any way to connect to it from my local machine. I’ve setup a WireGuard tunnel to Fly.io. I can query DNS and find the IP addresses of the 3 machines I have running CouchDB. But going to any of those IP’s on port 5984 in Arc (Chromium fork) or Safari doesn’t work. Using kitty-couchdb.internal:5984 also doesn’t work, and it’s the same for Flycast. I tried using curl in the terminal and that didn’t work either.

Looking at the logs, everything seems normal. There are some repeated errors about no _users database, but from what I can tell this is completely normal. If I ssh into a machine and do curl http://localhost:5984/_up everything is ok.

I did look at Can't get a CouchDB Cluster working (connection_closed), but it didn’t help.

Then you will want to delete the [[servicess]] section in your fly.toml.

I suspect you will need a Dockerfile. I tried loading just the base image:

% fly console --image couchdb:3 -C /opt/couchdb/bin/couchdb
Searching for image 'couchdb:3' remotely...
image found: img_8rlxp26qe3yv3jqo
Image: registry-1.docker.io/library/couchdb:3
Image size: 90 MB

Created an ephemeral machine 1857703c445d78 to run the console.
Connecting to fdaa:0:cfd4:a7b:db:c2c0:798c:2... complete
[info] 2023-10-08T02:23:44.716475Z nonode@nohost <0.248.0> -------- Preflight check: Checking For Monsters

[info] 2023-10-08T02:23:44.718172Z nonode@nohost <0.248.0> -------- Preflight check: Asserting Admin Account

[info] 2023-10-08T02:23:44.718229Z nonode@nohost <0.248.0> -------- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  No Admin Account Found, aborting startup.                  
  Please configure an admin account in your local.ini file.  
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Waiting for ephemeral machine 1857703c445d78 to be destroyed ... done.
Error: ssh shell: Process exited with status 1

Experimenting with:

% fly console --image couchdb:3 -C /bin/bash

I installed vim, copied default.ini to local.ini in /opt/couchdb/etc and then added:

[admins]
admin = password

With this in place, running /opt/couchdb/bin/couchdb got a bit further but complained about a _users database not being present. If you create a Dockerfile you can run it using:

fly console --dockerfile Dockerfile -C /bin/bash

Once you get to the point where couch db launches, you can remove the [build] section in your fly.toml and switch back to fly deploy.

Once your instances deploy successfully, try:

fly console --image debian:bullseye-slim -C /bin/bash

From there, install and run curl to verify that you can access your database.

Once that works, try wireguard.

2 Likes

Forgot to mention I solved the first issue by setting it in fly secrets. The image takes COUCHDB_USER and COUCHDB_PASSWORD environment variables.

I was able to get everything working, almost. I’m able to connect to all 3 nodes I have running over WireGuard and look at them in Fauxton. When I try to set up replication between them though it always fails with stuff about DNS it seems.


Replicating from 5683d554a54dd8 to e784936a2e2768:

2023-10-10T06:08:43.758 app[5683d554a54dd8] iad [info] [error] 2023-10-10T06:08:43.757754Z nonode@nohost <0.2431.0> --------
couch_replicator_httpc: auth plugin initialization failed "http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/kitty/"
{session_request_failed,"http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/_session","admin",{conn_failed,{error,nxdomain}}}

2023-10-10T06:08:43.758 app[5683d554a54dd8] iad [info] [error] 2023-10-10T06:08:43.758155Z nonode@nohost <0.2431.0> --------
throw:{replication_auth_error,{session_request_failed,"http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/_session","admin",{conn_failed,{error,nxdomain}}}}:
Replication 23d38dc9a492f332fe6f5f3c383818e6+continuous failed to start "http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/kitty/" -> "http://e784936a2e2768.vm.kitty-couchdb.internal:5984/kitty/"
doc <<"shards/80000000-ffffffff/_replicator.1696915972">>:<<"3b1b63525691fbb102f7063bee00027c">>
stack:[{couch_replicator_httpc,setup,1,[{file,"src/couch_replicator_httpc.erl"},{line,62}]},
{couch_replicator_api_wrap,db_open,3,[{file,"src/couch_replicator_api_wrap.erl"},{line,68}]}]

(added some line breaks for readability)

Based on the output I’d think its something about auth, but I’ve reentered the password for the database being replicated from several times and I’m pretty sure it’s correct. Don’t have any other ideas really.

1 Like

Some new info. I found this GItHub issue and switched away from using the session based auth with the config change mentioned in it.

When using this, I now get a different error.

2023-10-10T15:37:03.329 app[5683d554a54dd8] iad [info] [error] 2023-10-10T15:37:03.329220Z nonode@nohost <0.2198.1> --------
Replicator, request GET to "http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/kitty/" failed due to error
{error,{conn_failed,{error,nxdomain}}}

2023-10-10T15:37:03.330 app[5683d554a54dd8] iad [info] [error] 2023-10-10T15:37:03.329678Z nonode@nohost <0.2198.1> --------
exit:{nxdomain,<<"could not resolve http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/kitty/">>}: Replication c69137545c9630b0fc5f26941816e28b+continuous+create_target failed to start
"http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/kitty/" -> "http://e784936a2e2768.vm.kitty-couchdb.internal:5984/kitty/" doc <<"shards/80000000-ffffffff/_replicator.1696915972">>:<<"f0283eeea4248b169847e01f30001169">>
stack:[{couch_replicator_api_wrap,db_open,3,[{file,"src/couch_replicator_api_wrap.erl"},{line,122}]},{couch_replicator_scheduler_job,init_state,1,[{file,"src/couch_replicator_scheduler_job.erl"},{line,634}]}]

(added some line breaks for readability)
If I fly console into that machine (5683d554a54dd8) I’m able to curl it just fine. Here’s using the /_up endpoint.

root@5683d554a54dd8:/# curl http://5683d554a54dd8.vm.kitty-couchdb.internal:5984/_up
{"status":"ok","seeds":{}}

Are you still getting these nxdomain errors?

If so, it might be interesting to use Wireshark to inspect the full request packets:

apt-get install --no-install-recommends tshark dnsutils
tshark -i eth0 -f "port 53"

Then, in a separate ssh session:

dig AAAA fly.io

You should see lines like the following:

59 498.862588529 fdaa:3:3169:a7b:21a1:56cf:c280:2 → fdaa::3 DNS 109 Standard query 0x3019 AAAA fly.io OPT
60 498.864110074 fdaa::3 → fdaa:3:3169:a7b:21a1:56cf:c280:2 DNS 125 Standard query response 0x3019 AAAA fly.io AAAA 2a09:8280:1::a:791 OPT

The first is the request, and the second the response—with addresses in <source> → <destination> format. [It always surprises me initially to see those two get swapped in the second line, but it does make sense; for responses, it’s the server—fdaa::3—that’s sending.]

The part at the end of the first line, after the AAAA, might be illuminating in the case of the nxdomaining query. Perhaps there are lingering glitches in how IPv6-related things are being parsed/reconstructed.

Another possibility is a bypass of the fdaa::3 resolver, generally similar to what was mentioned in the following:

“Some apps do their own DNS resolution and know nothing of the wireguard names”

In that case, it would be → 1.1.1.1 or the like in the request.

Anyway, hope this helps!

1 Like

hi! Sorry I forgot to reply to this. I was able to fix it with the help of some CouchDB contributors here.

Apparently CouchDB doesn’t really support IPv6 only on the latest release, but building from source with a config option fixed it and from there everything just worked.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.