Wauw! Just went through the entire guide, and much to my surprise, everything just worked! Thanks for a superb guide, with great explanations. It was fairly easy to follow and I really liked that you took the time to explain the different steps.
Would be great to see an example/tutoral of how to configure libcluster as well and deploy the app as a distributed elixir cluster.
Thanks @mscno! I’m glad to hear it worked for you! I love the idea of going in to clustering, but that felt like overwhelming the guide. I should write that up separately and link to it.
@Mark, thanks for the guide. Surprisingly easy to get clustering set up!
A couple of minor things:
Although likely obvious to many, might it be helpful to state when the second fly deploy is required, after the libcluster config has been added?
When deploying or re-deploying, though the nodes are connected (I checked by accessing the interactive shell as per the guide) I get warnings in the logs (datacentre name made bold here for ease of reading):
2021-06-22T13:04:55.765712833Z app[47501f25] ams [info] 13:04:55.762 [warn] [libcluster:fly6pn] unable to connect to :“{app name}@{ip address}”
2021-06-22T13:04:55.816219860Z app[47501f25] ams [info] 13:04:55.813 [warn] [libcluster:fly6pn] unable to connect to :“{app name}@{ip address}”
Is this expected? If so, perhaps it’s worth mentioning them so that they don’t cause undue concern.
It seems like the connection “failure” warnings and errors are all in regard to connection from one of the two nodes to the other, and not in the other direction, so I’m thinking these warnings may occur because a connection has already been made.
Edit: Erlang 24.0.2 is now available, in case you want to update the guide to use that.
Thanks @DavidO for the feedback! Your #1 is an easy one to do and a good idea.
As for #2, there are a couple things that can be at play here. If the clustering cookie isn’t explicitly given, then it is re-generated during the build/deploy process. This means the new nodes can’t connect to the old nodes because the cookies don’t match. It might also be that a shutting down node is rejecting connections because it is shutting down. When deploying a new release, it is common to see some log churn about connections being broken and rejected.
Hi @Mark I have some feedback on the static cookie and observer guides, hope this is the right place for them.
First of all, thanks a lot for writing them! I learned a lot getting my elixir app running on fly and even found some bugs in how I was handling configuration for my gigalixir setup
On to the feedback:
You suggest setting the cookie in the releases function in mix.exs. I tried to avoid this as I didn’t want the shared secret to be in my code repository. I know that we’re relying on the wiregaurd tunnels for security more than this static cookie, but still it felt like bad form to me.
In the docs I saw
At runtime, we will first attempt to fetch the cookie from the RELEASE_COOKIE environment variable and then we’ll read the releases/COOKIE file.
I set the RELEASE_COOKIE using flyctl secrets set and everything worked perfectly!
I think this would make the process a little easier, and can be done in the same step in the guide where we set SECRET_KEY_BASE.
Secondly, this is very minor, but the observer script should start with #!/usr/bin/env bash for portability (I’m a nixos user and have been trained to do this always )
Finally a question: I can connect to IEx by either sshing in or connecting using distributed elixir like we do for observer.
Hi @Mark, thanks for the guide. Unsure why but cloning the sample repo (changing nothing at all) and following the guide, it doesn’t want to deploy.
Have tested in both Sydney and Virginia (the default) regions.
Here’s the final lines of the deploy
Running: `/app/bin/hello_elixir eval HelloElixir.Release.migrate` as nobody
2021/07/08 00:38:42 listening on [fdaa:0:2f24:a7b:ad6:f8da:1834:2]:22 (DNS: [fdaa::3]:53)
: not foundpp/bin/hello_elixir: /app/releases/0.1.0/env.sh: line 2:
Main child exited normally with code: 127
Starting clean up.
Error Release command failed, deployment aborted
Any thoughts on how to get this deployed, or whether I need to make changes to the repo before it’ll work?
Hey @brainlid, generated name is crimson-forest-7023. Happy to supply more logs too if that’d help.
I also tried building the docker image and that’s possibly where the problem is? It built successfully but wouldn’t let me run the container (although could start the container in the ash shell per your comment here).
The main difference I see in your logs is this line:
/app/bin/hello_elixir: /app/releases/0.1.0/env.sh: line 2:
: not found
The env.sh file is generated when building a release from the env.sh.eex file.
Going by the error “not found”, I can’t tell if the file wasn’t found or there was an error performing a command from that file.
One thing you can try is to SSH into the app (if that works given it’s state) and see if the file exists /app/releases/0.1.0/env.sh and if, so, you can cat it to see that it was generated correctly.
I assume you changed the fly.toml file. It needs to have your app’s name.
@Mark the fly.toml has app = "crimson-forest-7023"; that’s the generated name it gave. Tested changing it to app = "hello_elixir" but that gives Error not possible to validate configuration: server returned Could not resolve App so I don’t expect that’s what you meant. Just did flyctl launch again and it’s now named broken-snow-4053.
It must be failing too early in the deploy to let me SSH into it; returns Error connect to SSH server: dial: lookup broken-snow-4053.internal. on fdaa:0:2f24::3: no such host
If I build the Docker container locally and try run it directly, it exits with a similar error, returning : not foundlixir: /app/releases/0.1.0/env.sh: line 2: (looks like it overwrites part of the message somehow).
Connecting directly with docker run -it --rm hello_elixir /bin/ash and using cat on the env.sh file as requested, looks normal to me:
/app $ cat releases/0.1.0/env.sh
#!/bin/sh
# Sets and enables heart (recommended only in daemon mode)
# case $RELEASE_COMMAND in
# daemon*)
# HEART_COMMAND="$RELEASE_ROOT/bin/$RELEASE_NAME $RELEASE_COMMAND"
# export HEART_COMMAND
# export ELIXIR_ERL_OPTIONS="-heart"
# ;;
# *)
# ;;
# esac
# Set the release to work across nodes. If using the long name format like
# the one below (my_app@127.0.0.1), you need to also uncomment the
# RELEASE_DISTRIBUTION variable below. Must be "sname", "name" or "none".
ip=$(grep fly-local-6pn /etc/hosts | cut -f 1)
export RELEASE_DISTRIBUTION=name
export RELEASE_NODE=$FLY_APP_NAME@$ip
export ELIXIR_ERL_OPTIONS="-proto_dist inet6_tcp"
Not sure if it’s relevant but I couldn’t run the env.sh file due to missing permissions (could well just be a Docker quirk that doesn’t affect this though)
@samw Locally running the Docker container will fail because ENV settings like $FLY_APP_NAME won’t exist.
When you run flyctl launch, it generates the app name for you but it also regenerates the fly.toml file. My question about that file is really like, “It needs to be different – it needs your app name, but it gets overwritten when you generate an app name… so you have to edit back in some of the settings.”
When working with a cloned repo, that’s the one file where it becomes a little sticky.
This should work with no changes? That’s the impression I get from the guide; I’ve double-checked and the generated fly.toml uses the existing settings so the only change that flyctl launch makes is to app at the top (confirmed in git diff).
Do I have to make any changes or should those 4 commands in a row just work?
flyctl launch - Don’t deploy yet. Keep the generated app name in fly.toml but revert the other changes.
Go through the process of Preparing to Deploy section. You need a database to connect to, and the ENV values set for your app. Both for the mix phx.gen.secret and the database connection.
flyctl deploy
There is a little bit of a dance that happens the first time you deploy an app that uses a database. Because the database app must exist first for the app to start. The step with setting the ENV for the mix phx.gen.secret as a secret will appear to fail when you first run it because the app can’t be deployed yet. But you just want the ENV set and that works.
I just went through the process and cloned, launched and deployed so otherwise it should all work.