FLAME + DNS Clustering

Hello everyone,

I am currently trying to implement FLAME in my application.

When i deploy my application on fly and execute the process, the runner node cannot connect to my parent node for some reason.

I get following error:

2024-03-08T08:47:52Z runner[683ddd1bd44998] fra [info]Pulling container image registry.fly.io/nexus-staging-86d65fe3:deployment-01HREKS084A53NRA1SS94CD7N3
2024-03-08T08:47:57Z runner[683ddd1bd44998] fra [info]Successfully prepared image registry.fly.io/nexus-staging-86d65fe3:deployment-01HREKS084A53NRA1SS94CD7N3 (4.394211345s)
2024-03-08T08:47:59Z runner[683ddd1bd44998] fra [info]Configuring firecracker
2024-03-08T08:47:59Z app[683ddd1bd44998] fra [info][    0.146717] Spectre V2 : WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks!
2024-03-08T08:47:59Z app[683ddd1bd44998] fra [info][    0.156593] PCI: Fatal: No config space access function found
2024-03-08T08:47:59Z app[683ddd1bd44998] fra [info] INFO Starting init (commit: 913ad9c)...
2024-03-08T08:47:59Z app[683ddd1bd44998] fra [info] INFO Preparing to run: `/app/bin/server` as nobody
2024-03-08T08:47:59Z app[683ddd1bd44998] fra [info] INFO [fly api proxy] listening at /.fly/api
2024-03-08T08:47:59Z app[683ddd1bd44998] fra [info]2024/03/08 08:47:59 listening on [fdaa:2:be18:a7b:b5:c76f:b221:2]:22 (DNS: [fdaa::3]:53)
2024-03-08T08:47:59Z runner[683ddd1bd44998] fra [info]Machine created and started in 6.867s
2024-03-08T08:48:04Z app[683ddd1bd44998] fra [info]08:48:04.398 [info] starting with parent %FLAME.Parent{pid: #PID<72860.4510.0>, ref: #Reference<72860.717919452.1950613505.19621>, backend: FLAME.FlyBackend}
2024-03-08T08:48:04Z app[683ddd1bd44998] fra [info]08:48:04.403 [info] connect (1) :"nexus-staging-86d65fe3-01HREKS084A53NRA1SS94CD7N3@fdaa:2:be18:a7b:250:6dec:7af5:2": false
2024-03-08T08:48:04Z app[d89d971b62d008] fra [info]08:48:04.402 [error] ** Connection attempt from node :"nexus-staging-86d65fe3-01HREKS084A53NRA1SS94CD7N3@fdaa:2:be18:a7b:b5:c76f:b221:2" rejected. Invalid challenge reply. **

I have managed to make it work after i remove RELEASE_COOKIE from my fly.toml configuration.

Somehow the DNS clustering conflicts with FLAME and I am not sure why.

DNS Clustering has been disabled in my application.ex as well.

def start(_type, _args) do
    flame_parent = FLAME.Parent.get()
    children = [
     ....
     !flame_parent && {DNSCluster, query: Application.get_env(:nexus, :dns_cluster_query) || :ignore}
     ...
    
     ]

How were you setting your release cookie in the fly.toml? Sounds like the release cookie didn’t match, which would be why the connection was refused. Note that you don’t need DNCluster running on the FLAME children because they connect directly to their parent which will provide mesh clustering so you rightfully have it disabled. This issue is unrelated to DNSCluster. By default, the release cookie is baked into the release, so your build process (mix release) will bake the _build/rel/app/releases/COOKIE file into the image, which means it will match for FLAME parents and children without configuring anything. If you override the cookie, you’ll need to ensure the FLAME children receive it via :env options to the pool, but using the one baked into the image is ususally what you want.

I am setting the release cookie in fly.toml.

I have followed the instructions here:

For instance, if I connect to the deployed application and call Node.get_cookie(), it shows the correct value (set in fly.toml).

I did not quite get it how to ensure that FLAME children receive it via :env options.

this would be my pool configuration in application.ex

    {FLAME.Pool,
       name: Nexus.HTMLToPDFRunner,
       min: 0,
       max: 10,
       max_concurrency: 5,
       idle_shutdown_after: 10_000,
       log: :debug}

and this would be my runtime.exs

  config :flame, :backend, FLAME.FlyBackend

  config :flame, FLAME.FlyBackend,
    token: System.get_env("FLY_API_TOKEN"),
    boot_timeout: 60_000

You don’t need to set the cookie yourself unless you need to conveniently know it externally like connecting from your local iex. In such cases you can still fly ssh console and lookup the generated cookie out of the release. If you want to maintain your current cookie setup via env, you can pass it along to the FLAME’s using the :env backend option:


      if config_env() == :prod do
        config :flame, :backend, FLAME.FlyBackend
        config :flame, FLAME.FlyBackend, 
          token: System.fetch_env!("FLY_API_TOKEN"),
          env: %{"RELEASE_COOKIE" => System.fetch_env!("RELEASE_COOKIE")}
      end

It worked.

thank you so much.

I have overwritten the RELEASE_COOKIE as I need to connect to my application once in a while from livebook.

I have tried your solution and it worked, indeed.

Thanks a lot!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.