IPv4 application internal network

It is possible to use IPv4 in an application, mainly for use when horizontal scaling.

How I obtain the IPv4 values of the new instance and the other instances in the application?

I believe it’s ipv6 only. You can get the IPs of individual instances by using the internal dns resolver.

1 Like

@claudio_biale are you trying to get public IPv4 addresses for each instance, or something else? If you can outline what you want the app to do I can give you details.

Right now, instances don’t get public IPv4 addresses. The only addresses available are anycast addresses you see in flyctl ips list. There are apps that need to pin users to specific instances (like webrtc / coturn servers), we have plumbing for this but it’s not self service yet.

I want to build a scalable VerneMQ cluster, but from what I’ve read it only supports searching the other instances with IPv4.

Ah I see! It looks like clustering might actually work over IPv6, but possibly not their admin tool.

Erlang apps are a little persnickity about IPv6. They are mostly doable, though. I can imagine ways to make ipv4 work over private networking, but it’s all so complex I think you’d lose a lot of what makes us interesting.

1 Like

Thanks, I am going to try to analyze the cause of why the other nodes are not seen when scaling.

On the other hand, what can be the cause that flyctl ssh console generates a timeout error?

flyctl ssh console shouldn’t generate a timeout? You can try explicitly passing it a region: flyctl ssh console -r ord, for instance, to see if it’s automatic region selection that’s tripping it up.

Are you running it from a directory with a fly.toml in it, with a deployed application?

The first time you log in with flyctl ssh console it’ll take a couple seconds — maybe like 10 — to propagate both the new WireGuard session and the SSH key for your organization. But I haven’t see it hang. If that’s still happening, let me know, and I’ll investigate further.

flyctl ssh console -r ord Connecting to huerta-db.internal...⢿ Error connect to SSH server: dial: lookup huerta-db.internal. on fdaa:0:e71::3: read udp [fdaa:0:e71:a7b:ce2:0:a:100]:23375: i/o timeout

The fly.toml file is:

# fly.toml file generated for huerta-db on 2021-02-01T19:02:41-03:00

app = "huerta-db"

kill_signal = "SIGINT"
kill_timeout = 5

[[services]]
  internal_port = 8080
  protocol = "tcp"

  [services.concurrency]
    hard_limit = 500
    soft_limit = 20

  [[services.ports]]
    handlers = ["http"]
    port = "80"

  [[services.ports]]
    handlers = ["tls", "http"]
    port = "443"

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "10s"
    port = "8080"
    restart_limit = 5
    timeout = "2s"

I’m not have a persistent volume (I’m using at this moment for testing scale the database)

Weird. Checking this out now.

Later

I can reproduce this. Working on it.

Quick followup:

I’ve added some debugging and changed some timeout stuff, but without updating your flyctl there’s one quick thing you can check for me: can you try ssh console -s, to pop up the instance selector? Does that give you instances, or does it hang before you get there?

I can’t reproduce “DNS doesn’t work” right now; for me, in multiple regions, on fresh Linux machines, it seems to always successfully connect and resolve DNS names. It’d be good to know if you’re getting at least that far.

One thing that has happened is that we’ve had stale AAAA records for allocs (this was due to an orchestrator bug), so one reason it might hang is that it might have picked an instance that no longer exists, but is still in the DNS.

Just narrowing this down!

No give me the instances:

flyctl ssh console -s Looking up regions in DNS...⢿ Error look up huerta-db: look up regions for huerta-db: lookup regions.huerta-db.internal on 127.0.0.53:53: read udp [fdaa:0:e71:a7b:ce2:0:a:100]:61831: i/o timeout

I don’t know if it has something to do but:

flyctl ssh log returning multiples certificates.

Weird!

(The multiple certificates thing is fine and normal; you aren’t getting that far).

I’ll cut a release with more debugging shortly. Thanks for sticking with this! If you’re seeing it, other people are too.

2 Likes

I’m getting this error when I try fly ssh console too, and the same when I try with -s , and with -r ord, -r yyz. Did you have any luck tracking it down?

We think it might be related to stale entries in our private DNS.

Will you try setting up a wireguard connection using fly wireguard create, then get the connection going with a wireguard client?

Once that’s set, you can run fly ips private and then try ping -6 <ip> to see what happens.

I know this is a lot of work! But it will help us narrow down this bug.

Wanted to see if you guys had an update on this. I’m still getting an error running fly ssh console, but ping6 <my-private-ip> works fine when I have the wireguard client set up.

I’m not sure what the nature of these stale dns entries is, but I just launched a brand new app using the hello-rust example, and I’m still getting the timeout error when running fly ssh console -s. I’m running macOS 11.2.1 on an M1 Macbook Air. Steps to reproduce:

% fly version
fly v0.0.211 darwin/arm64 Commit: babb333 BuildDate: 2021-04-26T19:53:31Z
% git clone https://github.com/fly-apps/hello-rust
% cd hello-rust
% fly init --import fly.toml
% fly deploy

# substitute the name of the app here
% curl late-sky-9562.fly.dev
Hello World!%

# now we know the app is running, try to ssh
% fly ssh console -s
Looking up regions in DNS...⣾ Error look up late-sky-9562: look up regions for late-sky-9562: lookup regions.late-sky-9562.internal on 192.168.0.1:53: read udp [fdaa:0:2088:a7b:e23:0:a:0]:45540: i/o timeout
% fly ssh console
Connecting to late-sky-9562.internal...⣾ Error connect to SSH server: dial: lookup late-sky-9562.internal. on fdaa:0:2088::3: read udp [fdaa:0:2088:a7b:e23:0:a:0]:27544: i/o timeout

# let's hook up our wireguard tunnel and try to ping the app's internal IP
% fly ips private
ID       REGION IP
0b1a4383 ord    fdaa:0:2088:a7b:7d:b1a:4383:2
% ping6 fdaa:0:2088:a7b:7d:b1a:4383:2
PING6(56=40+8+8 bytes) fdaa:0:2088:a7b:bea:0:a:2 --> fdaa:0:2088:a7b:7d:b1a:4383:2
16 bytes from fdaa:0:2088:a7b:7d:b1a:4383:2, icmp_seq=0 hlim=62 time=68.995 ms
16 bytes from fdaa:0:2088:a7b:7d:b1a:4383:2, icmp_seq=1 hlim=62 time=57.795 ms
16 bytes from fdaa:0:2088:a7b:7d:b1a:4383:2, icmp_seq=2 hlim=62 time=53.595 ms
^C
--- fdaa:0:2088:a7b:7d:b1a:4383:2 ping6 statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 53.595/60.128/68.995/6.500 ms

I’m a bit out of my depth here, but I figured I’d try to do a DNS lookup using what I think is the correct DNS server, with the wireguard tunnel enabled:

% dig '@fdaa:0:2088::3' late-sky-9562.internal AAAA

; <<>> DiG 9.10.6 <<>> @fdaa:0:2088::3 late-sky-9562.internal AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49390
;; flags: qr rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;late-sky-9562.internal.		IN	AAAA

;; ANSWER SECTION:
late-sky-9562.internal.	300	IN	AAAA	fdaa:0:2088:a7b:7d:b1a:4383:2

;; Query time: 55 msec
;; SERVER: fdaa:0:2088::3#53(fdaa:0:2088::3)
;; WHEN: Wed Apr 28 00:54:28 EDT 2021
;; MSG SIZE  rcvd: 68

I was able to ping the returned IP, and curl the actual app running on it on port 8080, so it seems like the DNS entry is correct.

We haven’t been able to replicate this problem but we have some guesses.

Since you do have wireguard setup, you’re probably better off ssh’ing directly to your VMs. You can set that up with fly ssh issue, it’ll put a key on your system that works when you SSH to the private IP addresses.

Thanks, I was able to connect by ssh’ing directly. Let me know if I can give you a hand in diagnosing the problem with fly ssh console.