How to Enable Private Networking Between Microservices on Fly.io?

Hi everyone,

I’m working on a project where I expect around 200 requests per day across three microservices:

  • identity
  • product1 (called “connect”)
  • product2

Even though the average traffic is low, I want to build my system in a professional way using a microservices architecture. Some days may see traffic spikes, so I need the services to scale dynamically. Cost is also important to me—ideally, I want to stay within a monthly budget of $15–25. That means I’d prefer my services to auto-start when requests come in and sleep when idle to save on cost.

The services should be able to communicate with each other efficiently, as if they are on the same private network. For example, I want identity to call connect using a URL like:
http://connect.internal:8002

However, I get the following error:
Cannot connect to host connect.internal:8002 ssl:False [Connect call failed ('fdaa:14:f019:a1b:3fc:1297:f1:2', 8002, 0, 0)]

Note: I can access connect via connect.fly.dev with no problem. Only the internal address isn’t working as expected.

Here’s my Dockerfile for the connect service:

FROM python:3.11-slim AS backend

WORKDIR /app
COPY ./requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir --upgrade -r /app/requirements.txt
COPY . /app

EXPOSE 8002

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8002"]

And here’s the fly.toml for connect:

app = 'connect'
primary_region = 'fra'

[build]

[http_service]
  internal_port = 8002
  force_https = true
  auto_stop_machines = 'suspend'
  auto_start_machines = true
  min_machines_running = 0
  processes = ['app']

[[vm]]
  memory = '1gb'
  cpu_kind = 'shared'
  cpus = 1

What am I missing to get the internal .internal domain communication working properly between my services?

All of my microservices are defined under the same organization.

Would you recommend using Kubernetes as an alternative? What would the cost look like for products with such low traffic?
I’m also open to other alternatives if you have any suggestions. I’m new to the world of microservices.

Thanks in advance for any help!

I would not recommend Kubernetes for this. Your app feels very simple, and K8S is an ecosystem that is phenomenally complicated.

I’d build this as three apps, or maybe even one app with separately declared services. I probably wouldn’t add in auto-scaling to start with, only because it is good to get something working quickly, and I suspect with 2x redundancy on each app, if each machine were to be 256M, you could run your six machines within budget without putting anything to sleep.

1 Like

Yes, this is available by default. Try shelling into each of those apps, and pinging the other. That’s a better test, since in your case you’re relying on your listener being operational.

1 Like

Thank you for your quick response!

By “separate apps,” do you mean something like running services on a single machine using something like supervisord? That doesn’t sound bad at all. But isn’t Fly.io’s feature of putting unused machines to sleep/shutting them down better? That way, services can be separate from each other, and the ones that don’t receive traffic can sleep, which helps reduce costs.

What you’re saying makes sense too, but I feel like the price difference isn’t that big, so running separate services on separate machines might be the better option.

I didn’t mean that, though it is indeed an option, and I do it myself. What I was referring to was process groups, which as I understand it sort of divides a Fly app into sub-apps, so different processes get different compute resources just from the one config file.

But you can also have three separate apps with a config file each.

1 Like

As a quick side note, you may not be listening on IPv6 here…

2 Likes

When I run the following commands:

fly ssh console -a identity
> apt-get update 
> apt-get install iputils-ping
ping connect.internal

I get this output:

PING connect.internal(6e112234c27e11.vm.connect.internal (fdaa:14:f019:a1b:3fc:1297:f1:2)) 56 data bytes  
64 bytes from 6e112234c27e11.vm.connect.internal (fdaa:14:f019:a1b:3fc:1297:f1:2): icmp_seq=1 ttl=63 time=0.238 ms  
64 bytes from 6e112234c27e11.vm.connect.internal (fdaa:14:f019:a1b:3fc:1297:f1:2): icmp_seq=2 ttl=63 time=0.304 ms  
64 bytes from 6e112234c27e11.vm.connect.internal (fdaa:14:f019:a1b:3fc:1297:f1:2): icmp_seq=3 ttl=63 time=0.282 ms  

So apparently the connection exists.

But how am I supposed to access it from my Python code? I tried this, but it didn’t work:

@app.get("/test-connect1")
async def test_connect_1(url: str):
    try:
        async with aiohttp.ClientSession() as session:
            async with session.get("http://connect.internal:8002") as response:            
                return "Response OK: " + str(await response.text())
    except Exception as e:
        return "Error: " + str(e)

The response I get is:

"Error: Cannot connect to host connect.internal:8002 ssl:False [Connect call failed ('fdaa:14:f019:a1b:3fc:1297:f1:2', 8002, 0, 0)]"

(I think the process groups approach relies on each being able to share a parent container, so if each of your microservices have different Dockerfile definitions, splitting into three Fly apps may be the best approach.)

1 Like

Actually, as I mentioned, I’m very new to the world of microservices, so I don’t really know what’s right or wrong yet. Right now, this is how I’ve set up my system: I have 3 different projects on GitHub, each in a separate repo. I was planning to deploy all of them on Fly.io. They’re independent from each other, but since they’ll be under the same Organization, they’ll be able to communicate with each other through the private network.

I wonder if your fetch code is fine; have a look at the listener in connect. Make sure you’re attaching to the right port, and make sure you’re attaching to the public IP for the private network (or all IPs), rather than localhost.

1 Like

Ahhh, I’ve been trying to solve this for hours — thank you so much! The issue is resolved now, but another problem has come up. When my Product1 (Connect) microservice is in a suspended state, a request comes into my Identity microservice which tries to make a request to the Connect service. But since the Connect machine is suspended, it throws an error.

Shouldn’t the Connect service wake up and handle the request even if it’s in a suspended state?

How can I solve this?

The error when the service is suspended and a request comes in:
“Error: Cannot connect to host connect.internal:8002 ssl:False [No address associated with hostname]”

2 Likes

Ooh, good spot :eagle:

1 Like

The .internal addresses are rather low-level and do not trigger auto-start…

You would need Flycast for that:

https://fly.io/docs/networking/flycast/

1 Like

That sounds reasonable. Just don’t fall into the trap of over engineering everything to be its own microservice. Separating them by resource requirement is also a good idea, eg a web service doesn’t need that much vs a worker that processing images.

1 Like

Thanks to everyone who replied! I looked into Flycast and wanted to share the issues I faced and how I fixed them, in case it helps others later.

First, I changed uvicorn’s --host 0.0.0.0 to --host :: so it could accept IPv6 requests. That worked, but it introduced a new issue: now it wouldn’t accept IPv4 connections. So I couldn’t access the app from connect.fly.io — only from the local network.

When I switched it back to 0.0.0.0, the reverse happened: IPv4 worked (I could access it from the domain), but IPv6 stopped working.

To solve this, I switched from uvicorn to gunicorn. Here’s the final command I used:

CMD ["gunicorn", "-k", "uvicorn.workers.UvicornWorker", "main:app", "--bind", "[::]:8002"]

With this setup, I can now access the app both via domain (IPv4) and over IPv6.

Next issue: I was trying to send requests to http://connect.flycast:8002, but it kept failing. Turns out you don’t need to include the port.

When I used just http://connect.flycast, it worked — even if the machine was suspended or stopped. Flycast automatically starts it up and responds.

Now everything works just the way I want.

One last note: to make Flycast work across all my services, I ran this command on each one to kind of “link” them all to the same network:
fly ips allocate-v6 --private
And In the fly.toml file, you need to set force_https = false under [http_service]

Again, big thanks to everyone who helped out!

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.