Wireguard not working ? (+ Unable to create postgres)

Hello :wave:

While trying to attach postgres to my app, fly cannot lookup private network.

❯ fly postgres attach --app cakeinvite-api-dev cakeinvite-db-dev
Error Get "http://.../commands/databases/cakeinvite_api_dev": dial: lookup cakeinvite-db-dev.internal. on (...) read udp [...]:28517: i/o timeout

Here is status:

❯ fly wireguard status personal
? Select peer: <PICKED AGENT PEER>
Error upstream service is unavailable
❯ fly doctor
Testing authentication token... PASSED
Testing flyctl agent... PASSED
Testing local Docker instance... Nope
Pinging WireGuard gateway (give us a sec)... FAILED
(Error: ping gateway: no response from gateway received)

We can't establish connectivity with WireGuard for your personal organization.

WireGuard runs on 51820/udp, which your local network may block.

If this is the first time you've ever used 'flyctl' on this machine, you
can try running 'flyctl doctor' again.

If this was working before, you can ask 'flyctl' to create a new peer for
you by running 'flyctl wireguard reset'.

If your network might be blocking UDP, you can run 'flyctl wireguard websockets enable',
followed by 'flyctl agent restart', and we'll run WireGuard over HTTPS.

What should I do in this situation?

There seems to be a problem with creating postgres too.
When I try to delete the existing one and create it again, an error occurs in the provisioning step.

❯ fly postgres create
? Choose an app name (leave blank to generate one): cakeinvite-db-dev
automatically selected personal organization: cakeinvite
? Select regions: Ashburn, Virginia (US) (iad)
For pricing information visit: https://fly.io/docs/about/pricing/#postgresql-cluste
? Select configuration: Development - Single node, 1x shared CPU, 256MB RAM, 1GB disk
Creating postgres cluster cakeinvite-db-dev in organization personal
Creating app...
Setting secrets...
Provisioning 1 of 1 machines with image flyio/postgres:14.4
Error failed to launch VM: Post "http://.../v1/apps/cakeinvite-db-dev/machines": connect tcp (...): operation timed out

I think there is some issues with my region(South Korea). I tried with another account, but same operation timeout error occurs while provisioning.

1 Like

I’m having the same issue as you, I made the following attempts, but none of them resolved:

  • Switch my local network (or with local proxy);
  • Run flyctl wireguard reset to reset;
  • Run flyctl wireguard websockets enable && flyctl agent restart;
  • Switch account or organization;
  • Run flyctl wireguard status to show Error upstream service is unavailable.

Unrelated to the topic but by the way: when I run flyctl wireguard list I find that more than one has been created, does this need to be cleaned up?

1 Like

If the wireguard cannot be linked, the user should not be able to control any application via flyctl?

For example, deploying new releases to resolve online errors in production, emergency database backups and restores, etc. For a user using the fly platform in a production environment, this is of more concern.

Is this an issue with fly.io? I think it worked a few days ago… but it not wokring since yesterday.

Or, I looked for other similar problems, but it seems that the network may blocked.

Hey, we’ve been noticing a few users reporting wireguard related failures and we’re trying to investigate the issue. What gateway/region is your wireguard peer attempting to connect to?

@shortdiv For me, Hong Kong (hkg) and Taiwan are not available, but when I try to build a springboard (simply forward the request to the new IP address) using a server from another platform to access using the Singapore (sin), the wireguard works fine.

But I tried a few other regions in Asia when I built the springboard and it seems to be unavailable as well, but I’m not quite sure of the exact IP address of the server, all I can confirm right now is that sin is available in Asia. (Maybe Fly should add a global node view to debug and monitor this thing)

cc @seokjume @seokjume If you need immediate access to Fly service, I guess a proxy or VPN should be a solution.

2 Likes

Cool, thanks for the clarification @witt-dev. It looks like the app @seokjume was having issues with was deleted, so we can’t find the instance to investigate, we’ll look into the other apps that y’all were working on to see if we find anything

2 Likes

You don’t have to clean up old WireGuard peers if you don’t want to; your old peers probably aren’t confusing our system. It’s easy to remove them if you want to keep the list clean though.

2 Likes

I’ve tried several colleagues’ accounts to access Fly wireguard in hkg regions and all are unavailable, I suspect this may have something to do with the availability of the fly service in different locations, but users are not aware of which regions may be having problems with wireguard.
The issue should be easily reproducible or monitored, or at least allow users to see the true state of the service, and as a cloud provider, availability should be critical.

1 Like

For me, was Hong Kong (hkg) too!

When I check the status hkg, lhr, sjc, and ewr all come back either “Error upstream service is unavailable” or “Error You hit a Fly API error with request ID: 01GFNHAYQVAABQSPPT99PRYA7J-hkg”. Only I made a new lax peer and that works and reports alive.

Correction. sin works!

@shortdiv
Oh, I deleted the app… so, I just created new instance. (operation timed out error still happening :sob: )

This is a personal idea, if it difficult to quickly investiage the gateway issues, it would be nice to be able to select a gateway/region to connect to in flyctl. (e.g. fly postgres create --peer some-peer-in-iad)

But, as @witt-dev said, I agree that monitoring and investiage issues quickly is the best way.

Hi @witt-dev, i reported the same issue and it’ve been happening for days. I had to delay the schedule for a week. @shortdiv, it would be great if fly.io could help to keep the users updated regarding the investigation and the status. Thanks!

We hear you and will work on better monitoring tools for apps. We found the bug and are troubleshooting, it seems that there’s an issue preventing DNS lookups over some wireguard peers. This seems to be happening on and off again, and we’re actively working on it, thanks for your patience!

We updated the status page to account for the wireguard connectivity issues → https://status.flyio.net/

2 Likes