OOM issue even with swap ram added

My node.js app is running out of memory and gets killed repeatedly despite the fact that I added 1gb swap. The app had been working fine before and I just experienced it for the first time today.

The error message is something like this

[706:0x4e42690] 20180 ms: Mark-sweep 244.1 (258.7) → 242.9 (258.5) MB, 81.5 / 0.1 ms (average mu = 0.234, current mu = 0.151) allocation failure scavenge might not succeed

[706:0x4e42690] 20272 ms: Mark-sweep 243.9 (258.5) → 243.0 (258.5) MB, 79.1 / 0.0 ms (average mu = 0.191, current mu = 0.140) allocation failure scavenge might not succeed

[info] <— JS stacktrace —>

I’m also unable to debug as I have never been able to connect through ssh.

fly ssh console always gives this

Error: tunnel unavailable: failed probing “personal”: context deadline exceeded

fly doctor gives this
Testing authentication token… PASSED
Testing flyctl agent… PASSED
Testing local Docker instance… Nope
Pinging WireGuard gateway (give us a sec)… FAILED
(Error: ping gateway: no response from gateway received

I’ve tried running
flyctl wireguard reset
flyctl wireguard websockets enable
and
flyctl agent restart
but I still get the same error message if I try to connect with “fly ssh console”

I’m currently stuck, any help (especially with the OOM issue) is highly appreciated :pray::pray:

1 Like

All I had to do was set the environment variable NODE_OPTIONS=“–max-old-space-size=1024”.The default value in my case was the total ram excluding the swap and it’s really low.

I’ll still appreciate any help with the ssh connection though

1 Like

If there are no active machines, you can’t run fly ssh console.

Also we were having an issue with some gateways yesterdays, that should be resolved now.
Can you try again?

We see NodeJS OOMs even with max-old-space-size set to 75% of total RAM (+swap) available.

It is either a memory leak somewhere (unable to catch it so far) which I think is due to something very specific happening in prod (like too many incoming requests in a short time), or (if it is Fly’s issue) OOMs are triggered despite enough free swap?

I’ve never been able to run that command successfully. The issue didn’t just start yesterday.

What do you mean by machines? Does it have something to do with v2 apps. I migrated my app to v2 yesterday but still can’t connect.

Also running flyctl machines list shows one machine

Are you using some kind of a proxy to access the Internet?

No. I’m not using a proxy.

Could you share agent’s logs please?

# stop the agent
$ fly agent stop
# remove the old logs
$ rm ~/.fly/agent-logs/*.log
# try to run SSH
$ fly ssh console

After that, there should be a log file in ~/.fly/agent-logs

This is the content of the log file.

2023/05/19 12:38:35.502415 srv OK 31980
2023/05/19 12:38:35.506095 #1 connected …
2023/05/19 12:38:35.507672 #1 ← ( 4) “ping”
2023/05/19 12:38:35.508810 #1 → ( 55) “5\x00ok {"PID":31980,"Version":"0.1.8","Background":true}\n”
2023/05/19 12:38:35.508888 #1 dropped.
2023/05/19 12:38:35.512508 #2 connected …
2023/05/19 12:38:35.513038 #2 ← ( 18) “establish personal”
2023/05/19 12:38:37.357138 srv returning port: 35531
2023/05/19 12:38:37.357445 srv (re-)connecting to wss://ams1.gateway.6pn.dev:443/
2023/05/19 12:38:38.012109 #2 → ( 753) “\xef\x02ok {"WireGuardState":{"org":"personal","name":"interactive-agent-localhost-researchgainscopy-gmail-com-01H0R0DMZCFNQNRZX2HHTR5SW9","region":"ams","localPrivateKey":"[redacted]","localpublic":"4IFTTndoMC+jC7/FvuY6hCA2X6WKbLM5ylA7/BIchdU=","dns":"","peer":{"peerip":"fdaa:1:7310:a7b:16a9:0:a:702","endpointip":"ams1.gateway.6pn.dev","pubkey":"yUyg63j5+17YeJ7gRhxoQuF6rvdX0JF59M6skytJFTQ="}},"TunnelConfig":{"LocalPrivateKey":"[redacted]","LocalNetwork":"fdaa:1:7310:a7b:16a9:0:a:700/120","RemotePublicKey":"yUyg63j5+17YeJ7gRhxoQuF6rvdX0JF59M6skytJFTQ=","RemoteNetwork":"fdaa:1:7310::/48","Endpoint":"ams1.gateway.6pn.dev:51820","DNS":"fdaa:1:7310::3","KeepAlive":0,"MTU":0,"LogLevel":0}}\n”
2023/05/19 12:38:38.012274 #2 dropped.
2023/05/19 12:38:38.017944 #3 connected …
2023/05/19 12:38:38.020970 #3 ← ( 14) “probe personal”
2023/05/19 12:38:38.021024 srv probing “personal” …
2023/05/19 12:38:43.023561 #3 → ( 58) “8\x00err failed probing "personal": context deadline exceeded”
2023/05/19 12:38:43.023864 #3 dropped.
2023/05/19 12:40:36.365471 srv validated wireguard peers
2023/05/19 12:42:37.155469 srv validated wireguard peers
2023/05/19 12:44:37.944973 srv validated wireguard peers
2023/05/19 12:46:38.807437 srv validated wireguard peers

Thanks! Sadly, not much info in the logs. Could you try again with the debug log level, please?

$ fly wg reset
$ fly agent stop
$ LOG_LEVEL=debug fly agent run

This will start the agent in foreground with freshly created WG peer and debug logs. In another console:

$ fly ssh console

Hopefully, this will have some useful info. If not, we might need to add some additional logging.

Thanks for the help and sorry for the late reply. This is the console output.

automatically selected personal organization: Research Gains
New WireGuard peer for organization ‘personal’: ‘interactive-agent-localhost-researchgainscopy-gmail-com-01H1ETNHZ7ZMMWNGYARMP5JTZA’
DEBUG Loaded flyctl config from/root/.fly/config.yml
DEBUG determined hostname: “localhost”
DEBUG determined working directory: “/home/researchgains”
DEBUG determined user home directory: “/root”
DEBUG determined config directory: “/root/.fly”
DEBUG ensured config directory exists.
DEBUG ensured config directory perms.
DEBUG cache loaded.
DEBUG config initialized.
DEBUG initialized task manager.
DEBUG started querying for new release
DEBUG client initialized.
DEBUG Config has metrics token
2023/05/27 14:44:16.686732 srv OK 19783
2023/05/27 14:45:48.315753 #1 connected …
2023/05/27 14:45:48.319578 srv config change at: 2023-05-27 14:45:48.305983599 +0000 UTC
2023/05/27 14:45:48.319760 #1 ← ( 4) “ping”
2023/05/27 14:45:48.320181 #1 → ( 56) “6\x00ok {"PID":19783,"Version":"0.1.8","Background":false}\n”
2023/05/27 14:45:48.320288 #1 dropped.
2023/05/27 14:45:48.327299 #2 connected …
2023/05/27 14:45:48.328296 #2 ← ( 18) “establish personal”
DEBUG → POST GraphQL Playground

{
“query”: “query($admin: Boolean!) { organizations(admin: $admin) { nodes { id slug name type paidPlan } } }”,
“variables”: {
“admin”: false
}
}

DEBUG {}
DEBUG ← 200 GraphQL Playground (780.1ms)

{
“data”: {
“organizations”: {
“nodes”: [
{
“id”: “RVLZQkpYjPL7yTyD6ne82l5gq8TvbxlMe”,
“slug”: “personal”,
“name”: “Research Gains”,
“type”: “PERSONAL”,
“paidPlan”: false
}
]
}
}
}
wg connect fdaa:1:7310::3 ams1.gateway.6pn.dev:51820 fdaa:1:7310:a7b:16a9:0:a:900 fdaa:1:7310::
2023/05/27 14:45:49.292240 srv returning port: 54165
2023/05/27 14:45:49.292309 srv (re-)connecting to wss://ams1.gateway.6pn.dev:443/
DEBUG: (fly-ssh) 2023/05/27 14:45:49 UAPI: Updating private key
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: encryption worker 4 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: encryption worker 1 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: decryption worker 1 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: handshake worker 1 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: encryption worker 2 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: decryption worker 2 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: handshake worker 2 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: encryption worker 3 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: decryption worker 3 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: handshake worker 3 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: decryption worker 5 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: decryption worker 4 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: handshake worker 5 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: handshake worker 6 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: handshake worker 4 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: encryption worker 5 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: decryption worker 6 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: encryption worker 6 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: decryption worker 7 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: encryption worker 7 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: encryption worker 8 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: handshake worker 7 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: handshake worker 8 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: decryption worker 8 - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: event worker - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Interface up requested
DEBUG: (fly-ssh) 2023/05/27 14:45:49 peer(yUyg…JFTQ) - UAPI: Created
DEBUG: (fly-ssh) 2023/05/27 14:45:49 peer(yUyg…JFTQ) - UAPI: Updating endpoint
DEBUG: (fly-ssh) 2023/05/27 14:45:49 peer(yUyg…JFTQ) - UAPI: Adding allowedip
DEBUG: (fly-ssh) 2023/05/27 14:45:49 peer(yUyg…JFTQ) - UAPI: Updating persistent keepalive interval
DEBUG: (fly-ssh) 2023/05/27 14:45:49 peer(yUyg…JFTQ) - Starting
DEBUG: (fly-ssh) 2023/05/27 14:45:49 peer(yUyg…JFTQ) - Routine: sequential receiver - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 peer(yUyg…JFTQ) - Routine: sequential sender - started
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Routine: TUN reader - started
ERROR: (fly-ssh) 2023/05/27 14:45:49 Unable to update bind: permission denied
DEBUG: (fly-ssh) 2023/05/27 14:45:49 peer(yUyg…JFTQ) - Stopping
DEBUG: (fly-ssh) 2023/05/27 14:45:49 peer(yUyg…JFTQ) - Routine: sequential receiver - stopped
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Interface state was Down, requested Up, now Down
DEBUG: (fly-ssh) 2023/05/27 14:45:49 peer(yUyg…JFTQ) - Routine: sequential sender - stopped
ERROR: (fly-ssh) 2023/05/27 14:45:49 Unable to update bind: permission denied
DEBUG: (fly-ssh) 2023/05/27 14:45:49 Interface state was Down, requested Up, now Down
2023/05/27 14:45:49.985830 #2 → ( 753) “\xef\x02ok {"WireGuardState":{"org":"personal","name":"interactive-agent-localhost-researchgainscopy-gmail-com-01H1ETNHZ7ZMMWNGYARMP5JTZA","region":"ams","localPrivateKey":"[redacted]","localpublic":"wJASCuJfNb9SZz+5EZXTDKn7XY/OHDwxOmvG6AngmgY=","dns":"","peer":{"peerip":"fdaa:1:7310:a7b:16a9:0:a:902","endpointip":"ams1.gateway.6pn.dev","pubkey":"yUyg63j5+17YeJ7gRhxoQuF6rvdX0JF59M6skytJFTQ="}},"TunnelConfig":{"LocalPrivateKey":"[redacted]","LocalNetwork":"fdaa:1:7310:a7b:16a9:0:a:900/120","RemotePublicKey":"yUyg63j5+17YeJ7gRhxoQuF6rvdX0JF59M6skytJFTQ=","RemoteNetwork":"fdaa:1:7310::/48","Endpoint":"ams1.gateway.6pn.dev:51820","DNS":"fdaa:1:7310::3","KeepAlive":0,"MTU":0,"LogLevel":2}}\n”
2023/05/27 14:45:49.986167 #2 dropped.
2023/05/27 14:45:49.991151 #3 connected …
2023/05/27 14:45:49.993385 #3 ← ( 14) “probe personal”
2023/05/27 14:45:49.993444 srv probing “personal” …
2023/05/27 14:45:54.995852 #3 → ( 58) “8\x00err failed probing "personal": context deadline exceeded”
2023/05/27 14:45:54.995930 #3 dropped.
DEBUG → POST GraphQL Playground

{
“query”: “mutation($input: ValidateWireGuardPeersInput!) { validateWireGuardPeers(input: $input) { invalidPeerIps } }”,
“variables”: {
“input”: {
“peerIps”: [
“fdaa:1:7310:a7b:16a9:0:a:902”
]
}
}
}

DEBUG {}
DEBUG ← 200 GraphQL Playground (271.23ms)

{
“data”: {
“validateWireGuardPeers”: {
“invalidPeerIps”:
}
}
}
2023/05/27 14:46:16.968687 srv validated wireguard peers

Seems like a permission issue.

@researchgains Thanks for the log.

This error is pretty interesting. It means that wireguard code failed to create a listening socket/bind it to address. You are on Linux, right? Are you running some kind of security software (seccomp, selinux or anything like this) that may intercept system calls?

@pavel Thank you, I’m running this in an ubuntu proot jail (with termux) on an actual android device :sweat_smile:

I live in a third world country with very epileptic power supply so I do more than 90% of my software development on my phone as power banks are able to sustain for long periods (sometimes weeks) of power outage.

Do you think running the command as the superuser might work? My phone is actually rooted but I don’t know how to run a proot linux distro as the actual root user. It was intentionally designed not to support that.

Also, what port does wireguard attempt to bind to? Is it possible to change this with some config perhaps?

I’m able to create servers and bind to ports on localhost and even the public address 0.0.0.0, I think that port wireguard is trying to bind to is a restricted one.

Thanks once again for the help :pray:

That’s a pretty unusual setup :slight_smile:

The port is random (whatever the kernel allocates). I don’t think the problem is with the port.

I don’t know for sure. It might be a problem with proot (as it intercepts some syscalls), it might be some Android security subsystem that prevents Wireguard from creating the socket. I suspect that the problem is that Android doesn’t allow binding to a NETLINK_ROUTE socket that Wireguard Go needs (sticky_linux.go « device - wireguard-go - Go implementation of WireGuard)

You can probably confirm this by running flyctl agent run under strace like this:

$ strace -t -f -o /tmp/agent.log fly agent run

Should be possible to find which syscall is failing in /tmp/agent.log

1 Like

Yeah, I think you’re right. I did try to run strace, however the log file had too much data and I couldn’t really make sense of it.
I guess i’ll just give up then. Thank you for your time.

You might be interested in https://fly.io/terminal (announcement); not sure how well it works on mobile though.

[Edit: see simpler approach below] Another idea is to deploy an app running a full SSH server, and connect to it from your Ubuntu proot using the native SSH client; that way you get to use Termux, and you can use volumes for persistent storage. Unfortunately, non-HTTP apps require a dedicated IPv4 address or IPv6, and dedicated IPv4 addresses will cost $2/month. If this is an issue, you can tunnel TCP over HTTP; this will involve running a small script on both the client and the app. Let me know if you want details!

Edit: @lubien suggested a simpler approach: Install a native WireGuard app instead of using flyctl’s userspace WireGuard implementation (which doesn’t seem to work with Ubuntu on Android). Once you create a WireGuard tunnel, you’ll be able to connect using fly ssh issue --agent and ssh root@<app>.internal.

3 Likes

Im late to the party, let me know if Im talking gibberish but let me help you not pay us: You can setup wireguard tunnel from your machine and use addresses such as appname.internal

2 Likes

Wow, thanks, i think i’ll find that handy.

@lubien @tom93 I really appreciate the assistance :pray:

I’ll try the simpler “terminal in browser” solution when next I need to ssh.

2 Likes