Fly Postgres suddenly bloated disk usage 1GB /usr/lib /usr/share

Suddenly recently disk usage skyrocketed until the machine shutdown for reaching resource limits (1GB Volume)

> du -hcs /*
6.1M    /bin
4.0K    /boot
80M     /data # this is fine - postgresql usage
1.1M    /dev
# ...
0       /sys
8.0K    /tmp
705M    /usr # WHATTT?????
37M     /var # i guess logs and stuff
845M    total # !!!!!!!!!!!!!
> du -hcs /usr/*
36M     /usr/bin
4.0K    /usr/games
12K     /usr/include
409M    /usr/lib # possible to reduce / cleanup?
24K     /usr/libexec
74M     /usr/local # ?
6.4M    /usr/sbin
181M    /usr/share # possible to reduce / cleanup?
4.0K    /usr/src
705M    total # !!!!!!!!!!!

This is a Fly.io provisioned postgres deployment, i haven’t downloaded packages or installed anything more than what comes by default from the fly.io postgres deployment.

Why did it used to be much smaller system footprint and suddenlt July 17 it shot up?
Inefficient distro / image?
Can anything be done to reduce its size? :pray:

Hi… I think this might be a misinterpretation of that original resource-limits error message. I see 882M for /usr/ on a freshly created Postgres Flex instance, and such a figure isn’t out of line for a Debian-based VM—even a fairly stripped-down one.

Overall, /usr/lib/ and /usr/share/ are on the root partition, which is roughly 8GB.

The only thing that should be on the 1G persistent volume is /data/.

Do you have the exact error message at hand?

Added volumes

OK thanks good to know about fresh/default Postgres Flex debian initial high disk usage.

After running fly image update and fly pg restart it seems to be back online for now.

Unfortunately I might have missed the log history of when it crashed / went down. Here’s a few lines I found, not sure if it’s from before or after recovering:
Don’t miss the line about “Your instance has hit resource limits

2024-08-15T19:57:00.884 health[<REDACTED APPID>] ewr [warn] Health check for your postgres database is warning. Your database might be malfunctioning.
2024-08-15T19:57:00.884 health[<REDACTED APPID>] ewr [warn] Health check for your postgres vm is warning. Your instance might be hitting resource limits.
2024-08-15T19:57:00.884 health[<REDACTED APPID>] ewr [warn] Health check for your postgres role is warning. Your cluster's membership might be affected.
2024-08-15T19:57:15.326 health[<REDACTED APPID>] ewr [error] Health check for your postgres database has failed. Your database is malfunctioning.
2024-08-15T19:57:21.793 health[<REDACTED APPID>] ewr [error] Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
2024-08-15T19:57:23.079 health[<REDACTED APPID>] ewr [error] Health check for your postgres role has failed. Your cluster's membership is inconsistent.
2024-08-15T19:57:28.762 app[<REDACTED APPID>] ewr [info] proxy | [WARNING] (316) : Backup Server bk_db/pg is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 5002ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-08-15T19:57:28.762 app[<REDACTED APPID>] ewr [info] proxy | [WARNING] (316) : Server bk_db/pg1 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 5001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-08-15T19:57:28.762 app[<REDACTED APPID>] ewr [info] proxy | [ALERT] (316) : backend 'bk_db' has no server available!

Glad to hear it’s back up… That log line is admittedly a little generic, but any disk it was referring to would almost certainly be the /data/ persistent volume. Here’s what I see from my own freshly created DB’s current health-check dials, via fly checks list -a db-app-name:

Name Status Output
pg passing disk-capacity: 11.0% - readonly mode will be enabled at 90.0%
vm passing checkDisk: 877.51 MB (89.0%) free space on /data/

(Many things elided.)

It might be prudent to keep an eye on the disk-related ones, and see if they start inching upward…

https://community.fly.io/t/runaway-pg-wal-disk-usage/17438/2

Found this in the metrics logs:

2024-08-15 18:40:27.811	 INFO Main child exited normally with code: 0
2024-08-15 18:40:27.811	 INFO Starting clean up.
2024-08-15 18:40:27.811	 INFO Umounting /dev/vdb from /data
2024-08-15 18:40:27.818	 WARN hallpass exited, pid: 240, status: signal: 15 (SIGTERM)
2024-08-15 18:40:27.826	2024/08/15 22:40:27 listening on [REDACTED IPV6]:22 (DNS: [fdaa::3]:53)
2024-08-15 18:40:28.816	[ 9821.633798] reboot: Restarting system
2024-08-15 18:40:30.969	 INFO Starting init (commit: db101a53)...
2024-08-15 18:40:30.990	 INFO Mounting /dev/vdb at /data w/ uid: 0, gid: 0 and chmod 0755
2024-08-15 18:40:31.007	 INFO Resized /data to 2143289344 bytes

I don’t think i did anything to resize that filesystem, is it automatic?

Also - how come df shows disk sizes of 7.8GB (/) and 2.0GB (/data) but the volume provisioned for the postgres flex was set for 1GB? I must be missing something about how this all works

Hm… It can be, but I didn’t think that was enabled on Fly Postgres.

fly m status -d -a db-app-name would detect it (the auto-extend setting), if I understand correctly. Consult the "mounts" section.

Also, does fly vol list -a db-app-name concur with the 2GB?

Don’t see any mention of auto extend… see below
I did manually extend the volume myself from 1GB to 2GB, however I’m not sure how that translates to df on the fly postgres machine as I mentioned - I only had a 1GB volume, and I manually extended it myself just today to 2GB, and it seems to be different than the 8GB / and now 2GB /data mounts i’m seeing in df… I don’t think the auto-extend (if that’s what it was) changed anything in my account usage; it was 1GB volume even after that log said it extended the /data the account was still showing 1GB volume, until I manually extended it to 2.

Still feel like i’m missing something about how volumes vs. fly postgres machines disks work

"mounts": [
    {
      "encrypted": true,
      "path": "/data",
      "size_gb": 2,
      "volume": "vol_xxxxxxxxxxxxxxxx",
      "name": "pg_data"
    }
  ],

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.