Fly Postgres suddenly bloated disk usage 1GB /usr/lib /usr/share

LetsDoGooder · August 15, 2024, 8:42pm

Suddenly recently disk usage skyrocketed until the machine shutdown for reaching resource limits (1GB Volume)

> du -hcs /*
6.1M    /bin
4.0K    /boot
80M     /data # this is fine - postgresql usage
1.1M    /dev
# ...
0       /sys
8.0K    /tmp
705M    /usr # WHATTT?????
37M     /var # i guess logs and stuff
845M    total # !!!!!!!!!!!!!

> du -hcs /usr/*
36M     /usr/bin
4.0K    /usr/games
12K     /usr/include
409M    /usr/lib # possible to reduce / cleanup?
24K     /usr/libexec
74M     /usr/local # ?
6.4M    /usr/sbin
181M    /usr/share # possible to reduce / cleanup?
4.0K    /usr/src
705M    total # !!!!!!!!!!!

This is a Fly.io provisioned postgres deployment, i haven’t downloaded packages or installed anything more than what comes by default from the fly.io postgres deployment.

Why did it used to be much smaller system footprint and suddenlt July 17 it shot up?
Inefficient distro / image?
Can anything be done to reduce its size?

mayailurus · August 15, 2024, 9:32pm

Hi… I think this might be a misinterpretation of that original resource-limits error message. I see 882M for /usr/ on a freshly created Postgres Flex instance, and such a figure isn’t out of line for a Debian-based VM—even a fairly stripped-down one.

Overall, /usr/lib/ and /usr/share/ are on the root partition, which is roughly 8GB.

The only thing that should be on the 1G persistent volume is /data/.

Do you have the exact error message at hand?

mayailurus · August 15, 2024, 9:33pm

Added volumes

LetsDoGooder · August 15, 2024, 10:26pm

OK thanks good to know about fresh/default Postgres Flex debian initial high disk usage.

After running fly image update and fly pg restart it seems to be back online for now.

Unfortunately I might have missed the log history of when it crashed / went down. Here’s a few lines I found, not sure if it’s from before or after recovering:
Don’t miss the line about “Your instance has hit resource limits”

2024-08-15T19:57:00.884 health[<REDACTED APPID>] ewr [warn] Health check for your postgres database is warning. Your database might be malfunctioning.
2024-08-15T19:57:00.884 health[<REDACTED APPID>] ewr [warn] Health check for your postgres vm is warning. Your instance might be hitting resource limits.
2024-08-15T19:57:00.884 health[<REDACTED APPID>] ewr [warn] Health check for your postgres role is warning. Your cluster's membership might be affected.
2024-08-15T19:57:15.326 health[<REDACTED APPID>] ewr [error] Health check for your postgres database has failed. Your database is malfunctioning.
2024-08-15T19:57:21.793 health[<REDACTED APPID>] ewr [error] Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
2024-08-15T19:57:23.079 health[<REDACTED APPID>] ewr [error] Health check for your postgres role has failed. Your cluster's membership is inconsistent.
2024-08-15T19:57:28.762 app[<REDACTED APPID>] ewr [info] proxy | [WARNING] (316) : Backup Server bk_db/pg is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 5002ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-08-15T19:57:28.762 app[<REDACTED APPID>] ewr [info] proxy | [WARNING] (316) : Server bk_db/pg1 is DOWN, reason: Layer7 invalid response, info: "HTTP content check did not match", check duration: 5001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
2024-08-15T19:57:28.762 app[<REDACTED APPID>] ewr [info] proxy | [ALERT] (316) : backend 'bk_db' has no server available!

mayailurus · August 15, 2024, 11:11pm

Glad to hear it’s back up… That log line is admittedly a little generic, but any disk it was referring to would almost certainly be the /data/ persistent volume. Here’s what I see from my own freshly created DB’s current health-check dials, via fly checks list -a db-app-name:

Name	Status	Output
pg	passing	disk-capacity: 11.0% - readonly mode will be enabled at 90.0%
vm	passing	checkDisk: 877.51 MB (89.0%) free space on /data/

(Many things elided.)

It might be prudent to keep an eye on the disk-related ones, and see if they start inching upward…

https://community.fly.io/t/runaway-pg-wal-disk-usage/17438/2

LetsDoGooder · August 15, 2024, 11:29pm

Found this in the metrics logs:

2024-08-15 18:40:27.811	 INFO Main child exited normally with code: 0
2024-08-15 18:40:27.811	 INFO Starting clean up.
2024-08-15 18:40:27.811	 INFO Umounting /dev/vdb from /data
2024-08-15 18:40:27.818	 WARN hallpass exited, pid: 240, status: signal: 15 (SIGTERM)
2024-08-15 18:40:27.826	2024/08/15 22:40:27 listening on [REDACTED IPV6]:22 (DNS: [fdaa::3]:53)
2024-08-15 18:40:28.816	[ 9821.633798] reboot: Restarting system
2024-08-15 18:40:30.969	 INFO Starting init (commit: db101a53)...
2024-08-15 18:40:30.990	 INFO Mounting /dev/vdb at /data w/ uid: 0, gid: 0 and chmod 0755
2024-08-15 18:40:31.007	 INFO Resized /data to 2143289344 bytes

I don’t think i did anything to resize that filesystem, is it automatic?

Also - how come df shows disk sizes of 7.8GB (/) and 2.0GB (/data) but the volume provisioned for the postgres flex was set for 1GB? I must be missing something about how this all works

mayailurus · August 16, 2024, 12:03am

Hm… It can be, but I didn’t think that was enabled on Fly Postgres.

fly m status -d -a db-app-name would detect it (the auto-extend setting), if I understand correctly. Consult the "mounts" section.

Also, does fly vol list -a db-app-name concur with the 2GB?

LetsDoGooder · August 16, 2024, 2:42am

Don’t see any mention of auto extend… see below
I did manually extend the volume myself from 1GB to 2GB, however I’m not sure how that translates to df on the fly postgres machine as I mentioned - I only had a 1GB volume, and I manually extended it myself just today to 2GB, and it seems to be different than the 8GB / and now 2GB /data mounts i’m seeing in df… I don’t think the auto-extend (if that’s what it was) changed anything in my account usage; it was 1GB volume even after that log said it extended the /data the account was still showing 1GB volume, until I manually extended it to 2.

Still feel like i’m missing something about how volumes vs. fly postgres machines disks work

"mounts": [
    {
      "encrypted": true,
      "path": "/data",
      "size_gb": 2,
      "volume": "vol_xxxxxxxxxxxxxxxx",
      "name": "pg_data"
    }
  ],

system · August 23, 2024, 2:42am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Postgres app not picking up extended volume Questions / Help postgres	3	298	June 28, 2023
Need Help With Postgresql	4	1055	February 8, 2022
Increase diskspace for postgresql Questions / Help postgres	1	916	November 25, 2022
Database hitting resource limits postgres , troubleshooting , volumes	2	55	April 21, 2025
Postgres metric shows wrong max size	4	218	August 3, 2023

Fly Postgres suddenly bloated disk usage 1GB /usr/lib /usr/share

Related topics