Postgres VM going OOM after some time of bulk inserting

I’ve got a Phoenix app, where I’m also using Oban to do some background processing. I’m inserting Oban jobs in batches of 5K into the DB, however the Postgres VM goes OOM after a while of inserting batches.

I couldn’t really figure out why. First the VM had 1 GB of RAM then I’ve scaled it all the way up to 4 GB but still I get OOMs after a while. Any idea why?

The log example

 2023-07-09T11:11:03.154 app[17811694bd5068] ams [info] [ 4398.099844] Out of memory: Killed process 31237 (postgres) total-vm:1285664kB, anon-rss:9044kB, file-rss:0kB, shmem-rss:854424kB, UID:999 pgtables:2376kB oom_score_adj:0 
show work_mem;
 work_mem
----------
 4MB
(1 row)

show shared_buffers ;
 shared_buffers
----------------
 1GB
(1 row)

show maintenance_work_mem ;
 maintenance_work_mem
----------------------
 64MB
(1 row)

Below you can see all the peaks after which it went OOM.

health checks (Not sure why there’s 500 Internal Server Error in vm check)

pg is passing
2023-07-09 11:11:19
	

[✓] connections: 73 used, 3 reserved, 300 max (7.4ms)
[✓] cluster-locks: No active locks detected (22.63µs)
[✓] disk-capacity: 52.7% - readonly mode will be enabled at 90.0% (11.45µs)

vm is critical
2023-07-09 11:44:44
	

500 Internal Server Error
[✓] checkDisk: 8.9 GB (45.3%) free space on /data/ (810.48µs)
[✓] checkLoad: load averages: 0.04 0.13 0.57 (1.5ms)
[✓] memory: system spent 72ms of the last 60s waiting on memory (95.71µs)
[✗] cpu: system spent 1.5s of the last 10 seconds waiting on cpu (30.33µs)
[✗] io: system spent 1.99s of the last 10 seconds waiting on io (32.78µs)

role is passing
2023-07-09 11:11:19
	

primary

Tried upgrading the primary VM CPU size to 4 as well because I had a message like

Health check for your postgres vm has failed. Your instance has hit resource limits. Upgrading your instance / volume size or reducing your usage might help.
[✗] cpu: system spent 1.5s of the last 10 seconds waiting on cpu (30.33µs)
[✗] io: system spent 1.99s of the last 10 seconds waiting on io (32.78µs)

though I don’t understand what’s wrong with waiting on stuff as long as it’s acceptable and not sure what the IO message should mean.

Upgrading to 4 CPUs didn’t help either, I still can’t bulk insert things in a loop. It just goes OOM eventually.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.