TL;DR
If there’s a problem importing the data, then pg import
should offer some kind of indication that there was an error, especially if you turn on additional logging.
Also, it would be nice for pg import
to include timestamps, so when doing a test run we can get a sense of how long the migration is likely to take.
DETAILS
I used pg import
to import my data from the source Postgres (on Heroku) to the fly.io cluster I just stood up. The import command appeared to succeed. Here’s what I saw, with potentially sensitive fields snipped out:
% echo "Start: $(date)" && fly pg import <my_db_url> --app <my_db> --region <my_region> && echo "Finish: $(date)"
Start: Thu Sep 5 11:31:03 PDT 2024
? Select VM size: shared-cpu-1x - CPU Kind: Shared, vCPUs: 1 Memory: 256MB
Created an ephemeral machine <machine_id> to run the import process.
Connecting to <location>... complete
[info] Running pre-checks...
[info] Source Postgres version: 13.15 (Ubuntu 13.15-1.pgdg20.04+1)
[info] Target Postgres version: 16.4 (Ubuntu 16.4-1.pgdg24.04+1)
[info] Pre-checks completed without issue
[info] Starting import process... (This could take a while)
[info] Import complete!
Waiting for ephemeral machine <machine_id> to be destroyed ... done.
Finish: Thu Sep 5 11:37:06 PDT 2024
From this output, it looks like everything succeeded. When I poked deeper, I discovered that actually there was a problem which caused the import to fail ~half way through, such that the first several tables (alphabetically) were fully populated and the remaining tables all ended up with zero rows.
Specifically, the logs showed Out of memory: Killed process 391 (pg_dump)
followed by ERROR: invalid input syntax for type json
. It appears that the ephemeral machine used for doing the import had too little memory (it had 256MB, which is the default). This caused the pg_dump to fail, which interrupted the data in the middle of a JSON block for one of the records, and this caused the corresponding COPY
command to fail.
It’s certainly possible to discover all this, but it would be nice if pg import
gave some kind of indication that there was a problem along the way.
BTW, I tried running import separately with both --debug
and --verbose
, but this made no difference at all. The import failed in exactly the same way, but the output of pg import
was identical.