We’re working towards dedicated per-region Corrosion hosts to make our distributed database more robust. This means Corrosion will handle way more concurrency, and instead of flyd and corrosion colocation, we’ll have actual network latency between them. To keep things fast under these new conditions, we needed a better way to handle bulk operations.
Enter UNNEST — a powerful concept from Postgres for expanding arrays into rows, enabling very efficient bulk operations. Loops that issue the same query N times on different data can usually get rewritten into a single query using unnest. The best part? You only pay the network latency once, not N times.
SQLite has carray but it’s limited to expanding at most one array at a time, which doesn’t get you very far. We’ve managed to avoid that limitation by writing a custom carray-inspired vtab and exposing it through our Postgres emulation layer.
Example queries
Now you can use the Corrosion API to make bulk insertions:
INSERT INTO foo (col1, col2, col3, col4)
SELECT value0, value1, value2, value3
FROM unnest($1, $2, $3, $4)
Or filter by a list of IDs in a single statement:
SELECT * FROM foo WHERE foo.id IN (SELECT value0 FROM unnest($1))
Most patterns of “loop over params and run the same query” can now be converted to a single statement.
Using UNNEST via Postgres
To use UNNEST through our Postgres emulation, you’ll need explicit type casts. We support text[], blob[], int[], and float[], with up to 32 array parameters per unnest invocation. Postgres :: type casts and {} array literals aren’t currently supported.
INSERT INTO foo (col1, col2, col3, col4)
SELECT value0, value1, value2, value3
FROM unnest(
CAST($1 AS int[]),
CAST($2 AS float[]),
CAST($3 AS text[]),
CAST($4 AS blob[])
)
You can use any Postgres-compatible client to send over arrays of data. Be mindful though—we’re a bit pedantic about types right now. Use i64 (not u32 or others) for int[], f64 (not f32 or others) for float[], and so on.
Real-world impact
We’ve made Corrosion use UNNEST internally where possible, the idea being that it would improve sync performance by issuing fewer SQLite queries. The results were mixed: while these changes improved performance on our benchmarks for large transactions, they didn’t affect our sync latency for small transactions in any observable way.
An unexpected bonus: our sync loop now contains a query that uses unnest, crsqlite, and a returning clause together. By doing so we unearthed an SQLite codegen bug.
We managed to debug it, report it to the SQLite forum, and it’s now fixed upstream.