The way crsqlite (the SQLite extension used by Corrosion) works is by creating a metadata row for each non–primary key column in a table. Each of these metadata rows represents a single column value of a row in the main table. Corrosion syncs data by sending out these “metadata rows” to other nodes.
So for a table with this row,
┌──────────────────┬────────────────────────────┬──────────────┬───────┬──────┐
│ id │ title │ completed_at │ done │ user │
├──────────────────┼────────────────────────────┼──────────────┼───────┼──────┤
│ 'fresh produces' │ 'tomatoes, carrot, greens' │ 1 │ '0' │ 'fly'│
└──────────────────┴────────────────────────────┴──────────────┴───────┴──────┘
Crsqlite will create metadata rows with this data:
select * from crsql_changes where pk = X'010b0e66726573682070726f6475636573';
┌─────────┬───────────────────────────────────────┬─────────────────────────────────────┬────────────┬───────────────────────┬────────────────┬────────────────────────────┐
│ table │ pk │ site_id │ db_version │ ts │ cid │ val │
├─────────┼───────────────────────────────────────┼─────────────────────────────────────┼────────────┼───────────────────────┼────────────────┼────────────────────────────┤
│ 'todos' │ x'010b0e66726573682070726f6475636573' │ x'c560acdfe248432b995777dbc6994fa2' │ 2 │ '7561109084173720240' │ 'title' │ 'tomatoes, carrot, greens' │
│ 'todos' │ x'010b0e66726573682070726f6475636573' │ x'c560acdfe248432b995777dbc6994fa2' │ 2 │ '7561109084173720240' │ 'completed_at' │ 1 │
│ 'todos' │ x'010b0e66726573682070726f6475636573' │ x'c560acdfe248432b995777dbc6994fa2' │ 2 │ '7561109084173720240' │ 'done' │ '0' │
│ 'todos' │ x'010b0e66726573682070726f6475636573' │ x'c560acdfe248432b995777dbc6994fa2' │ 2 │ '7561109084173720240' │ 'user' │ 'fly' │
└─────────┴───────────────────────────────────────┴─────────────────────────────────────┴────────────┴───────────────────────┴────────────────┴────────────────────────────┘
Notice that some information is duplicated, metadata rows for columns of a particular primary key updated within the same transaction will have the same site_id, db_version, table, pk (which is a compact binary representation of the primary keys of the row) and timestamp columns.
Previously, the broadcast and sync messages contained an array of row changes. We’ve switched to a hash map keyed by the primary key, where each value is a struct with the row’s updated fields. This lets us deduplicate columns for changes with the same primary key and reduces over-the-wire data. With this update, broadcast and sync bandwidth dropped by ~30%.
