Smaller broadcast and sync payloads in Corrosion

The way crsqlite (the SQLite extension used by Corrosion) works is by creating a metadata row for each non–primary key column in a table. Each of these metadata rows represents a single column value of a row in the main table. Corrosion syncs data by sending out these “metadata rows” to other nodes.

So for a table with this row,

┌──────────────────┬────────────────────────────┬──────────────┬───────┬──────┐
│        id        │           title            │ completed_at │ done  │ user │
├──────────────────┼────────────────────────────┼──────────────┼───────┼──────┤
│ 'fresh produces' │ 'tomatoes, carrot, greens' │ 1            │ '0'   │ 'fly'│
└──────────────────┴────────────────────────────┴──────────────┴───────┴──────┘

Crsqlite will create metadata rows with this data:

select * from crsql_changes where pk = X'010b0e66726573682070726f6475636573';

┌─────────┬───────────────────────────────────────┬─────────────────────────────────────┬────────────┬───────────────────────┬────────────────┬────────────────────────────┐
│  table  │                  pk                   │               site_id               │ db_version │          ts           │      cid       │            val             │
├─────────┼───────────────────────────────────────┼─────────────────────────────────────┼────────────┼───────────────────────┼────────────────┼────────────────────────────┤
│ 'todos' │ x'010b0e66726573682070726f6475636573' │ x'c560acdfe248432b995777dbc6994fa2' │ 2          │ '7561109084173720240' │ 'title'        │ 'tomatoes, carrot, greens' │
│ 'todos' │ x'010b0e66726573682070726f6475636573' │ x'c560acdfe248432b995777dbc6994fa2' │ 2          │ '7561109084173720240' │ 'completed_at' │ 1                          │
│ 'todos' │ x'010b0e66726573682070726f6475636573' │ x'c560acdfe248432b995777dbc6994fa2' │ 2          │ '7561109084173720240' │ 'done'         │ '0'                        │
│ 'todos' │ x'010b0e66726573682070726f6475636573' │ x'c560acdfe248432b995777dbc6994fa2' │ 2          │ '7561109084173720240' │ 'user'         │ 'fly'                      │
└─────────┴───────────────────────────────────────┴─────────────────────────────────────┴────────────┴───────────────────────┴────────────────┴────────────────────────────┘

Notice that some information is duplicated, metadata rows for columns of a particular primary key updated within the same transaction will have the same site_id, db_version, table, pk (which is a compact binary representation of the primary keys of the row) and timestamp columns.

Previously, the broadcast and sync messages contained an array of row changes. We’ve switched to a hash map keyed by the primary key, where each value is a struct with the row’s updated fields. This lets us deduplicate columns for changes with the same primary key and reduces over-the-wire data. With this update, broadcast and sync bandwidth dropped by ~30%.

5 Likes