Thanks all for the feedback and questions! A few responses I can give you off the top of my head:
We didn’t release any optimizations, so this sounds like natural variation. For example, how fast a Machine can get back up and running might depend on how long it takes to configure its network interface on the host side and how much of the Machine’s memory snapshot is still cached. We are thinking about how we can reduce the resume time further or at least make it more consistent, though!
This is a known issue (see also “Current limitations and caveats” in the original Fresh Produce post)—the Machine has to reconnect to a vsock to send logs after it’s resumed. Unfortunately, this means that a few log lines may be dropped immediately after resumption, but it should otherwise be benign, and improving this is on the to-do list.
We’d definitely like to raise it! We started with a 2 GiB limit to be safe, and because we currently write the entire contents of the Machine’s memory to disk each time it is suspended. Firecracker also supports “diff” snapshots that include only the memory that has been written since the last snapshot was taken, which should help make snapshots of large Machines less expensive.