Outdated System Time after Suspension

In one of my applications I have received a jwt decoding issue and discovered outdated system time after machine wake up from suspensions (similar as here: VMs booting with outdated system time after suspension - best practices?)

My logs gave the following

2025-04-18T18:20:32Z [info] Virtual machine has been suspended
2025-04-18T18:54:07Z [info] | DEBUG | Server current UTC time: 2025-04-18T18:20:26.366985+00:00

At 18:20:32Z the machine was suspended and at 18:54:07Z I logged a system time of 18:20:26, i.e. few seconds before suspension.

Also repeating the logs for +2s and +4s still issued the wrong system time but 2 resp. 4 seconds later than before:

2025-04-18T18:54:09Z [info] | DEBUG | Server current UTC time: 2025-04-18T18:20:28.368798+00:00
2025-04-18T18:54:11Z [info] | DEBUG | Server current UTC time: 2025-04-18T18:20:30.369548+00:00

If anyone could help me to figure out what can be done for having the right time (waiting for N seconds or other solutions) that would be awesome. Also I would really like to keep the option
auto_stop_machines = “suspend”

Thank you already in advance and have a great day

This was in the original Suspense thread - it’s a known issue but I’m not sure where Fly is at in terms of addressing it. For clock sensitive systems, I don’t think you can properly use “suspend” here.

In your JWT decoder, perhaps you can make an NTP call to an external time service. This will add latency, but you may decide that is a reasonable trade-off for keeping this machine suspendable.

Interesting… It looks like there is actually an efficient paravirtualized time service available, the one mentioned in the Firecracker FAQ…

$ fly ssh console
# ls -l /dev/ptp0
crw------- 1 root root 253, 0 Dec 24 02:11 /dev/ptp0
# apt-get update
# apt-get install --no-install-recommends linuxptp
# phc_ctl /dev/ptp0 get
phc_ctl[1088.776]: clock time is 1745121618.964522192 or Sun Apr 20 04:00:18 2025

https://en.wikipedia.org/wiki/Precision_Time_Protocol

https://github.com/firecracker-microvm/firecracker/blob/main/FAQ.md

[…] the guests will constantly update time
to stay in sync with host wall-clock. They do so using cheap para-virtualized
calls into kvm ptp instead of actual network NTP traffic.

In my experiments, phc_ctl /dev/ptp0 get is pretty accurate even when date (the Linux guest’s internal time) is 5+ minutes off.

Sun Apr 20 03:45:46 UTC 2025   # output of `date`

phc_ctl[521.947]: clock time is 1745121052.124684276 or Sun Apr 20 03:50:52 2025

(I didn’t test really long suspends, however, :snowflake:.)

Glancing at the Linux kernel docs, it looks like there might be a POSIX API call a person could make, instead of shelling out to a subprocess. (Admittedly that’s not going to be convenient from Node, and the like.)


Having said that, I agree with @khuezy overall, that this all still seems a bit fragile for security-sensitive tasks… (It will be super-nice once Fly.io does declare suspend ready for production, though.)

Thank you @khuezy, @halfer and @mayailurus very much for your helpful remarks and suggestions!! Yesterday I implemented and tested the NTP call in the decoding process and I am quite happy with how it turned out to work.

But also your suggestion @mayailurus is definitely something I will have a closer look at.

So thank you all very much once again and have a great day

This should be reliable 100% of the time.

The guest clock is better now too.