Outdated System Time after Suspension

In one of my applications I have received a jwt decoding issue and discovered outdated system time after machine wake up from suspensions (similar as here: VMs booting with outdated system time after suspension - best practices?)

My logs gave the following

2025-04-18T18:20:32Z [info] Virtual machine has been suspended
2025-04-18T18:54:07Z [info] | DEBUG | Server current UTC time: 2025-04-18T18:20:26.366985+00:00

At 18:20:32Z the machine was suspended and at 18:54:07Z I logged a system time of 18:20:26, i.e. few seconds before suspension.

Also repeating the logs for +2s and +4s still issued the wrong system time but 2 resp. 4 seconds later than before:

2025-04-18T18:54:09Z [info] | DEBUG | Server current UTC time: 2025-04-18T18:20:28.368798+00:00
2025-04-18T18:54:11Z [info] | DEBUG | Server current UTC time: 2025-04-18T18:20:30.369548+00:00

If anyone could help me to figure out what can be done for having the right time (waiting for N seconds or other solutions) that would be awesome. Also I would really like to keep the option
auto_stop_machines = “suspend”

Thank you already in advance and have a great day

This was in the original Suspense thread - it’s a known issue but I’m not sure where Fly is at in terms of addressing it. For clock sensitive systems, I don’t think you can properly use “suspend” here.

2 Likes

In your JWT decoder, perhaps you can make an NTP call to an external time service. This will add latency, but you may decide that is a reasonable trade-off for keeping this machine suspendable.

Interesting… It looks like there is actually an efficient paravirtualized time service available, the one mentioned in the Firecracker FAQ…

$ fly ssh console
# ls -l /dev/ptp0
crw------- 1 root root 253, 0 Dec 24 02:11 /dev/ptp0
# apt-get update
# apt-get install --no-install-recommends linuxptp
# phc_ctl /dev/ptp0 get
phc_ctl[1088.776]: clock time is 1745121618.964522192 or Sun Apr 20 04:00:18 2025

https://en.wikipedia.org/wiki/Precision_Time_Protocol

https://github.com/firecracker-microvm/firecracker/blob/main/FAQ.md

[…] the guests will constantly update time
to stay in sync with host wall-clock. They do so using cheap para-virtualized
calls into kvm ptp instead of actual network NTP traffic.

In my experiments, phc_ctl /dev/ptp0 get is pretty accurate even when date (the Linux guest’s internal time) is 5+ minutes off.

Sun Apr 20 03:45:46 UTC 2025   # output of `date`

phc_ctl[521.947]: clock time is 1745121052.124684276 or Sun Apr 20 03:50:52 2025

(I didn’t test really long suspends, however, :snowflake:.)

Glancing at the Linux kernel docs, it looks like there might be a POSIX API call a person could make, instead of shelling out to a subprocess. (Admittedly that’s not going to be convenient from Node, and the like.)


Having said that, I agree with @khuezy overall, that this all still seems a bit fragile for security-sensitive tasks… (It will be super-nice once Fly.io does declare suspend ready for production, though.)

4 Likes

Thank you @khuezy, @halfer and @mayailurus very much for your helpful remarks and suggestions!! Yesterday I implemented and tested the NTP call in the decoding process and I am quite happy with how it turned out to work.

But also your suggestion @mayailurus is definitely something I will have a closer look at.

So thank you all very much once again and have a great day

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.