Hi, I implemented challenge 2 by having a unique prefix + local counter variable. I have done this the way snowflake does it, (
MachineID + timestamp(NTP) + local counter (in process memory)). My colleague said that doing it this way makes the service “almost” stateless. The word “almost” made me think about the weakest part of this. The clock. I had a couple of questions about the clock in an actual production system:
Can one do this only by making some assumptions about the clock? One obvious one I can think of is: Skew should be < X milliseconds. X could be fixed by a “safe” process restart interval. If skew of more than X milliseconds and the process restarts during that, the time counter reset could generate duplicates. Is that the right way to think about the problem?
In case of clock skew > X, the application layer can detect it by comparing it to the last time value, which is fine, and it can throw an error.
Is this how it works in production? Any other details I am missing?
PS: When asked, ChatGPT actually said there is a “clock ID” too (reference from Multi-clock solves the time redirection problem of snowflake algorithm - SoByte) that can be used to overcome clock skew. Which sounds reasonable, but that seems to make the service more stateful by needing to persist clock IDs.