There is a known slow service propagation issue to which a fix is being developed. As such it can take up to a minute before requests are routed to new vms after a deploy and so get handled as expected.
As @zee says what you need is for an old vm to hang around long enough so your new vm route, service etc is fully propagated. If you only have one vm this issue randomly happens. As you say, transient. It can be mitigated by having three:
… since it’s likely by the time the third old vm is replaced, the first new vm is ready to go, routing-wise. As this issue is fixed that won’t be necessary, however for now I’ve found that helps (in my case with the rolling strategy). So if you get it again, give that a try.