Yeah I think that’s the issue.
Our application service has two machines attached (let’s call them A and B), and one of them gives issues while connecting via SSH (let’s say B).
I tried forcing each specific machine to serve the response by stopping the other: when only A was active the application could connect to the database, and when only B was active it could not.
This is the error from the logs when I try to connect:
2024-09-20T16:57:43Z app[<snip>] cdg [info]2024/09/20 16:57:43 ERROR unexpected error fetching cert error="transient SSH server error: can't resolve _orgcert.internal"
2024-09-20T16:57:43Z app[<snip>] cdg [info]2024/09/20 16:57:43 ERROR unexpected error error="[ssh: no auth passed yet, transient SSH server error: can't resolve _orgcert.internal]"
I tried re-creating the machine but the new one has the same issue. I could disable auto-scaling and keep the one machine working (traffic won’t be high) but I fear it could break anytime and go offline.
Any tips on how to solve this aside from re-creating the organization like the other poster did?