I seem to now be able to run fly deploy again after the outage, but my app does NOT seem to be able to communicate with the database anymore. Any suggestions on troubleshooting? I can connect successfully to the database on my own.
This is the error I get from Phoenix when deploying:
14:45:56.432 [error] Could not create schema migrations table. This error usually happens due to the following:
* The database does not exist
* The "schema_migrations" table, which Ecto uses for managing
migrations, was defined by another library
* There is a deadlock while migrating (such as using concurrent
indexes with a migration_lock)
Yeah nevermind on the deployments working. My deployments are getting stuck at pending still. This is brutal, I have over 18 hours of downtime due to hosting issues this month.
I’ve been running into a similar issue for a couple days now when running fly deploy with primary region in bos (edited for region typo, sorry):
Not sure if it has to do with the flycast network since the db is listed at <db-name>.flycast:5432.
[ 0.152832] PCI: Fatal: No config space access function found
INFO Starting init (commit: 15238e9)...
INFO Preparing to run: `/app/bin/migrate` as nobody
INFO [fly api proxy] listening at /.fly/api
2023/10/23 17:02:51 listening on [fdaa:0:85f4:a7b:1ed:7be1:e30c:2]:22 (DNS: [fdaa::3]:53)
17:02:53.323 [error] Postgrex.Protocol (#PID<0.167.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (<db-name>.flycast:5432): non-existing domain - :nxdomain
...
** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2987ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
1. Ensuring your database is available and that you can connect to it
2. Tracking down slow queries and making sure they are running fast enough
3. Increasing the pool_size (although this increases resource consumption)
4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
If you could double-check your primary region and it turns out to be bos, we do have a single host having DNS issues there. If a release_command ephemeral machine lands on that host it will have this kind of trouble. So - if the error you’re seeing is related to a release_command, you can try forcing the ephemeral machine to be created in another region (ewr is a good choice as it’s close to bos)
PRIMARY_REGION=ewr fly deploy
Let me know if that works - it should be a temporary workaround while we get that pesky bos host’s DNS working.