Deployment fails here and there

Thanks for fixing it, the cluster does seem more stable now. I felt bad posting this support ticket, I knew you were swamped with the recent DB fixes and all, and I have been posting a lot of them along the week (this was launch week of my company). Fly was the smoothest release experience I ever had with any platform, that’s why I love to stay even though you explicitly said Postgres is in beta. You have done a great job with the platform :rocket:

The issue seems to have come back. The database name is indiepaper-production-db

This issue has surfaced again. My postgres database is indiepaper-production-db.

Command: /app/bin/indie_paper eval IndiePaper.Release.migrate
	 Starting instance
	 Configuring virtual machine
	 Pulling container image
	 Unpacking image
	 Preparing kernel init
	 Configuring firecracker
	 Starting virtual machine
	 Starting init (commit: 50ffe20)...
	 Preparing to run: `/app/bin/indie_paper eval IndiePaper.Release.migrate` as nobody
	 2021/10/26 08:41:14 listening on [fdaa:0:3565:a7b:21a1:7179:7b9d:2]:22 (DNS: [fdaa::3]:53)
	 Reaped child process with pid: 561 and signal: SIGUSR1, core dumped? false
Error: :18.295 [error] Could not create schema migrations table. This error usually happens due to the following:
	   * The database does not exist
	   * The "schema_migrations" table, which Ecto uses for managing
	     migrations, was defined by another library
	   * There is a deadlock while migrating (such as using concurrent
	     indexes with a migration_lock)
	 To fix the first issue, run "mix ecto.create".
	 To address the second, you can run "mix ecto.drop" followed by
	 "mix ecto.create". Alternatively you may configure Ecto to use
	 another table and/or repository for managing migrations:
	     config :indie_paper, IndiePaper.Repo,
	       migration_source: "some_other_table_for_schema_migrations",
	       migration_repo: AnotherRepoForSchemaMigrations
	 The full error report is shown below.
	 ** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2977ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
	   1. Ensuring your database is available and that you can connect to it
	   2. Tracking down slow queries and making sure they are running fast enough
	   3. Increasing the pool_size (albeit it increases resource consumption)
	   4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
	 See DBConnection.start_link/2 for more information
	     (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:756: Ecto.Adapters.SQL.raise_sql_call_error/1
	     (elixir 1.12.1) lib/enum.ex:1553: Enum."-map/2-lists^map/1-0-"/2
	     (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:844: Ecto.Adapters.SQL.execute_ddl/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:645: Ecto.Migrator.verbose_schema_migration/3
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:473: Ecto.Migrator.lock_for_migrations/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:388: Ecto.Migrator.run/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:146: Ecto.Migrator.with_repo/3
	     (indie_paper 0.1.0) lib/indie_paper/release.ex:12: anonymous fn/2 in IndiePaper.Release.migrate/0
	 Main child exited normally with code: 1
	 Reaped child process with pid: 563 and signal: SIGUSR1, core dumped? false
	 Starting clean up.

Error Release command failed, deployment aborted

I have separated out my development and production environments into two different accounts so I don’t mess up production DB by locally running some commands. So all my deployments get auto-triggered after a push to the master branch. Failing in deployment kind of erodes that trust and flow of pushing via Github. I have to manually check and verify if the deployment went in the right direction.

The development version indiepaper-development actually went through without errors, that is the same code that fails on indiepaper-production-db .Please fix it fast.

Looking into it… we’re working on improving reliability here, will post an update once I have more info.

1 Like

exactly the same problem I’ve been having all day, had to deploy a version for a presentation and couldn’t :grimacing:

Yikes. That sucks, sorry to hear that. Inside Fly these problems are just some package or the other acting up under load, but to our customers they’re real-life problems that are often personal and irritating. I don’t have a quick answer, but the good news is that this should get better the more normal and edge cases we fix.

2 Likes

All I can say is that I look forward to your business support plans :slight_smile:. (and also the platform becomes more stable, I’d prefer to use you guys instead of the defacto which always ends up being AWS)

3 Likes

Has there been any movement with this? Went to deploy this morning and still having it complain about there being a DB missing etc.

Command: /app/bin/app eval App.Release.migrate_all
	Starting instance
	Configuring virtual machine
	Pulling container image
	Unpacking image
	Preparing kernel init
	Configuring firecracker
	Starting virtual machine
	Starting init (commit: 50ffe20)...
	Preparing to run: `/app/bin/app eval App.Release.migrate_all` as nobody
	2021/10/26 19:59:01 listening on [fdaa:0:35d4:a7b:2984:821:7a9b:2]:22 (DNS: [fdaa::3]:53)
	Reaped child process with pid: 561 and signal: SIGUSR1, core dumped? false
	19:59:05.678 [error] Could not create schema migrations table. This error usually happens due to the following:
	  * The database does not exist
	  * The "schema_migrations" table, which Ecto uses for managing
	    migrations, was defined by another library
   ....

We’ve handled the original issue that caused this error a few weeks ago, so it’s odd that it’s still showing up on your app. Could you confirm that the connection limits are not being hit on your DB instance? Or if there are any other errors in the DB logs?

Also, is this the same application where you noticed this error? deployment issues in SYD region

correct.

As for connection limits nope, there is only one machine connected to the DB, I’ve logged into the DB to check for locks etc etc and it’s all fine. So I’m confused.

Have any changes or overrides been made to the DNS resolvers in the app?

nope, it’s a standard Elixir app using the Dockefile posted in your guides.

I’m replying in the other thread deployment issues in SYD region to limit the spread of answers if that is okay?

Yeah, let’s move this there, I’ll link the DB errors.

Will you try this again and also make sure your database hasn’t reached a connection limit and the DB logs aren’t showing any errors? That particular Elixir error is not super helpful, but it’s probably not an issue on our end (this time).

Hey there! This started to happen to me as well today, around 3 hours ago.
The issue seems to be the same where migrations cannot be run due to connection error, while the deployed app is working fine :thinking:

Hey @flyio3! Sorry to see you’ve been struggling here. I’d love to help get this resolved if I can so we can improve the docs or whatever else is needed to make it better for you and others.

So let me restate what I understand the setup to be… please correct where I’m wrong.

  • You are hosting a single application instance
  • You are hosting it in syd
  • You have a single postgres database in syd (not using read-replicas)
  • Are you using Phoenix 1.6 with esbuild? I ask because the application is generated differently more recently.
  • You aren’t using the fly_postgres hex package (just making sure)
  • The DATABASE_URL is set (fly secrets list)

Sometimes the logs can provide more information. When that happens, just run fly logs to see if there’s any more info there.

When I’ve seen this problem, it’s generally because one of the following:

  • a missing inet6 config for IPv6 support (app can’t see the database)
  • It’s multi-region and the app doesn’t know what the primary region is supposed to be
  • It’s multi-region and the app is being started in a backup region which doesn’t have the database

Does any of that apply?

sorry, this has all been sorted, there was downtime on one of your services while this was happening which compounded with some issues I was having by using podman to build the images, I’ve switched to using debian:slim as a base image and no longer have any issues with regards to DNS - so I assume it was something to do with the alpine image having DNS issues when built with podman.

1 Like

@paolo.marino does any of this apply to the problem you’re having? Deployment fails here and there - #21 by brainlid

Hey! some of those. The app has been deployed there for a week or so with no issues and I haven’t made any big changes to the config.

I tried to deploy it again today and the deploy step worked, so migrations run, but now the deployed app cannot connect to the database anymore which in turns was working before the deploy.