Deployment fails here and there

Deploying via Github or via terminal sometimes fails and sometimes succeed. The steps that fail are at

deployment-1633625254: digest: sha256:c69318ec44fbd899eae561ae448fdbff009f6ce33b2dff12e06b62c92f41f2d7 size: 1364
--> Pushing image done
Image: registry.fly.io/indiepaper-development:deployment-1633625254
Image size: 49 MB
==> Creating release
Release v16 created
Release command detected: this new release will not be available until the command succeeds.

You can detach the terminal anytime without stopping the deployment
==> Release command
Command: /app/bin/indie_paper eval IndiePaper.Release.migrate
	 Configuring virtual machine
	 Pulling container image
	 Unpacking image
	 Preparing kernel init
	 Configuring firecracker
	 Starting virtual machine
	 Starting init (commit: 50ffe20)...
	 Preparing to run: `/app/bin/indie_paper eval IndiePaper.Release.migrate` as nobody
	 2021/10/07 16:50:43 listening on [fdaa:0:3567:a7b:1449:d5e5:d11e:2]:22 (DNS: [fdaa::3]:53)
	 Reaped child process with pid: 561 and signal: SIGUSR1, core dumped? false
Error: :46.891 [error] Could not create schema migrations table. This error usually happens due to the following:
	   * The database does not exist
	   * The "schema_migrations" table, which Ecto uses for managing
	     migrations, was defined by another library
	   * There is a deadlock while migrating (such as using concurrent
	     indexes with a migration_lock)
	 To fix the first issue, run "mix ecto.create".
	 To address the second, you can run "mix ecto.drop" followed by
	 "mix ecto.create". Alternatively you may configure Ecto to use
	 another table and/or repository for managing migrations:
	     config :indie_paper, IndiePaper.Repo,
	       migration_source: "some_other_table_for_schema_migrations",
	       migration_repo: AnotherRepoForSchemaMigrations
	 The full error report is shown below.
	 ** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2976ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
	   1. Ensuring your database is available and that you can connect to it
	   2. Tracking down slow queries and making sure they are running fast enough
	   3. Increasing the pool_size (albeit it increases resource consumption)
	   4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
	 See DBConnection.start_link/2 for more information
	     (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:756: Ecto.Adapters.SQL.raise_sql_call_error/1
	     (elixir 1.12.1) lib/enum.ex:1553: Enum."-map/2-lists^map/1-0-"/2
	     (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:844: Ecto.Adapters.SQL.execute_ddl/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:645: Ecto.Migrator.verbose_schema_migration/3
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:473: Ecto.Migrator.lock_for_migrations/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:388: Ecto.Migrator.run/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:146: Ecto.Migrator.with_repo/3
	     (indie_paper 0.1.0) lib/indie_paper/release.ex:12: anonymous fn/2 in IndiePaper.Release.migrate/0
	 Main child exited normally with code: 1
	 Reaped child process with pid: 563 and signal: SIGUSR1, core dumped? false
	 Starting clean up.

Error Release command failed, deployment aborted

Here is my fly.toml

# fly.toml file generated for indiepaper-dev on 2021-09-02T14:43:54+05:30
kill_signal = "SIGTERM"
kill_timeout = 5

[env]

[deploy]
  release_command = "/app/bin/indie_paper eval IndiePaper.Release.migrate"

[[services]]
  internal_port = 4000
  protocol = "tcp"

  [services.concurrency]
    hard_limit = 25
    soft_limit = 20

  [[services.ports]]
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "30s" # allow some time for startup
    interval = "15s"
    restart_limit = 6
    timeout = "2s"

Here is my development github action

name: Deploy develop branch to indiepaper-development on fly
on:
  push:
    branches:
      - develop
env:
  FLY_API_TOKEN: ${{ secrets.FLY_API_TOKEN }}
jobs:
  deploy:
    name: Deploy Development App
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: superfly/flyctl-actions@master
        with:
          args: "deploy -a indiepaper-development --remote-only"

Here is a screenshot of failed intermittent deploys. They have all failed due to the above issue, some succeed though.

This issue is prevalent for my development and production apps.

These were release command failures connecting to the DB. What is your production DB name? I checked the development one, updated it to a new nearby consul, and it should be more stable now.

The production db name is indiepaper-production-db.

Ok that DB is updated with a new Consul service. It should be back and happy now. We have some fixes coming tomorrow and Monday that will also help with stability, but this might just fix you up.

Thanks for fixing it, the cluster does seem more stable now. I felt bad posting this support ticket, I knew you were swamped with the recent DB fixes and all, and I have been posting a lot of them along the week (this was launch week of my company). Fly was the smoothest release experience I ever had with any platform, that’s why I love to stay even though you explicitly said Postgres is in beta. You have done a great job with the platform :rocket:

The issue seems to have come back. The database name is indiepaper-production-db

This issue has surfaced again. My postgres database is indiepaper-production-db.

Command: /app/bin/indie_paper eval IndiePaper.Release.migrate
	 Starting instance
	 Configuring virtual machine
	 Pulling container image
	 Unpacking image
	 Preparing kernel init
	 Configuring firecracker
	 Starting virtual machine
	 Starting init (commit: 50ffe20)...
	 Preparing to run: `/app/bin/indie_paper eval IndiePaper.Release.migrate` as nobody
	 2021/10/26 08:41:14 listening on [fdaa:0:3565:a7b:21a1:7179:7b9d:2]:22 (DNS: [fdaa::3]:53)
	 Reaped child process with pid: 561 and signal: SIGUSR1, core dumped? false
Error: :18.295 [error] Could not create schema migrations table. This error usually happens due to the following:
	   * The database does not exist
	   * The "schema_migrations" table, which Ecto uses for managing
	     migrations, was defined by another library
	   * There is a deadlock while migrating (such as using concurrent
	     indexes with a migration_lock)
	 To fix the first issue, run "mix ecto.create".
	 To address the second, you can run "mix ecto.drop" followed by
	 "mix ecto.create". Alternatively you may configure Ecto to use
	 another table and/or repository for managing migrations:
	     config :indie_paper, IndiePaper.Repo,
	       migration_source: "some_other_table_for_schema_migrations",
	       migration_repo: AnotherRepoForSchemaMigrations
	 The full error report is shown below.
	 ** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2977ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
	   1. Ensuring your database is available and that you can connect to it
	   2. Tracking down slow queries and making sure they are running fast enough
	   3. Increasing the pool_size (albeit it increases resource consumption)
	   4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
	 See DBConnection.start_link/2 for more information
	     (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:756: Ecto.Adapters.SQL.raise_sql_call_error/1
	     (elixir 1.12.1) lib/enum.ex:1553: Enum."-map/2-lists^map/1-0-"/2
	     (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:844: Ecto.Adapters.SQL.execute_ddl/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:645: Ecto.Migrator.verbose_schema_migration/3
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:473: Ecto.Migrator.lock_for_migrations/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:388: Ecto.Migrator.run/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:146: Ecto.Migrator.with_repo/3
	     (indie_paper 0.1.0) lib/indie_paper/release.ex:12: anonymous fn/2 in IndiePaper.Release.migrate/0
	 Main child exited normally with code: 1
	 Reaped child process with pid: 563 and signal: SIGUSR1, core dumped? false
	 Starting clean up.

Error Release command failed, deployment aborted

I have separated out my development and production environments into two different accounts so I don’t mess up production DB by locally running some commands. So all my deployments get auto-triggered after a push to the master branch. Failing in deployment kind of erodes that trust and flow of pushing via Github. I have to manually check and verify if the deployment went in the right direction.

The development version indiepaper-development actually went through without errors, that is the same code that fails on indiepaper-production-db .Please fix it fast.

Looking into it… we’re working on improving reliability here, will post an update once I have more info.

1 Like

exactly the same problem I’ve been having all day, had to deploy a version for a presentation and couldn’t :grimacing:

Yikes. That sucks, sorry to hear that. Inside Fly these problems are just some package or the other acting up under load, but to our customers they’re real-life problems that are often personal and irritating. I don’t have a quick answer, but the good news is that this should get better the more normal and edge cases we fix.

2 Likes

All I can say is that I look forward to your business support plans :slight_smile:. (and also the platform becomes more stable, I’d prefer to use you guys instead of the defacto which always ends up being AWS)

3 Likes

Has there been any movement with this? Went to deploy this morning and still having it complain about there being a DB missing etc.

Command: /app/bin/app eval App.Release.migrate_all
	Starting instance
	Configuring virtual machine
	Pulling container image
	Unpacking image
	Preparing kernel init
	Configuring firecracker
	Starting virtual machine
	Starting init (commit: 50ffe20)...
	Preparing to run: `/app/bin/app eval App.Release.migrate_all` as nobody
	2021/10/26 19:59:01 listening on [fdaa:0:35d4:a7b:2984:821:7a9b:2]:22 (DNS: [fdaa::3]:53)
	Reaped child process with pid: 561 and signal: SIGUSR1, core dumped? false
	19:59:05.678 [error] Could not create schema migrations table. This error usually happens due to the following:
	  * The database does not exist
	  * The "schema_migrations" table, which Ecto uses for managing
	    migrations, was defined by another library
   ....

We’ve handled the original issue that caused this error a few weeks ago, so it’s odd that it’s still showing up on your app. Could you confirm that the connection limits are not being hit on your DB instance? Or if there are any other errors in the DB logs?

Also, is this the same application where you noticed this error? deployment issues in SYD region

correct.

As for connection limits nope, there is only one machine connected to the DB, I’ve logged into the DB to check for locks etc etc and it’s all fine. So I’m confused.

Have any changes or overrides been made to the DNS resolvers in the app?

nope, it’s a standard Elixir app using the Dockefile posted in your guides.

I’m replying in the other thread deployment issues in SYD region to limit the spread of answers if that is okay?

Yeah, let’s move this there, I’ll link the DB errors.

Will you try this again and also make sure your database hasn’t reached a connection limit and the DB logs aren’t showing any errors? That particular Elixir error is not super helpful, but it’s probably not an issue on our end (this time).

Hey there! This started to happen to me as well today, around 3 hours ago.
The issue seems to be the same where migrations cannot be run due to connection error, while the deployed app is working fine :thinking: