deployment issues in SYD region

flyio3 · October 26, 2021, 5:02am

I’m having issues updating an existing application in the SYD region, my deployments keep failing with 503 error codes, and on the odd chance I don’t get a 503, my migrations fail to run, it seems as though the DATABASE_URL isn’t being injected, but I am unable to verify this properly due to the 503’s. I’ve also been getting 500’s when trying to attach / detach the pg cluster from my application.

UPDATE: I’m also getting this error on deployments that make it through the process.
File operation error: eacces. Target: /etc/resolv.conf.

sudhir.j · October 26, 2021, 7:21am

Thanks for letting us know, we’ll check this out. If you could post the app name we might be able to help quicker as well.

flyio3 · October 26, 2021, 8:04pm

I’d rather not share my app name publically :tinfoil_hat:, its the only app hosted in my personal organisation, if you are able to see that.

kurt · October 26, 2021, 8:21pm

What kind of app is this? This errors seems like it’s coming from the app itself:

File operation error: eacces. Target: /etc/resolv.conf.

The Registry / API 503s should be much improved since last night. Can you post more error log context?

flyio3 · October 26, 2021, 8:26pm

A standard Elixir app, I think the /etc/resolv.conf may actually be coming from Alpine itself so it could be an unrelated error. I’m essentially using the same Dockerfile as posted on your guides, and it was working when I last deployed 12 days ago.

sudhir.j · October 26, 2021, 8:31pm

The release phase also seems to have trouble connecting to the DB, as described here Deployment fails here and there - #12 by flyio3

@flyio3 has also confirmed that connection counts are within limits, and there are no errors on the DB side.

flyio3 · October 26, 2021, 8:33pm

and there are no errors on the DB side

sudhir.j · October 26, 2021, 8:35pm

@flyio3 During the release phase when you get an error when trying to run the migration, the bottom portion will have a little more info. Could you post that as well? Looks a bit like this:

The full error report is shown below.
	 ** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2977ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
	   1. Ensuring your database is available and that you can connect to it
	   2. Tracking down slow queries and making sure they are running fast enough
	   3. Increasing the pool_size (albeit it increases resource consumption)
	   4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
	 See DBConnection.start_link/2 for more information
	     (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:756: Ecto.Adapters.SQL.raise_sql_call_error/1
	     (elixir 1.12.1) lib/enum.ex:1553: Enum."-map/2-lists^map/1-0-"/2
	     (ecto_sql 3.7.0) lib/ecto/adapters/sql.ex:844: Ecto.Adapters.SQL.execute_ddl/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:645: Ecto.Migrator.verbose_schema_migration/3
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:473: Ecto.Migrator.lock_for_migrations/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:388: Ecto.Migrator.run/4
	     (ecto_sql 3.7.0) lib/ecto/migrator.ex:146: Ecto.Migrator.with_repo/3
	     (indie_paper 0.1.0) lib/indie_paper/release.ex:12: anonymous fn/2 in IndiePaper.Release.migrate/0
	 Main child exited normally with code: 1
	 Reaped child process with pid: 563 and signal: SIGUSR1, core dumped? false
	 Starting clean up.

Error Release command failed, deployment aborted

flyio3 · October 26, 2021, 8:39pm

You can detach the terminal anytime without stopping the deployment
==> Release command
Command: /app/bin/foo eval foo.Release.migrate_all
	Starting instance
	Configuring virtual machine
	Pulling container image
	Unpacking image
	Configuring firecracker
	Starting virtual machine
	Preparing to run: `/app/bin/foo eval foo.Release.migrate_all` as nobody
	2021/10/26 20:37:58 listening on [fdaa:0:35d4:a7b:2983:64fb:c1fd:2]:22 (DNS: [fdaa::3]:53)
	Reaped child process with pid: 561 and signal: SIGUSR1, core dumped? false
	20:38:02.310 [error] Could not create schema migrations table. This error usually happens due to the following:
	  * The database does not exist
	  * The "schema_migrations" table, which Ecto uses for managing
	    migrations, was defined by another library
	  * There is a deadlock while migrating (such as using concurrent
	    indexes with a migration_lock)
	To fix the first issue, run "mix ecto.create".
	To address the second, you can run "mix ecto.drop" followed by
	"mix ecto.create". Alternatively you may configure Ecto to use
	another table and/or repository for managing migrations:
	    config :foo, foo.Repo,
	      migration_source: "some_other_table_for_schema_migrations",
	      migration_repo: AnotherRepoForSchemaMigrations
	The full error report is shown below.
	** (DBConnection.ConnectionError) connection not available and request was dropped from queue after 2977ms. This means requests are coming in and your connection pool cannot serve them fast enough. You can address this by:
	  1. Ensuring your database is available and that you can connect to it
	  2. Tracking down slow queries and making sure they are running fast enough
	  3. Increasing the pool_size (although this increases resource consumption)
	  4. Allowing requests to wait longer by increasing :queue_target and :queue_interval
	See DBConnection.start_link/2 for more information
	    (ecto_sql 3.7.1) lib/ecto/adapters/sql.ex:760: Ecto.Adapters.SQL.raise_sql_call_error/1
	    (elixir 1.12.1) lib/enum.ex:1553: Enum."-map/2-lists^map/1-0-"/2
	    (ecto_sql 3.7.1) lib/ecto/adapters/sql.ex:852: Ecto.Adapters.SQL.execute_ddl/4
	    (ecto_sql 3.7.1) lib/ecto/migrator.ex:678: Ecto.Migrator.verbose_schema_migration/3
	    (ecto_sql 3.7.1) lib/ecto/migrator.ex:504: Ecto.Migrator.lock_for_migrations/4
	    (ecto_sql 3.7.1) lib/ecto/migrator.ex:419: Ecto.Migrator.run/4
	    (ecto_sql 3.7.1) lib/ecto/migrator.ex:146: Ecto.Migrator.with_repo/3
	    (foo 0.1.0) lib/release.ex:10: anonymous fn/2 in foo.Release.migrate/0
	Main child exited normally with code: 1
	Reaped child process with pid: 563 and signal: SIGUSR1, core dumped? false
	Starting clean up.

Error Release command failed, deployment aborted

sudhir.j · October 26, 2021, 8:49pm

Thanks. The error seems specific to this application, so I’m thinking based on this and the /etc/resolv.conf error is that DNS is somehow malfunctioning.

If you do a console session on the app, are you able to access the database from there? Can echo the $DATABASE_URL and see if you can connect to it directly or via psql?

flyio3 · October 26, 2021, 8:51pm

I can (I tested that yesterday), but I can only open a console on the currently running application right? should I use the remote builder and try login there?

sudhir.j · October 26, 2021, 8:53pm

I assumed there was an application already running? The console session will use whatever image is currently deployed and running.

flyio3 · October 26, 2021, 8:54pm

yes there is, and it’s working correctly, i.e: I can console in, and connect to the DB from there.

doing a deploy using the remote builder now

sudhir.j · October 26, 2021, 8:55pm

Thanks, let me think about this and talk to the team, then. The only possibility I can think of then is that the new image you’re trying to deploy is unable to use resolve DNS somehow, but can’t think of why.

flyio3 · October 26, 2021, 8:59pm

Ok, it worked using the remote builder, so the capacity issues from last night are resolved, strangely last night I was getting the /etc/resolv.conf error on the remote builder only, and when building locally I was missing that error and getting direct to the error I posted above.
The only difference I can think of between the two is that the remote builder uses docker, and my local builder uses podman (which is a big difference in itself now that I say it here), I guess I had bad timing yesterday as there were also the scaling issues kurt mentioned thrown in there.

flyio3 · October 26, 2021, 9:06pm

Now to figure out why my local builds don’t work, they are half the size of the remote builds and probably cost me less also .

sudhir.j · October 26, 2021, 9:08pm

let me know if you figure out what’s different. Why would they be half the size, though? Does podman do something special with compression or something?

flyio3 · October 26, 2021, 9:16pm

probably some OCI image format thing.

flyio3 · October 26, 2021, 9:20pm

are you guys bind mounting /etc/resolv.conf onto the containers?

michael · October 26, 2021, 11:09pm

Yeah, our init creates /etc/resolv.conf.