Improving flyctl reliability

Today we released flyctl v0.0.416, and it includes several reliability improvements. :tada:

Update 1: fly postgres ... commands no longer rely solely on .internal dns name lookups.

Before today, running a command like fly postgres attach ... would connect to the postgres cluster using its app_name.internal domain name, and then send commands to provision a db and user (and some other things). Our .internal dns system has experienced some increasing lag, which became a lot more noticeable this week resulting in issues like this fly postgres attach - no such host.

Now, flyctl will concurrently query our graqphql api for 6pn ip addresses associated with an app. For fly pg ... commands, we then select the ip of the leader and connect directly to that instead of using the app_name.internal name.

Note that we are continuing to improve our internal dns system and reduce its latency—that’s still important! While we continue to ship improvements to internal dns, this change ensures flyctl will be a bit more reliable.

Update 2: flyctl sends a heartbeat to remote builders to avoid timeouts during longer builds.

Builds that involve large docker images, in particular some of the buildpack images, in some cases could take more than 10 minutes to download and untar the image. Previously, the remote builder would incorrectly consider the build stalled for >10min and automatically suspend itself. Now, flyctl will send a heartbeat to the remote builder every 30 seconds while the build is ongoing, which will extend the deadline timeout of the remote builder by another 10 minutes.

It’s incredibly frustrating when a 10+ minute build fails and needs to be restarted, and now that won’t happen!

This does require the latest version of remote builders. The easiest way to ensure the latest remote builder software is running is to destroy the current remote builder app (e.g., use fly apps destroy <name-of-your-remote-builder-app>). A new remote builder with the latest software will automatically be created the next time a remote builder is needed.

Update 3: fly launch generates a .dockerignore if one does not exist uses docker to build images, which are downloaded and transmogrified into firecracker vms. Building a docker image requires a Dockerfile and a context. The context is a set of files and folders, which are archived from the current system and streamed, as an archive, to the docker builder. When using a Fly remote builder the context is streamed over the network from the system running flyctl to the builder.

When a .dockerignore file is present, docker will not send files listed in .dockerignore as part of the build context. This can dramatically reduce the size of build contexts. For example, if the docker build is going to download and install composer and nodejs dependencies we can add vendor/ and node_modules/ directories to .dockerignore to avoid sending them. That will make docker contexts smaller and builds will be faster!

Now, when you run fly launch flyctl will create a .dockerignore file if one does not exist. flyctl uses existing .gitignore files to create the .dockerignore file, which will be a safe starting point for most projects. For mediocre or slower internet connections, this can save minutes of time sending build context to the remote builder. On networks with higher latency+packet loss, smaller build contexts can be the difference between remote builds working or failing.

The .dockerignore file is only generated during fly launch today. We may add the functionality to fly deploy in the future. Let us know what you think!


A number of scanners already produce .dockerignore files:

% find scanner/templates -name .dockerignore

This leads to a confusing message:

% fly launch
Creating app in /Users/rubys/tmp/foo
Scanning source code
Detected a Rails app
Found .dockerignore file. Will use when deploying to Fly.
? App Name (leave blank to use an auto-generated name): 

Perhaps the message that a .dockerignore file was found could be omitted if no such file was present prior to the running of the scanner, but is present now?

1 Like

Oh yeah, that is confusing and not useful. I’ll turn that into a debug msg so it only shows up when folks are running with LOG_LEVEL=debug.

1 Like

flyctl 0.0.417 is released and fixes that message.

Just chiming in to say it’s delightful to see a focus on reliability, especially around deploys. Keep it up!