Swap commands on /swapfile suddenly failing

27 minutes ago I pushed a commit that crashed on… seemingly random error. I have not made any changes to anything but the application code in weeks, so I was pretty surprised to see this in the GH Action that operates the deploys for me;

Run superfly/flyctl-actions@1.4
/usr/bin/docker run --name fdc5b24ee5ab08d25c4ecc9f44f1ba5d069d19_4c73c6 --label fdc5b2 --workdir /github/workspace --rm -e "FLY_API_TOKEN" -e "INPUT_ARGS" -e "HOME" -e "GITHUB_JOB" -e "GITHUB_REF" -e "GITHUB_SHA" -e "GITHUB_REPOSITORY" -e "GITHUB_REPOSITORY_OWNER" -e "GITHUB_REPOSITORY_OWNER_ID" -e "GITHUB_RUN_ID" -e "GITHUB_RUN_NUMBER" -e "GITHUB_RETENTION_DAYS" -e "GITHUB_RUN_ATTEMPT" -e "GITHUB_REPOSITORY_ID" -e "GITHUB_ACTOR_ID" -e "GITHUB_ACTOR" -e "GITHUB_TRIGGERING_ACTOR" -e "GITHUB_WORKFLOW" -e "GITHUB_HEAD_REF" -e "GITHUB_BASE_REF" -e "GITHUB_EVENT_NAME" -e "GITHUB_SERVER_URL" -e "GITHUB_API_URL" -e "GITHUB_GRAPHQL_URL" -e "GITHUB_REF_NAME" -e "GITHUB_REF_PROTECTED" -e "GITHUB_REF_TYPE" -e "GITHUB_WORKFLOW_REF" -e "GITHUB_WORKFLOW_SHA" -e "GITHUB_WORKSPACE" -e "GITHUB_ACTION" -e "GITHUB_EVENT_PATH" -e "GITHUB_ACTION_REPOSITORY" -e "GITHUB_ACTION_REF" -e "GITHUB_PATH" -e "GITHUB_ENV" -e "GITHUB_STEP_SUMMARY" -e "GITHUB_STATE" -e "GITHUB_OUTPUT" -e "RUNNER_OS" -e "RUNNER_ARCH" -e "RUNNER_NAME" -e "RUNNER_ENVIRONMENT" -e "RUNNER_TOOL_CACHE" -e "RUNNER_TEMP" -e "RUNNER_WORKSPACE" -e "ACTIONS_RUNTIME_URL" -e "ACTIONS_RUNTIME_TOKEN" -e "ACTIONS_CACHE_URL" -e "ACTIONS_RESULTS_URL" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/file_commands" -v "/home/runner/work/redacted/redacted":"/github/workspace" fdc5b2:4ee5ab08d25c4ecc9f44f1ba5d069d19 deploy --app redacted-d54e-staging --image registry.fly.io/redacted-d54e:dev-redacted --wait-timeout=300
==> Verifying app config
--> Verified app config
Validating /github/workspace/fly.toml
✓ Configuration is valid
==> Building image
Searching for image 'registry.fly.io/redacted-d54e:dev-redacted' remotely...
image found: img_wd57v55o8k78v38o

Watch your deployment at https://fly.io/apps/redacted-d54e-staging/monitoring

Updating existing machines in 'redacted-d54e-staging' with rolling strategy
> [1/2] Updating 328744e1c32328 [app]
> [1/2] Updating 328744e1c32328 [app]
✔ [1/2] Machine 328744e1c32328 [app] update succeeded
> [2/2] Updating 6e82dd74f04d38 [app]
> [2/2] Updating 6e82dd74f04d38 [app]
> [2/2] Waiting for 6e82dd74f04d38 [app] to have state: started
> [2/2] Machine 6e82dd74f04d38 [app] has state: started
> [2/2] Checking that 6e82dd74f04d38 [app] is up and running
Smoke checks for 6e82dd74f04d38 failed: the app appears to be crashing
Check its logs: here's the last lines below, or run 'fly logs -i 6e82dd74f04d38':
  HEAD / 200 - - 3.677 ms
✖ [2/2] Machine 6e82dd74f04d38 [app] update failed: smoke checks for 6e82dd74f04d38 failed: the app appears to be crashing
  GET /healthcheck 200 - - 14.075 ms
  Successfully prepared image registry.fly.io/redacted-d54e:dev-redacted (11.132264311s)
  HEAD / 200 - - 4.140 ms
  GET /healthcheck 200 - - 7.489 ms
  Configuring firecracker
   INFO Sending signal SIGINT to main child process w/ PID 305
   INFO Sending signal SIGTERM to main child process w/ PID 305
  HEAD / 200 - - 4.352 ms
  GET /healthcheck 200 - - 7.656 ms
   INFO Main child exited with signal (with signal 'SIGTERM', core dumped? false)
   INFO Starting clean up.
  [1703934.416666] reboot: Restarting system
  2024-07-09T08:27:55.634935721 [01J2B9NWC1C9EW8E3CWE7885WZ:main] Running Firecracker v1.7.0
  [    0.265559] PCI: Fatal: No config space access function found
   INFO Starting init (commit: ad092ccf)...
   INFO Preparing to run: `sh start.sh` as root
   INFO [fly api proxy] listening at /.fly/api
  2024/07/09 08:27:56 INFO SSH listening listen_address=[fdaa:0:6fee:a7b:e7:7c6d:ddde:2]:22 dns_server=[fdaa::3]:53
  + fallocate -l 512M /swapfile
  + chmod 0600 /swapfile
  + mkswap /swapfile
  Setting up swapspace version 1, size = 512 MiB (536866816 bytes)
  no label, UUID=763fc49c-58af-4772-bf94-1a82920c96a8
  + echo 10
  + swapon /swapfile
  Machine created and started in 20.981s
  WARNING: Setting up swap manually on your rootfs (not a volume) is not recommended. OverlayFS does not support swap files. Please consider using the Fly-provided swap configuration. Replacing swap file path with a location on the writeable drive in your machine: /.fly-upper-layer/swapfile
  + swapoff /swapfile
  swapoff: /swapfile: swapoff failed: Invalid argument
   INFO Main child exited normally with code: 4
   INFO Starting clean up.
   WARN could not unmount /rootfs: EINVAL: Invalid argument
  [    1.603321] reboot: Restarting system
  machine did not have a restart policy, defaulting to restart
  2024-07-09T08:27:57.897621966 [01J2B9NWC1C9EW8E3CWE7885WZ:main] Running Firecracker v1.7.0
  Starting machine
  [PM01] machines API returned an error: "machine ID 6e82dd74f04d38 lease currently held by info@cateandcompany.dev, expires at 2024-07-09T08:28:07Z"
  [    0.266220] PCI: Fatal: No config space access function found
   INFO Starting init (commit: ad092ccf)...
   INFO Preparing to run: `sh start.sh` as root
   INFO [fly api proxy] listening at /.fly/api
  2024/07/09 08:27:58 INFO SSH listening listen_address=[fdaa:0:6fee:a7b:e7:7c6d:ddde:2]:22 dns_server=[fdaa::3]:53
  + fallocate -l 512M /swapfile
  + chmod 0600 /swapfile
  + mkswap /swapfile
  mkswap: /swapfile: warning: wiping old swap signature.
  Setting up swapspace version 1, size = 512 MiB (536866816 bytes)
  no label, UUID=52dfe4f0-714c-4ee3-9e42-1aeae169310c
  + echo 10
  + swapon /swapfile
  WARNING: Setting up swap manually on your rootfs (not a volume) is not recommended. OverlayFS does not support swap files. Please consider using the Fly-provided swap configuration. Replacing swap file path with a location on the writeable drive in your machine: /.fly-upper-layer/swapfile
  + swapoff /swapfile
  swapoff: /swapfile: swapoff failed: Invalid argument
  Machine started in 896ms
  [PC01] instance refused connection. is your app listening on 0.0.0.0:8080? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)
   INFO Main child exited normally with code: 4
   INFO Starting clean up.
   WARN could not unmount /rootfs: EINVAL: Invalid argument
  [    1.601988] reboot: Restarting system
  machine did not have a restart policy, defaulting to restart
  2024-07-09T08:28:00.136383723 [01J2B9NWC1C9EW8E3CWE7885WZ:main] Running Firecracker v1.7.0
   INFO Starting init (commit: ad092ccf)...
  + fallocate -l 512M /swapfile
  WARNING: Setting up swap manually on your rootfs (not a volume) is not recommended. OverlayFS does not support swap files. Please consider using the Fly-provided swap configuration. Replacing swap file path with a location on the writeable drive in your machine: /.fly-upper-layer/swapfile
  + swapoff /swapfile
  swapoff: /swapfile: swapoff failed: Invalid argument
Error: smoke checks for 6e82dd74f04d38 failed: the app appears to be crashing

I have then… changed nothing and deployed twice. Succesfully. But I’m still “Suspended”.

Similar output in the live logs section:

What can I do?

I guess we’ve gone from “not recommended” to “not working”, as suggested by this warning in the output;

WARNING: Setting up swap manually on your rootfs (not a volume) is not recommended. OverlayFS does not support swap files. Please consider using the Fly-provided swap configuration. Replacing swap file path with a location on the writeable drive in your machine: /.fly-upper-layer/swapfile

The root cause of this issue is the fact that we “inherited” this config from a project boilerplate over a year ago, and did no changes to the infra. Replacing all /swapfile references with /.fly-upper-layer/swapfile did the trick.

If it hadn’t, I’d probably just removed all swap commands from my start.sh and used this option instead: Fly Launch configuration (fly.toml) · Fly Docs