fly machine run is no longer executing CMD and machine stays stuck

My work item machines are no longer executing their command and exiting. They were working yesterday afternoon. However, even the most simple container setup is causing the same problem.

I setup the simplest reproduction here:

FROM node:20-alpine AS base
CMD echo hello world

Script to build and run the container:

#!/bin/bash
set -e

fly apps create --machines --name machinebug

APP_NAME="machinebug"
IMAGE="registry.fly.io/${APP_NAME}"

docker build --no-cache -t ${IMAGE} -f ./Dockerfile .

docker push ${IMAGE}

fly machine run \
    --detach \
    --region iad \
    --vm-memory 512 \
    --rm \
    --restart no \
    --app ${APP_NAME} \
    ${IMAGE}

The above should run, exit. and print hello world. Instead, the machine hangs forever. “hello world” is never printed. The logs show the following:

$ fly logs -a machinebug

Waiting for logs...

2023-10-31T12:40:32.542 runner[48ed591c7e2598] iad [info] Pulling container image registry.fly.io/machinebug:latest

2023-10-31T12:40:37.098 runner[48ed591c7e2598] iad [info] Successfully prepared image registry.fly.io/machinebug:latest (4.556266841s)

2023-10-31T12:40:37.574 runner[48ed591c7e2598] iad [info] Configuring firecracker

2023-10-31T12:40:37.746 app[48ed591c7e2598] iad [info] [ 0.040330] PCI: Fatal: No config space access function found

2023-10-31T12:40:37.975 app[48ed591c7e2598] iad [info] INFO Starting init (commit: 15238e9)...

2023-10-31T12:40:37.993 app[48ed591c7e2598] iad [info] INFO Preparing to run: `/bin/sleep inf` as root

2023-10-31T12:40:37.998 app[48ed591c7e2598] iad [info] INFO [fly api proxy] listening at /.fly/api

2023-10-31T12:40:38.004 app[48ed591c7e2598] iad [info] 2023/10/31 12:40:38 listening on [fdaa:3:6b8e:a7b:fd:13ae:ea93:2]:22 (DNS: [fdaa::3]:53)

Hi, does this work if you try a different region?

  • Daniel

Just tried in ord and no luck. Logs at bottom

Just as an FYI. I’m going off scattered docs here. The --machines command line option isn’t even normally listed.

fly apps create --machines

However, I have not found any other way using only apps with toml to run one-off stateless tasks.

2023-10-31T14:29:28.323 runner[4d896d2fe04348] ord [info] Pulling container image registry.fly.io/machinebug:latest

2023-10-31T14:29:29.019 runner[4d896d2fe04348] ord [info] Successfully prepared image registry.fly.io/machinebug:latest (696.705697ms)

2023-10-31T14:29:29.857 app[4d896d2fe04348] ord [info] [ 0.045004] PCI: Fatal: No config space access function found

2023-10-31T14:29:30.111 app[4d896d2fe04348] ord [info] INFO Starting init (commit: 15238e9)...

2023-10-31T14:29:30.141 app[4d896d2fe04348] ord [info] INFO Preparing to run: `/bin/sleep inf` as root

2023-10-31T14:29:30.153 app[4d896d2fe04348] ord [info] INFO [fly api proxy] listening at /.fly/api

2023-10-31T14:29:30.161 app[4d896d2fe04348] ord [info] 2023/10/31 14:29:30 listening on [fdaa:3:6b8e:a7b:9ada:dac6:f0c5:2]:22 (DNS: [fdaa::3]:53)

This works well for me with your scripts:

2023-10-31T14:51:14Z runner[d8d9765ae94128] iad [info]Pulling container image registry.fly.io/mabu459:latest
2023-10-31T14:51:14Z runner[d8d9765ae94128] iad [info]Successfully prepared image registry.fly.io/mabu459:latest (103.993375ms)
2023-10-31T14:51:15Z runner[d8d9765ae94128] iad [info]Configuring firecracker
2023-10-31T14:51:15Z app[d8d9765ae94128] iad [info][    0.043079] PCI: Fatal: No config space access function found
2023-10-31T14:51:15Z app[d8d9765ae94128] iad [info] INFO Starting init (commit: 15238e9)...
2023-10-31T14:51:15Z app[d8d9765ae94128] iad [info] INFO Preparing to run: `docker-entrypoint.sh /bin/sh -c echo hello world` as root
2023-10-31T14:51:15Z app[d8d9765ae94128] iad [info] INFO [fly api proxy] listening at /.fly/api
2023-10-31T14:51:15Z app[d8d9765ae94128] iad [info]2023/10/31 14:51:15 listening on [fdaa:2:7d1e:a7b:fd:d81e:576b:2]:22 (DNS: [fdaa::3]:53)
2023-10-31T14:51:15Z app[d8d9765ae94128] iad [info]hello world
2023-10-31T14:51:16Z app[d8d9765ae94128] iad [info] INFO Main child exited normally with code: 0
2023-10-31T14:51:16Z app[d8d9765ae94128] iad [info] INFO Starting clean up.
2023-10-31T14:51:16Z app[d8d9765ae94128] iad [info] WARN hallpass exited, pid: 306, status: signal: 15 (SIGTERM)
2023-10-31T14:51:16Z app[d8d9765ae94128] iad [info]2023/10/31 14:51:16 listening on [fdaa:2:7d1e:a7b:fd:d81e:576b:2]:22 (DNS: [fdaa::3]:53)
2023-10-31T14:51:17Z app[d8d9765ae94128] iad [info][    2.323358] reboot: Restarting system
2023-10-31T14:51:17Z runner[d8d9765ae94128] iad [info]machine restart policy set to 'no', not restarting

Can you run fly machines list -a machinebug while the machine seems to be stuck? (i.e. start mashing on that command once you run your script in another terminal).

Thanks!

Okay, perhaps there is something wrong with my account? I have tried deleting and recreating the app as well. Perhaps try slightly changing the docker file and it multiple times?

Here’s the output of me running fly machines list -a machinebug >> out.txt in quick succession:

1 machines have been retrieved from app machinebug.
View them in the UI here (​https://fly.io/apps/machinebug/machines/)

e[1mmachinebuge[0m
ID            	NAME         	STATE  	REGION	IMAGE            	IP ADDRESS                      	VOLUME	CREATED             	LAST UPDATED        	APP PLATFORM	PROCESS GROUP	SIZE                
148e450f063528	cool-sun-8060	created	ord   	machinebug:latest	fdaa:3:6b8e:a7b:9ad9:a1ac:4e43:2	      	2023-10-31T14:57:37Z	2023-10-31T14:57:37Z	            	             	shared-cpu-1x:512MB	

1 machines have been retrieved from app machinebug.
View them in the UI here (​https://fly.io/apps/machinebug/machines/)

e[1mmachinebuge[0m
ID            	NAME         	STATE  	REGION	IMAGE            	IP ADDRESS                      	VOLUME	CREATED             	LAST UPDATED        	APP PLATFORM	PROCESS GROUP	SIZE                
148e450f063528	cool-sun-8060	created	ord   	machinebug:latest	fdaa:3:6b8e:a7b:9ad9:a1ac:4e43:2	      	2023-10-31T14:57:37Z	2023-10-31T14:57:37Z	            	             	shared-cpu-1x:512MB	

1 machines have been retrieved from app machinebug.
View them in the UI here (​https://fly.io/apps/machinebug/machines/)

e[1mmachinebuge[0m
ID            	NAME         	STATE  	REGION	IMAGE            	IP ADDRESS                      	VOLUME	CREATED             	LAST UPDATED        	APP PLATFORM	PROCESS GROUP	SIZE                
148e450f063528	cool-sun-8060	started	ord   	machinebug:latest	fdaa:3:6b8e:a7b:9ad9:a1ac:4e43:2	      	2023-10-31T14:57:37Z	2023-10-31T14:57:45Z	            	             	shared-cpu-1x:512MB	

1 machines have been retrieved from app machinebug.
View them in the UI here (​https://fly.io/apps/machinebug/machines/)

e[1mmachinebuge[0m
ID            	NAME         	STATE  	REGION	IMAGE            	IP ADDRESS                      	VOLUME	CREATED             	LAST UPDATED        	APP PLATFORM	PROCESS GROUP	SIZE                
148e450f063528	cool-sun-8060	started	ord   	machinebug:latest	fdaa:3:6b8e:a7b:9ad9:a1ac:4e43:2	      	2023-10-31T14:57:37Z	2023-10-31T14:57:45Z	            	             	shared-cpu-1x:512MB	

1 machines have been retrieved from app machinebug.
View them in the UI here (​https://fly.io/apps/machinebug/machines/)

e[1mmachinebuge[0m
ID            	NAME         	STATE  	REGION	IMAGE            	IP ADDRESS                      	VOLUME	CREATED             	LAST UPDATED        	APP PLATFORM	PROCESS GROUP	SIZE                
148e450f063528	cool-sun-8060	started	ord   	machinebug:latest	fdaa:3:6b8e:a7b:9ad9:a1ac:4e43:2	      	2023-10-31T14:57:37Z	2023-10-31T14:57:45Z	            	             	shared-cpu-1x:512MB	


Note: the machines stay stuck forever. The only way I can get rid of them is with fly machine stop

I also tried in my other org and it is the same behavior.

Hi @akutruff—I’m able to reproduce this. I think it may be a bug introduced in flyctl v0.1.113. I’m looking into it now.

In the meantime, if you’d like, then you can try temporarily downgrading to v0.1.112. If you installed flyctl via the online install.sh script, then I believe that the following command should do it:

curl -L https://fly.io/install.sh | sh -s -- v0.1.112

(If you try downgrading, then please let me know if it works!)

That shell command to downgrade does not appear to work… Running fly auth login and then fly auth docker appears to upgrade the command line silently behind the scenes:

As you can see below, I had fly version v0.1.112 but at the bottom it is v0.1.114.

node ➜ /workspaces/gadget (main) $ fly version
flyctl v0.1.112 linux/amd64 Commit: fb160d4fd5fd653cd1afe725121230b6cba46cff BuildDate: 2023-10-23T13:52:25Z
node ➜ /workspaces/gadget (main) $ cd app/repro/
node ➜ /workspaces/gadget/app/repro (main) $ ls
Dockerfile  out.txt  show-bug.sh
node ➜ /workspaces/gadget/app/repro (main) $ fly machine list -a manchinebug-again
Error: No access token available. Please login with 'flyctl auth login'
node ➜ /workspaces/gadget/app/repro (main) $ fly auth login
failed opening browser. Copy the url (https://fly.io/app/auth/cli/3c84b0d9c91db0f539358c8a7a6e83ec) into a browser and continue
Opening https://fly.io/app/auth/cli/3c84b0d9c91db0f539358c8a7a6e83ec ...

Waiting for session... Done
successfully logged in as andy.kutruff@gmail.com
node ➜ /workspaces/gadget/app/repro (main) $ fly auth docker
Authentication successful. You can now tag and push images to registry.fly.io/{your-app}
node ➜ /workspaces/gadget/app/repro (main) $ fly version
flyctl v0.1.114 linux/amd64 Commit: 93001806ee467ed760df247337b31fa0d585d5b6 BuildDate: 2023-10-30T21:40:39Z
node ➜ /workspaces/gadget/app/repro (main) $ 

This is happening to me as well. The machine is running /bin/sleep inf instead of the command passed to flyctl

@akutruff sorry about that. I suspect that it auto-updated itself right back to v0.1.114.

In any case, we just released v0.1.115, which should address this issue. If flyctl does not auto-update itself, then fly version upgrade should get it for you. If that doesn’t fix it, then please let us know!

1 Like

The command is now running, however, twice now the run command hung when trying to start a new machine

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.