Creating more than one machine on an app fails

I’m seeing this spamming in logs now

runner[06e82d55cde987] iad [info] Pulling container image
runner[06e82d55cde987] iad [info] Unpacking image
runner[39080e11fd0587] iad [info] Pulling container image
runner[39080e11fd0587] iad [info] Unpacking image
runner[73287411ce1085] iad [info] Pulling container image
runner[73287411ce1085] iad [info] Unpacking image
runner[9e784e33f3d483] iad [info] Pulling container image
runner[9e784e33f3d483] iad [info] Unpacking image
runner[73d8d9ddb71589] iad [info] Pulling container image
runner[73d8d9ddb71589] iad [info] Unpacking image
runner[06e82d55cde987] iad [info] Pulling container image
runner[06e82d55cde987] iad [info] Unpacking image
runner[39080e11fd0587] iad [info] Pulling container image
runner[39080e11fd0587] iad [info] Unpacking image
runner[73287411ce1085] iad [info] Pulling container image
runner[73287411ce1085] iad [info] Unpacking image
runner[73d8d9ddb71589] iad [info] Pulling container image
runner[73d8d9ddb71589] iad [info] Unpacking image
runner[06e82d55cde987] iad [info] Pulling container image
runner[06e82d55cde987] iad [info] Unpacking image
runner[9e784e33f3d483] iad [info] Pulling container image
runner[9e784e33f3d483] iad [info] Unpacking image

06e82d55cde987 doesn’t even seem to… exist??

Are those timestamps identical? 06e82d55cde987 is a machine on a now deleted app with the same name. This shouldn’t affect how machines run, but I’m curious how that got attached to a deleted app.

The repeated image pulls might be Docker Hub rate limiting us. Is this a public image you’re using, or did you push it to our repository? Our repository is going to be way more reliable.

It’s meant to be a public image, so I’m hosting it on Docker Hub.
Is there a way to use the Fly registry to host images that anyone on Fly can pull?

Some logs with timestamps:

2022-07-05T21:13:10.181 runner[39080e11fd0587] iad [info] Pulling container image

2022-07-05T21:13:10.242 runner[39080e11fd0587] iad [info] Unpacking image

2022-07-05T21:13:11.024 runner[06e82d55cde987] iad [info] Pulling container image

2022-07-05T21:13:11.030 runner[73d8d9ddb71589] iad [info] Pulling container image

2022-07-05T21:13:11.114 runner[06e82d55cde987] iad [info] Unpacking image

2022-07-05T21:13:11.116 runner[73d8d9ddb71589] iad [info] Unpacking image

2022-07-05T21:13:11.943 runner[9e784e33f3d483] iad [info] Pulling container image

2022-07-05T21:13:12.028 runner[9e784e33f3d483] iad [info] Unpacking image

2022-07-05T21:13:12.365 runner[73287411ce1085] iad [info] Pulling container image

2022-07-05T21:13:12.390 runner[39080e11fd0587] iad [info] Pulling container image

2022-07-05T21:13:12.430 runner[73287411ce1085] iad [info] Unpacking image

2022-07-05T21:13:12.475 runner[39080e11fd0587] iad [info] Unpacking image

2022-07-05T21:13:12.714 runner[73d8d9ddb71589] iad [info] Pulling container image

2022-07-05T21:13:12.799 runner[73d8d9ddb71589] iad [info] Unpacking image

2022-07-05T21:13:13.771 runner[06e82d55cde987] iad [info] Pulling container image

2022-07-05T21:13:13.878 runner[06e82d55cde987] iad [info] Unpacking image

2022-07-05T21:13:14.451 runner[39080e11fd0587] iad [info] Pulling container image

2022-07-05T21:13:14.537 runner[39080e11fd0587] iad [info] Unpacking image

2022-07-05T21:13:14.973 runner[9e784e33f3d483] iad [info] Pulling container image

2022-07-05T21:13:15.032 runner[9e784e33f3d483] iad [info] Unpacking image

2022-07-05T21:13:15.316 runner[73287411ce1085] iad [info] Pulling container image

2022-07-05T21:13:15.336 runner[73d8d9ddb71589] iad [info] Pulling container image

2022-07-05T21:13:15.412 runner[73d8d9ddb71589] iad [info] Unpacking image

2022-07-05T21:13:15.426 runner[73287411ce1085] iad [info] Unpacking image

2022-07-05T21:13:16.053 runner[06e82d55cde987] iad [info] Pulling container image

2022-07-05T21:13:16.128 runner[06e82d55cde987] iad [info] Unpacking image

2022-07-05T21:13:16.923 runner[9e784e33f3d483] iad [info] Pulling container image

2022-07-05T21:13:17.237 runner[9e784e33f3d483] iad [info] Unpacking image

2022-07-05T21:13:17.364 runner[73287411ce1085] iad [info] Pulling container image

2022-07-05T21:13:17.442 runner[73287411ce1085] iad [info] Unpacking image

2022-07-05T21:13:17.451 runner[39080e11fd0587] iad [info] Pulling container image

2022-07-05T21:13:17.577 runner[39080e11fd0587] iad [info] Unpacking image

2022-07-05T21:13:18.557 runner[73d8d9ddb71589] iad [info] Pulling container image

2022-07-05T21:13:18.618 runner[9e784e33f3d483] iad [info] Pulling container image

2022-07-05T21:13:18.622 runner[73d8d9ddb71589] iad [info] Unpacking image

2022-07-05T21:13:18.680 runner[9e784e33f3d483] iad [info] Unpacking image

2022-07-05T21:13:19.115 app[24d896d1a76587] iad [info] 2022/07/05 21:13:19 v1.26.1-tb3687b707-gbcd70ce32 peers: 27568/42536

2022-07-05T21:13:19.115 app[24d896d1a76587] iad [info] 2022/07/05 21:13:19 [sspZy] - Receiving keepalive packet

2022-07-05T21:13:19.125 runner[06e82d55cde987] iad [info] Pulling container image

2022-07-05T21:13:19.191 runner[06e82d55cde987] iad [info] Unpacking image

2022-07-05T21:13:19.491 runner[73287411ce1085] iad [info] Pulling container image

2022-07-05T21:13:19.567 runner[73287411ce1085] iad [info] Unpacking image

2022-07-05T21:13:19.675 runner[39080e11fd0587] iad [info] Pulling container image

2022-07-05T21:13:19.752 runner[39080e11fd0587] iad [info] Unpacking image

2022-07-05T21:13:20.655 runner[9e784e33f3d483] iad [info] Pulling container image

2022-07-05T21:13:20.729 runner[9e784e33f3d483] iad [info] Unpacking image

2022-07-05T21:13:21.021 runner[73287411ce1085] iad [info] Pulling container image

2022-07-05T21:13:21.119 runner[73287411ce1085] iad [info] Unpacking image

2022-07-05T21:13:21.579 runner[73d8d9ddb71589] iad [info] Pulling container image

2022-07-05T21:13:21.774 runner[73d8d9ddb71589] iad [info] Unpacking image

2022-07-05T21:13:21.978 runner[06e82d55cde987] iad [info] Pulling container image

2022-07-05T21:13:22.061 runner[06e82d55cde987] iad [info] Unpacking image

2022-07-05T21:13:22.465 runner[39080e11fd0587] iad [info] Pulling container image

2022-07-05T21:13:22.532 runner[39080e11fd0587] iad [info] Unpacking image

Huh, it was in a repeated loop trying to pull. I think it might’ve just worked though?

Using Docker Hub is fine for something you need to share. It’s a problem if something goes awry and we have to pull a bunch, but you’ll notice that more while doing dev than users will I think.

How would I go about killing the machine on the deleted app? I thought they would be cleaned up automatically when the app was deleted.

It should have been, we’ll get it cleaned up and figure out what went wrong.

Changed my ENTRYPOINT to launch || echo $? && sleep 10 but nothing is logged and the machine just exits again :confused:

machine id: 73d8d9ddb70589

That also exited with code 139. Which is interesting. ENTRYPOINT can’t evaluate shell scripts, it has to run a binary. Which makes me think the “segfault” might be an error finding the executable it’s meant to run.

It’s also possible it didn’t pick up the new ENTRYPOINT if you used the same image tag. Maybe give it another try with a unique image tag and a real binary for entrypoint?

The full ENTRYPOINT was ENTRYPOINT [ "/bin/bash", "-c", "launch || echo $? && sleep 10" ]
I’ll try using a different tag.

New tag, same result

2022-07-05T21:32:59.144 runner[06e82d55cd3987] iad [info] Pulling container image
2022-07-05T21:32:59.161 runner[06e82d55cd3987] iad [info] Pulling image failed
2022-07-05T21:32:59.223 runner[06e82d55cd3987] iad [info] Pulling container image
2022-07-05T21:32:59.342 runner[06e82d55cd3987] iad [info] Unpacking image
2022-07-05T21:32:59.554 runner[06e82d55cd3987] iad [info] Configuring firecracker

Are these safe to mess with? I think we may be able to get some more interesting logs out of them.

Yeah defo, there’s nothing important running on these, they’re just testing machines for a tool I’m writing.
What’s on your mind?

I have a feeling there’s a bug in how we parse / use ENDPOINTS. I know for sure we’re not showing you all the logs you need, too.

What is the base for this image? I think it might have been failing because it was sometimes landing on hardware we run that has an older host kernel (5.6). It was likely working on 5.11. More context here: Segmentation fault no matter what i have in my Dockerfile

We’re probably a week away from getting the remaining host kernels upgraded. For now, you can try an older base image and see if it runs. We’re also going to get the error logs delivered more reliably. This would have taken about 10 min to diagnose. :slight_smile:

2 Likes

The base image is ubuntu:latest, I’ll try using ubuntu 20.04 instead and see how that goes

That would definitely cause the issue. Did it work with 20.04?

Yeah, that seems to have resolved it, the machines are booting up reliably now.

I’m confused as to how the host kernel affects whether a newer image can boot, I thought firecracker was KVM based, so the host kernel wasn’t shared to the guest?

Right. We are as confused as you are :confused: but this issue only happens on our older kernel hosts.

The linux KVM module changes a lot as new kernel versions are released. There’s likely an obscur bug breaking a more recent libc. At least that’s my current theory.

@nebula we have upgraded our hosts and it should now be possible to run the latest ubuntu images!

1 Like