Volume not yet available/mounted while executing entrypoint.sh

I’m trying to install my python packages on a volume because they’re large (i.e. almost larger than 8GB, which appears to be the current limit). To do this, I’m trying to use use an entrypoint.sh file for my docker image. The goal is to install them on start, before running the app, but (due to the volume persistent) they’ll be cached there in the future:

#!/bin/bash

whoami  # root at this point
mkdir /python_packages/cache
mkdir /python_packages/packages
/usr/bin/python3 -m pip install -U -r requirements.txt --target /python_packages/packages --cache-dir /python_packages/cache

chown nobody -R /python_packages/

export PYTHONPATH="${PYTHONPATH}:/python_packages/packages"

# run app as nobody
runuser -u nobody /app/bin/server

I’ve also created a volume with fly volumes create python_packages --region ewr --size 10. Output of fly volumes list:

ID                      STATE   NAME            SIZE    REGION  ZONE    ENCRYPTED       ATTACHED VM     CREATED AT   
vol_p4mwmpze39321jdr    created python_packages 10GB    ewr     732d    true            148eddd4f471e8  20 hours ago

However! During deploy it looks like the volume is not yet mounted when entrypoint.sh runs. In the logs I see this:

bos [info]mkdir: cannot create directory ‘/python_packages/cache’: No such file or directory
bos [info]mkdir: cannot create directory ‘/python_packages/packages’: No such file or directory

Strangely, the logs then continue to install the python packages:

...
bos [info]  Downloading confection-0.1.3-py3-none-any.whl (34 kB)
bos [info]Collecting click<9.0.0,>=7.1.1
bos [info]  Downloading click-8.1.7-py3-none-any.whl (97 kB)
bos [info]Collecting MarkupSafe>=2.1.1
bos [info]  Downloading MarkupSafe-2.1.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
bos [info]Building wheels for collected packages: spacy-universal-sentence-encoder, jax
bos [info]  Building wheel for spacy-universal-sentence-encoder (setup.py): started
bos [info]  Building wheel for spacy-universal-sentence-encoder (setup.py): finished with status 'done'
bos [info]  Created wheel for spacy-universal-sentence-encoder: filename=spacy_universal_sentence_encoder-0.4.6-py3-none-any.whl size=16551 sha256=41107463de037206e30a42ddb25d51a51eb589373a2febefeb309351e9db99cb
bos [info]  Stored in directory: /python_packages/cache/wheels/23/cf/0b/162118b8e7dac277d8bd91f17dec299c6210e25bdff1a53264

However, when I fly ssh console into the machine and run ls -l /python_packages:

root@148eddd4f471e8:/app# ls -l /python_packages/
total 16
drwx------ 2 root root 16384 Sep 10 00:02 lost+found

So, while I’m not sure why pip is happily installing packages anyway, clearly the volume is not available/mounted because A) mkdir fails with No such file or directory B) there are no python packages in the volume C) it’s still owned by root at the end.

One thing I noticed is that my app is in ewr, but the deploy ran from a bos machine. I’m not really sure what, if anything, that means, but it seemed a bit suspicious.

Any help would be greatly appreciated!

Your app has 2 machines running, one in ewr and one in bos, but you only have 1 volume. Volumes can only be attached to one machine at a time. Can you try:

fly scale count 1

Yeah I see those two machines running on the web console:

But when I run fly scale count 1 I’m told:

App already scaled to desired state. No need for changes

And fly status shows:

Machines
PROCESS ID              VERSION REGION  STATE   ROLE    CHECKS                  LAST UPDATED         
app     148eddd4f471e8  156     ewr     started         1 total, 1 passing      2023-09-11T17:55:29Z

I’m don’t fully understand how fly’s deploy process works, but I see the new machine is spun up during the deploy. Is that not expected? I destroyed that bos machine with fly machine destroy 148edd65bd5238 --force and tried another deploy. Again a new machine was spun up (this time in ewr), and again I ran into the same issue with the volume not seeming to be there.

Oh, the EWR machine is for running your release_command. Not sure why it’s running though. If you do a new deploy, does fly machine list show it as stopped? Is your release command somehow running a long-running process/

Yeah the deploy spends awhile here:

Running dreaming release_command: /app/bin/migrate
  Waiting for 5683dd43b49d18 to have state: destroyed

FWIW I also run the command with flyctl deploy --release-command-timeout 240.

While the deploy hangs there, in the logs I see that it’s installing the python packages (per the entrypoint.sh). However what’s strange is that the logs finish and I see the server boot, but the deploy still hangs at the above. It will hang there for a couple more minutes before it eventually fails with this:

Error: release command failed - aborting deployment. error waiting for release_command machine 5683dd43b49d18 to finish running: timeout reached waiting for machine to destroyed failed to wait for VM 5683dd43b49d18 in destroyed state: Get "https://api.machines.dev/v1/apps/dreaming/machines/5683dd43b49d18/wait?instance_id=01HA3JJM0EQG44EJ4G8SVYTZFA&state=destroyed&timeout=60": net/http: request canceled
You can increase the timeout with the --release-command-timeout flag

So the fact that the deploy fails is also strange to me, though I’m not sure if it’s related to the volume not being accessible.

FWIW, I’ve deployed this app successfully, doing the python package install during the docker build phase instead

I would also be happy to do the install in docker and just deploy the image in the “normal” way. Is there any plan to increase the max docker image size? Say, 16GB?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.