Two questions about starting a non-public app/machine

paulrudy · September 15, 2023, 9:21am

I’d love some help on this (I think) simple pair of questions/goals.

I have a scraper app on Fly (running typesense/docsearch-scraper) that populates a separate, always-on typesense app (also on Fly). The scraper app does its job, finishes a few seconds later, and shuts down.

My goals are:

1. Whenever a github workflow is run, it should start the scraper app, but without the scraper being publicly available.

I think I have this one figured out: Use flyctl and a Fly app deploy token saved as a secret, and have the github action run fly app restart. Am I missing anything there?

2. Whenever a VM running the typesense app restarts or is redeployed, it should use private networking to start the scraper.

This is what I’ve found so far: I can use curl <scraper-app>.flycast to start the scraper app from within the typesense app. But is there a better way?

The curl command results in: [error]failed to connect to machine: gave up after 15 attempts (in 11.311288483s) in the scraper app’s logs, and moreover, since there are two clones of the scraper app, they alternate starting up and trying to manage the curl connection before shutting down, so curl needs to have --connect-timeout to handle that issue.

Surely there has to be a more elegant way to get the scraper app to start up via private networking?

For reference, as of now, the scraper app has a private ipv6 address, no public ipv6 or ipv4, and:

[[services]]
protocol = "tcp"
internal_port = 8080

[[services.ports]]
port = 80
handlers = ["http"]

Grateful in advance for any help!

Zane_Milakovic · September 15, 2023, 12:15pm

Those are clever solutions. The other thing you could do is expose it on a internal port, and have a endpoint on your public app, send a private connection over .internal (no proxy) or flycast (proxy) routing as well.

So a webhook endpoint, that then triggers that action.

So for your fly.toml, I don’t know that your locked down right now. It seems like http would still connect from the public (not https). I think you only need a specify a internal_port, and not a external for the internal routing to work.

But you should be able to connect then, fly has robust internal routing.

Check out the .internal routing, and the flycast section.

paulrudy · September 15, 2023, 4:47pm

Thanks for your suggestions! I spent hours with the private networking portion of Fly docs last evening, and what I outlined in the OP was the best I could come up with, with my novice level of understanding.

With the webhook endpoint suggestion, I think you’re talking about item 1 of my OP—getting a github action to run the scraper. That makes sense, I’ll look into implementing it.

But for item 2, about triggering the scraper app via private networking: I’m already using private networking through flycast, and using curl through the private flycast address is getting the scraper app to start up, but I’m unsure why I get the messages I quoted, and I’d like to know if there’s a better way to start the app over private networking.

Also, from the ssh console of the typesense app, curl <scraper-app>.flycast works (with the issues I mentioned), but curl <scraper-app>.internal gives me Could not resolve host. I feel like I’m missing something basic here.

paulrudy · September 15, 2023, 5:45pm

Ok, I think I learned a bit more. I’m still lost, but getting somewhere:

The scraper app is locked down, because its only IP is a private ipv6.

I changed the services section of the scraper’s fly.toml to

[[services]]
protocol = "tcp"
internal_port = 8080

[[services.ports]]
port = 8080
handlers = ["http"]

Then in my scraper app’s entrypoint, start.sh, I added 6tunnel -6 -l :: 8080 127.0.0.1 8080.

With these changes, curl <scraper-app>.internal:8080 connects, but only when the scraper VM is running, so .internal doesn’t help me start the machine via Fly private networking.

If the scraper VM is stopped: curl <scraper-app>.flycast:8080 does start the VM.

And if the VM is already running: curl <scraper-app>.flycast:8080 produces an error in the scraper VM logs [error]could not complete HTTP request to instance: connection closed before message completed.

So I suppose now my question is this: How can I set up the scraper app so that if another VM privately connects to it via flycast, it will run its entrypoint command (which is a bash script)?

Zane_Milakovic · September 15, 2023, 7:12pm

To answer your earlier question, i know .internal does not go through the proxy, and therefore can’t send wake signals. As the proxy, I believe controls the scaling and wakeup.

If I had to guess, <scrapper-app>.flycast:8080 is doing 100% the same thing with both your calls, the difference is the first one woke the machine up and didn’
t have time to resolve yet.

Your script runs on entrypoint, but I assume it does not have a HTTP server, so nothing is actually listening within the image on :8080.

So while flycast will route and wake it up, triggering the entrypoint, nothing is on the server to receive the connection to trigger the same script as the entrypoint.

Not sure what language your script is using. But you want to do write a simple server. Make your entrypoint something like this -

ENTRYPOINT run script && serve -a :8080

Now your script runs on ENTRYPOINT, and it starts a server.


app = new Server()
app.route('/', handler {
    os.runCmd('run script');
})
app.start(':8080')

Just some sudocode don’t know the lanague you are using.

And actually if you don’t need the script to trigger on deploy/restart, and just when any request goes too it, you can just have that small single router server trigger the script.

paulrudy · September 16, 2023, 12:42am

Thank you for your patient explanation! That fills in all of the blanks for me.

The entrypoint script is in bash, and simply runs a couple of 6tunnel commands to open ports to connect to the typesense VM instances. Then it runs the typesense-docsearch-scraper app (which is out-going only—does not have a server), and updates the typesense instances.

I’ve made a minimal server in bash (with the help of phind.com) using socat, which runs only as long as the scraper runs (to allow the VM to shut down when finished), and that’s solved that particular issue. Thank you!

I’ll look into doing the github action portion this weekend. Really appreciate the help

Zane_Milakovic · September 16, 2023, 1:25am

Of course, I am glad you got it working!

system · September 23, 2023, 1:25am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Machine auto start/stop on private network Questions / Help	4	140	April 27, 2024
App starting but not available somehow Questions / Help	7	2397	June 14, 2023
Private networking example? Questions / Help	5	675	November 16, 2021
Confused about how to make an app only accessible through a private network Questions / Help	5	262	January 3, 2024
Struggling with Private Networking Questions / Help	2	1049	November 26, 2022

Two questions about starting a non-public app/machine

1. Whenever a github workflow is run, it should start the scraper app, but without the scraper being publicly available.

2. Whenever a VM running the typesense app restarts or is redeployed, it should use private networking to start the scraper.

Related Topics