fly-force-instance-id header not working

It seems like fly-replay: instance=<instance_id> isn’t respected.

I checked the headers in the dev tools and I see it:
fly-replay: instance=01J3QT2MNX6MSTV5Q6GTWEMTGA

Here is the result of the status command:

fly machine status 4d89662b6e1ed8
Machine ID: 4d89662b6e1ed8
Instance ID: 01J3QT2MNX6MSTV5Q6GTWEMTGA
State: started

VM
  ID            = 4d89662b6e1ed8                    
  Instance ID   = 01J3QT2MNX6MSTV5Q6GTWEMTGA        
  State         = started                           

But in the log I clearly see another machine getting the request:

app[328733df039668] sjc [info] GET / 200 in 58ms

And indeed the response I receaive is from the other machine (I have set them up to return different things).

I tried Fly-Instance-Id which I’ve seen mentionned once in the forum, same result.

I also tried the fly-force-instance-id header but it is simply not working, it hangs and eventually fails:

Error: socket hang up
    at TLSSocket.socketCloseListener (node:_http_client:473:25)
    at TLSSocket.emit (node:events:531:35)
    at node:net:337:12
    at TCP.done (node:_tls_wrap:657:7)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
  code: 'ECONNRESET'
}

Attempting to answer my own question here, looking at the doc like Dynamic Request Routing · Fly Docs and Multi-region databases and fly-replay · Fly Docs it seems like you cannot really put headers on the request from the client? If that’s the case then we really really need a way to route to a specific machine without having to write a whole server to act as a router. Ideally a special url like <machineId>.myapp.fly.dev accessible from the outside.

fly-force-instance-id is what you are looking for:

Thank you very much for the response! @rubys, As I mentionned: it doesn’t seem to work?

I also tried the fly-force-instance-id header but it is simply not working, it hangs and eventually fails:

Error: socket hang up
    at TLSSocket.socketCloseListener (node:_http_client:473:25)
    at TLSSocket.emit (node:events:531:35)
    at node:net:337:12
    at TCP.done (node:_tls_wrap:657:7)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
  code: 'ECONNRESET'
}

Sorry, I guess I read too fast. Can you share the actual header you sent? The value of the header is the desired instance id.

I just tried with putting the machines id instead of instance_id and I think it works?
If so please update the docs because I just spend many days trying to make this work

EDIT: nope, doesn’t seem to work with machine.id

@rubys : here is the header I am sending:

fly-force-instance-id: 01J3QT2MNX6MSTV5Q6GTWEMTGA

And it just hangs then eventually fails:

Error: socket hang up
    at TLSSocket.socketCloseListener (node:_http_client:473:25)
    at TLSSocket.emit (node:events:531:35)
    at node:net:337:12
    at TCP.done (node:_tls_wrap:657:7)
    at TCP.callbackTrampoline (node:internal/async_hooks:130:17) {
  code: 'ECONNRESET'
}

Here is the relevant machine status:

fly machine status 4d89662b6e1ed8
Machine ID: 4d89662b6e1ed8
Instance ID: 01J3QT2MNX6MSTV5Q6GTWEMTGA
State: started

VM
  ID            = 4d89662b6e1ed8                    
  Instance ID   = 01J3QT2MNX6MSTV5Q6GTWEMTGA        
  State         = started                           
  Image         = nestor-flyapp:nestor-app-v1       
  Name          = clz2rdte40000m1vu7bg5awbe         
  Private IP    = fdaa:9:9632:a7b:b2e2:7be4:e69d:2  
  Region        = sjc                               
  Process Group =                                   
  CPU Kind      = shared                            
  vCPUs         = 1                                 
  Memory        = 2048                              
  Created       = 2024-07-26T13:49:18Z              
  Updated       = 2024-07-26T15:20:53Z              
  Entrypoint    =                                   
  Command       = ["npm","run","dev"]               

Event Logs
STATE  	EVENT 	SOURCE	TIMESTAMP                    	INFO 
started	start 	flyd  	2024-07-26T08:20:53.883-07:00	
created	launch	user  	2024-07-26T08:20:45.518-07:00	

It also cleary responds properly because without the header I get a response for either this machine or the only other one I have running at random as expected.

I just ran some tests, and am seeing similar results as you are so I’ve asked others to look into this.

For what it worth, Fly-Prefer-Region is working for me.

1 Like

@rubys thank you very much! At least I’m not crazy :slight_smile: please update this thread whenever you get a response.

Also as far as I can tell there’s no other way to route to a specific id from a client (what I really want is a special url like <machine_id>.myapp.fly.dev or even better <machine_name>.myapp.fly.dev because that way I don’t need to call the machine api to route). wishlist

So this is a blocker.

If you control the code on the server, the machine that receives the request should be able to respond to requests that arrive at the “wrong” server with a Fly-Replay header (as per the original link you sent) and the original request will be replayed to the desired machine.

That’s what I’m working on but that not ideal :slight_smile:

(my use case is having users get their own sandboxed instance with their app running on it, so this forces me to inject code into their app, the other option is to have a router, but that means writing my own router server and having a machine dedicated to that)

So it seems like fly-replays hangs indefinitely.

I use:
fly-replay: instance=01J3VPEGAQVED97J3TSDHJDGR0

This instance does exists and is running:

fly machine status 4d89662b6e1ed8
Machine ID: 4d89662b6e1ed8
Instance ID: 01J3VPEGAQVED97J3TSDHJDGR0
State: started

VM
  ID            = 4d89662b6e1ed8                    
  Instance ID   = 01J3VPEGAQVED97J3TSDHJDGR0        
  State         = started                           
  Image         = nestor-flyapp:nestor-app-v1       
  Name          = clz2rdte40000m1vu7bg5awbe         
  Private IP    = fdaa:9:9632:a7b:b2e2:7be4:e69d:2  
  Region        = sjc                               
  Process Group =                                   
  CPU Kind      = shared                            
  vCPUs         = 1                                 
  Memory        = 2048                              
  Created       = 2024-07-26T13:49:18Z              
  Updated       = 2024-07-28T03:34:25Z              
  Entrypoint    =                                   
  Command       = ["npm","run","dev"]               

Event Logs
STATE  	EVENT 	SOURCE	TIMESTAMP                    	INFO 
started	start 	flyd  	2024-07-27T20:34:25.663-07:00	
created	launch	user  	2024-07-27T20:34:17.707-07:00	

In case it matters, here is the code I’m using (using middleware.ts in a NextJs app):

import { NextResponse } from 'next/server'
import type { NextRequest } from 'next/server'

export async function middleware(request: NextRequest) {
  console.log("    ROUTER MIDDLEWARE: request.nextUrl " + request.nextUrl);
  if (request.nextUrl.pathname.includes("/route-to-instance/")) {
    // Extract id from /route-to-instance/id/some/path
    const instanceId = request.nextUrl.pathname.split('/')[2];
    console.log("    ROUTER MIDDLEWARE: instanceId: " + instanceId);    
    console.log("    ROUTER MIDDLEWARE: fly-replay: " + `instance=${instanceId}`);
    
    return NextResponse.json({ message: `Routing to instance ${instanceId} ` }, {
      headers: {
        'fly-replay': `instance=${instanceId}`
      }
    });
  } else {
    return NextResponse.next();
  }
}

I’m loading https://nestor-flyapp.fly.dev/route-to-instance/01J3VPEGAQVED97J3TSDHJDGR0 from the browser.

I see the middleare being called in the fly logs:

2024-07-28T03:47:16Z app[328733df039668] sjc [info]    ROUTER MIDDLEWARE: request.nextUrl https://localhost:3001/route-to-instance/01J3VPEGAQVED97J3TSDHJDGR0
2024-07-28T03:47:16Z app[328733df039668] sjc [info]    ROUTER MIDDLEWARE: instanceId: 01J3VPEGAQVED97J3TSDHJDGR0
2024-07-28T03:47:16Z app[328733df039668] sjc [info]    ROUTER MIDDLEWARE: fly-replay: instance=01J3VPEGAQVED97J3TSDHJDGR0

But it hangs.

So first, it looks like fly replay expects a machine id can someone confirm this? fly-replay: instance=<machine-id> which if true, well the doc is wrong and also the naming should be machineId not instance…

Second, it turns out this approach won’t work because I run a full web app which issues its own requests (like localhost:3000/app/_next/static/chunks/app/page.js) which mean I have to find a better way (currently thinking of cookies but that doesn’t feel like a very solid solution…

Should I dynamically create new apps instead and change the whole approach? If so, my issue is that I all of a sudden have to call the graphql api because I’d have to assign ips dynamically as well and this is currently not supported by the REST machine API.

Am I missing something because it feels like I am trying to do something fairly trivial (spin up machines with a basic NextJs app running on them, and accessing them via the browser) but that there is no simple solution for this?

I could have sworn this was working. instance should be the machine-id - the nomenclature can be confusing. The one thing to watch out for is if the instance refers to a dead machine then the fly proxy will hang for a minute.

Both fly-force-instance-id and fly-replay expect machine IDs, yes.

The proxy’s naming predates whatever is shown in the CLI’s output by many years. Internally a machine has a unique ID and multiple “versions”. I don’t know why it was named “Instance ID” in the output there. The proxy used the word “instance” because that seemed generic enough for future-proofing features like allowing things that aren’t machines, without being stuck with awkward naming.

Either way, the docs are wrong or confusing and we should fix that.

1 Like

Thanks! Yes updating the docs would be very helpful! It’s especially confusing given that the Machine object has an instance_id.

Any news on fly-force-instance-id not working? I’m pretty sure I tried with machine id but let me try again.

Nevermind, it seems like fly-force-instance-id works, so I must have been doing something wrong in my tests yesterday. Thanks so much for the help and please update the docs. :slight_smile:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.