fly-replay with JSON format returns “Invalid JSON: expected value at line 1 column 1”

Problem: Using fly-replay with JSON format to route requests between apps in the same org. The JSON payload appears to be generated correctly in logs, but Fly’s proxy returns [PA03] 'fly-replay' response returned by app was invalid: replay was malformed: Invalid JSON: expected value at line 1 column 1.

Response Flow

  1. Browser requests https://{main-app}/machines/{project-id}
  2. Main app generates fly-replay JSON response with path transformation
  3. Fly proxy should replay request to target app ({main-app}-{project-id}) at path /
  4. Target machine runs http service on ports 80:8080

Code Implementation

def code(conn, %{"project_id" => project_id}) do
  project = Projects.get_project_by_id!(project_id)
  app_name = Projects.cloud_app_name(project)

  replay_config = %{
    app: app_name, # app-1234abcd (exists in fly)
    instance: project.machine_id, # 12314abcdefg (exists in fly)
    state: project.machine_key, # 1234abcdefghi1234
    transform: %{
      path: "/"
    }
  }

  conn
  |> put_resp_header("content-type", "application/vnd.fly.replay+json")
  |> json(replay_config)
end

Logs

Successful JSON generation:

20:26:37.083 [info] fly-replay config: %{state: "smzv5snfd0z0p", instance: "6839740b6d5ed8", transform: %{path: "/"}, app: "app-8453e84d"}
20:26:37.084 [info] Encoded JSON body: {"state":"smzv5snfd0z0p","instance":"6839740b6d5ed8","transform":{"path":"/"},"app":"app-8453e84d"}
20:26:37.084 [info] Sent 200 in 61ms

Fly proxy error immediately after:

[PA03] 'fly-replay' response returned by app was invalid: replay was malformed: Invalid JSON: expected value at line 1 column 1

Request headers (Elixir formatted)

Request headers: [
  {"pragma", "no-cache"},
  {"cache-control", "no-cache"},
  {"sec-ch-ua", "\"Chromium\";v=\"139\", \"Not;A=Brand\";v=\"99\""},
  {"sec-ch-ua-mobile", "?0"},
  {"sec-ch-ua-platform", "\"macOS\""},
  {"dnt", "1"},
  {"x-request-start", "t=1755989309202546"},
  {"user-agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36"},
  {"accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7"},
  {"sec-fetch-site", "same-origin"},
  {"sec-fetch-mode", "navigate"},
  {"sec-fetch-dest", "iframe"},
  {"referer", "https://{main-app}.fly.dev/projects/8453e84d-3b96-4697-a2fb-3afe303f68dc"},
  {"accept-encoding", "gzip, deflate, br, zstd"},
  {"accept-language", "en-US,en;q=0.9"},
  {"cookie", "..."},
  {"priority", "u=0, i"},
  {"host", "{main-app}.fly.dev"},
  {"fly-client-ip", "{redacted-ip}"},
  {"x-forwarded-for", "{redacted-ip}, {redacted-ip}"},
  {"fly-forwarded-proto", "https"},
  {"x-forwarded-proto", "https"},
  {"fly-forwarded-ssl", "on"},
  {"x-forwarded-ssl", "on"},
  {"fly-forwarded-port", "443"},
  {"x-forwarded-port", "443"},
  {"fly-region", "dfw"},
  {"fly-request-id", "01K3CHWXRJTD3TTYR32RJEZMCD-dfw"},
  {"via", "2 fly.io, 2 fly.io"}
]

What Works vs What Doesn’t

Works: Direct curl requests work perfectly and get proper 302 redirects to http service on target machine

curl -v https://{main-app}/machines/{project-id}
# Returns: HTTP/2 302, location: ./

Doesn’t Work: Browser requests, directly navigating, or trying to embed in an iframe, get 502 Bad Gateway due to the malformed JSON error.

Questions

  1. Why would the same response work for curl but fail for browser requests?
  2. Could Phoenix middleware be interfering with the response body after json()?
  3. Is there something about the request headers (cookies, user-agent, etc.) that affects fly-replay JSON parsing?

:waving_hand: @odin_smash

Can you send a (failing) request to your app with the header flyio-debug set to doit? If you can then give me the fly-request-id from that response I can dig into our logging.

Hmm, this doesn’t sound like it is working. The fly-replay should be transparent to the client, it replays the request inside the proxy and returns the response to the original request. (Unless your target machine is the one returning a 302).

@bglw Thanks for the reply! I’ve added the flyio-debug: doit header, here’s a recent request demonstrating the issue: {"fly-request-id", "01K3HF19SHEJCM552JCPDREMZG-dfw"}, I’m also viewing logs in Fly’s grafana instance, and there’s this value: request_id=GF8c1JUIwXwLVIEAAAHC associated with some server logs.

Re the 302, I see, It’s possible the http service on the target server is issuing the 302 (need to confirm). I had assumed the proxy would issue redirects.

Thanks! I’ve rolled out a fix today, so you shouldn’t be seeing these errors any more :slight_smile:

Our proxy was not correctly handling responses with content encoding, where the replay JSON is gzipped, for example. All of your browser requests would set an accept-encoding header, which would make it through to your Phoenix app where the compression was applied automatically. A bog standard curl sends no such header, so didn’t break.

All is well now! The proxy handles the replay body being encoded going forward.

Thanks for the report and all the details!

Amazing @bglw Glad to hear! Thanks again for the quick follow up, and resolution!

@bglw Just making sure I’m understanding the transform.path parameter correctly. I’m going for rewriting:

https://my-app.fly.dev/machines/80c6931b-9f8d-4f85-89b6-21792785aa2c/code

to:

https://my-app.fly.dev/

However, with my Fly replay config:

{
  "state":"smzv5snfd0z0p",
  "instance":"6839740b6d5ed8",
  "transform": {
    "path":"/"
  },
  "app":"app-80c6931b"
}

The resulting path transform is:

https://my-app.fly.dev/machines/80c6931b-9f8d-4f85-89b6-21792785aa2c/

(notice it’s just rewriting the topmost path segment)

Is this the expected result? Thanks again for the support!

Hmm, no, I’m unable to reproduce that. The request should be replayed at https://my-app.fly.dev/ in this example, and that is what I see with my apps.

Can you again put a debug request through? e.g. with

curl https://my-app.fly.dev/machines/80c6931b-9f8d-4f85-89b6-21792785aa2c/code -H "flyio-debug: doit"

And share the fly-request-id that it outputs?

Ok, here’s a few requests with the debug header present:

curl - {"fly-request-id", "01K3S3ZNZK70RDGVF8TSZ3H0TX-dfw"}

wget - {"fly-request-id", "01K3S40RAC9RP3AXETPBCNSAD2-dfw"}

browser - {"fly-request-id", "01K3S4J0H538WD81K69XYC5FE1-dfw"}

Looking at the first request there:

  • The request hits the proxy as https://my-app.fly.dev/machines/xxxxx/code
    • Your app responds with a replay elsewhere with a path transformation
  • The proxy, internally, replays the request to the new app as /
  • Your target app receives the replay, and responds to the user with a 302 to a relative location
    • e.g. a 302 with a location header of ./?a=b (setting a query parameter).

Ultimately, from the user’s perspective, they sent a request for https://my-app.fly.dev/machines/xxxxx/code, and got back a 302 response for ./?a=b, which together resolve the redirect to land at https://my-app.fly.dev/machines/xxxxx/?a=b.

If you’re routing based on paths, issuing any sort of user-facing redirect from an app you’re replaying to is likely to cause issues. With fly-replay, the user only ever sees the original URL, so any redirects are relative to this.

Thanks for confirming @bglw , the target app is issuing a relative 302 redirect. So seems fly replay is working as expected then. I will adjust the target app and go from there.

1 Like