Unable to reliably get `current_user` in production using POW

Hi everyone :wave:

Using the user authentication system POW, I’m trying to retrieve the current user in my liveviews.

I created a util UserLiveAuth defining an on_mount function to retrieve and then assign the user (see code below). And everything works perfectly locally.

Sadly, this does not prove to be reliable in production (app and DB hosted on Fly): sometimes everything works, sometimes the function returns nil every other time, and often the user is disconnected for no reason. Here is an example in the monitoring logs, where I constantly update the page, with inconsistent responses:

 [info] 18:46:37.986 request_id=F4oQLanygBo_-oMAAAbh [info] Sent 200 in 17ms

 [info] 18:46:38.335 request_id=F4oQLb_EKOAJCSUAAAcB [info] GET /

 [info] 18:46:38.342 request_id=F4oQLb_EKOAJCSUAAAcB [info] Elixir.MyappWeb.UserLiveAuth: No user found in session

 [info] 18:46:38.352 request_id=F4oQLb_EKOAJCSUAAAcB [info] Sent 200 in 17ms

 [info] 18:46:38.653 request_id=F4oQLdK1RR-SfJgAAAch [info] GET /

 [info] 18:46:38.660 request_id=F4oQLdK1RR-SfJgAAAch [info] Elixir.MyappWeb.UserLiveAuth: User goulvenclech found in session

 [info] 18:46:38.669 request_id=F4oQLdK1RR-SfJgAAAch [info] Sent 200 in 16ms

 [info] 18:46:38.967 request_id=F4oQLeVzuYH-qAAAAAdB [info] GET /

 [info] 18:46:38.975 request_id=F4oQLeVzuYH-qAAAAAdB [info] Elixir.MyappWeb.UserLiveAuth: No user found in session 

I tried fetching the current_user in two different ways (see code below), with CredentialsCache.get() and Session.fetch(), giving a similar result.

I can’t prove it, but we had the impression that this particularly happens in two cases:

  • First minutes after a new deploy
  • When there have been no users for a long time, and this message appears:
    2023-10-01T18:02:45.443 proxy[2874de7f676678] cdg [info] Downscaling app (our app) in region cdg from 1 machines to 0 machines. Automatically stopping machine 2874de7f676678

Do you think this is an error in the architecture of my code? Or could this be related to Fly.Io?

Thanks for any help !

Source code

Utils module :

defmodule MyappWeb.UserLiveAuth do
  import Phoenix.Component

  alias Surface.Components.Context

  def on_mount(:default, _params, session, socket) do
    user = get_user(socket, session)

    {:cont,
     socket
     |> assign_new(:current_user, fn -> user end)
     |> Context.put(current_user: user)}
  end
end

First version of get_user :

  defp get_user(socket, session, config \\ [otp_app: :myapp])

  defp get_user(socket, %{"myapp_auth" => signed_token}, config) do
    conn = struct!(Plug.Conn, secret_key_base: socket.endpoint.config(:secret_key_base))
    salt = Atom.to_string(Pow.Plug.Session)

    Logger.info("#{__MODULE__}: get_user/3 called with #{signed_token}")

    with {:ok, token} <- Pow.Plug.verify_token(conn, salt, signed_token, config),
         {user, _metadata} <- CredentialsCache.get([backend: EtsCache], token) do
      Logger.info("#{__MODULE__}: get_user/3 found user #{user.username}")
      user
    else
      resp ->
        Logger.error("#{__MODULE__}: get_user/3 failed with error #{resp}")
        nil
    end
  end

  defp get_user(_, _, _), do: nil

Second version of get_user :

  defp get_user(socket, session, config \\ [otp_app: :myapp]) do
    case %Plug.Conn{
           private: %{
             plug_session_fetch: :done,
             plug_session: session,
             pow_config: config
           },
           secret_key_base: socket.endpoint.config(:secret_key_base),
           owner: self(),
           remote_ip: {0, 0, 0, 0}
         }
         |> Pow.Plug.Session.fetch(config) do
      {_conn, nil} ->
        Logger.error("#{__MODULE__}: No user found in session")
        nil

      {_conn, user} ->
        Logger.info("#{__MODULE__}: User #{user.username} found in session")
        user
    end
  end