gRPC socket closed error

I have a python gRPC server similar to - grpc/examples/python/multiprocessing/server.py at master · grpc/grpc · GitHub

It is hosting an ML model similar to Python GPU Dev Machine · Fly Docs .

When deploying the app, I get this warning -

WARNING The app is not listening on the expected address and will not be reachable by fly-proxy.king that 6e82de32f0d058 [app] is up and running
You can fix this by configuring your app to listen on the following addresses:
  - 0.0.0.0:50051

fly.toml file


app = 'yral-ml-server'
primary_region = "ams" # If you change this, ensure it's to a region that offers GPUs
# vm.size = "a100-40gb"   # A shorthand for the size preset in the [[vm]] section
swap_size_mb = 32768   # This enables 32GB swap
kill_signal = 'SIGINT'
kill_timeout = '5s'

[build]
[build.args]
NONROOT_USER = "pythonuser" # Access this value in the Dockerfile using `ARG NONROOT_USER`


[env]
PORT = '50051'


[[services]]
internal_port = 50051
protocol = "tcp"
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1

[[services.ports]]
handlers = ["tls"]                                                     # "tls"
tls_options = { "alpn" = ["h2"], "versions" = ["TLSv1.2", "TLSv1.3"] }
port = 443


[services.concurrency]
hard_limit = 25
soft_limit = 20

[[vm]]
size = "a100-80gb"

[mounts]
source = "data"
destination = "/home/pythonuser"
initial_size = "50gb"

The app eventually installs all the libraries (similar to entrypoint.sh and post-initialization.sh) and starts the server at 0.0.0.0:50051 but it takes some time. Meanwhile this warning occurs and no fly.dev address is provided. flyctl ips list gives the dedicated IPv4 and v6 but calling on it gives a Socket closed error.

python client code:

@contextlib.contextmanager
def create_client_channel(addr):
    # Call credential object will be invoked for every single RPC
    call_credentials = grpc.access_token_call_credentials(
        "<TOKEN>"
    )
    # Channel credential will be valid for the entire channel
    channel_credential = grpc.ssl_channel_credentials()
    # Combining channel credentials and call credentials together
    composite_credentials = grpc.composite_channel_credentials(
        channel_credential,
        call_credentials,
    )
    channel = grpc.secure_channel(addr, composite_credentials)
    yield channel


def run():

    with create_client_channel("213.188.210.123:443") as channel:
        stub = ml_server_pb2_grpc.MLServerStub(channel)
        response = stub.predict(
            ml_server_pb2.VideoEmbedRequest(video_id="ee1201fc2a6e45d9a981a3e484a7da0a")
        )

    print("Greeter client received: ", response.result)

How do I make the deploy wait till the app listens on the address ? Or am I missing something ?

Tried removing tls handler and implemented custom TLS similar to grpc/examples/python/auth/tls_server.py at master · grpc/grpc · GitHub

Getting this error now on client side

status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses; last error: UNAVAILABLE: ipv4:168.220.88.57:50051: Socket closed"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-05-20T21:54:23.31853+05:30", grpc_status:14, grpc_message:"failed to connect to all addresses; last error: UNAVAILABLE: ipv4:168.220.88.57:50051: Socket closed"}"

error log in fly logs

2024-05-20T16:43:51Z app[5683d642fde6d8] ams [info][PID 462] Binding to 0.0.0.0:50051
error.message="instance refused connection. is your app listening on 0.0.0.0:50051? make sure it is not only listening on 127.0.0.1 (hint: look at your startup logs, servers often print the address they are listening on)" 2024-05-20T16:44:26Z proxy[5683d642fde6d8] ams [error]

fly.toml

app = 'yral-ml-server'
primary_region = "ams" # If you change this, ensure it's to a region that offers GPUs
# vm.size = "a100-40gb"   # A shorthand for the size preset in the [[vm]] section
swap_size_mb = 32768   # This enables 32GB swap
kill_signal = 'SIGINT'
kill_timeout = '5s'

[build]
[build.args]
NONROOT_USER = "pythonuser" # Access this value in the Dockerfile using `ARG NONROOT_USER`


[env]
PORT = '50051'


[[services]]
internal_port = 50051
protocol = "tcp"
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1

[[services.ports]]
handlers = [] # "tls"
# tls_options = { "alpn" = ["h2"], "versions" = ["TLSv1.2", "TLSv1.3"] }
port = 50051

[services.concurrency]
hard_limit = 25
soft_limit = 20

[[vm]]
size = "a100-80gb"

[mounts]
source = "data"
destination = "/home/pythonuser"
initial_size = "50gb"

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.