App concurrency is spiking- has anything changed in the infrastructure?

Since 9/1 our app- which is primarily using GRPC has started to hog memory and accumulate concurrent connections:

Ports are configured as

[[services]]
  internal_port = 443
  protocol = "tcp"

  [[services.ports]]
    port = 443

[[services]]
  internal_port = 8080
  protocol = "tcp"

  [[services.ports]]
    port = 8080

  [services.concurrency]
    hard_limit = 500
    soft_limit = 200

pprof shows that the application is accumulating goroutines in the GRPC handler:

goroutine profile: total 390
114 @ 0x44136e 0x439af7 0x46f325 0x4f63a7 0x4f769a 0x4f7688 0x5856e5 0x59a1c5 0x802cdb 0x534838 0x802ebe 0x800490 0x806758 0x806761 0x5d75f7 0x4ca010 0x9bf225 0x9bf1f4 0x9bf965 0xa03c85 0xa401a9 0xa3fa85 0x4758c1
#	0x46f324	internal/poll.runtime_pollWait+0x84						/opt/homebrew/Cellar/go/1.21.0/libexec/src/runtime/netpoll.go:343
#	0x4f63a6	internal/poll.(*pollDesc).wait+0x26						/opt/homebrew/Cellar/go/1.21.0/libexec/src/internal/poll/fd_poll_runtime.go:84
#	0x4f7699	internal/poll.(*pollDesc).waitRead+0x279					/opt/homebrew/Cellar/go/1.21.0/libexec/src/internal/poll/fd_poll_runtime.go:89
#	0x4f7687	internal/poll.(*FD).Read+0x267							/opt/homebrew/Cellar/go/1.21.0/libexec/src/internal/poll/fd_unix.go:164
#	0x5856e4	net.(*netFD).Read+0x24								/opt/homebrew/Cellar/go/1.21.0/libexec/src/net/fd_posix.go:55
#	0x59a1c4	net.(*conn).Read+0x44								/opt/homebrew/Cellar/go/1.21.0/libexec/src/net/net.go:179
#	0x802cda	crypto/tls.(*atLeastReader).Read+0x3a						/opt/homebrew/Cellar/go/1.21.0/libexec/src/crypto/tls/conn.go:805
#	0x534837	bytes.(*Buffer).ReadFrom+0x97							/opt/homebrew/Cellar/go/1.21.0/libexec/src/bytes/buffer.go:211
#	0x802ebd	crypto/tls.(*Conn).readFromUntil+0xdd						/opt/homebrew/Cellar/go/1.21.0/libexec/src/crypto/tls/conn.go:827
#	0x80048f	crypto/tls.(*Conn).readRecordOrCCS+0x24f					/opt/homebrew/Cellar/go/1.21.0/libexec/src/crypto/tls/conn.go:625
#	0x806757	crypto/tls.(*Conn).readRecord+0x157						/opt/homebrew/Cellar/go/1.21.0/libexec/src/crypto/tls/conn.go:587
#	0x806760	crypto/tls.(*Conn).Read+0x160							/opt/homebrew/Cellar/go/1.21.0/libexec/src/crypto/tls/conn.go:1369
#	0x5d75f6	bufio.(*Reader).Read+0x196							/opt/homebrew/Cellar/go/1.21.0/libexec/src/bufio/bufio.go:244
#	0x4ca00f	io.ReadAtLeast+0x8f								/opt/homebrew/Cellar/go/1.21.0/libexec/src/io/io.go:335
#	0x9bf224	io.ReadFull+0x64								/opt/homebrew/Cellar/go/1.21.0/libexec/src/io/io.go:354
#	0x9bf1f3	golang.org/x/net/http2.readFrameHeader+0x33					/Users/andig/go/pkg/mod/golang.org/x/net@v0.14.0/http2/frame.go:237
#	0x9bf964	golang.org/x/net/http2.(*Framer).ReadFrame+0x84					/Users/andig/go/pkg/mod/golang.org/x/net@v0.14.0/http2/frame.go:498
#	0xa03c84	google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams+0x164	/Users/andig/go/pkg/mod/google.golang.org/grpc@v1.56.0/internal/transport/http2_server.go:642
#	0xa401a8	google.golang.org/grpc.(*Server).serveStreams+0x148				/Users/andig/go/pkg/mod/google.golang.org/grpc@v1.56.0/server.go:946
#	0xa3fa84	google.golang.org/grpc.(*Server).handleRawConn.func1+0x44			/Users/andig/go/pkg/mod/google.golang.org/grpc@v1.56.0/server.go:889

We can’t find any obvious application changes leading to this behavior. Could potentially have something on the fly side changed? Maybe TCP keep-alives or similar?

Hi @andig, we recently lifted some idle timeout restrictions on TCP connections in our proxy. There’s a lot more information about that here:

1 Like

Thank you, that solved it. Lesson learned for Go GRPC: client connections do not have an idle timeout by default. If you need one- like here- enforce it as server:

		grpc.KeepaliveParams(keepalive.ServerParameters{
			MaxConnectionIdle: 30 * time.Second,
		}),
3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.