Java app is killed on startup

I’m a few days into fly.io . I have got my app working but when I try to deploy now, it gets killed just as it’s starting up with no hint as to why.

2022-11-01T19:23:29.537 runner[fe5df107] lhr [info] Starting instance
2022-11-01T19:23:32.598 runner[fe5df107] lhr [info] Configuring virtual machine
2022-11-01T19:23:32.603 runner[fe5df107] lhr [info] Pulling container image
2022-11-01T19:33:20.352 runner[fe5df107] lhr [info] Unpacking image
2022-11-01T19:33:30.630 runner[fe5df107] lhr [info] Preparing kernel init
2022-11-01T19:33:31.075 runner[fe5df107] lhr [info] Configuring firecracker
2022-11-01T19:33:31.184 runner[fe5df107] lhr [info] Starting virtual machine
2022-11-01T19:33:31.577 app[fe5df107] lhr [info] Starting init (commit: ce4cf1b)...
2022-11-01T19:33:31.617 app[fe5df107] lhr [info] Preparing to run: `/usr/local/bin/mvn-entrypoint.sh bash ./fly-run.sh` as root
2022-11-01T19:33:31.651 app[fe5df107] lhr [info] 2022/11/01 19:33:31 listening on [fdaa:0:1989:a7b:bada:fe5d:f107:2]:22 (DNS: [fdaa::3]:53)
...
2022-11-01T19:33:35.192 app[fe5df107] lhr [info] :: Spring Boot :: (v2.5.5)
2022-11-01T19:33:35.438 app[fe5df107] lhr [info] 2022-11-01 19:33:35.435 INFO 541 --- [ main] o.c.m.CrossrefManifoldApplicationKt : Starting CrossrefManifoldApplicationKt v0.0.1-SNAPSHOT using Java 11.0.16 on fe5df107 with PID 541 (/target/manifold.jar started by root in /)
...
2022-11-01T19:33:42.082 app[fe5df107] lhr [info] 2022-11-01 19:33:42.081 INFO 541 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
...
2022-11-01T19:33:46.636 app[fe5df107] lhr [info] 2022-11-01 19:33:46.632 INFO 541 --- [ main] (normal application log stuff)
2022-11-01T19:33:46.683 runner[fe5df107] lhr [info] Shutting down virtual machine
2022-11-01T19:33:46.903 app[fe5df107] lhr [info] Sending signal SIGINT to main child process w/ PID 521
2022-11-01T19:33:47.220 app[fe5df107] lhr [info] 2022-11-01 19:33:47.212 INFO 541 --- [ main] (normal application log stuff)
2022-11-01T19:33:47.537 app[fe5df107] lhr [info] 2022-11-01 19:33:47.533 INFO 541 --- [ main] (normal application log stuff)

So my app takes under a minute to boot, no report of health check failures, and is sent a SIGINT by the runner.

I’ve set my health checks to be very liberal for debugging. So I don’t even expect a health check in these first few seconds. But I’ve configured the port correctly.

  [[services.tcp_checks]]
    grace_period = "120s"
    interval = "15s"
    restart_limit = 10
    timeout = "2s"

I noticed there was a ten minute gap between “Pulling container image” and “Unpacking image”. Though after that the app appeared to boot happily until it was killed.

Ideas?

Here’s some other questions about SIGINT:

Virtual machine repeatedly shutting down → I’ve configured the grace period
Is SIGINT an issue with my app or an issue with fly.io? → I’ve verified the internal port.

I’ve tried to reboot it with various tweaks a few time now (up to version 30!).

I’ve noticed that the shutdown happens in a specific place each time. Always less than a minute in (just after it succeeds in running database migrations), and just before it’s able to start the HTTP server. And, because the server takes a second or two to respond to the SIGINT, it continues as normal. So it’s not the app crashing.

Hard to tell what’s going on without code, but is it possible that allocated VM resources aren’t enough to run that Java app (also: 1, 2)? Try scaling up, if you haven’t already, to see if things then run as expected?

# 1G RAM
fly scale vm memory 1024 -a <app-name>

Additionally, you may also want to give existing JVM flags a cursory look.

Thanks! It’s got 4 GB allocated and uses about 300 MB of that. I’ve checked the resource usage graph, and there are no bumps.

The actual code is a relatively simple Spring Boot app. It does have some features, but I don’t think it gets are far as actually doing any work, as it’s killed a few seconds in.

Thanks for the tip, I’ll try with -Xmx (maximum heap) but I don’t believe the runtime will have cause to touch 4GB.

1 Like

What does fly status --all -a <app-name> tell you about health-checks / statuses of deployed instances (VMs)?

For ex, here’s status of my NodeJS app:

➜  fly status --all -a ____                              
App
  Name     = ____  
  Owner    = ____           
  Version  = 435               
  Status   = running           
  Hostname = ____.fly.dev  
  Platform = nomad             

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS 	HEALTH CHECKS      	RESTARTS	CREATED              
d3adbeef	app    	435 ⇡  	aws   	run    	running	2 total, 2 passing 	0       	2022-10-23T07:38:18Z	
deadb33f	app    	425    	aws   	evict  	failed 	2 total, 2 critical	0       	2022-08-18T13:25:20Z	

Btw, folks at codecentric.de wrote quite a nice post about getting Spring up and running on Fly that may have a pointer or two in case you’ve not read it already.

BTW I configured -Xmx to 3 GB heap (out of 4). Happened again.

App
  Name     = manifold
  Owner    = crossref
  Version  = 33
  Status   = running
  Hostname = manifold.fly.dev
  Platform = nomad

Deployment Status
  ID          = 52df1e8d-246d-3799-4654-7f2d3f3dca5c
  Version     = v33
  Status      = successful
  Description = Deployment completed successfully
  Instances   = 1 desired, 1 placed, 1 healthy, 0 unhealthy

Instances
ID      	PROCESS	VERSION	REGION	DESIRED	STATUS  	HEALTH CHECKS      	RESTARTS	CREATED
d6d7c16c	app    	33 ⇡   	lhr   	run    	running 	1 total, 1 passing 	0       	10m15s ago
0430276e	app    	32     	lhr   	stop   	complete	1 total, 1 passing 	0       	21m19s ago
3e495ee7	app    	31     	lhr   	stop   	complete	1 total, 1 passing 	0       	34m15s ago
701986eb	app    	30     	lhr   	stop   	complete	1 total, 1 passing 	0       	18h54m ago
5b1ad98f	app    	12     	lhr   	run    	failed  	1 total, 1 critical	2       	2022-10-31T20:10:06Z
95cc646f	app    	11     	lhr   	run    	failed  	1 total, 1 critical	2       	2022-10-31T18:00:26Z
3ddd8c06	app    	10     	lhr   	stop   	failed  	1 total            	2       	2022-10-31T17:39:41Z
93d5a92d	app    	9      	lhr   	run    	failed  	1 total            	2       	2022-10-30T16:17:14Z
a2810e03	app    	8      	lhr   	stop   	failed  	1 total, 1 critical	2       	2022-10-30T15:25:37Z
7117ade5	app    	7      	lhr   	stop   	failed  	1 total, 1 critical	2       	2022-10-30T14:48:42Z
b34962a7	app    	6      	lhr   	stop   	failed  	                   	2       	2022-10-30T14:35:42Z
66ef98a9	app    	5      	lhr   	stop   	failed  	1 total            	2       	2022-10-30T14:19:13Z
af0b4a0a	app    	4      	lhr   	stop   	failed  	1 total, 1 critical	2       	2022-10-30T14:00:08Z
be176fd3	app    	3      	lhr   	stop   	failed  	1 total            	2       	2022-10-30T13:52:20Z
5761a68b	app    	2      	lhr   	run    	failed  	                   	0       	2022-10-30T09:27:52Z
6ece49ce	app    	1      	lhr   	run    	failed  	1 total, 1 critical	2       	2022-10-29T21:24:10Z
d6b550bf	app    	0      	lhr   	stop   	failed  	1 total, 1 critical	2       	2022-10-29T21:15:59Z

So I see some critical health checks.