Unhealthy allocations is killing me.

Disclaimer: low skill / newbie, sorry.

I have followed a tutorial and successfully organized an instance which is working well.

I’m trying to get another one up and I’ll be damned if I can figure out why it’s failing.
I’m attempting to run this:
https://hub.docker.com/r/jwigley/smokeping-speedtest/

and endlessly hitting the unhealthy allocations error. Google tells me that this is likely due to port assignment. I’ve edited the fly.toml at least 50 times and tried remapping ports to no avail.

Is there any way to get a more detailed breakdown of the fault.

Second issue, perhaps this is showing my skills, but my first, nicely working instance is on the wrong time zone, despite being on a server in my region. I’ve googled and googled some more, read some documentation, I’ll be damned if I can identify what controls the date and time?

I’ve SSH’d in to the instance and typed date - it’s clearly not the date for my region. How does one set this?

Can you share the complete unhealthy alloc error you see? And the fly.toml, Dockerfile (if any) you’re using?

1 Like

This is the console spitting back the fault to me

2022-09-23T23:06:31.000 [info] Mounting /dev/vdc at /app/smokeping w/ uid: 0, gid: 0 and chmod 0755
2022-09-23T23:06:31.000 [info] Preparing to run: `/init` as root
2022-09-23T23:06:31.000 [info] 2022/09/23 23:06:31 listening on [xxxx:Xxxxxx:xxxxxxxxxx]:22 (DNS: [fdaa::3]:53)
2022-09-23T23:06:32.000 [info] Starting clean up.
2022-09-23T23:06:32.000 [info] Umounting /dev/vdc from /app/smokeping
***v10 failed - Failed due to unhealthy allocations - no stable job version to auto revert to and deploying as v11 

Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/
Error abort

Here’s the toml file:

# fly.toml file generated for jaxjexsmokeping on 2022-09-23T05:35:21Z

app = "jaxjexsmokeping"
kill_signal = "SIGINT"
kill_timeout = 5
processes = []


[build]
  image = "jwigley/smokeping-speedtest:latest"




[env]
PORT = "8080"


[mounts]
  destination = "/app/smokeping"
  source = "jaxjexsmokeping"


[experimental]
  allowed_public_ports = []
  auto_rollback = true

[[services]]
  http_checks = []
  internal_port = 8080
  processes = ["app"]
  protocol = "tcp"
  script_checks = []
  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

 [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "1s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"


It’s very, very likely this is a dumb user error. I’m using basically default settings (as little no fiddling was required to get kuma up and running) so I figured I could get this going on but no luck at all.

Try setting grace_period = "30s" in the services.tcp_checks. It’s possible the VM is taking too long to pass health checks.

You can run fly status --all to get a list of failed VMs, and then fly vm status <id> to see more details about what’s happening on one of them.

Instances
ID              PROCESS VERSION REGION  DESIRED STATUS  HEALTH CHECKS           RESTARTS        CREATED    
e2521e03        app     11 ⇡    syd     run     failed  1 total                 2               1m9s ago  
46668da8        app     10      syd     stop    failed                          2               11m47s ago
aaea3788        app     9       syd     run     failed                          2               1h29m ago 
1edd2861        app     8       syd     stop    failed  1 total                 2               1h38m ago 
a80d3348        app     7       syd     run     failed                          2               17h19m ago
70388d42        app     6       syd     stop    failed  1 total, 1 critical     2               17h20m ago
4da5a949        app     5       syd     stop    failed  1 total                 2               17h23m ago
45babaf7        app     4       syd     stop    failed                          2               17h25m ago
d003e9b9        app     3       syd     stop    failed                          2               17h26m ago
f073b4d5        app     2       syd     stop    failed  1 total                 2               17h27m ago
0073aab7        app     1       syd     stop    failed  1 total                 2               17h29m ago
262b764e        app     0       syd     stop    failed                          2               17h34m ago

root@CloudManager:~/.fly/bin/smokeping# /root/.fly/bin/flyctl vm status e2521e03
Instance
  ID            = e2521e03   
  Process       = app        
  Version       = 11         
  Region        = syd        
  Desired       = run        
  Status        = failed     
  Health Checks = 1 total    
  Restarts      = 2          
  Created       = 6m46s ago  

Events
TIMESTAMP               TYPE            MESSAGE                                                         
2022-09-23T23:16:40Z    Received        Task received by client                                        
2022-09-23T23:16:40Z    Task Setup      Building Task Directory                                        
2022-09-23T23:16:46Z    Started         Task started by client                                         
2022-09-23T23:16:48Z    Terminated      Exit Code: 100                                                 
2022-09-23T23:16:48Z    Restarting      Task restarting in 1.088518721s                                
2022-09-23T23:16:57Z    Started         Task started by client                                         
2022-09-23T23:16:59Z    Terminated      Exit Code: 100                                                 
2022-09-23T23:16:59Z    Restarting      Task restarting in 1.169924683s                                
2022-09-23T23:17:07Z    Started         Task started by client                                         
2022-09-23T23:17:09Z    Terminated      Exit Code: 100                                                 
2022-09-23T23:17:09Z    Not Restarting  Exceeded allowed attempts 2 in interval 5m0s and mode is "fail"
2022-09-23T23:17:09Z    Alloc Unhealthy Unhealthy because of failed task                               
2022-09-23T23:17:10Z    Killing         Sent interrupt. Waiting 5s before force killing                

Checks
ID                                      SERVICE         STATE   OUTPUT 
230cd37130e7f9c96f951ed2ad499c8f        tcp-8080        warning       

Recent Logs
root@CloudManager:~/.fly/bin/smokeping# 

Is it that final line there, tcp-8080 ?
(30s wasn’t the fix sadly)

Ah this is what you want to notice:

Exit Code: 100

The process is exiting with code 100. It seems like it’s not writing anything to stdout before it does that, though, so it’s hard to tell why it’s crashing.

Any other way I could diagnose?

What I’d do is override the command for the app with sleep 1000. You can try adding this to your fly.toml:

[experimental]
cmd = ["sleep", "1000"]

[services]

Make sure you delete the [services] block contents, so it doesn’t try to perform network health checks.

Then SSH to the VM and try running your command by hand.

1 Like