`Forbidden` error page when I try to access my app

Hey

So, out of the blue, when I try to access my app, I’m greeted with a page that says Forbidden and nothing else.
I haven’t changed anything, haven’t deployed anything new.
I noticed this error some time ago, ended up restarting the app and it started to work again but now I’m seeing this issue again, on a different app which makes me believe there is an issue on Fly.

Checking the logs, there are no requests reaching my app, the issue is probably on the load balancer or in between the app in the LB

Hmm, thank you for letting us know; we’ll be happy to help check things out. A couple of questions that might help us narrow stuff down off the bat:

  • Do you have rough idea (or even timestamps) that would indicate when you first saw the problem?
  • Are/were both of those apps deployed to the same region(s)?

Do you have rough idea (or even timestamps) that would indicate when you first saw the problem?

unfortunately no, I just noticed this morning and it was working fine yesterday…so between the last 24h (huge timespan, I know hahaha sorry)

Are/were both of those apps deployed to the same region(s)?

I have pretty much everything deployed to fra with some read replicas on gru and I’m currently in Berlin so I should be hitting fra

Thanks for that information! Taking a closer look, it seems like the issue might lie primarily with the logs: that 403 response is coming from your app, so traffic is being sent there correctly.

You might be able to investigate further by sshing into your app’s instances (fly ssh console $app-name) to check if there are errors with any listening services with tools like ss and curl.

Restarting the apps’ instances may also be a good initial troubleshooting step for its logs.

I hope this helps point you in the right direction!

I ssh’d into it but I can’t really see anything weird…I don’t really know exactly what to look for, this app has two dependencies: redis and postgres. both are up and running and logs are fine…
ss shows me nothing useful

Netid      State      Recv-Q      Send-Q                              Local Address:Port                                    Peer Address:Port
u_str      ESTAB      0           0                                               * 4349                                               * 0
u_str      ESTAB      0           0                                               * 4350                                               * 0
udp        ESTAB      0           0                                    172.19.4.130:33146                                   172.19.4.131:8125
tcp        ESTAB      0           0                [fdaa:0:6c8d:a7b:66:ec56:b79f:2]:45494                [fdaa:0:6c8d:a7b:23c4:1:4bbb:2]:6379
tcp        ESTAB      0           72               [fdaa:0:6c8d:a7b:66:ec56:b79f:2]:ssh                   [fdaa:0:6c8d:a7b:957f:0:a:300]:32435
v_str      ESTAB      0           0                                               3:1348482693                                         2:2531
v_str      ESTAB      0           0                                               3:10000                                              2:1073741828

to be honest I’m not even sure what to look for. trying to start a new server, I get the message that there is a server running already.

I ended up deploying a new version, couldn’t figure out what happened exactly. I noticed tho that on the new instance, when I run ss I see a postgres line on the Peer Address:Port column that wasn’t there before…maybe the app lost connection to the database and moved to a broken state (running but not working)?

maybe the app lost connection to the database and moved to a broken state (running but not working)?

That sounds plausible to me! The absence of a peer address for postgres does indicate that your app is having trouble finding it. I’d typically expect to see a 500 or 503 error in that case. Perhaps there’s even a typo somewhere?

If you run into this issue again, I’d definitely recommend checking that the db’s hostname resolves via fly dig, to rule out any issues with your app’s DNS server, to start. You could also test for other host-specific issues by manually scaling your app, to see if fresh instances are working.

1 Like

Another longshot, but if you’re running a rails app, you might want to give this recent post a look as it sounds sort of similar to the issue you were describing!

2 Likes

LOL goddamn, that’s precisely it!

I have a fail2ban filter to silence these automated pentesters script kiddies and looks like rack-attack’s request.ip method uses the Fly’s reverse proxy IP which is always the same. I could reproduce the problem simply by accessing one of the blocklisted path and cleaning the rack-attack cache solved it…

interesting problem tho, I will reach out to rack-attack folks

thanks a ton @eli :heart:

1 Like

I just hit this now and I had no clue what could be happening. Glad you got there before me :sweat_smile:

Do you have a rack attack config that works well with fly?

hey!
sorry for the very late reply, I’ve been really busy lately

Yeah, this one is working fine for me:

# frozen_string_literal: true

PENTEST_CONFIG  = { maxretry: 1, findtime: 10.minutes, bantime: 5.minutes }.freeze
THROTTLE_CONFIG = { limit: 5, period: 60 }.freeze

Rack::Attack.enabled = Rails.env.production?

Rack::Attack.throttle("limit decrypt attempts", THROTTLE_CONFIG) do |req|
  ip = req.env["HTTP_FLY_CLIENT_IP"]

  _controller, id, path = req.path.split("/").compact_blank

  "#{id}-#{ip}" if path == "d" && req.post?
end

Rack::Attack.blocklist("fail2ban pentesters") do |req|
  ip = req.env["HTTP_FLY_CLIENT_IP"]

  Rack::Attack::Fail2Ban.filter("pentesters-#{ip}", PENTEST_CONFIG) do
    CGI.unescape(req.query_string).include?("/etc/passwd") ||
      req.path.include?("/etc/passwd") || req.path.include?("wp-admin") ||
      req.path.include?("wp-includes") || req.path.include?(".git/config") ||
      req.path.include?("wp-login")
  end
end
2 Likes