I am looking for help as I am a bit lost when it comes to the various options of scaling my app. I have an app with a postgres db in the background. Lately, I have been experiencing performance issues, i.e. it takes a long time for pages to load or at times they don’t even load at all. I have looked at the documentation for scaling the app but I honeslty don’t know which option would be best.
Would love to hear about your experiences on what helped to increase the performance of your app in terms of the various scaling options available through fly.io.
What kind of information/details would be helpful? Apologies, but I am not an expert at all and thus wouldn’t even know what you needed to know to provide some guidance. What I can tell you is my app is a django-based app with a postgres database running in the backend. My frontend is very simple, i.e. only html, bootstrap, a bit of custom css and very little javascript.
Please let me know what you else you need to know to provide some guidance on improving the performance through the scaling options that fly.io provides.
Memory usage, how much traffic your site is getting to start. Is the front and back end running in the same app or are they 2 separate apps talking to each other.
You said your front end is simple but it’s not loading sometimes, w/o much info it doesn’t sound like a infra scaling problem, but rather your app’s architecture.
@neptunhiker If you don’t know where to look, you might want to integrate Sentry and enable tracing/performance. This will show you which part of the request handling is slow, and if there are any crashes.
You will be asked for your app’s technology when creating a new project. Select Django to get hints and documentation specific to Django.
Thank you both @ktosiek and @khuezy . I have managed to set up sentry for performance monitoring but will need some time to make sense of the information that I can get out of it.
@khuezy to answer your questions: My app is getting very little traffic, maybe 5 - 10 users per day use it at peak times. And yes, the front and back end are running in the same app.
In the meantime, I have had calls to my app that didn’t work out at all. I have taken a look at the logs and found the following:
Waiting for logs...
2024-11-16T21:57:30.514 app[...] ams [info] [2024-11-16 21:57:30 +0000] [323] [CRITICAL] WORKER TIMEOUT (pid:330)
2024-11-16T21:57:30.953 proxy[...] ams [error] [PU02] could not complete HTTP request to instance: connection closed before message completed
2024-11-16T21:57:30.963 app[...] ams [info] [2024-11-16 22:57:30 +0100] [330] [INFO] Worker exiting (pid: 330)
2024-11-16T21:57:31.526 app[...] ams [info] [2024-11-16 21:57:31 +0000] [323] [ERROR] Worker (pid:330) was sent SIGKILL! Perhaps out of memory?
2024-11-16T21:57:31.528 app[...] ams [info] [2024-11-16 21:57:31 +0000] [348] [INFO] Booting worker with pid: 348
As you can see there does seem to be some kind of memory issue which is the reason why I thought that possibly it would help to scale my app to have greater memory.
Can you make sense of the logs and indicate what I could do to resolve this issue?
You should think about your app’s architecture. If you couple your db w/ your app, how are you going to scale them independently? (You won’t be able to since they’re coupled)
What is the app doing? A typical web app, with this kind of traffic, should fit easily (database and all) in the smallest fly machines.
Do you see a reason for your app to require a lot of memory?
Of course, you can move to a bigger machine. But if the memory usage scales with your overall data size then you might run out of bigger machines (or budget) pretty soon
To answer your questions: The app is a customized CRM system, i.e. the database contains all kinds of information about clients, sessions with those clients, documentation and other typical CRM tasks. So not complex.
“Do you see a reason for your app to require a lot of memory?” The page that is not loading anymore (502 error) requires a lot of db queries as that page is a dashboard-like page that shows various information about the CRM. The dashboard page has around 20 functions that query the database in different ways to be able to present the information on the dashboard. Maybe that is indeed the reason I am getting the below error in the logs:
2024-11-18T06:35:13.320 app[...] ams [info] [2024-11-18 06:35:13 +0000] [323] [INFO] Using worker: sync
2024-11-18T06:35:13.324 app[...] ams [info] [2024-11-18 06:35:13 +0000] [329] [INFO] Booting worker with pid: 329
2024-11-18T06:35:13.348 app[...] ams [info] [2024-11-18 06:35:13 +0000] [330] [INFO] Booting worker with pid: 330
2024-11-18T06:35:14.318 proxy[...] ams [info] machine became reachable in 2.197278361s
2024-11-18T06:35:52.498 app[...] ams [info] [2024-11-18 06:35:52 +0000] [323] [CRITICAL] WORKER TIMEOUT (pid:329)
2024-11-18T06:35:53.526 app[...] ams [info] [2024-11-18 06:35:53 +0000] [323] [ERROR] Worker (pid:329) was sent SIGKILL! Perhaps out of memory?
2024-11-18T06:35:53.523 proxy[...] ams [error] [PU02] could not complete HTTP request to instance: connection closed before message completed
2024-11-18T06:35:53.534 app[...] ams [info] [2024-11-18 06:35:53 +0000] [342] [INFO] Booting worker with pid: 342
However, the database in production has about 100 clients or so. So, it is not an enormous amount of data which is why I am so surprised that it runs out of memory. Also, in my test environment the page loads more or less immediately, i.e. within half a second or so.
@khuezy to come back to your question whether my db is independent. I am actually not sure whether I gave you the right answer. In fly.io my app appears as an app and my database also appears as an app. So, I guess they are two separate apps that are connected. Is that what you meant to ask?
Can you disable parts of this dashboard page? This would help with narrowing the investigation a bit: can the page load with just first half of widgets? Can it load with the other half? If just one half loads, but not the other, then you can continue by disabling parts of the failing half.