PostgreSql Performance

Hello Fly Team,

I’m trying to get a postgres app up and running, but I am struggling to get reasonable performance. I am creating a table with the following python command:

df_.to_sql("tweets1", 
           engine, 
           if_exists="replace", 
           dtype={
                'id': BigInteger,
                'created_at': DateTime, 
                'author.username': String(15), 
                'author.id': BigInteger, 
                'referenced_tweets.replied_to.id': BigInteger, 
                'referenced_tweets.retweeted.id': BigInteger,
                'referenced_tweets.quoted.id': BigInteger, 
                'in_reply_to_user_id': BigInteger,
                "entities.mentions": JSON,
                "public_metrics.reply_count": BigInteger, 
                "author.public_metrics.followers_count": BigInteger, 
           }
)

which runs this sql command to create the table:

CREATE TABLE tweets1 (
	index BIGINT, 
	id BIGINT, 
	created_at TIMESTAMP WITHOUT TIME ZONE, 
	"author.username" VARCHAR(15), 
	"author.id" BIGINT, 
	"referenced_tweets.replied_to.id" BIGINT, 
	"referenced_tweets.retweeted.id" BIGINT, 
	"referenced_tweets.quoted.id" BIGINT, 
	in_reply_to_user_id BIGINT, 
	"entities.mentions" JSON, 
	"public_metrics.reply_count" BIGINT, 
	"author.public_metrics.followers_count" BIGINT
)

My table only has ~4500 rows in it, but it takes 30 seconds for select * from tweets1 to complete and return the data. This seems way slower than it should be. I’m not sure where I am going wrong. I wouldn’t think any optimizations are needed to get better performance for this use would be necesssary, which makes me think I am missing a very simple step.

My db is running with 2 dedicated cpu’s and 4096MB.

Could someone give me some basic db pointers to help me figure this out?

Well, unless you have megabytes of data in each row, this should take much less than a second.

I would debug in the following order:

  1. ssh into the db node to run psql with the query “select count(*) from tweets1”
  2. ssh into the db node to run psql with the query “select * from tweets1”
  3. ssh into the app node to run psql with the query “select count(*) from tweets1”
  4. ssh into the app node to run psql with the query “select * from tweets1”

You’ll have to connect with “psql -U youruser -h somehost” etc.

1 Like

@tj1 made good debugging suggestions. If the commands they suggested are fast, make sure your app and database are running in the same regions.

Does the command fly postgres connect tweetscape-follows-views achieve the same thing as ssh’ing into the db node?

When i do this, the count(*) query is fast, but the select * query is still very slow

No, I believe that is connecting from your machine which means you would be counting that network latency.

It’s a little of both. That command runs on the DB node itself, but then it has to send results over SSH to your local terminal.

Is select * from tweets1 limit 10 fast?

0.581 ms when running it with the timer on. (turning on timer with the \timing command at psql console).

That seems slower than it should be correct?

It’s hard to say. I can imagine queries that take 581ms. What are you getting for that on other systems? There’s nothing special about the Postgres you’re using. It should behave the same with the same RAM / CPU / disk as any other server has.

Without logging into the db server and running the query it’s tough to separate the network portion vs the db portion of the performance.

alright is is officially entirely a network issue on my end.

Just ran the query from my python app running in the same region as my db, and it returned in 0.04 seconds…

Is there a way for me to develop my app directly on a fly machine to avoid this network issue in the future?

Also, thank you both for your help!!! @tj1 @kurt

EDIT: just found Jupyter Notebook · Launch on Fly.io, I think this will take care of that for me!!