The index is created in postgres, but I wonder if you do seq scan instead of index scan.

The background of knowing the problem is that when selecting, the index was clearly created in consideration of the corresponding select statement, and in development db (local db), the index scan was well done as intended, so we were careless.

The execution time of the query is about three times longer because production db is doing seq scan...

Here's what the query looks like this.

SELECT * FROM "friends" WHERE "friends"."user_id" = 1234;

This is how the index is created.

CREATE UNIQUE INDEX index_friends_on_key_id_and_user_id ON friends USING btree (key_id, user_id);

This is the result of EXPLAIN.

Seq Scan on friends  (cost=0.00..97182.54 rows=75 width=1) (actual time=248.244..370.122 rows=64 loops=1)
  Filter: (user_id = 1234)
  Rows Removed by Filter: 3289314
Planning time: 0.044 ms
Execution time: 370.145 ms

Index Scan using index_friends_on_key_id_and_user_id on friends  (cost=0.43..117476.69 rows=75 width=1) (actual time=1.280..80.742 rows=64 loops=1)
  Index Cond: (user_id = 1234)
Planning time: 0.039 ms
Execution time: 80.791 ms

When I tried SET enable_seqscan = OFF; to force the index, it did index scan right away.

As shown above, the execution time differs by 3 times.

Why does the optimizer of postgres have to do index scan when SET enable_seqscan = OFF;

In addition, the local db does index scan as intended without SET enable_seqscan = OFF;. I'm doing seq scan on production db.

The local db is PostgreSQL 9.4.4 on x86_64-apple-darwin14.3.0, compiled by Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0 svn), 64-bit

The production db listed in awsrds is PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compressed by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit.

From my shallow knowledge, I think there's a difference in setting... How should we approach it?

postgresql database rdbms

2022-09-21 18:28

2 Answers

Depending on the number of data in the local DB and production DB, the optimizer will choose whether to index scan or sequential scan. If SELECT brings more than 5-10% of the total row count, sequential scan is said to be much faster. (Index scan may be slower because you have to read data blocks after reading index blocks.) Perhaps the optimizer determined that the number of rows selected in production DB is more than 5% of the total number of rows. Run ANALYZE table to update the statistics and check the execution plan for other results.

Also, I have a question. Since the where statement of the query you gave is user_id = 1234, isn't it better to put the index on (key_id, user_id) instead of (key_id, user_id)?

2022-09-21 18:28

In the case of a multicholum B-Tree index, it traverses in the order listed in the index column (i.e., in the order of key_id, user_id), so if the key_id does not have a condition, it may not take the index. If you change the index to (user_id, key_id), there will be a big improvement in performance because you always scan the index.

Please refer to the article below regarding the combination index.

http://databaser.net/moniwiki/wiki.php/%EA%B2%B0%ED%95%A9%EC%9D%B8%EB%8D%B1%EC%8A%A4%EC%9D%98%EC%82%AC%EC%9A%A9

2022-09-21 18:28

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656