How PostgreSQL Chooses a Query Plan (And Why It Gets It Wrong)
Ever run a query that was blazing fast yesterday… and painfully slow today — without changing a single line of SQL?
That isn’t PostgreSQL being random. It’s PostgreSQL doing exactly what it was designed to do.
To understand performance problems, you need to understand how PostgreSQL chooses a query plan — and why that choice is sometimes wrong.
The Planner Is Cost-Based (Not Rule-Based)
PostgreSQL doesn’t follow hard rules like “always use an index.” Instead, it uses a cost-based optimizer.
For every query, PostgreSQL:
- Generates multiple possible execution plans
- Estimates the cost of each plan
- Chooses the cheapest one
That cost is not measured in milliseconds. It’s an internal unit based on:
- Estimated rows processed
- Disk I/O vs memory access
- CPU work for joins, filters, and aggregates
The key word here is estimated.
Why Row Estimates Matter More Than Almost Anything
Nearly every bad plan starts with a bad row estimate.
PostgreSQL doesn’t know your data — it guesses based on statistics. If it thinks a filter returns 10 rows but actually returns 100,000, the chosen plan can be wildly inappropriate.
Example:
EXPLAIN
SELECT *
FROM rental
WHERE return_date IS NULL;
If PostgreSQL believes only a handful of rows match, it may choose an index scan.
If half the table actually matches, a sequential scan would be cheaper — but the planner doesn’t realize that.
PostgreSQL doesn’t adapt mid-query. Once the plan is chosen, it sticks with it.
Planning Time vs Execution Time
When you run EXPLAIN ANALYZE, you’ll see two important numbers:
- Planning Time
- Execution Time
Planning time is usually tiny — microseconds to a few milliseconds. PostgreSQL is optimizing, but not exhaustively.
Execution time is where bad estimates hurt. A plan that looked cheap on paper can be devastating in reality.
This is why a query can:
- Have a fast plan but slow execution
- Use an index and still perform poorly
- Suddenly switch from index scan to seq scan
“It Worked Yesterday” Is Not a Contradiction
PostgreSQL query plans are not guaranteed to be stable forever.
Any of the following can change a plan — without touching the query:
- Data volume growth
- Data distribution changes
- Vacuum or analyze timing
- Statistics updates
- Parameter values
That means:
“The same query with the same indexes can legitimately get a different plan tomorrow.”
PostgreSQL isn’t being inconsistent. It’s responding to a different statistical reality.
Why PostgreSQL Sometimes Chooses a Sequential Scan
Developers often panic when they see a sequential scan. That’s a mistake.
A sequential scan is often the correct choice when:
- A large percentage of the table is read
- The table fits in memory
- The index would require random I/O
Indexes are not free. They’re only useful when they eliminate enough work.
The planner chooses a sequential scan because — based on estimates — it believes that’s cheaper.
EXPLAIN vs EXPLAIN ANALYZE
EXPLAIN shows what PostgreSQL thinks will happen.
EXPLAIN ANALYZE shows what actually happened.
EXPLAIN
SELECT *
FROM rental
WHERE return_date IS NULL;
This is where you spot problems:
- Estimated rows vs actual rows
- Unexpected sequential scans
- Nested loops where hashes would be better
When estimates are off by orders of magnitude, the planner never had a chance.
The Real Enemy: Bad Statistics
PostgreSQL relies heavily on statistics gathered by ANALYZE. If those stats are stale or insufficient, the planner is flying blind.
This is why:
- Freshly loaded tables behave strangely
- Highly skewed data causes bad plans
- Default statistics aren’t always enough
PostgreSQL won’t warn you when statistics are bad — but it will tell you when they were last updated.
SELECT
relname AS table_name,
last_analyze,
last_autoanalyze,
n_live_tup
FROM pg_stat_user_tables
WHERE relname = 'rental';
If last_analyze and last_autoanalyze are NULL, PostgreSQL has never analyzed the table. If they’re old and the table changes frequently, the planner is working with stale information.
Indexes don’t fix bad statistics. Hints don’t exist. PostgreSQL can only choose correctly if it understands your data.
Conclusion
PostgreSQL doesn’t randomly choose bad plans. It makes the best decision it can — based on the information it has.
When performance degrades, don’t blame the planner. Ask:
- Are row estimates accurate?
- Has the data distribution changed?
- Are statistics current and detailed enough?
Once you understand how PostgreSQL thinks, query plans stop being mysterious — and start becoming something you can reason about, predict, and fix!