Blog Post

VACUUM, Autovacuum, and Bloat — The Plain-English Guide

If you work with PostgreSQL long enough, you’ll eventually hear someone say: “Did you run VACUUM?” It sounds mysterious, but VACUUM is actually one of the most important maintenance operations in PostgreSQL. This guide explains what VACUUM does, why autovacuum sometimes fails, what bloat really is, and how to check your database for problems — in simple, plain-English terms.

Why PostgreSQL Needs VACUUM

PostgreSQL uses MVCC (Multi-Version Concurrency Control). Instead of overwriting rows when you run an UPDATE or DELETE, PostgreSQL creates new row versions and marks the old ones as “dead” and no longer visible. This keeps reads fast and avoids blocking — but it leaves behind wasted space.

VACUUM cleans up that space and keeps your database healthy.

  • Removes dead row versions so tables don't grow forever.
  • Updates visibility maps so index-only scans can run efficiently.
  • Prevents transaction ID wraparound — which can shut down the entire cluster if ignored.

What Is Bloat?

Bloat is wasted space caused by a buildup of dead rows. It makes tables larger on disk, slows down sequential scans, and increases I/O. Every busy database has some bloat — but when VACUUM can’t keep up, performance suffers.

Common symptoms of bloat include:

  • Tables or indexes are large on disk but contain relatively few rows.
  • Queries slow down even when proper indexes exist.
  • Backups take longer than expected.
  • Sequential scans read far more data than they should.

Bloat affects indexes as well. Frequent UPDATE and DELETE activity can leave behind dead index entries, causing index files to grow and slowing down index scans even when the underlying table is small.

Autovacuum: PostgreSQL’s Automatic Cleanup Worker

PostgreSQL includes a background process called autovacuum that continuously removes dead rows as tables change. It decides when to run based on thresholds set in postgresql.conf:


autovacuum_vacuum_threshold    -- minimum number of dead rows
autovacuum_vacuum_scale_factor -- % of table that must change

On small tables, autovacuum triggers frequently. On very large tables, the default settings may delay autovacuum for far too long — allowing bloat to build up.

Autovacuum also runs ANALYZE, which updates PostgreSQL’s planner statistics so the optimizer can choose efficient query plans. Disabling autovacuum can cause stale statistics and slower queries. So DON'T disable it!

When Autovacuum Falls Behind

Autovacuum struggles with:

  • Tables receiving heavy UPDATE or DELETE activity.
  • Long-running transactions that block cleanup.
  • Large tables combined with high scale factors.
  • Autovacuum workers being too slow or too few.
  • Frequent bulk loads or large batch operations.

When autovacuum can’t keep up, dead rows accumulate and bloat grows. That’s when you’ll need manual cleanup or more aggressive autovacuum settings.

How to Check for Bloat

You can check your tables and indexes using PostgreSQL’s system views and helpful extensions.

1. Check dead row counts


SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

If n_dead_tup is very high relative to n_live_tup, autovacuum may be falling behind.

2. Use pgstattuple for real bloat estimation


CREATE EXTENSION IF NOT EXISTS pgstattuple;

SELECT *
FROM pgstattuple('public.rental');

This shows the actual bloat statistics for the table. The key columns to pay attention to are:

  • table_len - Total size of the table in bytes.
  • tuple_len - Size taken up by live (active) rows.
  • dead_tuple_len - Size taken up by dead rows that VACUUM can reclaim.
  • dead_tuple_percent - Percentage of the table occupied by dead rows.
  • free_space - Additional unused space inside pages.

The most important indicator of bloat is dead_tuple_percent — higher values mean VACUUM isn’t keeping up and the table may need more aggressive autovacuum settings or manual maintenance.

3. Find your largest tables


SELECT relname, pg_size_pretty(pg_total_relation_size(relid))
FROM pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

Large, heavily updated tables are the most common sources of bloat.

Manual VACUUM Options

When you need immediate cleanup, you can run VACUUM manually:


VACUUM;           -- standard cleanup, non-blocking
VACUUM ANALYZE;   -- also updates planner statistics
VACUUM VERBOSE;   -- prints detailed output

To completely eliminate bloat, you can use:


VACUUM FULL;

This rewrites the table and recovers all wasted space, but it takes an exclusive lock and will stop writes while it runs. Use with care and caution!

A normal VACUUM does not usually shrink the table file on disk — it simply marks space as reusable inside the table. To return disk space to the operating system, you must use VACUUM FULL or a table rewrite.

How to Prevent Bloat

  • Lower autovacuum scale factors on very large tables. Autovacuum will run more often instead of waiting for huge amounts of dead rows to accumulate.
  • Increase autovacuum cost limits so workers run faster.
  • Avoid long-running transactions that block cleanup.
  • Use batch updates instead of row-by-row operations. This reduces the number of row versions PostgreSQL must create, which slows the rate at which dead tuples pile up.
  • Use UNLOGGED tables for temporary or transient data. These tables will skip WAL logging, which reduces write amplification and helps keep heavily churned data from bloating as quickly.

Conclusion

VACUUM is not optional — it’s a core part of PostgreSQL’s MVCC architecture. When autovacuum keeps up, your database stays lean and fast. When it doesn’t, bloat builds up and performance drops.

By understanding how dead rows accumulate, how bloat forms, and how to monitor table health, you’ll be able to prevent issues before they become serious — and keep your PostgreSQL systems running smoothly!