Managing bloat in PostgreSQL using pgstattuple on Azure Database for PostgreSQL flexible server

Plug, Play, and Prey: The security risks of the Model Context Protocol

May 8, 2025

Neon Serverless Postgres is Now Generally Available!

May 8, 2025

Published by azurefeeds on May 8, 2025

Tags

Bloat refers to the unused space within database objects like tables and indexes, caused by accumulated dead tuples that have not been reclaimed by the storage engine. This often results from frequent updates, deletions, or insertions, leading to inefficient storage and performance issues. Addressing bloat is crucial for maintaining optimal database performance, as it can significantly impact storage efficiency, increase I/O operations, reduce cache efficiency, prolong vacuum times, and slow down index scans.

In this blog post, I will walk you through how to use the pgstattuple extension in PostgreSQL to analyze and understand the physical storage of your database objects. By leveraging pgstattuple, you can identify and quantify the unused space within your tables and indexes. We will guide you through analyzing bloat, interpreting the results, and addressing the bloat to optimize your database and improve its performance.

I will be using the pg_repack extension as an alternative to VACUUM FULL and REINDEX. pg_repack is a PostgreSQL extension that removes bloat from tables and indexes and reorganizes them more efficiently. pg_repack works by creating a new copy of the target table or index, applying any changes that occurred during the process, and then swapping the old and new versions atomically. pg_repack doesn’t require any downtime or exclusive access locks on the processed table or index, except for a brief period at the beginning and at the end of the operation. Performing a full table repack requires free disk space about twice as large as the target table(s) and its indexes. For example, if the total size of the tables and indexes to be reorganized is 1GB, an additional 2GB of disk space is required. For pg_repack to run successfully on a table you must have either a PRIMARY KEY or a UNIQUE index on a NOT NULL column.

Let us dive in and see how you can make the most of this powerful tool.

To get more details on the bloat on an Azure Database for PostgreSQL flexible server, you can follow these steps:

1. Installing pgstattuple

Add pgstattuple to the azure.extensions server parameter.

You must install the extension on the database in which you want to analyze the bloat. To do so, connect to the database of interest and run the following command:

CREATE EXTENSION pgstattuple;

2. Analyze the table/index

Once the extension is installed, you can use the pgstattuple function to gather the detailed statistics to analyze the bloat on test table. The function provides information such as the number of live tuples, dead tuples, and the percentage of bloat, free space within the pages.

To showcase the pgstattuple extension features, I have used a 4 Vcore SKU with PG version 16, created a test table with an index, loaded it with 24 Gb data and kept on generating bloat by performing some update/delete commands on the table leading to bloat.

Creating a test table using the below script.

CREATE TABLE test_table (
id bigserial PRIMARY KEY,
column1 text,
column2 text,
column3 text,
column4 text,
column5 text,
column6 text,
column7 text,
column8 text,
column9 text,
column10 text);

Loading the table with 24 GB data using the below script.

INSERT INTO test_table (column1, column2, column3, column4, column5, column6, column7, column8, column9, column10)
SELECT
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text)
FROM generate_series(1, 22000000);

Create an index on this table to depict the usage of pgstatindex function to analyze bloat on an index.

CREATE INDEX idx_column1 ON test_table (column1);

Run functions pgstattuple on the above table and pgstatindex on the above index without bloat.

pgstattuple on unbloated table

You can observe the dead_tuple_percent is 0 and free_percent is 4.3 referring to the healthy state of a table.

pgstatindex on unbloated index

You can observe the avg_leaf_density is 89.07% and leaf_fragementation is 8.1 referring to healthy state of the index.

Using the command below, I am generating bloat on the table.

UPDATE test_table
SET column1 = md5(random()::text), column2 = md5(random()::text), column3 = md5(random()::text), column4 = md5(random()::text), column5 = md5(random()::text), column6 = md5(random()::text)
WHERE id % 5 = 0;

You can analyze a table using the below query:

SELECT * FROM pgstattuple(‘your_table_name’);

What it does: It performs a full scan to gather detailed statistics

Performance: On large tables the performance is slower and might take some seconds to minutes depending on the table size due to the full table scans

Use case: To diagnose bloat or planning vacuuming strategies like (performing VACUUM/VACUUM FULL)

To achieve faster estimates you can use,

SELECT * FROM pgstattuple_approx(‘your_table_name’);

What it does: Uses sampling to estimate statistics about the table.

Accuracy: Results are close but not exact

Performance: On large tables the performance is faster as it only considers a sample (subset of pages).

Use case: quick insights

The function provides the following information about the table.

Column	Description
table_len	Table length in bytes (tuple_len+dead_tuple_len+frees_space) and the overhead accounts for the padding (for tuple alignment) and page header (for per page table pointers)
tuple_count	Number of live tuples
tuple_len	Length of live tuples in bytes
tuple_percent	Percentage of live tuples
dead_tuple_count	Number of dead tuples
dead_tuple_len	Length of dead tuples in bytes
dead_tuple_percent	Percentage of dead tuples
free_space	Total free space in bytes within the pages
free_percent	Percentage of free space within the pages

You should mainly concentrate on the below 3 columns to understand the table bloat and the unused space.

dead_tuple_percent column tells us the percentage of dead tuples in the table. It is calculated as below.

dead_tuple_percent = dead_tuple_len / table_len * 100

This can be reduced by running VACUUM on the table. However, VACUUM does not reclaim the space. Hence, free_space and free_percent would increase after the VACUUM.

free_space and free_percent depict the unused/wasted space within the pages. The space can be reclaimed only by performing a VACUUM FULL on the table. If you see high free_percent it depicts the table needs VACUUM FULL (here instead you could use pg_repack) to reclaim the space.

If you observe a dead_tuple_percent anything > 20% you would need to run VACUUM on the table. However, if you observe a free_percent > 50% you would need to run VACUUM FULL on the table.

The below 3 snips depict the pgstattuple function run on a bloated table, vacuumed table and the output after pg_repack run on the table.

Pgstattuple function run on the bloated table.

pgstattuple function run on the table after vacuuming the table.

Note: Here you also see a difference in tuple_count as I have performed some delete statements which are not captured in the document. Hence you see a tuple count difference.

pgstattuple function on after running pg_repack on the table.

Summary of Changes with vacuum and pg_repack run on a bloated table:

dead_tuple_count: Reduced to 0 after VACUUM

dead_tuple_len: Reduced to 0 after VACUUM

dead_tuple_percent: Reduced after VACUUM from 21% to 0

free_space: Increased after VACUUM but significantly reduced after pg_repack

free_percent: Increased after VACUUM but drastically reduced after pg_repack

Similarly, to analyze an index, you can run:

SELECT * FROM pgstatindex(‘your_index_name’);

The function provides the following information about the index.

Column	Description
version	B-tree version number
tree_level	Tree level of the root page
index_size	Total number of pages in index
root_block_no	Location of root block
internal_pages	Number of “internal” (upper-level) pages
leaf_pages	Number of leaf pages
empty_pages	Number of empty pages
deleted_pages	Number of deleted pages
avg_leaf_density	Average density of leaf pages
leaf_fragmentation	Leaf page fragmentation

Low avg_leaf_density implies underutilized pages. It denotes the percentage of good data in index pages. After VACUUM this column value would further go down as cleaning the dead tuples in the indexes reduces the leaf density further pointing us to increase in unused/wasted space. To reclaim the unused/wasted space REINDEX needs to be performed for the index to be performant.

High leaf_fragmentation implies Poor data locality within the pages again a REINDEX would help.

If you see avg_leaf_density anything < 20% you would need perform REINDEX.

The below 3 snips depict the pgstatindex function run on a bloated table index, vacuumed table index and the output after pg_repack run on the table index.

pgstatindex ran on bloated table

pgstatindex run on index after Vacuuming the table

pgstatindex run on index after pg_repack run

Summary of Changes with vacuum and pg_repack run on a bloated index:

index_size: Remained high after VACUUM but significantly reduced after pg_repack as the unused/wasted space is reclaimed

avg_leaf_density: Reduced significantly after VACUUM from 80%-14% depicting a smaller number of good data on the leaf page and increased to 89% after pg_repack

leaf_fragmentation: Remained the same after VACUUM and reduced to 0 after pg_repack depicting no page fragmentation happening

Note:

The index size is a sum of leaf_pages, empty_pages, deleted_pages and internal_pages.

pgstattuple acquires a read lock on the object (table/index). So the results do not reflect an instantaneous snapshot; concurrent updates will affect them.

For more information on pgstatginindex and pgstathasindex functions refer to PostgreSQL documentation here.

For more insights on pgstattuple with respect to TOAST tables, please refer to the relevant documentation here.

3. Addressing the bloat

Once you have identified a bloat, you can address it by taking the following steps. Below are common approaches.

VACUUM: Clears dead tuples without reclaiming the space.

Pg_repack: Performs VACUUM FULL and REINDEX online and efficiently reorganizes the data.

Note: Other unused space, like the one left in heap or index pages due to a configured fill factor lower than 100 or because the remaining available space on a page cannot accommodate a row, given its minimum size, is not considered bloat, while it’s also unused space.

Conclusion

This blog post guided you through understanding and managing bloat in PostgreSQL using the pgstattuple extension. By leveraging this tool, you were able to gain detailed insights into the extent of bloat within their tables and indexes. These insights prove valuable in maintaining efficient storage and ensuring optimal server performance.

Plug, Play, and Prey: The security risks of the Model Context Protocol

Neon Serverless Postgres is Now Generally Available!

Plug, Play, and Prey: The security risks of the Model Context Protocol

Neon Serverless Postgres is Now Generally Available!

Related posts

Understanding the ICL impact [Under the draft, reviewing by the team]

Announcing General Availability of Microsoft Purview SDK and APIs

Deploying macOS FileVault with Microsoft Intune