Plug, Play, and Prey: The security risks of the Model Context Protocol
May 8, 2025Neon Serverless Postgres is Now Generally Available!
May 8, 2025Bloat refers to the unused space within database objects like tables and indexes, caused by accumulated dead tuples that have not been reclaimed by the storage engine. This often results from frequent updates, deletions, or insertions, leading to inefficient storage and performance issues. Addressing bloat is crucial for maintaining optimal database performance, as it can significantly impact storage efficiency, increase I/O operations, reduce cache efficiency, prolong vacuum times, and slow down index scans.
In this blog post, I will walk you through how to use the pgstattuple extension in PostgreSQL to analyze and understand the physical storage of your database objects. By leveraging pgstattuple, you can identify and quantify the unused space within your tables and indexes. We will guide you through analyzing bloat, interpreting the results, and addressing the bloat to optimize your database and improve its performance.
I will be using the pg_repack extension as an alternative to VACUUM FULL and REINDEX. pg_repack is a PostgreSQL extension that removes bloat from tables and indexes and reorganizes them more efficiently. pg_repack works by creating a new copy of the target table or index, applying any changes that occurred during the process, and then swapping the old and new versions atomically. pg_repack doesn’t require any downtime or exclusive access locks on the processed table or index, except for a brief period at the beginning and at the end of the operation. Performing a full table repack requires free disk space about twice as large as the target table(s) and its indexes. For example, if the total size of the tables and indexes to be reorganized is 1GB, an additional 2GB of disk space is required. For pg_repack to run successfully on a table you must have either a PRIMARY KEY or a UNIQUE index on a NOT NULL column.
Let us dive in and see how you can make the most of this powerful tool.
To get more details on the bloat on an Azure Database for PostgreSQL flexible server, you can follow these steps:
1. Installing pgstattuple
Add pgstattuple to the azure.extensions server parameter.
You must install the extension on the database in which you want to analyze the bloat. To do so, connect to the database of interest and run the following command:
CREATE EXTENSION pgstattuple;
2. Analyze the table/index
Once the extension is installed, you can use the pgstattuple function to gather the detailed statistics to analyze the bloat on test table. The function provides information such as the number of live tuples, dead tuples, and the percentage of bloat, free space within the pages.
To showcase the pgstattuple extension features, I have used a 4 Vcore SKU with PG version 16, created a test table with an index, loaded it with 24 Gb data and kept on generating bloat by performing some update/delete commands on the table leading to bloat.
Creating a test table using the below script.
CREATE TABLE test_table (
id bigserial PRIMARY KEY,
column1 text,
column2 text,
column3 text,
column4 text,
column5 text,
column6 text,
column7 text,
column8 text,
column9 text,
column10 text);
Loading the table with 24 GB data using the below script.
INSERT INTO test_table (column1, column2, column3, column4, column5, column6, column7, column8, column9, column10)
SELECT
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text),
md5(random()::text)
FROM generate_series(1, 22000000);
Create an index on this table to depict the usage of pgstatindex function to analyze bloat on an index.
CREATE INDEX idx_column1 ON test_table (column1);
Run functions pgstattuple on the above table and pgstatindex on the above index without bloat.
pgstattuple on unbloated table
You can observe the dead_tuple_percent is 0 and free_percent is 4.3 referring to the healthy state of a table.
pgstatindex on unbloated index
You can observe the avg_leaf_density is 89.07% and leaf_fragementation is 8.1 referring to healthy state of the index.
Using the command below, I am generating bloat on the table.
UPDATE test_table
SET column1 = md5(random()::text), column2 = md5(random()::text), column3 = md5(random()::text), column4 = md5(random()::text), column5 = md5(random()::text), column6 = md5(random()::text)
WHERE id % 5 = 0;
You can analyze a table using the below query:
SELECT * FROM pgstattuple(‘your_table_name’);
- What it does: It performs a full scan to gather detailed statistics
- Performance: On large tables the performance is slower and might take some seconds to minutes depending on the table size due to the full table scans
- Use case: To diagnose bloat or planning vacuuming strategies like (performing VACUUM/VACUUM FULL)
To achieve faster estimates you can use,
SELECT * FROM pgstattuple_approx(‘your_table_name’);
- What it does: Uses sampling to estimate statistics about the table.
- Accuracy: Results are close but not exact
- Performance: On large tables the performance is faster as it only considers a sample (subset of pages).
- Use case: quick insights
The function provides the following information about the table.
Column |
Description |
table_len |
Table length in bytes (tuple_len+dead_tuple_len+frees_space) and the overhead accounts for the padding (for tuple alignment) and page header (for per page table pointers) |
tuple_count |
Number of live tuples |
tuple_len |
Length of live tuples in bytes |
tuple_percent |
Percentage of live tuples |
dead_tuple_count |
Number of dead tuples |
dead_tuple_len |
Length of dead tuples in bytes |
dead_tuple_percent |
Percentage of dead tuples |
free_space |
Total free space in bytes within the pages |
free_percent |
Percentage of free space within the pages |
You should mainly concentrate on the below 3 columns to understand the table bloat and the unused space.
- dead_tuple_percent column tells us the percentage of dead tuples in the table. It is calculated as below.
dead_tuple_percent = dead_tuple_len / table_len * 100
This can be reduced by running VACUUM on the table. However, VACUUM does not reclaim the space. Hence, free_space and free_percent would increase after the VACUUM.
- free_space and free_percent depict the unused/wasted space within the pages. The space can be reclaimed only by performing a VACUUM FULL on the table. If you see high free_percent it depicts the table needs VACUUM FULL (here instead you could use pg_repack) to reclaim the space.
If you observe a dead_tuple_percent anything > 20% you would need to run VACUUM on the table. However, if you observe a free_percent > 50% you would need to run VACUUM FULL on the table.
The below 3 snips depict the pgstattuple function run on a bloated table, vacuumed table and the output after pg_repack run on the table.
Pgstattuple function run on the bloated table.
pgstattuple function run on the table after vacuuming the table.
Note: Here you also see a difference in tuple_count as I have performed some delete statements which are not captured in the document. Hence you see a tuple count difference.
pgstattuple function on after running pg_repack on the table.
Summary of Changes with vacuum and pg_repack run on a bloated table:
- dead_tuple_count: Reduced to 0 after VACUUM
- dead_tuple_len: Reduced to 0 after VACUUM
- dead_tuple_percent: Reduced after VACUUM from 21% to 0
- free_space: Increased after VACUUM but significantly reduced after pg_repack
- free_percent: Increased after VACUUM but drastically reduced after pg_repack
Similarly, to analyze an index, you can run:
SELECT * FROM pgstatindex(‘your_index_name’);
The function provides the following information about the index.
Column |
Description |
version |
B-tree version number |
tree_level |
Tree level of the root page |
index_size |
Total number of pages in index |
root_block_no |
Location of root block |
internal_pages |
Number of “internal” (upper-level) pages |
leaf_pages |
Number of leaf pages |
empty_pages |
Number of empty pages |
deleted_pages |
Number of deleted pages |
avg_leaf_density |
Average density of leaf pages |
leaf_fragmentation |
Leaf page fragmentation |
- Low avg_leaf_density implies underutilized pages. It denotes the percentage of good data in index pages. After VACUUM this column value would further go down as cleaning the dead tuples in the indexes reduces the leaf density further pointing us to increase in unused/wasted space. To reclaim the unused/wasted space REINDEX needs to be performed for the index to be performant.
- High leaf_fragmentation implies Poor data locality within the pages again a REINDEX would help.
If you see avg_leaf_density anything < 20% you would need perform REINDEX.
The below 3 snips depict the pgstatindex function run on a bloated table index, vacuumed table index and the output after pg_repack run on the table index.
pgstatindex ran on bloated table
pgstatindex run on index after Vacuuming the table
pgstatindex run on index after pg_repack run
Summary of Changes with vacuum and pg_repack run on a bloated index:
- index_size: Remained high after VACUUM but significantly reduced after pg_repack as the unused/wasted space is reclaimed
- avg_leaf_density: Reduced significantly after VACUUM from 80%-14% depicting a smaller number of good data on the leaf page and increased to 89% after pg_repack
- leaf_fragmentation: Remained the same after VACUUM and reduced to 0 after pg_repack depicting no page fragmentation happening
Note:
- The index size is a sum of leaf_pages, empty_pages, deleted_pages and internal_pages.
- pgstattuple acquires a read lock on the object (table/index). So the results do not reflect an instantaneous snapshot; concurrent updates will affect them.
- For more information on pgstatginindex and pgstathasindex functions refer to PostgreSQL documentation here.
- For more insights on pgstattuple with respect to TOAST tables, please refer to the relevant documentation here.
3. Addressing the bloat
Once you have identified a bloat, you can address it by taking the following steps. Below are common approaches.
- VACUUM: Clears dead tuples without reclaiming the space.
- Pg_repack: Performs VACUUM FULL and REINDEX online and efficiently reorganizes the data.
Note: Other unused space, like the one left in heap or index pages due to a configured fill factor lower than 100 or because the remaining available space on a page cannot accommodate a row, given its minimum size, is not considered bloat, while it’s also unused space.
Conclusion
This blog post guided you through understanding and managing bloat in PostgreSQL using the pgstattuple extension. By leveraging this tool, you were able to gain detailed insights into the extent of bloat within their tables and indexes. These insights prove valuable in maintaining efficient storage and ensuring optimal server performance.