GraphRAG and PostgreSQL integration in docker with Cypher query and AI agents

Integrating Microsoft Defender for Identity Signals with Entra Recommendations

June 5, 2025

[Launched] Generally Available: NFS Azure Files volume mount support in Azure Container Apps

June 5, 2025

Published by azurefeeds on June 5, 2025

Tags

Why should I care?

⚡️In under 15 minutes, you’ll have a Cypher-powered, semantically rich knowledge graph you can query interactively.

How can graphRAG help?

GraphRAG extracts structured knowledge from raw, unstructured data like .txt files by building a knowledge graph. This enables more precise and context-aware retrieval, making it easier to surface relevant insights from messy or disconnected content.

What are the challenges?

While the standard graphRAG indexing process typically expects input and output directories, some users already store their data in a DB (database) and prefer to run graphRAG directly against using the DB for both input and output. This eliminates the need for intermediate blob storage and simplifies the pipeline. Additionally, customers often request support for Cypher queries and aim to build AI agents that can leverage the rich, structured data graphRAG provides.

Why PostgreSQL?

PostgreSQL, when extended with Apache AGE, enables native Cypher query support, making it a powerful query engine for this kind of graph-enhanced retrieval. This brings us to the question: how to integrate them together, PostgreSQL with AGE, GraphRAG and AI agents? That’s where our solution comes in.

Here we are:

In this blog, we introduce the solution by highlighting the following:
1. Use case example
2. Solution summary
3. Implementation
4. How to run the services step by step
5. Query examples, including Cypher query, graphRAG query and vector search
6. AI agent example

Use case example

An example use case can be like this, say you have lots of product information which are scattered in many .txt files, you want to extract the structured information out of them, like product name, size, color, manufacturing date, etc., then you can run query to search for products with specific attributes, the query maybe multi-hop, linking one product to another product.
In this solution we use part of ‘Kevin Scott’s Behind the Tech podcasts’ dataset as https://www.microsoft.com/en-us/behind-the-tech as input; this dataset features in-depth conversations with innovators, engineers, and tech leaders, offering insights into the people and ideas shaping the future of technology. Please refer to the Github repo shown on the top of the blog to see the source code, data and queries.

Solution summary

Our solution is to integrate these functionalities into a docker container, with postgres image as base and adding other needed modules on top. Below is the architecture. By adding AGE package to PostgreSQL, the docker is equipped with Cypher query capability. GraphRAG, python, Jupyter, semantic-kernel and other modules can be added as well, so the docker image has everything needed to run graphRAG application, Cypher query, then user can build AI agent on top. Two volumes are created to persist postgres data and app related data.

The services can be implemented in Python scripts or Jupyter notebooks. Some components handle data movement between the DB and the Docker container, either pulling raw data into the pipeline or pushing processed results back into the DB. This setup enables seamless integration between graphRAG and PostgreSQL, supporting both indexing and querying without relying on another intermediate blob storage.
Here we chose PostgreSQL+AGE over Neo4j for several reasons: running Neo4j requires pre-installed Java Virtual Machine; you may need license for enterprise edition; Neo4j doesn’t support traditional relational database operations as much as PostgreSQL, etc.

Implementation:

The source code is available in Github repo:
https://Github.com/Azure-Samples/PostgreSQL-graphRAG-docker

Here is the source code folder structure, the root folder for this project is: postgreSQL-graphRAG-docker.

The following items are needed before building the services:
1. Dockerfile
2. The .env file
3. settings.yaml
4. docker-compose.yaml

The sample files are provided in the Github repo.

⚡️In Dockerfile, it’s now using postgres version 16 and corresponding AGE package, because AGE is now supported by postgres up to version 16.

Below is an .env example. MY_DB is the DB where you have the input data and plan to store or backup the graphRAG output. In this solution, we provided two DB examples: Azure SQL database and Azure hosted PostgreSQL; you can add or replace with your own DB. The AGE_HOST is the local docker postgres which will be used as query engine.

All services are defined in docker-compose.yaml. There are seven services:

> docker compose up postgres
> docker compose up load-data
> docker compose up graphrag-index
> docker compose up graphrag-writer
> docker compose up build-graph
> docker compose up query-notebook
> docker compose up reconstruct-graph

Each service in the solution has its own source code, input or configuration requirements (i.e., .env, settings.yaml). The architecture is designed to be operated using “docker compose”, allowing you to run individual services as needed. A key advantage of this setup is that you don’t need to rebuild the Docker image every time you update the source code, input or configuration — changes can be picked up dynamically, streamlining development and iteration.

QuickStart

Please refer to the Github repo for README, source code and this blog together to get all details.
https://Github.com/Azure-Samples/PostgreSQL-graphRAG-docker

Step 0 – insert .txt input to DB if not already existingRun python code to insert .txt files to DB. This code does not run inside the docker. It’s extra if you want to import .txt input from local folder to DB first.

> python insert-table.py

If you have your input data in DB already, then you don’t need to have the folder PostgreSQL-graphRAG-docker/data/input.

Step 1 – build docker image

This step builds docker images to include all modules needed to run services later.

> docker build -t graphrag-img .

⚡️After a few minutes, the image will be built. Then you will be ready to run the services.

Step 2 – run postgres service

This step starts postgres service, which other services depend on.
> docker compose up postgres

⚡️In the screenshot above, you can see a container called ‘postgres’ being spun up.

Step 3.1 – run ‘load-data’ service

This step loads data from DB to docker folder /app/graphrag-folder/input as graphRAG input. This solution provides examples of loading from Azure SQL database and Azure PostgreSQL DB.
> docker compose up load-data

Step 3.2 – check data is present in postgres server

This sub-step checks docker folder /app/graphrag-folder/input and verify graphRAG input.

> docker exec -it postgres bash

Step 4.1 – build graphrag index

This step runs ‘graphrag index –root /app/graphrag-folder’ to build index, it runs its workflow, generates parquet, graphml, embeddings and other related output and store them in /app/graphrag-folder/output.

Below is the graphrag-index service defined in the docker-compose.yaml.

The configuration settings.yaml and prompts are both mounted to the docker. If you change the content of these two files, you don’t need to rebuild the image.

How to run:

> docker compose up graphrag-index

⚡️A new container called ‘graphrag-index-app’ being spun up, as defined in docker-compose.yaml.

Step 4.2 – check data presence in postgres server

This sub-step verifies ‘graphrag index’ output in /app/graphrag-folder/output. To do that, login to the container.

> docker exec -it postgres bash

Step 5 – write index output to DB

This step stores the index output to DB, as docker backup.
The content in /app/graphrag-index/output will be saved into DB. This solution provides examples of writing output to Azure SQL database and Azure PostgreSQL DB.

> docker compose up graphrag-writer

Step 6 – build graph, prepare for Cypher query

This step builds a graph using AGE on postgres in docker, to prepare for Cypher query.

> docker compose up build-graph

The graph content is in PostgreSQL ag_catalog in docker, named graphRAG. To check the presence, login to postgres container.

Be The image below illustrates the number of relationships in both directions: bi-directional relationships (MATCH ()–()) total 1,380, while uni-directional relationships (MATCH ()->()) account for 690. This highlights the structure and connectivity of the graph data extracted during the GraphRAG indexing process.

> docker exec -it postgres bash

Step 7 – run query in Jupyter notebook

This step runs Jupyter notebook in docker.
> docker compose up query-notebook

After clicking the link highlighted in the above screen shot, you can explore all files within the project in the docker, then find the query-notebook.ipynb. The kernel of the notebook is automatically set up because all software needed is in the docker image.

Then enjoy the coding, try Cypher query, graphRAG query, vector search, AI agent and others in the notebook!

Query examples

query-notebook.ipynb is here:
https://Github.com/Azure-Samples/PostgreSQL-graphRAG-docker/blob/main/query-notebook.ipynb
1. Understand the Nodes and Edges/Relationships in the graph

The query-notebook.ipynb provides examples of checking nodes and relationships in the graph. Cypher query requires exact property names, and case sensitive.

The solution provides a method to visualize the graph in HTML file.

2. Multi-hop Cypher query

One advantage of using graphs is the multi-hop query. With the effective graph traversal capability, the graph can give richer information by linking entities/documents which regular queries can’t get.

One multi-hop query used in the example is “Who did KEVIN SCOTT mention that leads a company helping kids learn computer science?”

3. GraphRAG LocalSearch and GlobalSearch

This solution can also do regular graphRAG queries, in python method or in CLI. Because both python and Powershell modules were added to the docker image.

In the examples in the query-notebook.ipynb, there are variations with include_community_rank turned ON or OFF, and community_prop with different values. It shows that by including community information, the query will generate richer response.

You can re-run the same query originally written in Cypher, now leveraging the graphRAG approach.

4. Vector search for comparison

This solution provides vector search capability as well, since graphRAG indexing generates output including parquets, embeddings, graphml, etc. Vector search is another way of querying.

From the example query we can see, the vector search yields not very high scores judging by cosine similarity. Behind the scenes of the vector search is that it goes through the embeddings, but it doesn’t know the relationship between entities. Thus, the query result is restricted to the original text; it typically returns the top-k most similar original chunks. In other words, vector search is retrieval only, it returns what was said, not what it means. However, graphRAG combines vector search with a knowledge graph that encodes relationships between entities. The graph is built from LLM-extracted triples, which helps contextual thinking; thus, graphRAG provides more synthesized and thoughtful response.

Graph content storage, backup and restore

In this solution, graph metadata like nodes and edges are stored in docker postgres server in ag_catalog. To verify the graph content, login to the postgres container:

> docker exec -it postgres bash

The graph input and output are stored in the docker folder /app/graphrag-folder/input and /app/graphrag-folder/output; they are also backed up in the DB. In this example, we experimented with Azure SQL database and Azure hosted PostgreSQL as DB.

Login to the DB then see the content:
• For using Azure PostgreSQL as DB
> psql -h xxxxx.database.azure.com -p 5432 -U username -d postgres

• For using Azure SQL database as DB
Go to Azure Data Studio to see the Tables:

Graph data in docker containers survive on restarts. In step5, it shows how to back up the graphRAG output to your DB. If in any case, the docker folders are lost, you can use the service ‘reconstruct-graph’ to pull the data stored in DB to docker. This solution provides examples of “reconstruct-graph” from Azure SQL database and Azure PostgreSQL DB.

The command is:
> docker compose up reconstruct-graph

> docker exec -it postgres bash

AI agent services

https://Github.com/Azure-Samples/PostgreSQL-graphRAG-docker/blob/main/query-notebook.ipynb

This solution adds AI agents, semantic-kernel, Azure OpenAI related modules in the docker image, so the user can build and run AI agent services in Jupyter notebook. In the notebook provided Step 10, it builds an AI agent to summarize the podcasts’ content.
This is just one example of AI agents; many other agents can be built.

Summary:

This solution packages PostgreSQL, graphRAG, AGE, and AI agent capabilities into a single, lightweight Docker image. All necessary components are pre-built, making the image highly self-sufficient and easy to deploy. It offers a streamlined way for users to build AI applications by supporting Cypher queries, graphRAG reasoning, vector search, and AI agent services—all within one image. The setup is minimal, and the system delivers low-latency query performance across services.

What’s next?

This solution establishes a solid foundation for building AI-powered applications. There are several clear paths for enhancement and expansion:
• Plugin extensibility: The Docker image includes a dedicated /app/plugins directory, designed to support future integrations—such as adding https://Github.com/microsoft/semantic-kernel plugins for orchestration and reasoning.
• AI agent development: Expand the system by developing additional AI agents tailored to specific tasks, domains, or workflows. Multi-agents is the trend.
• Scaling options. Containerapp is a good direction.
• Experiment more DB types where the graphRAG input resides. So far, this solution provided examples of Azure SQL database and Azure hosted PostgreSQL as DB. If you have other types of DB, you may need to add relevant python modules in Dockerfile and requirements.txt, and make some adjustments in .env, insert-table.py, load-data.py, write-to-db.py and reconstruct-graph.py.

Please check out the Github repo and try out the solution.
https://Github.com/Azure-Samples/PostgreSQL-graphRAG-docker/

References:

There are other articles or blogs published earlier regarding PostgreSQL and/or graphRAG.

https://www.microsoft.com/en-us/behind-the-tech
Introducing the GraphRAG Solution for Azure Database for PostgreSQL | Microsoft Community Hub
Introducing graph database support in Azure Database for PostgreSQL
https://Github.com/Azure-Samples/graphrag-legalcases-postgres
https://learn.microsoft.com/en-us/azure/PostgreSQL/flexible-server/quickstart-create-server?tabs=portal-create-flexible%2Cportal-get-connection%2Cportal-delete-resources

Acknowledgement:

Thanks Abe Omorogbe and Maxim Lukiyanov for reviewing this blog.

Integrating Microsoft Defender for Identity Signals with Entra Recommendations

[Launched] Generally Available: NFS Azure Files volume mount support in Azure Container Apps

Integrating Microsoft Defender for Identity Signals with Entra Recommendations

[Launched] Generally Available: NFS Azure Files volume mount support in Azure Container Apps

Related posts

Plan your best summer bash yet with Microsoft Copilot

European AI and Cloud Summit 2025

Master Microsoft Forms: Your Ultimate Guide to Surveys, Quizzes, and Polls (Video)