From Individual Voices to Collective Insight

Resilience in action for Windows devices

July 23, 2025

Azure VMware Solution now available in Spain Central

July 23, 2025

Published by azurefeeds on July 23, 2025

Microsoft GraphRAG

Standard RAG (Retrieval-Augmented Generation) excels at retrieving facts but often misses the big picture. It struggles to synthesize insights across documents or understand large text collections holistically.

Microsoft’s GraphRAG solves this by using LLMs and machine learning graph analysis to build a knowledge graph from your data. This graph acts as a map, helping the LLM connect concepts and answer complex questions more effectively than snippet-based search.

With GraphRAG, you can ask broad questions like “What themes emerge across these reports?” and get synthesized, evidence-backed answers. It’s especially powerful for narrative datasets—like comments or logs, or 10,000 public responses on the proposed AI Action Plan—where understanding trends matters as much as retrieving facts.

A quick comparison of GraphRAG and LazyGraphRAG: one builds a full knowledge graph, the other delivers similar insights with lower cost and setup.

Our latest innovation is LazyGraphRAG, a leaner, more efficient fork of GraphRAG. While full GraphRAG builds and summarizes the entire knowledge graph upfront, LazyGraphRAG combines vector and graph search on the fly and defers heavy LLM analysis until needed at query time. This hybrid strategy delivers GraphRAG-level insights at a fraction of the cost making it practical even under tight budget or latency constraints.

Note: LazyGraphRAG is still experimental and remains an internal-only fork of our GraphRAG research project. You can still achieve similar results using the open-source GraphRAG library today to explore hidden connections and turn massive text into actionable knowledge. I suggest you use fast indexing and DRIFT search.

Step 1: (More) Data Prep

To prepare the data for GraphRAG, I needed to perform one final preprocessing step. GraphRAG constructs the knowledge graph from unstructured text, and thus, expects a single text field per document it ingests. Since GraphRAG can embed metadata directly into the graph, I split each JSON document from Blog Post 1 into two parts: the core text used to build the graph, and the metadata I wanted embedded in the graph’s nodes and edges. The table below lists the structured fields I created in Blog Post 1 and how I’ve mapped them for indexing by GraphRAG.

Field	Description	Core text, Metadata
responseTitle	Short descriptive title < 8 words.	Metadata
sentiment	Positive, Negative, Mixed, Neutral.	Metadata
summary	2–4 sentence overview of the response.	Core text
stance	1–2 sentences describing the stance expressed in the response.	Core text
topics	List of specific subject matter or domains.	Metadata
policyCategories	List of policy actions or approaches.	Metadata
keyRecommendations	List of actionable suggestions with names and descriptions.	Core text
keyRisks	List of concerns or warnings with names and descriptions.	Core text
briefExcerpt	1–2 key sentences quoted from the response.	Core text
stakeholderInfo	Type, domain or market, and anonymity.	Metadata
stakeholders	List of response authors including companies and individuals.	Both
argumentQuality	Rating from 1 to 5 on quality of argument.	Metadata
factualAccuracy	Rating from 1 to 5 on accuracy of presented facts.	Metadata
passionScore	Rating from 1 to 5 on emotion of response.	Metadata
humanReview	Yes/No for interesting or concerning responses.	Metadata

To make the dataset GraphRAG-ready, I wrote a script that transformed each JSON document into an array of JSON documents, one for each text-based field, and replicated the relevant metadata across each entry to ensure it could be embedded in the graph. This approach allowed each key insight—whether a summary, recommendation, or risk—to become node clusters in the graph, improving granularity and enabling more precise traversal and retrieval. Here’s a brief before-and-after example.

Before

{
“fileName”: “AI-RFI-2025-ABCD.pdf”,
“responseTitle”: “Concern Over Lack of AI Copyright Protections”,
“sentiment”: “Negative”,
“summary”: “The submission expresses strong concern that …”,
“stance”: “The response strongly opposes any AI Action Plan because …”,
“topics”: [ “Copyright”, “Labor”, “National Security” ],
“policyCategories”: [ “Regulation”, “Security” ],
“keyRecommendations”: [
[ “AI Regulation”, “Require AI companies to obtain permission …” ],
[ “Copyright Protection”, “Ensure AI systems and their developers respect …” ]
],
“keyRisks”: [
[ “Job Loss”, “AI deployment without regulation could put …” ],
[ “Cultural Erosion”, “Allowing companies to use creative …” ]
],
“briefExcerpt”: “”By giving companies the ability to … “”,
“stakeholderInfo”: { “type”: “Individual”, “domain”: “Tech”, “isAnonymous”: false },
“stakeholders”: [“John Doe”],
“argumentQuality”: { “score”: 3, “rationale”: “The argument is clear …” },
“factualAccuracy”: { “score”: 2, “rationale”: “The claims about copyright theft …” },
“passionScore”: { “score”: 4, “rationale”: “The language is strongly …” },
“humanReview”: { “requiresHumanReview”: true, “rationale”: “The response …” }
}

After

[
{ # example summary ——————————————————-
“text”: “AI-RFI-2025-ABCD.pdf expresses strong concern that …”,
# metadata
“title”: “Concern Over Lack of AI Copyright Protections”,
“metadata”: {
“filename”: “AI-RFI-2025-ABCD.pdf “,
“sentiment”: “Negative”,
“topics”: [“Copyright”, “Labor”, “National Security”],
“policyCategories”: [ “Regulation”, “Security”],
“stakeholderInfo”: {“type”: “Individual”, “domain”: “Tech”, “isAnonymous”: false},
“stakeholders”: [“John Doe”],
“argumentQuality”: {“score”: 3},
“factualAccuracy”: {“score”: 2},
“passionScore”: {“score”: 4},
“humanReview”: {“requiresHumanReview”: true}
}
},
{ # example recommendation ————————————————
“text”: “AI-RFI-2025-ABCD.pdf recommends AI Regulation. Require AI companies …”,
# metadata
“title”: “Concern Over Lack of AI Copyright Protections”,
“metadata”: {
“filename”: “AI-RFI-2025-ABCD.pdf “,
“sentiment”: “Negative”,
“topics”: [ “Copyright”, “Labor”, “National Security”],
“policyCategories”: [ “Regulation”, “Security”],
“stakeholderInfo”: { “type”: “Individual”, “domain”: “Tech”, “isAnonymous”: false},
“stakeholders”: [“John Doe”],
“argumentQuality”: {“score”: 3},
“factualAccuracy”: {“score”: 2},
“passionScore”: {“score”: 4},
“humanReview”: {“requiresHumanReview”: true}
}
},
{ # example risk ———————————————————-
“text”: “AI-RFI-2025-ABCD.pdf sees a risk of Job Loss. AI deployment without …”,
# metadata
“title”: “Concern Over Lack of AI Copyright Protections”,
“metadata”: {
“filename”: “AI-RFI-2025-ABCD.pdf “,
“sentiment”: “Negative”,
“topics”: [“Copyright”, “Labor”, “National Security”],
“policyCategories”: [“Regulation”, “Security”],
“stakeholderInfo”: {“type”: “Individual”, “domain”: “Tech”, “isAnonymous”: false},
“stakeholders”: [“John Doe”],
“argumentQuality”: {“score”: 3},
“factualAccuracy”: {“score”: 2},
“passionScore”: {“score”: 4},
“humanReview”: {“requiresHumanReview”: true}
}
}, …
]

This structure allowed GraphRAG to treat the text as the primary source for entity and relationship extraction, while the metadata remained available for filtering, faceting, and graph enrichment.

Note: GraphRAG is entirely able to construct a knowledge graph from all the raw response text without any of the preprocessing in Blog Post 1. However, the knowledge graph would not be enriched with this additional metadata that will help during knowledge graph traversal, retrieval, and, ultimately, answer generation.

Step 2: Building a metadata-aware knowledge graph

With the data prepped and split into GraphRAG-ready JSON documents, the next step was configuring the indexer to ingest them properly. GraphRAG uses a YAML configuration file to define how it reads and processes input data. For this project, I needed to tell GraphRAG how to: (1) locate the JSON files, (2) identify the title and text fields, (3) embed metadata, and (4) handle chunking behavior. Here’s an excerpt from my updated configuration file:

input:

input:
file_type: json
file_pattern: “.*.json$$”
title_column: title
text_column: text
metadata: [metadata]
# additional input configurations… type, base_dir, etc.

chunks:
prepend_metadata: true
chunk_size_includes_metadata: false
# additional chunk configurations… size, overlap, etc.

Within the input block, I told GraphRAG to look for JSON documents and how to map the fields in those documents. Within the chunk block, these settings ensured the metadata would be preserved during chunking without inflating the chunk size calculations.

With this configuration in place, GraphRAG was able to ingest each JSON entry as a standalone document, extract entities and relationships from the text field, and embed the associated metadata directly into the graph. This setup ensured that every node and edge carried not just semantic meaning, but also contextual signals—like sentiment, stakeholder type, and policy relevance—that would later power more nuanced queries and visualizations.

Now that the indexer was configured, I could run the indexer to construct the graph and explore what it’d built.

Step 3: Generating Global Insights

Once the knowledge graph was constructed, I turned my attention to the real goal: surfacing insights that would be impossible to extract through traditional search or summarization alone.

The final graph contained over 123,000 nodes, 3.1 million edges, and 16,720 clusters—a dense web of ideas, arguments, and concerns. To make sense of this complexity, I used GraphRAG’s built-in query capabilities.

How it works

GraphRAG applies the Leiden clustering algorithm to group related ideas based on node connectivity and edge weights. These clusters represent emergent themes—like privacy concerns, regulatory suggestions, or ethical dilemmas—surfaced not by keyword frequency, but by conceptual proximity.

To explore the knowledge graph for response augmentation, GraphRAG uses a query-expansion approach, generally described in this blog post. Starting with a broader prompt like:

“What are the key trends across individual respondents”

GraphRAG generated five subqueries to probe the graph from different angles, captured in the Table below.

Subquery	Prompt
Subquery 1	What are the recurring themes, concerns, and priorities expressed by individual respondents regarding AI, including ethical, economic, and societal impacts?
Subquery 2	What regulatory challenges, opportunities, and specific topics such as privacy and intellectual property were most frequently discussed by individual respondents?
Subquery 3	What trends can be observed in the responses based on the demographic, socioeconomic, and professional backgrounds of individual respondents?
Subquery 4	What actionable suggestions and recommendations were commonly proposed by individual respondents, including those related to education, workforce development, and environmental sustainability?
Subquery 5	What trends in tone, sentiment, and passion can be observed across individual responses to the AI Action Plan?

Each subquery acted as a lens, identifying entry points into the graph—nodes and clusters most relevant to that line of inquiry. From there, GraphRAG traversed the graph, pulling in related nodes and their associated text chunks. This process was guided by iterative LLM-based relevance checks, ensuring that only the most contextually meaningful content was included in response generation.

What I found

The insights generated from the graph are not fixed—they evolve based on the questions you ask. GraphRAG can surface different patterns depending on the context of your inquiry, so these results are just an example from one line of questioning.

A snippet of a report generated by Lazy GraphRAG.

So, beginning with “What are the key trends across individual respondents?”, GraphRAG synthesized a nuanced response from over 1,200 text chunks across 800+ documents and 90 ‘idea’ clusters. Representative findings for individual respondents included:

A strong emphasis on privacy and data protection, particularly among individual respondents with technical or legal backgrounds.

Recurring concern about AI-driven job displacement, with varying tones depending on respondent domain.

A surprising number of grassroots policy suggestions related to education reform, environmental sustainability, and AI transparency.

Distinct patterns in sentiment (generally negative) and passion (high), with emotionally charged responses clustering around topics like surveillance, misinformation, and creative rights.

These examples reflect just one line of inquiry. The real strength of GraphRAG lies in its adaptability—allowing you to uncover entirely different insights depending on the questions you ask. By adjusting your query, you can explore entirely different dimensions of the dataset—whether you’re interested in regulatory gaps, stakeholder-group specific concerns, or sectoral and economic impact.

A snapshot of the graph in action

A subset of the knowledge graph used to generate the report above.

The image to the right shows a subset of the knowledge graph used to answer the “key trends” query. Each circle represents a community cluster, with sub-communities descending to the fourth level. Dots are individual ideas (nodes), and the gray lines are the relationships that connect them—used both for clustering and for traversing the graph during response generation.

This visual illustrates how GraphRAG doesn’t just retrieve relevant snippets—it maps the conceptual terrain of the dataset, allowing us to explore it with nuance and depth.

With the knowledge graph in place that is able surface these insights, the next step is making this capability available to a wider audience through tools available in the applications they use every day.

What’s Next: Creating an AI Action Plan Copilot

Across Blog Posts 1 and 2, we’ve taken 10,000+ unstructured public comments, extracted meaningful metadata from them, and created a rich knowledge graph capable of surfacing deep insights across the entire data set.

And while this is extremely interesting for me, getting lost within the data and spelunking through the graph, to be truly useful, we need to surface this capability to a broader audience.

In my final Blog Post, I’ll show how to bring these insights to life for end users—embedding them into Microsoft 365 Copilot as a custom agent that empowers policy analysts to explore, query, and act on public feedback in real time.

Key Takeaways

Transformed structured JSON into a metadata-rich knowledge graph using GraphRAG.

Surfaced emergent themes and sentiment patterns across 10,000+ public comments.

Demonstrated how query expansion and graph traversal yield deeper policy insights.

Resilience in action for Windows devices

Azure VMware Solution now available in Spain Central

Resilience in action for Windows devices

Azure VMware Solution now available in Spain Central

Microsoft GraphRAG

Step 1: (More) Data Prep

Before

After

Step 2: Building a metadata-aware knowledge graph

Step 3: Generating Global Insights

How it works

What I found

A snapshot of the graph in action

What’s Next: Creating an AI Action Plan Copilot

Key Takeaways

Related posts

🌟 Community Spotlight – Nicola Delfino

Hello? Computer? Conversational Interfaces are the Key, not the Keyboard

Introducing the Improved Search Job Experience in Azure Monitor Log Analytics

Hello?  Computer? Conversational Interfaces are the Key, not the Keyboard