Skip to main content

Graph RAG

The Query That Flat Vector Search Cannot Answer

An analyst at a pharmaceutical research company asked their RAG system a question that seemed perfectly reasonable: "What are the major research themes across all our clinical trial documentation?"

The system retrieved five chunks. They happened to be about five different trials, each discussing different mechanisms. The LLM synthesized them into a generic statement about the breadth of their research program. It was technically grounded in the retrieved text. It was also completely useless.

The analyst needed to understand the global structure of a 2,000-document corpus - the clusters of related research, the connections between trials, the overarching themes that emerged from the full body of work. Standard RAG retrieves locally relevant passages. It cannot answer questions about the global structure of an entire corpus without retrieving the entire corpus (impossible in a context window).

This is the problem that Microsoft Research's Graph RAG paper (Edge et al., 2024) identified and addressed. When your questions are fundamentally about relationships, themes, and structure rather than specific facts, you need a retrieval system that understands the relational structure of your corpus - not just the semantic proximity of individual passages.

Standard RAG stores documents as independent chunks and retrieves them based on pairwise similarity to a query. This is excellent for "find the most similar passage to this question." It is structurally incapable of:

  1. Global summarization: "What are the main themes across all our documents?" - Requires understanding the whole corpus.

  2. Multi-hop reasoning: "Which researchers collaborated with Alice, and what topics did they publish on?" - Requires traversing entity relationships.

  3. Relationship queries: "What are all the drugs that interact with compound X?" - Requires knowing that Drug A mentions compound X, Drug B mentions compound X, and their interaction profiles.

  4. Community-level questions: "What controversies surround topic Y across the literature?" - Requires understanding the discourse structure, not just finding passages about Y.

Vector search is a proximity search in semantic space. It cannot follow graph edges.

Historical Context: Knowledge Graphs in NLP

Knowledge graphs - structured representations of entities and relationships - have been central to NLP since the early 2000s. Freebase (2006), DBpedia (2007), Wikidata (2012), and Google's Knowledge Graph (2012) demonstrated that structured entity-relationship databases enabled more precise semantic queries than unstructured text search.

The challenge: existing knowledge graphs were hand-curated or Wikipedia-derived. Building domain-specific knowledge graphs for proprietary corpora - clinical trials, legal filings, codebase documentation - was labor-intensive and brittle.

The 2024 Graph RAG paper from Microsoft Research (Edge, Trinh, Cheng et al.) combined LLM-based automatic entity/relationship extraction with the existing knowledge graph querying infrastructure, making automatic knowledge graph construction from arbitrary documents practical for the first time.

The Graph RAG Pipeline

Step 1: Entity Extraction

Use an LLM to extract entities and relationships from each document chunk. Entities have types (Person, Organization, Concept, Location, Event), names, and descriptions. Relationships have types, source/target entities, and descriptions.

from openai import OpenAI
import json
from typing import List, Dict

client = OpenAI()

def extract_entities_and_relations(text: str, model: str = "gpt-4o-mini") -> Dict:
"""Extract entities and relationships from a text chunk."""
prompt = f"""Extract all entities and relationships from the following text.

Text:
{text}

Respond with JSON:
{{
"entities": [
{{
"name": "entity name",
"type": "PERSON | ORGANIZATION | CONCEPT | LOCATION | EVENT | OTHER",
"description": "brief description of this entity in context"
}}
],
"relationships": [
{{
"source": "entity name",
"target": "entity name",
"type": "WORKS_AT | AUTHORED | RELATED_TO | PART_OF | CAUSES | etc.",
"description": "brief description of this relationship"
}}
]
}}"""

response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=0,
response_format={"type": "json_object"},
)
return json.loads(response.choices[0].message.content)


# Process a document
text = """
Alice Johnson at MIT published a paper on transformer attention mechanisms in 2022.
The paper built on the work of Vaswani et al. (2017) and introduced sparse attention
as a more efficient alternative. This work was later cited by Google DeepMind's
team in their Gemini architecture paper.
"""

extraction = extract_entities_and_relations(text)
print("Entities:")
for e in extraction["entities"]:
print(f" [{e['type']}] {e['name']}: {e['description']}")

print("\nRelationships:")
for r in extraction["relationships"]:
print(f" {r['source']} --[{r['type']}]--> {r['target']}")

Step 2: Build the Knowledge Graph

Aggregate entity and relationship extractions across all document chunks into a unified knowledge graph. Entities may appear in multiple chunks - merge them based on name matching (with coref resolution for common aliases).

from collections import defaultdict
from dataclasses import dataclass, field
from typing import Set

@dataclass
class Entity:
name: str
type: str
descriptions: List[str] = field(default_factory=list)
source_chunks: List[str] = field(default_factory=list)

@dataclass
class Relationship:
source: str
target: str
rel_type: str
descriptions: List[str] = field(default_factory=list)

class KnowledgeGraph:
def __init__(self):
self.entities: Dict[str, Entity] = {}
self.relationships: List[Relationship] = []
self.adjacency: Dict[str, Set[str]] = defaultdict(set)

def add_extraction(self, extraction: Dict, source_chunk_id: str):
"""Add entities and relationships from one extraction."""
for ent_data in extraction.get("entities", []):
name = ent_data["name"].strip()
if name not in self.entities:
self.entities[name] = Entity(name=name, type=ent_data["type"])
self.entities[name].descriptions.append(ent_data.get("description", ""))
self.entities[name].source_chunks.append(source_chunk_id)

for rel_data in extraction.get("relationships", []):
src = rel_data["source"].strip()
tgt = rel_data["target"].strip()
if src not in self.entities or tgt not in self.entities:
continue # Skip relationships with unknown entities

# Find existing relationship or create new
existing = next(
(r for r in self.relationships
if r.source == src and r.target == tgt and r.rel_type == rel_data["type"]),
None
)
if existing:
existing.descriptions.append(rel_data.get("description", ""))
else:
self.relationships.append(Relationship(
source=src,
target=tgt,
rel_type=rel_data["type"],
descriptions=[rel_data.get("description", "")],
))

self.adjacency[src].add(tgt)
self.adjacency[tgt].add(src) # undirected for community detection

def stats(self):
print(f"Entities: {len(self.entities)}")
print(f"Relationships: {len(self.relationships)}")
avg_degree = sum(len(v) for v in self.adjacency.values()) / len(self.adjacency)
print(f"Avg entity degree: {avg_degree:.1f}")


# Build graph from corpus
kg = KnowledgeGraph()

# Process each chunk
chunks = [
{"id": "chunk_1", "text": "Alice Johnson at MIT published on transformer attention..."},
{"id": "chunk_2", "text": "Google DeepMind cited Alice's work in Gemini architecture..."},
]

for chunk in chunks:
extraction = extract_entities_and_relations(chunk["text"])
kg.add_extraction(extraction, chunk["id"])

kg.stats()

Step 3: Community Detection with the Leiden Algorithm

The Leiden algorithm (Traag et al., 2019, an improvement on Louvain) detects communities - clusters of densely connected entities - in the knowledge graph. These communities represent the major topics, research areas, or organizational structures in your document corpus.

# Using the graspologic library (Microsoft's graph analysis library)
# pip install graspologic
import networkx as nx
import graspologic

def detect_communities(kg: KnowledgeGraph) -> Dict[str, int]:
"""
Run Leiden community detection on the knowledge graph.
Returns: {entity_name: community_id}
"""
# Build NetworkX graph
G = nx.Graph()
for entity_name in kg.entities:
G.add_node(entity_name)
for rel in kg.relationships:
G.add_edge(rel.source, rel.target)

# Apply Leiden partitioning
# graspologic wraps the leidenalg library
from graspologic.partition import leiden

# Convert to adjacency matrix format expected by leiden
nodes = list(G.nodes())
node_to_idx = {n: i for i, n in enumerate(nodes)}

# Run Leiden
partition = leiden(G, resolution=1.0)
# partition is a dict: {node: community_id}

return partition


# Alternative using python-louvain (simpler, widely available)
import community as community_louvain

def detect_communities_louvain(kg: KnowledgeGraph) -> Dict[str, int]:
"""Community detection using Louvain algorithm."""
G = nx.Graph()
for entity_name in kg.entities:
G.add_node(entity_name)
for rel in kg.relationships:
# Weight edges by number of relationship mentions
existing = G.get_edge_data(rel.source, rel.target, default={"weight": 0})
G.add_edge(rel.source, rel.target, weight=existing["weight"] + 1)

partition = community_louvain.best_partition(G, resolution=1.0)
return partition


# Group entities by community
def get_community_members(partition: Dict[str, int]) -> Dict[int, List[str]]:
communities = defaultdict(list)
for entity, community_id in partition.items():
communities[community_id].append(entity)
return dict(communities)

Step 4: Community Summary Generation

For each detected community, generate a natural language summary using an LLM. These summaries become the retrievable units for global queries.

def generate_community_summary(
community_entities: List[str],
kg: KnowledgeGraph,
model: str = "gpt-4o-mini",
) -> str:
"""Generate a summary for a community of related entities."""
# Collect entity descriptions and relationships for this community
entity_info = []
for name in community_entities:
if name in kg.entities:
ent = kg.entities[name]
desc = " ".join(ent.descriptions[:3]) # First 3 descriptions
entity_info.append(f"- {name} ({ent.type}): {desc}")

# Find relationships within this community
community_set = set(community_entities)
internal_rels = [
r for r in kg.relationships
if r.source in community_set and r.target in community_set
]
rel_info = [
f"- {r.source} --[{r.rel_type}]--> {r.target}: {r.descriptions[0] if r.descriptions else ''}"
for r in internal_rels[:10]
]

prompt = f"""Summarize the following community of related entities from a document corpus.
Explain what this group is about, the key themes, and the important relationships.

Entities:
{chr(10).join(entity_info)}

Key relationships:
{chr(10).join(rel_info)}

Write a concise paragraph (3-5 sentences) summarizing what this community represents."""

response = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=0,
)
return response.choices[0].message.content


def build_community_summaries(
kg: KnowledgeGraph,
partition: Dict[str, int],
) -> Dict[int, str]:
"""Generate summaries for all communities."""
community_members = get_community_members(partition)
summaries = {}

for community_id, members in community_members.items():
if len(members) < 2: # Skip singleton communities
continue
summary = generate_community_summary(members, kg)
summaries[community_id] = summary
print(f"Community {community_id} ({len(members)} entities): {summary[:100]}...")

return summaries

Graph RAG provides two retrieval modes:

Local search: Entity-centric. Given a query, find the relevant entity nodes, retrieve their descriptions, related entities (1-2 hops), and the source document chunks. Best for specific factual questions about known entities.

Global search: Community-centric. Retrieve community summaries, synthesize across them. Best for corpus-wide thematic questions.

def local_graph_search(
query: str,
kg: KnowledgeGraph,
vector_store, # for semantic entity lookup
n_hops: int = 2,
top_k: int = 10,
) -> str:
"""Local search: find relevant entities and traverse the graph."""
# Find relevant entities via semantic search on entity descriptions
entity_docs = [
f"{name}: {' '.join(ent.descriptions[:2])}"
for name, ent in kg.entities.items()
]

# Retrieve most relevant entities
relevant_entity_texts = vector_store.search(
query,
corpus=entity_docs, # search against entity descriptions
top_k=5
)

# Extract entity names from results
seed_entities = [text.split(":")[0] for text in relevant_entity_texts]

# Expand via graph traversal (BFS up to n_hops)
expanded = set(seed_entities)
frontier = set(seed_entities)
for hop in range(n_hops):
next_frontier = set()
for entity in frontier:
neighbors = kg.adjacency.get(entity, set())
for neighbor in neighbors:
if neighbor not in expanded:
next_frontier.add(neighbor)
expanded.add(neighbor)
frontier = next_frontier
if not frontier:
break

# Collect entity context
context_parts = []
for entity_name in expanded:
if entity_name in kg.entities:
ent = kg.entities[entity_name]
context_parts.append(f"[{ent.type}] {entity_name}: {' '.join(ent.descriptions[:2])}")

# Collect relevant relationships
rel_context = [
f"{r.source} --[{r.rel_type}]--> {r.target}"
for r in kg.relationships
if r.source in expanded and r.target in expanded
][:20]

return "\n".join(context_parts + ["", "Relationships:"] + rel_context)


def global_graph_search(
query: str,
community_summaries: Dict[int, str],
model: str = "gpt-4o",
) -> str:
"""Global search: synthesize across community summaries."""
# Use all community summaries as context
summaries_text = "\n\n".join([
f"Community {cid}: {summary}"
for cid, summary in community_summaries.items()
])

response = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": (
"Answer the question using the provided community summaries from a document corpus. "
"Identify themes, patterns, and global insights across all communities."
)
},
{
"role": "user",
"content": f"Community Summaries:\n{summaries_text}\n\nQuestion: {query}"
}
],
)
return response.choices[0].message.content

LlamaIndex Knowledge Graph Implementation

LlamaIndex provides a higher-level API for Graph RAG:

from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex, Settings
from llama_index.core.graph_stores import SimpleGraphStore
from llama_index.llms.openai import OpenAI as LlamaOpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure LlamaIndex
Settings.llm = LlamaOpenAI(model="gpt-4o-mini", temperature=0)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Load documents
documents = SimpleDirectoryReader("./docs").load_data()

# Build knowledge graph index
graph_store = SimpleGraphStore()

kg_index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=10, # entity-relation triplets per chunk
space_name="knowledge_graph",
graph_store=graph_store,
include_embeddings=True, # also build vector embeddings for hybrid retrieval
show_progress=True,
)

# Persist the index
kg_index.storage_context.persist("./kg_storage")

# Query with local context
local_query_engine = kg_index.as_query_engine(
include_text=True, # include source document text
retriever_mode="hybrid", # use both graph and vector retrieval
response_mode="tree_summarize",
embedding_mode="hybrid",
similarity_top_k=3,
)

response = local_query_engine.query("Who collaborated with Alice Johnson on attention mechanisms?")
print(response)

Neo4j for Graph Storage

For production deployments with large graphs (100K+ entities), Neo4j provides purpose-built graph database infrastructure:

from neo4j import GraphDatabase
from typing import List, Tuple

class Neo4jGraphStore:
def __init__(self, uri: str, username: str, password: str):
self.driver = GraphDatabase.driver(uri, auth=(username, password))

def create_entity(self, name: str, entity_type: str, description: str):
with self.driver.session() as session:
session.run(
"MERGE (e:Entity {name: $name}) "
"SET e.type = $type, e.description = $description",
name=name, type=entity_type, description=description
)

def create_relationship(self, source: str, target: str, rel_type: str, description: str):
with self.driver.session() as session:
session.run(
f"MATCH (s:Entity {{name: $src}}), (t:Entity {{name: $tgt}}) "
f"MERGE (s)-[r:{rel_type}]->(t) "
f"SET r.description = $desc",
src=source, tgt=target, desc=description
)

def query_entity_neighbors(self, entity_name: str, hops: int = 2) -> List[dict]:
"""Get all entities within N hops of a given entity."""
with self.driver.session() as session:
result = session.run(
f"MATCH path = (start:Entity {{name: $name}})-[*1..{hops}]-(neighbor) "
"RETURN neighbor.name as name, neighbor.type as type, "
" neighbor.description as description, length(path) as distance "
"ORDER BY distance",
name=entity_name
)
return [dict(record) for record in result]

def query_relationship_path(self, from_entity: str, to_entity: str) -> List[dict]:
"""Find shortest path between two entities."""
with self.driver.session() as session:
result = session.run(
"MATCH path = shortestPath((a:Entity {name: $from})-[*]-(b:Entity {name: $to})) "
"RETURN [node in nodes(path) | node.name] as path_nodes, length(path) as hops",
from_entity=from_entity, to_entity=to_entity
)
return [dict(record) for record in result]


# Usage
store = Neo4jGraphStore("bolt://localhost:7687", "neo4j", "password")
store.create_entity("Alice Johnson", "PERSON", "ML researcher at MIT")
store.create_entity("MIT", "ORGANIZATION", "Massachusetts Institute of Technology")
store.create_relationship("Alice Johnson", "MIT", "AFFILIATED_WITH", "Alice is a professor at MIT")

# Find all researchers within 2 hops of Alice
neighbors = store.query_entity_neighbors("Alice Johnson", hops=2)
for n in neighbors:
print(f"Distance {n['distance']}: [{n['type']}] {n['name']}")

When Graph RAG Beats Standard RAG

Query TypeStandard RAGGraph RAG
"What is the refund policy?"ExcellentUnnecessary overhead
"Who collaborated with Alice?"PoorExcellent
"What are the main themes in our corpus?"Cannot answerExcellent
"What drugs interact with compound X?"PoorExcellent
"Summarize the debate about topic Y"PoorGood
"What changed between v2 and v3?"PoorGood
"Give me the definition of X"ExcellentUnnecessary

Choose Graph RAG when:

  • Questions are about entity relationships, not facts
  • Questions are about global corpus structure ("themes," "patterns," "common threads")
  • Multi-hop reasoning is needed ("who works with X, and what do they work on?")
  • The corpus has rich entity structure (research papers, legal documents, news)

Stick with standard RAG when:

  • Questions are specific factual lookups
  • The corpus is conversational (support tickets, chat logs)
  • Indexing cost is a constraint (Graph RAG is 10x more expensive)
  • Latency is critical (graph traversal adds latency)

The Cost Problem: Graph RAG Is Expensive

The graph construction phase requires many LLM calls:

  • One extraction call per chunk: at 1000 chunks, that's 1000 LLM calls just for entity extraction
  • Community summary generation: one call per community (typically 50-200 communities)
  • Global queries require retrieving and synthesizing many community summaries

Cost estimation for indexing 1000 chunks:

  • Entity/relation extraction: 1000 calls × 1000 tokens each × 0.15/1M=0.15/1M = 0.15
  • Community summaries (100 communities): 100 × 500 tokens × 0.15/1M=0.15/1M = 0.0075
  • Total indexing: ~$0.20 for gpt-4o-mini

Compare to standard RAG indexing 1000 chunks:

  • Embedding only: 1000 × 400 tokens × 0.02/1M=0.02/1M = 0.008

Graph RAG is ~25x more expensive to index than standard RAG. For a 1M-document corpus, that's thousands of dollars in indexing cost vs tens of dollars for standard RAG.

The trade-off is clear: Only use Graph RAG when you have structural queries that flat vector search genuinely cannot answer.

Production Engineering Notes

Incremental graph updates: Adding new documents to a knowledge graph is expensive - you need to extract entities, resolve coreferences against existing entities, and potentially update community structures. Plan for batch updates (weekly or monthly) rather than real-time indexing if your corpus updates frequently.

Entity resolution: The same entity may appear under different names ("Alice Johnson," "A. Johnson," "Johnson, A."). Without coreference resolution, you'll get duplicate nodes. At minimum, do lowercase normalization and fuzzy matching for entity merging. For production, use an LLM-based entity resolution step that checks whether two entity mentions refer to the same entity.

Graph size limits: For LlamaIndex's SimpleGraphStore, the graph must fit in memory. For production with 100K+ entities, use Neo4j or a dedicated graph database. Neo4j Community Edition is free and handles graphs up to millions of nodes.

Microsoft's GraphRAG library: The graphrag Python package from Microsoft implements the full pipeline described in the paper, including the Leiden algorithm, prompt templates, and both local/global search. It's a production-ready starting point: pip install graphrag.

Common Mistakes

:::danger Using Graph RAG for Simple Factual Queries Graph RAG adds 10-25x indexing cost and 2-5x query latency compared to standard RAG. For a knowledge base that serves mostly direct factual questions ("what is the refund window?"), this overhead has zero benefit. Benchmark your query distribution before committing to Graph RAG. If fewer than 20% of queries are relational or global in nature, the cost isn't justified. :::

:::warning Entity Resolution Failure Creates Duplicate Nodes Without coreference resolution, "OpenAI," "OpenAI Inc.," and "Open AI" become three separate nodes in the graph with no connections. Multi-hop queries that should traverse through this entity fail silently. Always implement entity normalization: lowercase, common abbreviation expansion, and LLM-based duplicate detection for entity names that appear similar. This is the most common quality issue in production knowledge graph deployments. :::

:::warning Not Building a Hybrid System Graph RAG excels at relational and global queries. Standard RAG excels at specific factual queries. The best production system uses both: a router that classifies incoming queries and routes global/relational queries to the graph system and factual queries to the standard vector store. Build this hybrid from the start rather than committing to pure Graph RAG. :::

Interview Questions and Answers

Q: Explain the Microsoft Graph RAG approach. Why does it need community detection?

A: Microsoft's Graph RAG (Edge et al., 2024) addresses a fundamental limitation of flat vector search: it cannot answer questions about the global structure of a document corpus. The pipeline: (1) Extract entities and relationships from all document chunks using an LLM; (2) Build a knowledge graph where entities are nodes and relationships are edges; (3) Run the Leiden community detection algorithm to identify clusters of densely connected entities - these represent the major topics or themes in the corpus; (4) Generate natural language summaries for each community; (5) At query time, use community summaries to answer global questions ("what are the main themes?") or use graph traversal for entity-relationship questions. Community detection is essential because it compresses the relational structure of potentially thousands of entities into a manageable set of summaries, enabling global queries that don't require reading the entire corpus.

Q: What is the Leiden algorithm and why is it used for community detection in knowledge graphs?

A: The Leiden algorithm (Traag, Waltman, van Eck, 2019) is a community detection algorithm that partitions a graph into communities - sets of nodes more densely connected to each other than to the rest of the graph. It's an improvement on the earlier Louvain algorithm, fixing a key Louvain bug where communities could become internally disconnected. Leiden iterates between local node reassignment (moving each node to the community that maximizes modularity) and community refinement (splitting poorly connected communities). It runs efficiently on graphs with thousands to millions of nodes. In Graph RAG, Leiden groups related entities into communities representing coherent topics - "clinical trial researchers," "regulatory agencies," "drug interaction studies" - enabling community-level summaries that can answer global corpus queries.

Q: When would you recommend Graph RAG over standard RAG, and vice versa?

A: Use Graph RAG when: queries are about entity relationships ("who collaborates with whom"), queries ask about global corpus themes ("what are the main research directions"), multi-hop reasoning is needed ("find all drugs related to X through shared mechanisms"), or the corpus has rich structured relationships (academic papers, legal documents, enterprise knowledge bases with organizational structure). Use standard RAG when: queries are specific factual lookups answerable from a single passage, the corpus is conversational or flat (support tickets, FAQ), latency is critical, or indexing cost is a constraint. Graph RAG is ~25x more expensive to index and adds 2-5x query overhead. In practice: build standard RAG first, identify the query types that consistently fail, then add Graph RAG specifically for those failure patterns. Most production systems that use Graph RAG use it as a complement to standard RAG, not a replacement.

Q: How do you handle entity resolution in knowledge graph construction?

A: Entity resolution - ensuring that "Alice Johnson," "A. Johnson," and "Dr. Johnson" map to the same entity - is the hardest engineering problem in knowledge graph construction. A tiered approach: (1) String normalization: lowercase, strip punctuation, expand common abbreviations; (2) Exact match deduplication after normalization; (3) Fuzzy string matching (Levenshtein distance, token sort ratio) for near-duplicates; (4) For remaining ambiguous cases, use an LLM to decide: "Are these two entity mentions likely the same entity given their context?" Provide the entity description, type, and surrounding text. LLM-based resolution is accurate but expensive - run it only on pairs that pass fuzzy matching. For production systems, build a human review queue for high-confidence potential duplicates that weren't resolved automatically. Entity resolution quality directly determines graph quality; poor resolution creates fragmented graphs where multi-hop queries fail.

Q: What is the cost trade-off for Graph RAG vs standard RAG at scale?

A: For indexing 100,000 documents (1M chunks at 512 tokens each): Standard RAG indexing - embed 1M chunks at 0.02/1Mtokens=0.02/1M tokens = 20 total. Graph RAG indexing - entity extraction (1 LLM call per chunk at ~1500 tokens each, gpt-4o-mini at 0.15/1Minput):1M×1500×0.15/1M input): 1M × 1500 × 0.15/1M = 225.PluscommunitysummariesandotherLLMcalls: 225. Plus community summaries and other LLM calls: ~30 more. Total Graph RAG indexing: ~250,roughly12xstandardRAG.Forquerycosts:aglobalGraphRAGquerysynthesizes100+communitysummaries( 50Ktokenstotal)at250, roughly 12x standard RAG. For query costs: a global Graph RAG query synthesizes 100+ community summaries (~50K tokens total) at 0.15/1M = 0.0075perquery.AstandardRAGqueryuses5chunks( 2500tokens)=0.0075 per query. A standard RAG query uses 5 chunks (~2500 tokens) = 0.0004 per query - 18x cheaper. The question is whether the query quality improvement for relational/global queries justifies these costs. For a system with 20% of queries being global/relational, the additional quality on those queries needs to be substantial to justify the cost premium. Run a business case analysis: what is the value of correctly answering a previously-unanswerable global query?

The Microsoft GraphRAG Library

Microsoft Research released the graphrag Python library (2024) that implements the full pipeline described in their paper. It provides a production-ready starting point:

pip install graphrag
# Initialize a GraphRAG project
# graphrag init --root ./my_graphrag_project

# Project structure created:
# my_graphrag_project/
# settings.yaml <- configuration (LLM, chunking, community detection)
# .env <- API keys
# prompts/ <- customizable extraction prompts

# Index documents
# graphrag index --root ./my_graphrag_project

# Query with local search (entity-centric)
# graphrag query --root . --method local "Who collaborated with Alice on attention mechanisms?"

# Query with global search (community-level)
# graphrag query --root . --method global "What are the main research themes?"

Configuration (settings.yaml excerpt):

llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_chat
model: gpt-4o-mini # entity extraction model
model_supports_json: true

embeddings:
llm:
api_key: ${GRAPHRAG_API_KEY}
type: openai_embedding
model: text-embedding-3-small

chunks:
size: 300 # smaller chunks for more precise entity extraction
overlap: 100

entity_extraction:
max_gleanings: 1 # number of entity extraction passes per chunk

community_reports:
max_length: 2000 # max length of community summary
max_input_length: 8000 # max context for summary generation

The library handles the full pipeline: chunking, entity extraction, graph construction, community detection (Leiden), community summary generation, and both local and global search. For most Graph RAG use cases, starting with the Microsoft library is faster than building from scratch.

Hybrid Graph + Vector RAG Architecture

The most effective production architecture combines both systems:

from typing import Dict, Any
from openai import OpenAI
import json

client = OpenAI()

def hybrid_graph_vector_rag(
query: str,
vector_store,
knowledge_graph: KnowledgeGraph,
community_summaries: Dict[int, str],
model: str = "gpt-4o",
) -> Dict[str, Any]:
"""
Route query to the appropriate retrieval strategy.
Uses LLM classification to determine query type.
"""
# Step 1: Classify the query
classification = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{
"role": "system",
"content": (
"Classify this query. Is it: "
"(1) FACTUAL: asking for a specific fact from documents, "
"(2) RELATIONAL: asking about relationships between entities, "
"(3) GLOBAL: asking about themes or patterns across the corpus. "
"Respond with JSON: {\"type\": \"factual|relational|global\"}"
)
},
{"role": "user", "content": query}
],
temperature=0,
response_format={"type": "json_object"},
)
query_type = json.loads(classification.choices[0].message.content).get("type", "factual")

if query_type == "factual":
# Use standard vector search
results = vector_store.search(query, top_k=5)
context = "\n\n".join([r["text"] for r in results])
method = "vector_rag"

elif query_type == "relational":
# Use graph local search
context = local_graph_search(query, knowledge_graph, vector_store)
method = "graph_local"

else: # global
# Use community summaries
summaries_text = "\n\n".join([
f"Community {cid}: {summary}"
for cid, summary in list(community_summaries.items())[:20]
])
context = summaries_text
method = "graph_global"

# Generate answer
response = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": "Answer the question using the provided context."
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {query}"
}
],
)

return {
"answer": response.choices[0].message.content,
"method": method,
"query_type": query_type,
}

Summary: Graph RAG in Context

Graph RAG is a powerful but expensive addition to the RAG toolkit. Use it when:

  • Your corpus has rich entity relationships that standard RAG can't traverse
  • Users need answers to global, corpus-wide questions
  • Multi-hop reasoning over entity connections is required
  • You have the budget for 10-25x higher indexing costs and 2-5x query costs

Do not use it as a replacement for standard RAG - use it as a complement. The best production systems route simple factual queries to standard RAG and complex relational/global queries to Graph RAG. Building this router adds a few hours of engineering work and dramatically reduces the per-query cost while maintaining quality on the queries that need graph-level understanding.

:::tip 🎮 Interactive Playground

Visualize this concept: Try the Graph RAG demo on the EngineersOfAI Playground - no code required.

:::

© 2026 EngineersOfAI. All rights reserved.