A GraphRAG system over the 10-K filings of the top 100 S&P 500 companies, last 3 years. Claude extracts Company · Person · Product · Risk · Subsidiary entities and 14 typed relationships. Neo4j 5 with native vector index stores the graph (6,247 nodes / 27,418 edges). Multi-hop questions go through query classification → entry-point vector search → Cypher traversal → result aggregation → natural-language synthesis with node citations. Hybrid Graph+Vector beats Vector-only by +0.56 on 3-hop questions.
A semantic search over the Apple 10-K can find "TSMC is our primary chip supplier." It cannot find "and four other S&P 500 companies have the same exposure". That requires structure.
For a 3-hop question ("which directors serve on boards of competing pharma companies") the model has to traverse Person → BOARD_MEMBER → Company → COMPETES_WITH → Company → BOARD_MEMBER ← same Person. Vector retrieval flattens this graph into chunks and prays one of them happens to mention the chain. Hit rate on 3-hop drops to 18%.
Even worse for aggregation: "how many S&P 500 companies mention China as a risk?" A vector retriever can find 5 relevant chunks but cannot count across all 100 documents.
Entity extraction: section-aware chunking (10-K Items 1, 1A, 7, 8 are different beasts) → Pydantic-validated extraction → Neo4j MERGE. Each node carries its embedding as a property.
Hybrid retrieval: vector search finds the entry node ("TSMC" → Company:TSM), Cypher traverses out from there with typed edges ([:SUPPLIED_BY], [:COMPETES_WITH], [:EXPOSED_TO]).
Synthesizer cites nodes: every claim in the answer is grounded to a specific entity ID in the graph. Faithfulness 0.94 vs Vector RAG's 0.79.
Neo4j + Postgres Docker stack, SEC EDGAR download for top-100 S&P (300 filings, 4.2 GB), section-aware chunking (18,742 chunks), parallel entity + relationship extraction with Claude (6,247 unique nodes, 27,418 edges), Neo4j MERGE with APOC batching, vector index over voyage-3 embeddings, multi-hop eval (100 questions across 4 categories), Next.js demo with react-force-graph.
First the entities materialize from extraction (companies first, then directors, then risks). Then edges form. Then query 1 fires: "which S&P 500 companies depend on TSMC?" — watch the traversal light up. Query 2: "which directors sit on competing pharma boards?" — a different subgraph activates. The right panel shows the generated Cypher and the synthesized answer with node citations.