I built this after getting frustrated with the "knowledge graphs beat vector RAG" claims that have no numbers behind them. Took the same AWS Compute docs, indexed them 6 ways, asked 75 questions across 5 difficulty tiers (single-fact lookup through multi-hop architecture synthesis), evaluated with structural checks + LLM-as-judge.
Surprising result: pre-generated Q&A pairs won overall at 81.5%. Knowledge graph came second at 70.3%. RAPTOR and naive vector were both under 25%. The most sophisticated architectures were also the slowest — hybrid took 39.5s/query at $3.00/run and still lost to a simple Q&A lookup at 8.4s/$0.48.
The tool is bring-your-own-docs. Point it at Markdown, HTML, PDF, Word, CSV, or a URL, run the pipeline, get your own numbers. Comes with the AWS corpus as a built-in example so pip install kb-arena && kb-arena demo works with no API keys.
Benchmarked 6 retrieval strategies on AWS Compute docs: naive vector, contextual embeddings, Q&A pairs, knowledge graph (Neo4j), hybrid, and RAPTOR. 75 questions across 5 difficulty tiers, evaluated with structural checks + LLM-as-judge.
Result: pre-generated Q&A pairs won at 81.5%. Knowledge graph came second at 70.3%. Naive vector scored 19.5%. The hybrid approach took 39.5s/query at $3/run and still lost to Q&A pairs at 8.4s/$0.48.
Bring-your-own-docs — Markdown, HTML, PDF, Word, CSV, or URL. pip install kb-arena && kb-arena demo runs with no API keys using bundled results.
I built this after getting frustrated with the "knowledge graphs beat vector RAG" claims that have no numbers behind them. Took the same AWS Compute docs, indexed them 6 ways, asked 75 questions across 5 difficulty tiers (single-fact lookup through multi-hop architecture synthesis), evaluated with structural checks + LLM-as-judge.
Surprising result: pre-generated Q&A pairs won overall at 81.5%. Knowledge graph came second at 70.3%. RAPTOR and naive vector were both under 25%. The most sophisticated architectures were also the slowest — hybrid took 39.5s/query at $3.00/run and still lost to a simple Q&A lookup at 8.4s/$0.48.
The tool is bring-your-own-docs. Point it at Markdown, HTML, PDF, Word, CSV, or a URL, run the pipeline, get your own numbers. Comes with the AWS corpus as a built-in example so pip install kb-arena && kb-arena demo works with no API keys.
Code: https://github.com/xmpuspus/kb-arena
Benchmarked 6 retrieval strategies on AWS Compute docs: naive vector, contextual embeddings, Q&A pairs, knowledge graph (Neo4j), hybrid, and RAPTOR. 75 questions across 5 difficulty tiers, evaluated with structural checks + LLM-as-judge.
Result: pre-generated Q&A pairs won at 81.5%. Knowledge graph came second at 70.3%. Naive vector scored 19.5%. The hybrid approach took 39.5s/query at $3/run and still lost to Q&A pairs at 8.4s/$0.48.
Bring-your-own-docs — Markdown, HTML, PDF, Word, CSV, or URL. pip install kb-arena && kb-arena demo runs with no API keys using bundled results.