PIPE-Cypher is a synthetic data pipeline that creates balanced, executable, privacy-aware NL-to-Cypher benchmarks for enterprise knowledge graphs. The value here is that enterprise graphs are highly differentiated: their schemas, terminology, query patterns, and even the questions users ask are unique to each deployment. A strong coding agent today can probably generate data by inspecting a schema, but PIPE-Cypher makes this scalable, cost-effective, and repeatable when the schema inevitably changes. By constraining this as a pipeline, even small local models can efficiently create large amounts of synthetic benchmark data, with deterministic graph checks for balance, diversity, auditability, and execution validity. That makes it useful for keeping private Text2Cypher benchmarks grounded in how a graph is actually used as it evolves.</p>\n","updatedAt":"2026-06-09T05:13:22.846Z","author":{"_id":"68fa7c73f382c0374680ad98","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68fa7c73f382c0374680ad98/tPafIo_THzTDrZ98Nc77a.jpeg","fullname":"Suraj Ranganath","name":"suraj-ranganath","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9243913888931274},"editors":["suraj-ranganath"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68fa7c73f382c0374680ad98/tPafIo_THzTDrZ98Nc77a.jpeg"],"reactions":[],"isReport":false}},{"id":"6a27a18090514cbfc28eeef6","author":{"_id":"68fa7c73f382c0374680ad98","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68fa7c73f382c0374680ad98/tPafIo_THzTDrZ98Nc77a.jpeg","fullname":"Suraj Ranganath","name":"suraj-ranganath","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-09T05:15:44.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"We look forward to hearing thoughts, feedbacks and suggestions!","html":"<p>We look forward to hearing thoughts, feedbacks and suggestions!</p>\n","updatedAt":"2026-06-09T05:15:44.960Z","author":{"_id":"68fa7c73f382c0374680ad98","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68fa7c73f382c0374680ad98/tPafIo_THzTDrZ98Nc77a.jpeg","fullname":"Suraj Ranganath","name":"suraj-ranganath","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9569968581199646},"editors":["suraj-ranganath"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/68fa7c73f382c0374680ad98/tPafIo_THzTDrZ98Nc77a.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.08481","authors":[{"_id":"6a2772ea6dde1c5ef75bce99","name":"Suraj Ranganath","hidden":false},{"_id":"6a2772ea6dde1c5ef75bce9a","name":"Anish Raghavendra","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/68fa7c73f382c0374680ad98/E1h-T0FTyEbEsyzxGGKcx.png"],"publishedAt":"2026-06-07T00:00:00.000Z","submittedOnDailyAt":"2026-06-09T00:00:00.000Z","title":"PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems","submittedOnDailyBy":{"_id":"68fa7c73f382c0374680ad98","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/68fa7c73f382c0374680ad98/tPafIo_THzTDrZ98Nc77a.jpeg","isPro":false,"fullname":"Suraj Ranganath","user":"suraj-ranganath","type":"user","name":"suraj-ranganath"},"summary":"Enterprise property graphs vary widely in schema structure, internal terminology, domain assumptions, governance constraints, and user interaction patterns. A deployment-relevant Text2Cypher benchmark therefore reflects the questions users and agents actually ask of that graph. Creating such a benchmark is difficult because schemas and values are unique, and graph structure changes over time. Each NL-query pair must also be executable, use real graph entities, preserve diversity, and remain balanced across query types and difficulty levels. We present PIPE-Cypher, a local benchmark-generation pipeline that turns a live property graph and optional seed queries from customer questions, analyst logs, or agent tool calls into balanced NL-to-Cypher benchmarks. PIPE-Cypher combines schema profiling, reverse-query grounding, constrained generation, deterministic Cypher governance, execution validation, redaction, diversity controls, and a calibrated local LLM judge. Using local Qwen3.5-9B generation and judging, PIPE-Cypher exports 3,000 accepted FinBench/SNB examples, completes three audited ablation suites, calibrates judge behavior with human labels, and evaluates 11 local downstream models. The resulting benchmark is deliberately discriminative: zero-shot transfer is weak, while a few-shot control shows that schema-specific example banks can help compatible model families. Together, PIPE-Cypher makes Text2Cypher benchmarking a repeatable process that evolves with the graph, its users, and its target workloads.","upvotes":0,"discussionId":"6a2772eb6dde1c5ef75bce9b","githubRepo":"https://github.com/suraj-ranganath/PIPE-Cypher","githubRepoAddedBy":"user","ai_summary":"A local benchmark-generation pipeline transforms live property graphs and seed queries into balanced NL-to-Cypher datasets for enterprise knowledge graphs, incorporating schema profiling, reverse-query grounding, and execution validation.","ai_keywords":["Text2Cypher","property graphs","benchmark-generation pipeline","schema profiling","reverse-query grounding","constrained generation","Cypher governance","execution validation","redaction","diversity controls","local LLM judge","FinBench","SNB","zero-shot transfer","few-shot learning"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":0,"organization":{"_id":"697e87d12cc19315a8497001","name":"UCSanDiego","fullname":"University of California at San Diego","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/697e8687c00f332cf492d29e/KUQpvngxP4r9oBSDZwIwZ.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"697e87d12cc19315a8497001","name":"UCSanDiego","fullname":"University of California at San Diego","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/697e8687c00f332cf492d29e/KUQpvngxP4r9oBSDZwIwZ.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.08481.md"}">
PIPE-Cypher: Automatic Enterprise Benchmark Generation for Text-to-Cypher Systems
Abstract
A local benchmark-generation pipeline transforms live property graphs and seed queries into balanced NL-to-Cypher datasets for enterprise knowledge graphs, incorporating schema profiling, reverse-query grounding, and execution validation.
Enterprise property graphs vary widely in schema structure, internal terminology, domain assumptions, governance constraints, and user interaction patterns. A deployment-relevant Text2Cypher benchmark therefore reflects the questions users and agents actually ask of that graph. Creating such a benchmark is difficult because schemas and values are unique, and graph structure changes over time. Each NL-query pair must also be executable, use real graph entities, preserve diversity, and remain balanced across query types and difficulty levels. We present PIPE-Cypher, a local benchmark-generation pipeline that turns a live property graph and optional seed queries from customer questions, analyst logs, or agent tool calls into balanced NL-to-Cypher benchmarks. PIPE-Cypher combines schema profiling, reverse-query grounding, constrained generation, deterministic Cypher governance, execution validation, redaction, diversity controls, and a calibrated local LLM judge. Using local Qwen3.5-9B generation and judging, PIPE-Cypher exports 3,000 accepted FinBench/SNB examples, completes three audited ablation suites, calibrates judge behavior with human labels, and evaluates 11 local downstream models. The resulting benchmark is deliberately discriminative: zero-shot transfer is weak, while a few-shot control shows that schema-specific example banks can help compatible model families. Together, PIPE-Cypher makes Text2Cypher benchmarking a repeatable process that evolves with the graph, its users, and its target workloads.
Community
PIPE-Cypher is a synthetic data pipeline that creates balanced, executable, privacy-aware NL-to-Cypher benchmarks for enterprise knowledge graphs. The value here is that enterprise graphs are highly differentiated: their schemas, terminology, query patterns, and even the questions users ask are unique to each deployment. A strong coding agent today can probably generate data by inspecting a schema, but PIPE-Cypher makes this scalable, cost-effective, and repeatable when the schema inevitably changes. By constraining this as a pipeline, even small local models can efficiently create large amounts of synthetic benchmark data, with deterministic graph checks for balance, diversity, auditability, and execution validity. That makes it useful for keeping private Text2Cypher benchmarks grounded in how a graph is actually used as it evolves.
We look forward to hearing thoughts, feedbacks and suggestions!
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.08481 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.08481 in a dataset README.md to link it from this page.
Cite arxiv.org/abs/2606.08481 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.