BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison [R]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
[R]
BEAM 100K memory benchmark: CSM vs Hindsight local artifact comparison
I’m looking for feedback on a local agent-memory benchmark comparison, especially from people who care about evaluation methodology.
I built an open-source R&D memory system called Context Swarm Memory (CSM). It uses bounded read-only memory shards, query routing, probe/recall/synthesis, cited packets, and explicit Committer-gated writes.
The current comparison is against the accepted local Hindsight artifact on BEAM 100K:
- CSM: 0.757573 AMB score, 342 / 400 correct
- Hindsight: 0.733658 AMB score, 326 / 400 correct
- CSM uses 38.2% fewer answer-visible context tokens
- CSM is slower: 29.23s average retrieval vs 6.38s
I want to be precise about the claim:
This is not an official leaderboard claim. It is not a BEAM 10M claim. It is a committed local accepted-artifact comparison at 100K, and the next step should be independent replication or official chart acceptance.
Repo:
https://github.com/muhamadjawdatsalemalakoum/context-swarm-memory
Evidence and reproducibility notes:
https://muhamadjawdatsalemalakoum.github.io/context-swarm-memory/
The main question: what would make this comparison scientifically stronger before it is presented as a serious agent-memory result?
[link] [comments]
More from r/MachineLearning
-
Cross-Platform Fused MoE Dispatch in Triton: Portable Expert Routing Without CUDA [R]
May 27
-
"Unified Neural Scaling Laws" paper release [R]
May 27
-
[R] What 1000+ Harness Experiments Taught Me About Self-Improving Agents [R]
May 27
-
AI-generated CUDA kernels silently break training and inference [R]
May 27
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.