LQS v3.1 — an open methodology for rating AI training data (multi-oracle consensus + signed certificates) [P]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
Solo author here. I spent the last six months building (and then sunsetting) a marketplace for AI training data. The marketplace failed for an interesting reason: the actual bottleneck isn't supply. There's tons of data. The bottleneck is that buyers can't independently evaluate quality, and there's no Cleanlab/Galileo-style tool that occupies the rating-authority position — those products are diagnostics owned by the data owner, not third-party attestations a procurement team or model risk officer can cite.
So I rebuilt the whole thing as the rating layer. The methodology is published with a DOI (10.5281/zenodo.20278981, CC BY 4.0) — full v3.1 paper, every dimension defined.
What's in v3.1:
- 19 dimensions: label correctness, coverage, leakage, contamination, plausibility, oracle agreement, conformal
coverage, downstream projection, adversarial stability, subgroup equity, license clarity, provenance chain, and more
- 7-oracle consensus across the score, with oracle_agreement itself being a scored dimension (i.e., the score knows
when the score is uncertain)
- Outcome Registry: downstream signals feed back to recalibrate oracle credibility — the rating learns from real-world
quality outcomes, not just inter-rater agreement
- Ed25519-signed certificates auditors can verify offline against the published public key (no API call needed)
- Public LQS Index: 11 tickers, ~263 datasets scored, daily rebalance, free API
This is genuinely pre-revenue (zero acquired customers — being honest with you, not posturing). What I'd actually value from this sub:
Methodology review. The paper is open. If any dimension definitions are wrong, weights are gameable, or the oracle
aggregation is misspecified, I want to know now before this gets cited.
Adversarial datasets. If you have a dataset where you think the LQS would score it wrong (either direction), I'll
score it free and we can publish the disagreement.
Comparable systems I should be citing. I'm aware of Cleanlab, Galileo, the FT Spectrum project — what else?
Free score for any public dataset: labelsets.ai/rate
Happy to AMA on the architecture, conformal intervals, the marketplace pivot, or anything else.
[link] [comments]
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.