r/MachineLearning · · 1 min read

NeurIPS used uncalibrated AI detector for desk rejections [D]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

I recently had a submission desk-rejected from the NeurIPS 2026 Position Paper Track for an alleged AI-policy violation. After corresponding with the track leadership and reading their public blog post, I think the broader methodological issue is worth discussing here.

The track used Pangram, a proprietary AI-text detector, as part of the desk-rejection process. I was told that the materials considered for desk rejection were:

  • the detector output
  • the authors’ AI-use attestation

This creates a potential circularity problem. If a high detector score is used to judge the author’s attestation as inconsistent, and that inconsistency is then used to justify desk rejection, the detector is not just an aid. It becomes a decisive part of the adjudication process.

The bigger issue is validation.

The NeurIPS blog describes tests using Pangram audits, older ACM FAccT papers, synthetic AI-generated position papers, and manually edited samples. But the target population was NeurIPS 2026 Position Paper submissions, whose ground-truth authorship process is unknown.

So the key question is:

What is the false-positive rate of the final decision procedure on the actual target distribution?

A false-positive rate measured on one distribution does not automatically transfer to another. If the actual submission pool produced a "surprisingly high flagged rate" (citation from NeurIPS blog post), that could indicate distribution shift / miscalibration.

To sanity-check the detector’s behavior, I also ran Pangram on recent 2026 papers authored by NeurIPS Position Paper Track Chairs. Pangram returned scores including:

  • 69% AI
  • 45% AI
  • 36% AI
  • 24% AI

I am not claiming those papers were AI-written. For me, Pangram’s outputs alone does not permit such a conclusion. And that is exactly the point.

UPD:

Here is NeurIPS original blogpost

And here is the blogpost with the detailed critics

submitted by /u/Asleep-Requirement13
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning