r/MachineLearning · May 23, 2026 · 1 min read

Alignment: Higher order prioritizing over constraints [R]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

So, I ran across a behavior that I found interesting and may lead to alignment or safety research. I'm going to try to maintain an abstract description of what happened without giving away the details and the keys to jailbreaking.

The nature of a transformer is to predict the next token. But functionally, the algorithms are also approximating reality as language describes it. Hmmm maybe reality is not the right word, perhaps meaning. So, in a sense the algorithms have a vector towards aligning towards correct meaning. Clarity seeking, that's what I'll call this behavior. Constraints placed as an additional layer on top of a base statistical system has a natural structurally set priority level based on the statistical system's clarity seeking vectors. That level is implied within the structure of the model. If one were to discuss topics that are constrained but are higher in priority level than the constraints themselves, the machine's clarity seeking vectors will bypass the constraint.

Higher priority level things, I will call them higher order topics. I think I said enough.

submitted by /u/SenseCompetitive5851
[link] [comments]

Discussion (0)

No comments yet. Sign in and be the first to say something.

Discussion (0)

More from r/MachineLearning