arXiv — Machine Learning · · 4 min read

Lost and Found in Translation: Variational Diagnostics for Neural Codebook Channels

Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.

Computer Science > Machine Learning

arXiv:2605.18846 (cs)
[Submitted on 13 May 2026]

Title:Lost and Found in Translation: Variational Diagnostics for Neural Codebook Channels

View a PDF of the paper titled Lost and Found in Translation: Variational Diagnostics for Neural Codebook Channels, by Yusuke Hayashi
View PDF HTML (experimental)
Abstract:Classical communication systems fail not only through random noise but also when transmitter and receiver use incompatible operational codebooks. Variational autoencoders (VAEs) train an encoder $q_\phi$ and decoder $p_\theta$ jointly, and practitioners treat the resulting latent space as a discrete code -- for clustering, conditional generation, and mechanistic interpretability. Yet standard VAE diagnostics -- ELBO, active units, mutual information, and code histograms -- certify only whether this code is used, never whether the decoder reads each latent under the encoder's code.
We close this gap with the neural codebook channel $K_{e\to d}(j\mid i)$, a coupled encoder-decoder diagnostic whose off-diagonal mass is bounded by an architecture-free Bernoulli-KL certificate $d_{\mathrm{bin}}(1-\mathcal{A} \,\|\, \bar\eta_p) \le \bar\Delta$ controlled by the variational gap. The certificate is the operational specialization of the classical KL chain rule under disintegration to the encoder-decoder disagreement event, complemented by a constructive marginal-impossibility result: no combination of marginal histograms, entropies, active-code counts, or mutual information determines $K_{e\to d}$.
We audit the certificate on four sklearn datasets (finite-grid exact, 5/5 seeds, 20/20 pairs satisfy the bound), a 2D model where the bound is non-vacuous at $2.71\times$ the observed disagreement and the four-term identity closes within $10^{-4}$, MNIST under importance-sampling control, and a VQ-VAE attaining the predicted limit $\hat{\mathcal{A}}=1.000$. The package $(K_{e\to d}, \mathcal{A}, R_{\mathrm{eff}}, R, \mathrm{AU})$ is an audit-ready reporting unit. More broadly, the framework makes mismatched decoding -- a failure mode classical communication theory named decades ago -- visible inside a single deep generative model.
Comments: 9 pages, 2 figures
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT)
Cite as: arXiv:2605.18846 [cs.LG]
  (or arXiv:2605.18846v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.2605.18846
arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yusuke Hayashi [view email]
[v1] Wed, 13 May 2026 06:52:21 UTC (480 KB)
Full-text links:

Access Paper:

Current browse context:

cs.LG
< prev   |   next >
Change to browse by:

References & Citations

Loading...

BibTeX formatted citation

loading...
Data provided by:

Bookmark

BibSonomy Reddit
Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos

Demos

Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers

Recommenders and Search Tools

Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
IArxiv recommender toggle
IArxiv Recommender (What is IArxiv?)
About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from arXiv — Machine Learning