EMA-Gated Temporal Sequence Compression in Vision Transformers [P]
Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.
Vision Transformers waste 90% of their compute recalculating stationary asphalt. NeuroFlow tracks semantic surprise in embedding space, physically eliminating background tokens before the encoder.
NeuroFlow is a dynamic routing framework for Vision Transformer video inference. It exploits temporal redundancy by tracking per-patch semantic surprise via an Exponential Moving Average (EMA) of patch-level embeddings, effectively answering the architectural mismatch between O(N2) self-attention and highly redundant natural video streams.
Key Contributions
- Architecture C (Dual-Memory Reconstruction): A completely training-free inference engine that combines a Layer 0 Retinal Gate with a Layer 12 Cortical Cache. It achieves 71.55% zero-shot top-1 accuracy at 84.0% token sparsity on SigLIP, retaining 92.4% of dense accuracy without modifying any weights.
- Architecture B (Extreme Wall-Clock Speedup): Physically eliminates stationary tokens before the encoder. With sparse manifold distillation, it reduces 1792p SigLIP 2 inference from 678 ms to 11.9 ms—a 55.80× wall-clock speedup at 97.37% embedding fidelity.
- LLM Ablation: Characterises the architectural boundaries of applying similarity-gated bypass to autoregressive language models (Phi-3-mini), demonstrating 0% token drift in syntactically constrained generation.
Code and paper: https://github.com/ynnk-research/-NeuroFlow
[link] [comments]
More from r/MachineLearning
-
noisekit - CLI for generating realistic degraded speech datasets for ASR benchmarking [P]
May 27
-
Cross-species RSA: same learning rules (BP, PC, STDP, FA) tested against both human fMRI and macaque electrophysiology [P]
May 27
-
Profiling PyTorch training without accidentally stalling the GPU [D]
May 27
-
A Tiny Open-Source Self-Driving AI That Runs on a Phone [P]
May 27
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.