From Llama to Cria: Scaling Down Neural Networks via Neuron-Level Spectral Structural Importance Evaluation
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:From Llama to Cria: Scaling Down Neural Networks via Neuron-Level Spectral Structural Importance Evaluation
Abstract:This paper proposes a neuron pruning framework based on neuron-level spectral structural importance evaluation. Given a trained neural network, we record the hidden states of each hidden layer during inference and model neurons as graph nodes, with hidden states treated as graph signals. Using ideas from graph signal processing, we infer layer-wise input and output graphs that characterize the structural relationships among neurons before and after each layer transformation. We then evaluate the spectral structural importance of neurons by analyzing the transformation between these graphs based on spectral graph theory. Neurons with high spectral structural importance are regarded as strongly involved in the internal representation transformation and are therefore preserved, while neurons with low importance scores are selected as pruning candidates. The pruning process is conducted iteratively until a predefined effective parameter reduction target is reached. Instead of fine-tuning after every pruning step, the proposed strategy first removes low-importance neurons to obtain a compact architecture and then applies a final recovery fine-tuning stage to restore task performance. By connecting neuron pruning with graph signal processing and spectral structural analysis, the proposed framework offers a principled way to reduce neural network size while maintaining solution quality. Experimental results on CIFAR-10 image classification and SST-2 sentiment classification show that our method can effectively remove low-importance neurons and achieve compact networks with competitive performance after recovery fine-tuning.
| Subjects: | Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV) |
| Cite as: | arXiv:2605.18860 [cs.LG] |
| (or arXiv:2605.18860v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.18860
arXiv-issued DOI via DataCite (pending registration)
|
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
Dimensional Balance Improves Large Scale Spatiotemporal Prediction Performance
May 20
-
Robust Basis Spline Decoupling for the Compression of Transformer Models
May 20
-
HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models
May 20
-
UCCI: Calibrated Uncertainty for Cost-Optimal LLM Cascade Routing
May 20
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.