Algebraic Machine Learning for Small-to-Medium Datasets Is Competitive against Strong Standard Baselines
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:Algebraic Machine Learning for Small-to-Medium Datasets Is Competitive against Strong Standard Baselines
Abstract:Symbolic methods are generally not considered competitive with strong modern learners on realistic supervised tasks. We evaluate Algebraic Machine Learning (AML), a framework that learns through subdirect decomposition of algebraic structure rather than numerical optimization, against standard baselines on image and tabular classification across varying training-set sizes. We find that AML trained only on training data without using validation or cross-validation outperforms a family of cross-validated baseline methods including CNNs on small to medium image datasets (50--2000 training examples). On tabular datasets in the same size range, XGBoost is overall the best performing method, but AML is nonetheless comparable to methods incorporating task-specific biases such as LightGBM and random forests. AML achieves this competitive performance across two very different types of datasets using a generic algebraic inductive bias, rather than the modality-specific biases built into standard baselines like CNNs for images or XGBoost for tabular data, and requires no cross validation because it has no task-dependent hyperparameters to tune.
| Comments: | 9 pages, 4 figures |
| Subjects: | Machine Learning (cs.LG) |
| ACM classes: | I.2.6 |
| Cite as: | arXiv:2605.22155 [cs.LG] |
| (or arXiv:2605.22155v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.22155
arXiv-issued DOI via DataCite (pending registration)
|
Submission history
From: Gonzalo G. De Polavieja [view email][v1] Thu, 21 May 2026 08:25:22 UTC (262 KB)
Access Paper:
- View PDF
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
Temporal Contrastive Transformer for Financial Crime Detection: Self-Supervised Sequence Embeddings via Predictive Contrastive Coding
May 22
-
Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation
May 22
-
The Attribution Impossibility: No Feature Ranking Is Faithful, Stable, and Complete Under Collinearity
May 22
-
Don't Collapse Your Features: Why CenterLoss Hurts OOD Detection and Multi-Scale Mahalanobis Wins
May 22
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.