LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:LLMTabBench: Evaluating LLMs on Binary Tabular Classification From Zero to Few Shots
Abstract:Supervised classification for tabular data remains a core machine learning task, yet its reliance on large labeled datasets limits applicability in data-scarce domains. For such few-shot scenarios, specialized methods like TabPFN - a state-of-the-art Prior-Data Fitted Network - have set a high standard by leveraging large-scale synthetic pretraining, though they still require a context of labeled examples to function. In contrast, Large Language Models (LLMs) could offer a more flexible alternative via zero- and few-shot in-context learning directly from task descriptions, but their performance on tabular data remains inconsistent and poorly understood. We introduce LLMTabBench, a benchmark designed to systematically evaluate LLMs for tabular classification under data-scarce conditions. LLMTabBench explicitly probes (i) how LLM prior knowledge interacts with in-context information (task descriptions and few-shot examples), and (ii) how model performance scales with increasing data complexity, using both real-world and controlled synthetic datasets. Our findings include: (1) LLMs are highly competitive in zero-shot settings and can outperform alternative models, even when those models have access to few-shot examples; (2) incorporating additional few-shot examples can conflict with LLM prior knowledge, limiting or even degrading performance; and (3) there is a data complexity threshold beyond which LLMs' performance declines and few-shot examples become less effective. Together, these findings reveal fundamental constraints of in-context learning for tabular data and provide practical guidance for deploying LLMs in low-data regimes.
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2605.24417 [cs.LG] |
| (or arXiv:2605.24417v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2605.24417
arXiv-issued DOI via DataCite (pending registration)
|
Submission history
From: Kseniia Kuvshinova [view email][v1] Sat, 23 May 2026 06:05:20 UTC (37,359 KB)
Access Paper:
- View PDF
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
Algometrics: Forecasting Under Algorithmic Feedback
May 26
-
Parameter Efficient Multi-Class Intelligent Scheduling for Multimodal Online Distributed Industrial Anomaly Detection
May 26
-
CAFD: Concept-Aware DNN Fault Detection using VLMs
May 26
-
Towards Verifiable Transformers: Solver-Checkable Circuit Explanations
May 26
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.