Hugging Face Daily Papers · June 11, 2026 · 8 min read

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

📊 Releasing TRL-Bench — a unified framework + library for tabular representation learning, one stop for tabular representation learning. 🧩 20 encoders · 16 tasks · 87 datasets across 3 suites 🔍 Built to make heterogeneous tabular models directly comparable, and reusable as embedding models\n<a href=\"https://cdn-uploads.huggingface.co/production/uploads/65164444bc0631719873af81/_v27hrO7JemUP6WICmlJh.png\" rel=\"nofollow\"><img src=\"https://cdn-uploads.huggingface.co/production/uploads/65164444bc0631719873af81/_v27hrO7JemUP6WICmlJh.png\" alt=\"pipeline\"></a>\nTabular encoders come in every shape: different input formats, training objectives, and output heads. So even two models built for the same job are hard to compare head-to-head. We built TRL-Bench to make them comparable.\nIt unifies everything at the level of the representation: each model is wrapped behind one shared interface that exports row-, column-, and table-embeddings, and shared lightweight heads probe those embeddings under common task definitions, so 20 encoders from every paradigm finally sit on the same axes.\nIt's also a library: 20 different types of tabular models are adapted into embedding models that export row, column, and table embeddings for the community to reuse. It spans three suites: 🧩 TRL-CTbench — 13 column/table tasks: schema, joinability, unionability, grounding 🔗 TRL-Rbench — multi-target row prediction (50 subtasks, 123 targets) + record linkage (16 datasets) 🌊 TRL-DLTE — a 47,772-table data-lake enrichment pipeline spanning all three granularities\nThe main takeaway is clear: there is no single best tabular encoder, strengths are split across different table jobs. The choice of tabular models should be task-aware.\nWe also find that:\n📌 Off-the-shelf text encoders are surprisingly strong when the signal is in the surface text (column names and cell values); cross-table alignment and matching instead reward structure-aware specialists\n📌 Predicting a value inside a table and matching the same record across tables call for different encoders: one rewards adapting to a single table, the other rewards embeddings that stay comparable across tables\n📌 Stacking the best per-stage encoders does not give the best compositional pipeline, and neither does reusing one encoder end-to-end; the winning recipe matches a different specialist to each step (find related tables → align columns → match rows)\nTRL-Bench is meant to serve both as a diagnostic benchmark and as a practical library for building on tabular representations.\n📄 Paper: <a href=\"https://arxiv.org/abs/2606.09323\" rel=\"nofollow\">https://arxiv.org/abs/2606.09323</a> 🌐 Website: <a href=\"https://logo-cuhksz.github.io/trl-bench.github.io/\" rel=\"nofollow\">https://logo-cuhksz.github.io/trl-bench.github.io/</a> 🤗 Datasets: <a href=\"https://huggingface.co/collections/logo-lab/trl-bench\">https://huggingface.co/datasets/logo-lab/trl-ctbench · trl-rbench · trl-dlte</a> 💻 Code: <a href=\"https://github.com/LOGO-CUHKSZ/TRL-Bench\" rel=\"nofollow\">https://github.com/LOGO-CUHKSZ/TRL-Bench</a>\n","updatedAt":"2026-06-11T02:49:51.988Z","author":{"_id":"65164444bc0631719873af81","avatarUrl":"/avatars/0e68ea5b5369273a07e5889480ca9421.svg","fullname":"Wei Pang","name":"weipang142857","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8353967666625977},"editors":["weipang142857"],"editorAvatarUrls":["/avatars/0e68ea5b5369273a07e5889480ca9421.svg"],"reactions":[],"isReport":false}},{"id":"6a2a6c1e20955d2b4242d48b","author":{"_id":"676523c77cb286d2987945d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ShZn79IPLPxnrn4AAKkpu.png","fullname":"Duomin Zhang","name":"duogatech","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false},"createdAt":"2026-06-11T08:04:46.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Good work!","html":"Good work!\n","updatedAt":"2026-06-11T08:04:46.698Z","author":{"_id":"676523c77cb286d2987945d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ShZn79IPLPxnrn4AAKkpu.png","fullname":"Duomin Zhang","name":"duogatech","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.882943868637085},"editors":["duogatech"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/ShZn79IPLPxnrn4AAKkpu.png"],"reactions":[{"reaction":"❤️","users":["HideOnBush","weipang142857"],"count":2}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.09323","authors":[{"_id":"6a28d56ae7d78ea7587e547a","name":"Wei Pang","hidden":false},{"_id":"6a28d56ae7d78ea7587e547b","user":{"_id":"636865b8cca0a0a962c21f3f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/Mja7cpws4gb2Jmdj_foPA.png","isPro":false,"fullname":"Xiangru (Edward) Jian","user":"HideOnBush","type":"user","name":"HideOnBush"},"name":"Xiangru Jian","status":"claimed_verified","statusLastChangedAt":"2026-06-11T08:42:40.361Z","hidden":false},{"_id":"6a28d56ae7d78ea7587e547c","name":"Hehan Li","hidden":false},{"_id":"6a28d56ae7d78ea7587e547d","name":"Zhixuan Yu","hidden":false},{"_id":"6a28d56ae7d78ea7587e547e","name":"Alex Xue","hidden":false},{"_id":"6a28d56ae7d78ea7587e547f","name":"Jinyang Li","hidden":false},{"_id":"6a28d56ae7d78ea7587e5480","user":{"_id":"6476fb5603fe88eff54c1ff4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6476fb5603fe88eff54c1ff4/oBnppwPhHG8ixnXw3tfIF.png","isPro":false,"fullname":"Zhengyuan Dong","user":"dora2023","type":"user","name":"dora2023"},"name":"Zhengyuan Dong","status":"claimed_verified","statusLastChangedAt":"2026-06-11T08:42:38.250Z","hidden":false},{"_id":"6a28d56ae7d78ea7587e5481","name":"Xinjian Zhao","hidden":false},{"_id":"6a28d56ae7d78ea7587e5482","name":"Hao Xu","hidden":false},{"_id":"6a28d56ae7d78ea7587e5483","name":"Chao Zhang","hidden":false},{"_id":"6a28d56ae7d78ea7587e5484","name":"Reynold Cheng","hidden":false},{"_id":"6a28d56ae7d78ea7587e5485","name":"M. Tamer Özsu","hidden":false},{"_id":"6a28d56ae7d78ea7587e5486","name":"Tianshu Yu","hidden":false}],"publishedAt":"2026-06-08T00:00:00.000Z","submittedOnDailyAt":"2026-06-11T00:00:00.000Z","title":"TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders","submittedOnDailyBy":{"_id":"65164444bc0631719873af81","avatarUrl":"/avatars/0e68ea5b5369273a07e5889480ca9421.svg","isPro":false,"fullname":"Wei Pang","user":"weipang142857","type":"user","name":"weipang142857"},"summary":"Tabular encoders are usually evaluated inside task-specific end-to-end pipelines, so models from different training paradigms are difficult to compare directly even when they operate on similar tabular signals. We introduce TRL-Bench, a multi-granular tabular representation learning (TRL) benchmark that standardizes cross-paradigm representation-level evaluation: each encoder exports row-, column-, or table embeddings through its supported wrapper, and shared lightweight heads probe them across three suites: TRL-CTbench (column/table), TRL-Rbench (row), and TRL-DLTE (compositional Data-Lake Table Enrichment spanning all three granularities). To support this standardized setting, we release curated benchmark assets and task reformulations, including 50 OpenML tables with 123 verified targets, 16 row-pair linkage rewrites, and a 47,772-table DLTE lake derived from 1,379 parent tables. Across 20 models and 16 tasks, TRL-Bench shows that once downstream conditions are standardized, encoder quality is capability-specific rather than captured by a single leaderboard. In TRL-CTbench, generic text encoders often lead on tasks with strong surface-text signal, while tabular specialists win where their pretraining objective aligns with the task. In TRL-Rbench, within-table prediction and cross-table linkage favor different training regimes, with atomic linkage performance correlating strongly with the row-matching stage of DLTE pipelines. In TRL-DLTE, the strongest pipelines combine capability-matched specialists rather than reuse a single encoder, and top end-to-end quality depends on non-additive compositional fit rather than per-stage marginal rank alone. TRL-Bench provides a common protocol for measuring reusable signal in exported tabular representations under shared downstream conditions. Code and data: https://github.com/LOGO-CUHKSZ/TRL-Bench","upvotes":46,"discussionId":"6a28d56ae7d78ea7587e5487","projectPage":"https://logo-cuhksz.github.io/trl-bench.github.io/","githubRepo":"https://github.com/LOGO-CUHKSZ/TRL-Bench","githubRepoAddedBy":"user","ai_summary":"TRL-Bench establishes a standardized benchmark for evaluating tabular representation learning models across multiple granularities, revealing that encoder performance varies by task type and requires capability-specific assessment rather than single leaderboard rankings.","ai_keywords":["tabular encoders","representation learning","end-to-end pipelines","multi-granular benchmark","row embeddings","column embeddings","table embeddings","TRL-CTbench","TRL-Rbench","TRL-DLTE","OpenML tables","data-lake table enrichment","downstream tasks","capability-specific evaluation","compositional fit"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":4,"organization":{"_id":"6223644d0129f2097d69a407","name":"CUHKSZ","fullname":"Chinese University of Hong Kong, Shenzhen","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1646486592158-6108ae87823007eaf0c7bd1e.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"65164444bc0631719873af81","avatarUrl":"/avatars/0e68ea5b5369273a07e5889480ca9421.svg","isPro":false,"fullname":"Wei Pang","user":"weipang142857","type":"user"},{"_id":"6476fb5603fe88eff54c1ff4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6476fb5603fe88eff54c1ff4/oBnppwPhHG8ixnXw3tfIF.png","isPro":false,"fullname":"Zhengyuan Dong","user":"dora2023","type":"user"},{"_id":"66c9b8a03864174f35a51b28","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/0tUoi-ddcDH6mgNLrfgEk.png","isPro":false,"fullname":"hao xu","user":"moreerom","type":"user"},{"_id":"6458a45f7a7e192202df27c4","avatarUrl":"/avatars/3d0da29c4271a22c4bf7148c2b7d546a.svg","isPro":false,"fullname":"Yaoyao Xu","user":"Camille1054","type":"user"},{"_id":"68380f4f231cf484dd4e87f4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/xp34hfiSLf-DiE1DVVhHk.png","isPro":false,"fullname":"Xinjian Zhao","user":"Xinjiansz","type":"user"},{"_id":"6820b6d2cf5e63bc59038a57","avatarUrl":"/avatars/619d6cfe977d6bd9a1409af076d10150.svg","isPro":false,"fullname":"Dan Qiao","user":"danqiao-cuhk","type":"user"},{"_id":"69dc6ca37e72cc0f9c7cbe47","avatarUrl":"/avatars/9fae8e8f90be0a75f7f791258fd76678.svg","isPro":false,"fullname":"Mengrui Liu","user":"Mengrui999","type":"user"},{"_id":"62bb1e0f3ff437e49a3088e5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62bb1e0f3ff437e49a3088e5/MWNanci3x5g780xh-704U.png","isPro":true,"fullname":"Suyuchen Wang","user":"sheryc","type":"user"},{"_id":"6452d79149b6b9a2383b5775","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/T28lP0kE7PZIGzJjhSpSx.jpeg","isPro":false,"fullname":"Tianyu Zhang","user":"TianyuZhang","type":"user"},{"_id":"6837c26cdf7eb544dfcc9703","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/f4HZS8cOkXSgGqb_saStv.png","isPro":false,"fullname":"Tianshu Yu","user":"shuitx","type":"user"},{"_id":"690988344766b88b09526e34","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/LhE3SAOACP9SFPteG7Iwm.png","isPro":false,"fullname":"Zhixuan Yu","user":"Y-xvan","type":"user"},{"_id":"6970a259bca2c57e5c1a8ac1","avatarUrl":"/avatars/d7f7da925c05c919e9bb17892721481f.svg","isPro":false,"fullname":"LIUZz","user":"DiveInLava123","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6223644d0129f2097d69a407","name":"CUHKSZ","fullname":"Chinese University of Hong Kong, Shenzhen","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/1646486592158-6108ae87823007eaf0c7bd1e.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.09323.md"}">

Papers

arxiv:2606.09323

TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders

Published on Jun 8

· Submitted by

Wei Pang on Jun 11

Chinese University of Hong Kong, Shenzhen

Upvote

Authors:

Xiangru Jian ,

Zhengyuan Dong ,

Abstract

TRL-Bench establishes a standardized benchmark for evaluating tabular representation learning models across multiple granularities, revealing that encoder performance varies by task type and requires capability-specific assessment rather than single leaderboard rankings.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Tabular encoders are usually evaluated inside task-specific end-to-end pipelines, so models from different training paradigms are difficult to compare directly even when they operate on similar tabular signals. We introduce TRL-Bench, a multi-granular tabular representation learning (TRL) benchmark that standardizes cross-paradigm representation-level evaluation: each encoder exports row-, column-, or table embeddings through its supported wrapper, and shared lightweight heads probe them across three suites: TRL-CTbench (column/table), TRL-Rbench (row), and TRL-DLTE (compositional Data-Lake Table Enrichment spanning all three granularities). To support this standardized setting, we release curated benchmark assets and task reformulations, including 50 OpenML tables with 123 verified targets, 16 row-pair linkage rewrites, and a 47,772-table DLTE lake derived from 1,379 parent tables. Across 20 models and 16 tasks, TRL-Bench shows that once downstream conditions are standardized, encoder quality is capability-specific rather than captured by a single leaderboard. In TRL-CTbench, generic text encoders often lead on tasks with strong surface-text signal, while tabular specialists win where their pretraining objective aligns with the task. In TRL-Rbench, within-table prediction and cross-table linkage favor different training regimes, with atomic linkage performance correlating strongly with the row-matching stage of DLTE pipelines. In TRL-DLTE, the strongest pipelines combine capability-matched specialists rather than reuse a single encoder, and top end-to-end quality depends on non-additive compositional fit rather than per-stage marginal rank alone. TRL-Bench provides a common protocol for measuring reusable signal in exported tabular representations under shared downstream conditions. Code and data: https://github.com/LOGO-CUHKSZ/TRL-Bench

View arXiv page View PDF Project page GitHub 4 Add to collection

Community

weipang142857

Paper submitter about 17 hours ago

📊 Releasing TRL-Bench — a unified framework + library for tabular representation learning, one stop for tabular representation learning.
🧩 20 encoders · 16 tasks · 87 datasets across 3 suites
🔍 Built to make heterogeneous tabular models directly comparable, and reusable as embedding models

Tabular encoders come in every shape: different input formats, training objectives, and output heads. So even two models built for the same job are hard to compare head-to-head.
We built TRL-Bench to make them comparable.

It unifies everything at the level of the representation: each model is wrapped behind one shared interface that exports row-, column-, and table-embeddings, and shared lightweight heads probe those embeddings under common task definitions, so 20 encoders from every paradigm finally sit on the same axes.

It's also a library: 20 different types of tabular models are adapted into embedding models that export row, column, and table embeddings for the community to reuse.
It spans three suites:
🧩 TRL-CTbench — 13 column/table tasks: schema, joinability, unionability, grounding
🔗 TRL-Rbench — multi-target row prediction (50 subtasks, 123 targets) + record linkage (16 datasets)
🌊 TRL-DLTE — a 47,772-table data-lake enrichment pipeline spanning all three granularities

The main takeaway is clear: there is no single best tabular encoder, strengths are split across different table jobs. The choice of tabular models should be task-aware.

We also find that:

📌 Off-the-shelf text encoders are surprisingly strong when the signal is in the surface text (column names and cell values); cross-table alignment and matching instead reward structure-aware specialists

📌 Predicting a value inside a table and matching the same record across tables call for different encoders: one rewards adapting to a single table, the other rewards embeddings that stay comparable across tables

📌 Stacking the best per-stage encoders does not give the best compositional pipeline, and neither does reusing one encoder end-to-end; the winning recipe matches a different specialist to each step (find related tables → align columns → match rows)

TRL-Bench is meant to serve both as a diagnostic benchmark and as a practical library for building on tabular representations.

📄 Paper: https://arxiv.org/abs/2606.09323
🌐 Website: https://logo-cuhksz.github.io/trl-bench.github.io/
🤗 Datasets: https://huggingface.co/datasets/logo-lab/trl-ctbench · trl-rbench · trl-dlte
💻 Code: https://github.com/LOGO-CUHKSZ/TRL-Bench