Hugging Face Daily Papers · · 3 min read

TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

<a href=\"https://github.com/HotTricker/TransitLM.git\" rel=\"nofollow\">https://github.com/HotTricker/TransitLM.git</a></p>\n","updatedAt":"2026-05-22T07:38:47.264Z","author":{"_id":"66d255e3947594430c723ff6","avatarUrl":"/avatars/c56e4792332a01bf34085a75ee64916e.svg","fullname":"xiaochonglinghu","name":"xiaochonglinghu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5978816747665405},"editors":["xiaochonglinghu"],"editorAvatarUrls":["/avatars/c56e4792332a01bf34085a75ee64916e.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.22355","authors":[{"_id":"6a0fc10ea53a61ce2e422c9a","name":"Hanyu Guo","hidden":false},{"_id":"6a0fc10ea53a61ce2e422c9b","name":"Jiedong Yang","hidden":false},{"_id":"6a0fc10ea53a61ce2e422c9c","name":"Chao Chen","hidden":false},{"_id":"6a0fc10ea53a61ce2e422c9d","name":"Longfei Xu","hidden":false},{"_id":"6a0fc10ea53a61ce2e422c9e","name":"Kaikui Liu","hidden":false},{"_id":"6a0fc10ea53a61ce2e422c9f","name":"Xiangxiang Chu","hidden":false}],"publishedAt":"2026-05-21T00:00:00.000Z","submittedOnDailyAt":"2026-05-22T00:00:00.000Z","title":"TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation","submittedOnDailyBy":{"_id":"66d255e3947594430c723ff6","avatarUrl":"/avatars/c56e4792332a01bf34085a75ee64916e.svg","isPro":false,"fullname":"xiaochonglinghu","user":"xiaochonglinghu","type":"user","name":"xiaochonglinghu"},"summary":"Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for three evaluation tasks with complementary metrics. Experiments show that an LLM trained on TransitLM produces structurally valid routes at high accuracy and implicitly grounds arbitrary GPS coordinates to appropriate stations without any explicit mapping. These results demonstrate that transit route planning can be learned entirely from data, enabling end-to-end, map-free route generation directly from origin-destination information. The dataset and benchmark are available at https://huggingface.co/datasets/GD-ML/TransitLM, with evaluation code at https://github.com/HotTricker/TransitLM.","upvotes":105,"discussionId":"6a0fc10ea53a61ce2e422ca0","githubRepo":"https://github.com/HotTricker/TransitLM","githubRepoAddedBy":"user","ai_summary":"TransitLM dataset enables end-to-end transit route planning using large language models trained on structured transit data, eliminating the need for traditional map-based approaches.","ai_keywords":["large language models","transit route planning","structured map infrastructure","routing engines","continual pre-training","evaluation tasks","GPS coordinates","origin-destination information"],"githubStars":96,"organization":{"_id":"67d11771890254196d3174e5","name":"GD-ML","fullname":"AMAP-ML","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/67d116c47be76de1a40873ca/s5ukAx9E36ZZIKvbpBRi4.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"682eeb11d28d9650c926abb0","avatarUrl":"/avatars/3c8a34cc61aa2330ffcb52446a9afe70.svg","isPro":false,"fullname":"LONGFEI XU","user":"Xufew","type":"user"},{"_id":"6964a5cb0be2016837983f50","avatarUrl":"/avatars/c99715fcb6c273184416bdb9852358cf.svg","isPro":false,"fullname":"DuanYihai","user":"Shenhaihahaha","type":"user"},{"_id":"6964a38ae811a5a53de8ebc9","avatarUrl":"/avatars/d2d3ed0ce3e25a158c6863a15ebeb25b.svg","isPro":false,"fullname":"So","user":"dwdSo","type":"user"},{"_id":"68ca4e012b37a7b211ff0c23","avatarUrl":"/avatars/8c7f6a2efb7d169cfe409db489963739.svg","isPro":false,"fullname":"AiSniper","user":"AiSniper2025","type":"user"},{"_id":"6964c5ef0342c2fce9136a9f","avatarUrl":"/avatars/ea7fa3f2d4b8045132675c7f1c60b5ce.svg","isPro":false,"fullname":"Chengran","user":"chengran0115","type":"user"},{"_id":"689ae74c60236ebade8018bc","avatarUrl":"/avatars/6ee947dabaf12181518595e6193ff6ae.svg","isPro":false,"fullname":"Huimin Yan","user":"Hannah6huimin","type":"user"},{"_id":"6964a33d2ab4d967d0543d84","avatarUrl":"/avatars/93702bd3303bccb4ee29812e3025e7aa.svg","isPro":false,"fullname":"liuzheng","user":"lzlz2000","type":"user"},{"_id":"6964ab352fb9f77cbe8c9854","avatarUrl":"/avatars/2c4734b40bb98d162837fbe6b128f7d7.svg","isPro":false,"fullname":"Hanyu Guo","user":"MaoPaJia","type":"user"},{"_id":"695c72dbd0638f21b7f7c645","avatarUrl":"/avatars/2c7d50b623859d4f47f3a6ac20145982.svg","isPro":false,"fullname":"sunjunjie","user":"j2sun","type":"user"},{"_id":"6964ac022fb9f77cbe8cac95","avatarUrl":"/avatars/adbe319c1a1b85c09ac3854b82ca1bfc.svg","isPro":false,"fullname":"luo","user":"weiluo2068","type":"user"},{"_id":"69649c726a20f62862cf9578","avatarUrl":"/avatars/b5314840d74265d3fa4f88948146a729.svg","isPro":false,"fullname":"liukaikui","user":"liukaikui","type":"user"},{"_id":"6448f74f011671fb7bdbf5b7","avatarUrl":"/avatars/b9c883ab3966889386d89f7db75ef1af.svg","isPro":false,"fullname":"ScottZhang","user":"ScottZhang","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"67d11771890254196d3174e5","name":"GD-ML","fullname":"AMAP-ML","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/67d116c47be76de1a40873ca/s5ukAx9E36ZZIKvbpBRi4.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.22355.md"}">
Papers
arxiv:2605.22355

TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation

Published on May 21
· Submitted by
xiaochonglinghu
on May 22
Authors:
,
,
,
,
,

Abstract

TransitLM dataset enables end-to-end transit route planning using large language models trained on structured transit data, eliminating the need for traditional map-based approaches.

AI-generated summary

Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for three evaluation tasks with complementary metrics. Experiments show that an LLM trained on TransitLM produces structurally valid routes at high accuracy and implicitly grounds arbitrary GPS coordinates to appropriate stations without any explicit mapping. These results demonstrate that transit route planning can be learned entirely from data, enabling end-to-end, map-free route generation directly from origin-destination information. The dataset and benchmark are available at https://huggingface.co/datasets/GD-ML/TransitLM, with evaluation code at https://github.com/HotTricker/TransitLM.

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.22355
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.22355 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.22355 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.22355 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers