Toto 2.0 is designed to answer a simple and open question: Can time series foundation models (TSFMs) improve as they scale? </p>\n<p>Our results show they can. The highlights:</p>\n<ul>\n<li>Scaling that works. Every size improves on the one below it, with no sign of saturation at 2.5B. </li>\n<li>Best in class on every benchmark we tested. Toto 2.0 takes the top spots on BOOM (Datadog's observability forecasting benchmark), GIFT-Eval (the standard general-purpose benchmark), and TIME (a new contamination-resistant zero-shot benchmark).</li>\n<li>A generational jump from Toto 1.0. Toto 2.0 is 7× more parameter-efficient at matching quality and dramatically faster at inference time.</li>\n<li>Trained on observability and synthetic data, generalizes broadly. Toto 2.0 does not see any public forecasting data during pretraining, yet leads the field on general-purpose benchmarks.</li>\n</ul>\n","updatedAt":"2026-05-21T13:13:53.839Z","author":{"_id":"645e9d6d9c8e15af60a7d44f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/645e9d6d9c8e15af60a7d44f/uCuZRH2YcYktidW-Re9Xp.png","fullname":"Emaad Khwaja","name":"Emaad","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8899980783462524},"editors":["Emaad"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/645e9d6d9c8e15af60a7d44f/uCuZRH2YcYktidW-Re9Xp.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.20119","authors":[{"_id":"6a0dc52bd1ef9ecdf71c0db6","user":{"_id":"645e9d6d9c8e15af60a7d44f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/645e9d6d9c8e15af60a7d44f/uCuZRH2YcYktidW-Re9Xp.png","isPro":false,"fullname":"Emaad Khwaja","user":"Emaad","type":"user","name":"Emaad"},"name":"Emaad Khwaja","status":"claimed_verified","statusLastChangedAt":"2026-05-21T19:24:09.999Z","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0db7","name":"Chris Lettieri","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0db8","name":"Gerald Woo","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0db9","name":"Eden Belouadah","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dba","name":"Marc Cenac","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dbb","name":"Guillaume Jarry","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dbc","name":"Enguerrand Paquin","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dbd","name":"Xunyi Zhao","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dbe","name":"Viktoriya Zhukov","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dbf","name":"Othmane Abou-Amal","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dc0","name":"Chenghao Liu","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dc1","name":"Ameet Talwalkar","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dc2","name":"David Asker","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/645e9d6d9c8e15af60a7d44f/y4Letwe272Kj3BZqRUGvA.png","https://cdn-uploads.huggingface.co/production/uploads/645e9d6d9c8e15af60a7d44f/bRxC_dXH7umSpa9HvUIUX.png"],"publishedAt":"2026-05-19T00:00:00.000Z","submittedOnDailyAt":"2026-05-21T00:00:00.000Z","title":"Toto 2.0: Time Series Forecasting Enters the Scaling Era","submittedOnDailyBy":{"_id":"645e9d6d9c8e15af60a7d44f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/645e9d6d9c8e15af60a7d44f/uCuZRH2YcYktidW-Re9Xp.png","isPro":false,"fullname":"Emaad Khwaja","user":"Emaad","type":"user","name":"Emaad"},"summary":"We show that time series foundation models scale: a single training recipe produces reliable forecast-quality improvements from 4M to 2.5B parameters. We release Toto 2.0, a family of five open-weights forecasting models trained under this recipe. The Toto 2.0 family sets a new state of the art on three forecasting benchmarks: BOOM, our observability benchmark; GIFT-Eval, the standard general-purpose benchmark; and the recent contamination-resistant TIME benchmark. This report describes our experimental results and details the design decisions behind Toto 2.0: its architecture and training recipe, training data, and the u-muP hyperparameter transfer pipeline. All five base checkpoints are released under Apache 2.0.","upvotes":26,"discussionId":"6a0dc52bd1ef9ecdf71c0dc3","projectPage":"https://www.datadoghq.com/blog/ai/toto-2/","githubRepo":"https://github.com/DataDog/toto","githubRepoAddedBy":"user","ai_summary":"Time series foundation models demonstrate scalable forecasting performance across parameter sizes, with Toto 2.0 achieving state-of-the-art results on multiple benchmarks through a unified training approach.","ai_keywords":["time series foundation models","forecasting models","parameter scaling","BOOM benchmark","GIFT-Eval benchmark","TIME benchmark","u-muP hyperparameter transfer pipeline"],"githubStars":437,"organization":{"_id":"676d60964b96c8ead04106ea","name":"Datadog","fullname":"Datadog","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64399c0deb7c5616ef401ae5/tIe52AF51aIyKzDtbvH2U.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"645e9d6d9c8e15af60a7d44f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/645e9d6d9c8e15af60a7d44f/uCuZRH2YcYktidW-Re9Xp.png","isPro":false,"fullname":"Emaad Khwaja","user":"Emaad","type":"user"},{"_id":"6695376dca566116a61c8c27","avatarUrl":"/avatars/445a293b052048ff9abfc078ef5d7ca3.svg","isPro":false,"fullname":"Ben Cohen","user":"bthecohen","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"67bc8f7b5f3968ee6e4bc46c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67bc8f7b5f3968ee6e4bc46c/SRiW5vB7svVSUkUmcGUZC.jpeg","isPro":false,"fullname":"Chris Lettieri","user":"chris-lettieri-dd","type":"user"},{"_id":"6a0f1393b6daaf9802639236","avatarUrl":"/avatars/60a11ef5eb8f90bfa379a084ef63a3b4.svg","isPro":false,"fullname":"saferstein","user":"jsafo","type":"user"},{"_id":"6a0f1401201a52c2acaeee55","avatarUrl":"/avatars/159c89c73a3bda75c2ce447e6515d25e.svg","isPro":false,"fullname":"Roman","user":"rchevassu","type":"user"},{"_id":"6850832dc32aa399069b1100","avatarUrl":"/avatars/3700a2305c818378947c7d2c32230278.svg","isPro":false,"fullname":"Harmon Herring","user":"harmonherring-pro","type":"user"},{"_id":"64b7246f75b23e68c535320a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b7246f75b23e68c535320a/uy41js-IL4cd1qFiX4CAI.jpeg","isPro":true,"fullname":"Patrick Lee","user":"patrickleenyc","type":"user"},{"_id":"67aa29b64cb8b1eb4e07598a","avatarUrl":"/avatars/09c64ee0c2dd19897b84b20462794dee.svg","isPro":false,"fullname":"Juliet Moss","user":"julietmoss","type":"user"},{"_id":"682f4326ee1e9ce844a8deb4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/682f4326ee1e9ce844a8deb4/_bzLYS3sZyooO4aapK2vw.jpeg","isPro":false,"fullname":"Varun Reddy","user":"varunreddy5455","type":"user"},{"_id":"6a0f15cb452a0a84889d6a1c","avatarUrl":"/avatars/4c32c62afdd5ff60cbbcd10297f045cb.svg","isPro":false,"fullname":"Eli Schiff","user":"elischiffdd","type":"user"},{"_id":"682f48499806c0814788be70","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/5cbhulmatdl_YPWuaUFv-.png","isPro":false,"fullname":"Ameet Talwalkar","user":"atalwalkar","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"676d60964b96c8ead04106ea","name":"Datadog","fullname":"Datadog","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64399c0deb7c5616ef401ae5/tIe52AF51aIyKzDtbvH2U.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.20119.md"}">
Toto 2.0: Time Series Forecasting Enters the Scaling Era
Authors: ,
,
,
,
,
,
,
,
,
,
,
Abstract
Time series foundation models demonstrate scalable forecasting performance across parameter sizes, with Toto 2.0 achieving state-of-the-art results on multiple benchmarks through a unified training approach.
AI-generated summary
We show that time series foundation models scale: a single training recipe produces reliable forecast-quality improvements from 4M to 2.5B parameters. We release Toto 2.0, a family of five open-weights forecasting models trained under this recipe. The Toto 2.0 family sets a new state of the art on three forecasting benchmarks: BOOM, our observability benchmark; GIFT-Eval, the standard general-purpose benchmark; and the recent contamination-resistant TIME benchmark. This report describes our experimental results and details the design decisions behind Toto 2.0: its architecture and training recipe, training data, and the u-muP hyperparameter transfer pipeline. All five base checkpoints are released under Apache 2.0.
Community
Toto 2.0 is designed to answer a simple and open question: Can time series foundation models (TSFMs) improve as they scale?
Our results show they can. The highlights:
- Scaling that works. Every size improves on the one below it, with no sign of saturation at 2.5B.
- Best in class on every benchmark we tested. Toto 2.0 takes the top spots on BOOM (Datadog's observability forecasting benchmark), GIFT-Eval (the standard general-purpose benchmark), and TIME (a new contamination-resistant zero-shot benchmark).
- A generational jump from Toto 1.0. Toto 2.0 is 7× more parameter-efficient at matching quality and dramatically faster at inference time.
- Trained on observability and synthetic data, generalizes broadly. Toto 2.0 does not see any public forecasting data during pretraining, yet leads the field on general-purpose benchmarks.
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2605.20119 in a dataset README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.