Hugging Face Daily Papers · May 21, 2026 · 4 min read

Toto 2.0: Time Series Forecasting Enters the Scaling Era

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

Toto 2.0 is designed to answer a simple and open question: Can time series foundation models (TSFMs) improve as they scale? </p>\n<p>Our results show they can. The highlights:</p>\n<ul>\n<li>Scaling that works. Every size improves on the one below it, with no sign of saturation at 2.5B. </li>\n<li>Best in class on every benchmark we tested. Toto 2.0 takes the top spots on BOOM (Datadog's observability forecasting benchmark), GIFT-Eval (the standard general-purpose benchmark), and TIME (a new contamination-resistant zero-shot benchmark).</li>\n<li>A generational jump from Toto 1.0. Toto 2.0 is 7× more parameter-efficient at matching quality and dramatically faster at inference time.</li>\n<li>Trained on observability and synthetic data, generalizes broadly. Toto 2.0 does not see any public forecasting data during pretraining, yet leads the field on general-purpose benchmarks.</li>\n</ul>\n","updatedAt":"2026-05-21T13:13:53.839Z","author":{"_id":"645e9d6d9c8e15af60a7d44f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/645e9d6d9c8e15af60a7d44f/uCuZRH2YcYktidW-Re9Xp.png","fullname":"Emaad Khwaja","name":"Emaad","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8899980783462524},"editors":["Emaad"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/645e9d6d9c8e15af60a7d44f/uCuZRH2YcYktidW-Re9Xp.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.20119","authors":[{"_id":"6a0dc52bd1ef9ecdf71c0db6","user":{"_id":"645e9d6d9c8e15af60a7d44f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/645e9d6d9c8e15af60a7d44f/uCuZRH2YcYktidW-Re9Xp.png","isPro":false,"fullname":"Emaad Khwaja","user":"Emaad","type":"user","name":"Emaad"},"name":"Emaad Khwaja","status":"claimed_verified","statusLastChangedAt":"2026-05-21T19:24:09.999Z","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0db7","name":"Chris Lettieri","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0db8","name":"Gerald Woo","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0db9","name":"Eden Belouadah","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dba","name":"Marc Cenac","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dbb","name":"Guillaume Jarry","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dbc","name":"Enguerrand Paquin","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dbd","name":"Xunyi Zhao","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dbe","name":"Viktoriya Zhukov","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dbf","name":"Othmane Abou-Amal","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dc0","name":"Chenghao Liu","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dc1","name":"Ameet Talwalkar","hidden":false},{"_id":"6a0dc52bd1ef9ecdf71c0dc2","name":"David Asker","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/645e9d6d9c8e15af60a7d44f/y4Letwe272Kj3BZqRUGvA.png","https://cdn-uploads.huggingface.co/production/uploads/645e9d6d9c8e15af60a7d44f/bRxC_dXH7umSpa9HvUIUX.png"],"publishedAt":"2026-05-19T00:00:00.000Z","submittedOnDailyAt":"2026-05-21T00:00:00.000Z","title":"Toto 2.0: Time Series Forecasting Enters the Scaling Era","submittedOnDailyBy":{"_id":"645e9d6d9c8e15af60a7d44f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/645e9d6d9c8e15af60a7d44f/uCuZRH2YcYktidW-Re9Xp.png","isPro":false,"fullname":"Emaad Khwaja","user":"Emaad","type":"user","name":"Emaad"},"summary":"We show that time series foundation models scale: a single training recipe produces reliable forecast-quality improvements from 4M to 2.5B parameters. We release Toto 2.0, a family of five open-weights forecasting models trained under this recipe. The Toto 2.0 family sets a new state of the art on three forecasting benchmarks: BOOM, our observability benchmark; GIFT-Eval, the standard general-purpose benchmark; and the recent contamination-resistant TIME benchmark. This report describes our experimental results and details the design decisions behind Toto 2.0: its architecture and training recipe, training data, and the u-muP hyperparameter transfer pipeline. All five base checkpoints are released under Apache 2.0.","upvotes":26,"discussionId":"6a0dc52bd1ef9ecdf71c0dc3","projectPage":"https://www.datadoghq.com/blog/ai/toto-2/","githubRepo":"https://github.com/DataDog/toto","githubRepoAddedBy":"user","ai_summary":"Time series foundation models demonstrate scalable forecasting performance across parameter sizes, with Toto 2.0 achieving state-of-the-art results on multiple benchmarks through a unified training approach.","ai_keywords":["time series foundation models","forecasting models","parameter scaling","BOOM benchmark","GIFT-Eval benchmark","TIME benchmark","u-muP hyperparameter transfer pipeline"],"githubStars":437,"organization":{"_id":"676d60964b96c8ead04106ea","name":"Datadog","fullname":"Datadog","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64399c0deb7c5616ef401ae5/tIe52AF51aIyKzDtbvH2U.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"645e9d6d9c8e15af60a7d44f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/645e9d6d9c8e15af60a7d44f/uCuZRH2YcYktidW-Re9Xp.png","isPro":false,"fullname":"Emaad Khwaja","user":"Emaad","type":"user"},{"_id":"6695376dca566116a61c8c27","avatarUrl":"/avatars/445a293b052048ff9abfc078ef5d7ca3.svg","isPro":false,"fullname":"Ben Cohen","user":"bthecohen","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"67bc8f7b5f3968ee6e4bc46c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67bc8f7b5f3968ee6e4bc46c/SRiW5vB7svVSUkUmcGUZC.jpeg","isPro":false,"fullname":"Chris Lettieri","user":"chris-lettieri-dd","type":"user"},{"_id":"6a0f1393b6daaf9802639236","avatarUrl":"/avatars/60a11ef5eb8f90bfa379a084ef63a3b4.svg","isPro":false,"fullname":"saferstein","user":"jsafo","type":"user"},{"_id":"6a0f1401201a52c2acaeee55","avatarUrl":"/avatars/159c89c73a3bda75c2ce447e6515d25e.svg","isPro":false,"fullname":"Roman","user":"rchevassu","type":"user"},{"_id":"6850832dc32aa399069b1100","avatarUrl":"/avatars/3700a2305c818378947c7d2c32230278.svg","isPro":false,"fullname":"Harmon Herring","user":"harmonherring-pro","type":"user"},{"_id":"64b7246f75b23e68c535320a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b7246f75b23e68c535320a/uy41js-IL4cd1qFiX4CAI.jpeg","isPro":true,"fullname":"Patrick Lee","user":"patrickleenyc","type":"user"},{"_id":"67aa29b64cb8b1eb4e07598a","avatarUrl":"/avatars/09c64ee0c2dd19897b84b20462794dee.svg","isPro":false,"fullname":"Juliet Moss","user":"julietmoss","type":"user"},{"_id":"682f4326ee1e9ce844a8deb4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/682f4326ee1e9ce844a8deb4/_bzLYS3sZyooO4aapK2vw.jpeg","isPro":false,"fullname":"Varun Reddy","user":"varunreddy5455","type":"user"},{"_id":"6a0f15cb452a0a84889d6a1c","avatarUrl":"/avatars/4c32c62afdd5ff60cbbcd10297f045cb.svg","isPro":false,"fullname":"Eli Schiff","user":"elischiffdd","type":"user"},{"_id":"682f48499806c0814788be70","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/5cbhulmatdl_YPWuaUFv-.png","isPro":false,"fullname":"Ameet Talwalkar","user":"atalwalkar","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"676d60964b96c8ead04106ea","name":"Datadog","fullname":"Datadog","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/64399c0deb7c5616ef401ae5/tIe52AF51aIyKzDtbvH2U.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.20119.md"}">

Papers

arxiv:2605.20119

Toto 2.0: Time Series Forecasting Enters the Scaling Era

Published on May 19

· Submitted by

Emaad Khwaja on May 21

Datadog

Upvote

Authors:

Emaad Khwaja ,

Abstract

Time series foundation models demonstrate scalable forecasting performance across parameter sizes, with Toto 2.0 achieving state-of-the-art results on multiple benchmarks through a unified training approach.

AI-generated summary

We show that time series foundation models scale: a single training recipe produces reliable forecast-quality improvements from 4M to 2.5B parameters. We release Toto 2.0, a family of five open-weights forecasting models trained under this recipe. The Toto 2.0 family sets a new state of the art on three forecasting benchmarks: BOOM, our observability benchmark; GIFT-Eval, the standard general-purpose benchmark; and the recent contamination-resistant TIME benchmark. This report describes our experimental results and details the design decisions behind Toto 2.0: its architecture and training recipe, training data, and the u-muP hyperparameter transfer pipeline. All five base checkpoints are released under Apache 2.0.

View arXiv page View PDF Project page GitHub 437 Add to collection

Community

Emaad

Paper author Paper submitter about 13 hours ago

Toto 2.0 is designed to answer a simple and open question: Can time series foundation models (TSFMs) improve as they scale?

Our results show they can. The highlights:

Scaling that works. Every size improves on the one below it, with no sign of saturation at 2.5B.
Best in class on every benchmark we tested. Toto 2.0 takes the top spots on BOOM (Datadog's observability forecasting benchmark), GIFT-Eval (the standard general-purpose benchmark), and TIME (a new contamination-resistant zero-shot benchmark).
A generational jump from Toto 1.0. Toto 2.0 is 7× more parameter-efficient at matching quality and dramatically faster at inference time.
Trained on observability and synthetic data, generalizes broadly. Toto 2.0 does not see any public forecasting data during pretraining, yet leads the field on general-purpose benchmarks.