Hugging Face Daily Papers · · 6 min read

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

We introduce Tiny but Trusted, a parameter-efficient vision-language framework for time-series anomaly detection with grounded reasoning, enabling compact VLMs to provide accurate and explainable reasoning over sequential data. Instead of treating anomaly detection as interval prediction alone, we construct VisAnomBench, a curated benchmark from public time-series datasets augmented with natural-language anomaly rationales selected using fine-grained, task-specific rewards from multiple large VLMs. Fine-tuning on this benchmark yields VisAnomReasoner, a lightweight VLM that jointly localizes abnormal temporal regions and explains the underlying pattern shifts. The model improves anomaly localization and interpretability while outperforming strong baselines on VisAnomBench and generalizing to TSB-AD-U.</p>\n","updatedAt":"2026-05-29T15:44:52.151Z","author":{"_id":"678ac3b31cbaa0b4bc295885","avatarUrl":"/avatars/1244fc1b305c9c6383df9bb5e4707347.svg","fullname":"Ismini Lourentzou","name":"isminoula","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8342894911766052},"editors":["isminoula"],"editorAvatarUrls":["/avatars/1244fc1b305c9c6383df9bb5e4707347.svg"],"reactions":[],"isReport":false}},{"id":"6a1a41266c3a012e753effff","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false},"createdAt":"2026-05-30T01:45:10.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Reasoning-Guided Grounding: Elevating Video Anomaly Detection through Multimodal Large Language Models](https://huggingface.co/papers/2605.02912) (2026)\n* [LATERN: Test-Time Context-Aware Explainable Video Anomaly Detection](https://huggingface.co/papers/2605.15054) (2026)\n* [Detecting Time Series Anomalies Like an Expert: A Multi-Agent LLM Framework with Specialized Analyzers](https://huggingface.co/papers/2605.05725) (2026)\n* [AnomalyAgent: Training-Free Agentic Models for Zero-/Few-Shot Anomaly Detection](https://huggingface.co/papers/2605.30140) (2026)\n* [IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools](https://huggingface.co/papers/2605.20682) (2026)\n* [CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection](https://huggingface.co/papers/2605.23116) (2026)\n* [Anomaly-Aware Vision-Language Adapters for Zero-Shot Anomaly Detection](https://huggingface.co/papers/2605.12069) (2026)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"<p>This is an automated message from the <a href=\"https://huggingface.co/librarian-bots\">Librarian Bot</a>. I found the following papers similar to this paper. </p>\n<p>The following papers were recommended by the Semantic Scholar API </p>\n<ul>\n<li><a href=\"https://huggingface.co/papers/2605.02912\">Reasoning-Guided Grounding: Elevating Video Anomaly Detection through Multimodal Large Language Models</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.15054\">LATERN: Test-Time Context-Aware Explainable Video Anomaly Detection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.05725\">Detecting Time Series Anomalies Like an Expert: A Multi-Agent LLM Framework with Specialized Analyzers</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.30140\">AnomalyAgent: Training-Free Agentic Models for Zero-/Few-Shot Anomaly Detection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.20682\">IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.23116\">CoReVAD: A Contextual Reasoning Framework for Training-Free Video Anomaly Detection</a> (2026)</li>\n<li><a href=\"https://huggingface.co/papers/2605.12069\">Anomaly-Aware Vision-Language Adapters for Zero-Shot Anomaly Detection</a> (2026)</li>\n</ul>\n<p> Please give a thumbs up to this comment if you found it helpful!</p>\n<p> If you want recommendations for any Paper on Hugging Face checkout <a href=\"https://huggingface.co/spaces/librarian-bots/recommend_similar_papers\">this</a> Space</p>\n<p> You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: <code><span class=\"SVELTE_PARTIAL_HYDRATER contents\" data-target=\"UserMention\" data-props=\"{&quot;user&quot;:&quot;librarian-bot&quot;}\"><span class=\"inline-block\"><span class=\"contents\"><a href=\"/librarian-bot\">@<span class=\"underline\">librarian-bot</span></a></span> </span></span> recommend</code></p>\n","updatedAt":"2026-05-30T01:45:10.876Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":359,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.686028242111206},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.30344","authors":[{"_id":"6a19b3be808ddbc3c7d42d95","name":"Xiaona Zhou","hidden":false},{"_id":"6a19b3be808ddbc3c7d42d96","name":"Muntasir Wahed","hidden":false},{"_id":"6a19b3be808ddbc3c7d42d97","name":"Tianjiao Yu","hidden":false},{"_id":"6a19b3be808ddbc3c7d42d98","name":"Constantin Brif","hidden":false},{"_id":"6a19b3be808ddbc3c7d42d99","name":"Ismini Lourentzou","hidden":false}],"publishedAt":"2026-05-28T00:00:00.000Z","submittedOnDailyAt":"2026-05-29T00:00:00.000Z","title":"Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection","submittedOnDailyBy":{"_id":"678ac3b31cbaa0b4bc295885","avatarUrl":"/avatars/1244fc1b305c9c6383df9bb5e4707347.svg","isPro":false,"fullname":"Ismini Lourentzou","user":"isminoula","type":"user","name":"isminoula"},"summary":"Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large language or multimodal models to finding abnormal patterns in sequential data. Public anomaly detection benchmarks typically provide interval annotations but not natural-language rationales, making it difficult to fine-tune VLMs to produce grounded, interpretable decisions. To address this gap, we construct VisAnomBench, a curated benchmark built from public time-series datasets and augmented with high-quality anomaly explanations selected from multiple large VLMs using fine-grained, task-specific rewards. Through fine-tuning on this benchmark, we develop VisAnomReasoner, a parameter-efficient VLM for time-series anomaly detection. Experimental results on VisAnomBench show that VisAnomReasoner achieves more accurate anomaly localization and consistently outperforms all baselines, with improvements of at least 21.23 and 23.87 percentage points in precision and F1, respectively. Additional experiments on the TSB-AD-U benchmark demonstrate strong cross-benchmark generalization, with VisAnomReasoner improving precision and F1 by 9.57 and 13.39 percentage points, respectively.","upvotes":0,"discussionId":"6a19b3bf808ddbc3c7d42d9a","projectPage":"https://plan-lab.github.io/projects/VisAnom","ai_summary":"A parameter-efficient vision-language model is developed for time-series anomaly detection using a novel benchmark with natural-language rationales, achieving superior performance and generalization across multiple datasets.","ai_keywords":["Vision-Language Models","anomaly detection","parameter-efficient fine-tuning","time-series datasets","natural-language rationales","cross-benchmark generalization"],"organization":{"_id":"681be082cdcffc26982f55d7","name":"PLAN-Lab","fullname":"PLAN Lab @University of Illinois Urbana-Champaign","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/678ac3b31cbaa0b4bc295885/B6lcdzr22Y44SsFuyTnwH.jpeg"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"681be082cdcffc26982f55d7","name":"PLAN-Lab","fullname":"PLAN Lab @University of Illinois Urbana-Champaign","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/678ac3b31cbaa0b4bc295885/B6lcdzr22Y44SsFuyTnwH.jpeg"}}">
Papers
arxiv:2605.30344

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Published on May 28
· Submitted by
Ismini Lourentzou
on May 29
Authors:
,
,
,
,

Abstract

A parameter-efficient vision-language model is developed for time-series anomaly detection using a novel benchmark with natural-language rationales, achieving superior performance and generalization across multiple datasets.

AI-generated summary

Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large language or multimodal models to finding abnormal patterns in sequential data. Public anomaly detection benchmarks typically provide interval annotations but not natural-language rationales, making it difficult to fine-tune VLMs to produce grounded, interpretable decisions. To address this gap, we construct VisAnomBench, a curated benchmark built from public time-series datasets and augmented with high-quality anomaly explanations selected from multiple large VLMs using fine-grained, task-specific rewards. Through fine-tuning on this benchmark, we develop VisAnomReasoner, a parameter-efficient VLM for time-series anomaly detection. Experimental results on VisAnomBench show that VisAnomReasoner achieves more accurate anomaly localization and consistently outperforms all baselines, with improvements of at least 21.23 and 23.87 percentage points in precision and F1, respectively. Additional experiments on the TSB-AD-U benchmark demonstrate strong cross-benchmark generalization, with VisAnomReasoner improving precision and F1 by 9.57 and 13.39 percentage points, respectively.

Community

Paper submitter about 23 hours ago

We introduce Tiny but Trusted, a parameter-efficient vision-language framework for time-series anomaly detection with grounded reasoning, enabling compact VLMs to provide accurate and explainable reasoning over sequential data. Instead of treating anomaly detection as interval prediction alone, we construct VisAnomBench, a curated benchmark from public time-series datasets augmented with natural-language anomaly rationales selected using fine-grained, task-specific rewards from multiple large VLMs. Fine-tuning on this benchmark yields VisAnomReasoner, a lightweight VLM that jointly localizes abnormal temporal regions and explains the underlying pattern shifts. The model improves anomaly localization and interpretability while outperforming strong baselines on VisAnomBench and generalizing to TSB-AD-U.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.30344 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.30344 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.30344 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers