Hugging Face Daily Papers · · 6 min read

PreScam: A Benchmark for Predicting Scam Progression from Early Conversations

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Conversational scams, such as romance and investment scams, are emerging as a major form of online fraud. Unlike one-shot scam lures such as fake lottery or unpaid toll messages, they unfold through multi-turn conversations in which scammers gradually manipulate victims using evolving psychological techniques. However, existing research mainly focuses on static scam detection or synthetic scams, leaving open whether language models can understand how real-world scams progress over time. We introduce PreScam, a benchmark for modeling scam progression from early conversations. Built from user-submitted scam reports, PreScam filters and structures 177,989 raw reports into 11,573 conversational scam instances spanning 20 scam categories. Each instance is hierarchically structured according to the scam lifecycle defined by the proposed scam kill chain, and further annotated at the turn level with scammer psychological actions and victim responses. We benchmark models on two tasks: real-time termination prediction, which estimates whether a conversation is approaching the termination stage, and scammer action prediction, which forecasts the scammer's subsequent actions. Results show a clear gap between surface-level fluency and progression modeling: supervised encoders substantially outperform zero-shot LLMs on real-time termination prediction, while next-action prediction remains only moderately successful even for strong LLMs. Taken together, these results show that current models can capture some scam-related cues, yet still struggle to track how risk escalates and how manipulation unfolds across turns.</p>\n","updatedAt":"2026-05-15T13:27:31.056Z","author":{"_id":"6481a16f70ac5e1968a7bb97","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6481a16f70ac5e1968a7bb97/ith2d4CuhfJH1CeU92wzE.jpeg","fullname":"Weixiang Sun","name":"Sweson","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9171884655952454},"editors":["Sweson"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6481a16f70ac5e1968a7bb97/ith2d4CuhfJH1CeU92wzE.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.12243","authors":[{"_id":"6a071f1e3192c37877925017","name":"Weixiang Sun","hidden":false},{"_id":"6a071f1e3192c37877925018","name":"Shang Ma","hidden":false},{"_id":"6a071f1e3192c37877925019","name":"Yiyang Li","hidden":false},{"_id":"6a071f1e3192c3787792501a","name":"Tianyi Ma","hidden":false},{"_id":"6a071f1e3192c3787792501b","name":"Zehong Wang","hidden":false},{"_id":"6a071f1e3192c3787792501c","name":"Colby Nelson","hidden":false},{"_id":"6a071f1e3192c3787792501d","name":"Xusheng Xiao","hidden":false},{"_id":"6a071f1e3192c3787792501e","name":"Yanfang Ye","hidden":false}],"publishedAt":"2026-05-12T00:00:00.000Z","submittedOnDailyAt":"2026-05-15T00:00:00.000Z","title":"PreScam: A Benchmark for Predicting Scam Progression from Early Conversations","submittedOnDailyBy":{"_id":"6481a16f70ac5e1968a7bb97","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6481a16f70ac5e1968a7bb97/ith2d4CuhfJH1CeU92wzE.jpeg","isPro":false,"fullname":"Weixiang Sun","user":"Sweson","type":"user","name":"Sweson"},"summary":"Conversational scams, such as romance and investment scams, are emerging as a major form of online fraud. Unlike one-shot scam lures such as fake lottery or unpaid toll messages, they unfold through multi-turn conversations in which scammers gradually manipulate victims using evolving psychological techniques. However, existing research mainly focuses on static scam detection or synthetic scams, leaving open whether language models can understand how real-world scams progress over time. We introduce PreScam, a benchmark for modeling scam progression from early conversations. Built from user-submitted scam reports, PreScam filters and structures 177,989 raw reports into 11,573 conversational scam instances spanning 20 scam categories. Each instance is hierarchically structured according to the scam lifecycle defined by the proposed scam kill chain, and further annotated at the turn level with scammer psychological actions and victim responses. We benchmark models on two tasks: real-time termination prediction, which estimates whether a conversation is approaching the termination stage, and scammer action prediction, which forecasts the scammer's subsequent actions. Results show a clear gap between surface-level fluency and progression modeling: supervised encoders substantially outperform zero-shot LLMs on real-time termination prediction, while next-action prediction remains only moderately successful even for strong LLMs. Taken together, these results show that current models can capture some scam-related cues, yet still struggle to track how risk escalates and how manipulation unfolds across turns.","upvotes":1,"discussionId":"6a071f1e3192c3787792501f","ai_summary":"PreScam benchmark enables modeling of scam progression through multi-turn conversations by structuring real-world reports according to a scam kill chain and annotating psychological actions and victim responses.","ai_keywords":["conversational scams","scam progression","scam kill chain","psychological actions","victim responses","real-time termination prediction","next-action prediction","supervised encoders","zero-shot LLMs","strong LLMs"],"organization":{"_id":"6356ef35fe4ffe942db2460b","name":"notredame","fullname":"University of Notre Dame","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/RJJ94XCJw7R0WkOyrvXIU.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"661ab1f1fa3b144a381fa454","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/661ab1f1fa3b144a381fa454/IlpZBb9NCjo7ntFwMIH53.png","isPro":true,"fullname":"Urro","user":"urroxyz","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"6356ef35fe4ffe942db2460b","name":"notredame","fullname":"University of Notre Dame","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/RJJ94XCJw7R0WkOyrvXIU.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.12243.md"}">
Papers
arxiv:2605.12243

PreScam: A Benchmark for Predicting Scam Progression from Early Conversations

Published on May 12
· Submitted by
Weixiang Sun
on May 15
Authors:
,
,
,
,
,
,
,

Abstract

PreScam benchmark enables modeling of scam progression through multi-turn conversations by structuring real-world reports according to a scam kill chain and annotating psychological actions and victim responses.

AI-generated summary

Conversational scams, such as romance and investment scams, are emerging as a major form of online fraud. Unlike one-shot scam lures such as fake lottery or unpaid toll messages, they unfold through multi-turn conversations in which scammers gradually manipulate victims using evolving psychological techniques. However, existing research mainly focuses on static scam detection or synthetic scams, leaving open whether language models can understand how real-world scams progress over time. We introduce PreScam, a benchmark for modeling scam progression from early conversations. Built from user-submitted scam reports, PreScam filters and structures 177,989 raw reports into 11,573 conversational scam instances spanning 20 scam categories. Each instance is hierarchically structured according to the scam lifecycle defined by the proposed scam kill chain, and further annotated at the turn level with scammer psychological actions and victim responses. We benchmark models on two tasks: real-time termination prediction, which estimates whether a conversation is approaching the termination stage, and scammer action prediction, which forecasts the scammer's subsequent actions. Results show a clear gap between surface-level fluency and progression modeling: supervised encoders substantially outperform zero-shot LLMs on real-time termination prediction, while next-action prediction remains only moderately successful even for strong LLMs. Taken together, these results show that current models can capture some scam-related cues, yet still struggle to track how risk escalates and how manipulation unfolds across turns.

Community

Paper submitter about 12 hours ago

Conversational scams, such as romance and investment scams, are emerging as a major form of online fraud. Unlike one-shot scam lures such as fake lottery or unpaid toll messages, they unfold through multi-turn conversations in which scammers gradually manipulate victims using evolving psychological techniques. However, existing research mainly focuses on static scam detection or synthetic scams, leaving open whether language models can understand how real-world scams progress over time. We introduce PreScam, a benchmark for modeling scam progression from early conversations. Built from user-submitted scam reports, PreScam filters and structures 177,989 raw reports into 11,573 conversational scam instances spanning 20 scam categories. Each instance is hierarchically structured according to the scam lifecycle defined by the proposed scam kill chain, and further annotated at the turn level with scammer psychological actions and victim responses. We benchmark models on two tasks: real-time termination prediction, which estimates whether a conversation is approaching the termination stage, and scammer action prediction, which forecasts the scammer's subsequent actions. Results show a clear gap between surface-level fluency and progression modeling: supervised encoders substantially outperform zero-shot LLMs on real-time termination prediction, while next-action prediction remains only moderately successful even for strong LLMs. Taken together, these results show that current models can capture some scam-related cues, yet still struggle to track how risk escalates and how manipulation unfolds across turns.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2605.12243
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.12243 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.12243 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.12243 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers