Hugging Face Daily Papers · May 26, 2026 · 4 min read

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Like Read original ↗

How Far Will They Go? Red-Teaming Online Influence with Large Language Models\nAs large language model (LLM)-based agents increasingly participate in online discourse, red-teaming their capacity to support political influence campaigns is critical for information integrity. Inspired by this idea, we detail a practical framework for auditing the political steerability of open-source LLMs and for helping future researchers design stronger countermeasures against LLM-enabled influence campaigns.\nPaper: <a href=\"https://arxiv.org/abs/2605.22880\" rel=\"nofollow\">https://arxiv.org/abs/2605.22880</a> Code: <a href=\"https://github.com/SIGNALS-Lab/llm-overton-external\" rel=\"nofollow\">https://github.com/SIGNALS-Lab/llm-overton-external</a>\n","updatedAt":"2026-05-26T19:56:30.485Z","author":{"_id":"6169a814cc21d3c0aa086ad4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6169a814cc21d3c0aa086ad4/UlyQWGTP-LRtPeMS77Paj.jpeg","fullname":"Daniel Ruiz","name":"ZQ-Dev","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":6,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8793511390686035},"editors":["ZQ-Dev"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6169a814cc21d3c0aa086ad4/UlyQWGTP-LRtPeMS77Paj.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2605.22880","authors":[{"_id":"6a1467c0b57a1823d57088be","name":"Daniel C. Ruiz","hidden":false},{"_id":"6a1467c0b57a1823d57088bf","name":"Anna Serbina","hidden":false},{"_id":"6a1467c0b57a1823d57088c0","name":"Ashwin Rao","hidden":false},{"_id":"6a1467c0b57a1823d57088c1","name":"Emilio Ferrara","hidden":false},{"_id":"6a1467c0b57a1823d57088c2","name":"Luca Luceri","hidden":false}],"publishedAt":"2026-05-20T19:25:26.000Z","submittedOnDailyAt":"2026-05-26T00:00:00.000Z","title":"How Far Will They Go? Red-Teaming Online Influence with Large Language Models","submittedOnDailyBy":{"_id":"6169a814cc21d3c0aa086ad4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6169a814cc21d3c0aa086ad4/UlyQWGTP-LRtPeMS77Paj.jpeg","isPro":false,"fullname":"Daniel Ruiz","user":"ZQ-Dev","type":"user","name":"ZQ-Dev"},"summary":"As large language model (LLM)-based agents increasingly participate in online discourse, red-teaming their capacity to support political influence campaigns is critical for information integrity. In pursuit of this goal, we focus on locally deployed open-source LLMs, as opposed to frontier API-only models, given their superior alignment with the operational constraints of privacy-conscious malicious actors deployed in social media environments. We introduce an empirical red-teaming framework for measuring LLM Overton Windows (OWs), defined as the range of political opinions a model can reliably express on controversial topics, and for quantifying how simple natural-language jailbreaks expand that range. We evaluate more than 30 LLMs spanning 10 model families and five countries of origin. We find systematic asymmetries in political expressivity: open-source LLMs are typically more willing to generate left-leaning social media content, OWs tend to contract inversely to model size, and regional differences are substantial despite uneven representation in the open-source ecosystem. Jailbreak potency also varies sharply across model families, motivating a workflow for identifying effective combinations of jailbreak techniques. Taken together, our results establish a practical framework for auditing the political steerability of open-source LLMs and for helping future researchers design stronger countermeasures against LLM-enabled influence campaigns.","upvotes":1,"discussionId":"6a1467c1b57a1823d57088c3","githubRepo":"https://github.com/SIGNALS-Lab/llm-overton-external","githubRepoAddedBy":"user","ai_summary":"Open-source large language models exhibit varying political expressivity and vulnerability to jailbreak techniques, necessitating systematic red-teaming frameworks for assessing their potential misuse in influence campaigns.","ai_keywords":["large language models","red-teaming","political influence campaigns","LLM Overton Windows","jailbreaks","political expressivity","model size","regional differences","audit frameworks"],"githubStars":1,"organization":{"_id":"5fc6a2ad2d79acbef39dcb19","name":"usc-isi","fullname":"USC Information Sciences Institute","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2MzG9bQdTfWFN22cAy7Xu.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6169a814cc21d3c0aa086ad4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6169a814cc21d3c0aa086ad4/UlyQWGTP-LRtPeMS77Paj.jpeg","isPro":false,"fullname":"Daniel Ruiz","user":"ZQ-Dev","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"5fc6a2ad2d79acbef39dcb19","name":"usc-isi","fullname":"USC Information Sciences Institute","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/2MzG9bQdTfWFN22cAy7Xu.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2605/2605.22880.md"}">

Papers

arxiv:2605.22880

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Published on May 20

· Submitted by

Daniel Ruiz on May 26

USC Information Sciences Institute

Upvote

Authors:

Abstract

Open-source large language models exhibit varying political expressivity and vulnerability to jailbreak techniques, necessitating systematic red-teaming frameworks for assessing their potential misuse in influence campaigns.

AI-generated summary

As large language model (LLM)-based agents increasingly participate in online discourse, red-teaming their capacity to support political influence campaigns is critical for information integrity. In pursuit of this goal, we focus on locally deployed open-source LLMs, as opposed to frontier API-only models, given their superior alignment with the operational constraints of privacy-conscious malicious actors deployed in social media environments. We introduce an empirical red-teaming framework for measuring LLM Overton Windows (OWs), defined as the range of political opinions a model can reliably express on controversial topics, and for quantifying how simple natural-language jailbreaks expand that range. We evaluate more than 30 LLMs spanning 10 model families and five countries of origin. We find systematic asymmetries in political expressivity: open-source LLMs are typically more willing to generate left-leaning social media content, OWs tend to contract inversely to model size, and regional differences are substantial despite uneven representation in the open-source ecosystem. Jailbreak potency also varies sharply across model families, motivating a workflow for identifying effective combinations of jailbreak techniques. Taken together, our results establish a practical framework for auditing the political steerability of open-source LLMs and for helping future researchers design stronger countermeasures against LLM-enabled influence campaigns.

View arXiv page View PDF GitHub 1 Add to collection

Community

ZQ-Dev

Paper submitter about 5 hours ago

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

As large language model (LLM)-based agents increasingly participate in online discourse, red-teaming their capacity to support political influence campaigns is critical for information integrity. Inspired by this idea, we detail a practical framework for auditing the political steerability of open-source LLMs and for helping future researchers design stronger countermeasures against LLM-enabled influence campaigns.

Paper: https://arxiv.org/abs/2605.22880
Code: https://github.com/SIGNALS-Lab/llm-overton-external

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.22880

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.22880 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.22880 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.22880 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

No comments yet. Sign in and be the first to say something.

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Abstract

Community

Models citing this paper 0

Datasets citing this paper 0

Spaces citing this paper 0

Collections including this paper 0

Discussion (0)

More from Hugging Face Daily Papers