Tag

Copyright

22 articles archived under #copyright · RSS

arXiv — Machine Learning research 1d ago

CBD: API-Only LLM Black-Box Unlearning through Controlled Behavioral Divergence

arXiv:2606.27683v1 Announce Type: new Abstract: Edge devices increasingly invoke large language models (LLMs) through API services for context aware edge intelligence, while edge generated data may be collected to improve LLMs and may introduce sensitive, copyrighted, harmful,…

11
arXiv — NLP / Computation & Language research 1d ago

Position: The Term "Machine Unlearning" Is Overused in LLMs

arXiv:2606.27379v1 Announce Type: new Abstract: Large language models increasingly face demands to "forget" training data, knowledge, or behaviors due to regulatory deletion obligations, copyright/licensing disputes, and safety or product-policy requirements. This position paper…

15
Ars Technica — AI news-outlet 3d ago

NYT slams Microsoft for building copyright-infringing supercomputer for OpenAI

NYT shifts OpenAI/Microsoft copyright claims after SCOTUS ruling against Sony.

20
arXiv — NLP / Computation & Language research 12d ago

Output Vector Editing for Memorization Mitigation in Large Language Models

arXiv:2606.18767v1 Announce Type: new Abstract: Large language models memorize and reproduce sequences from their training data, creating privacy, copyright, and security risks. Existing neuron-level mitigation methods equate editing with zeroing out neuron activations, but the…

24
llama.cpp releases dev-tools 12d ago

b9684

[SYCL] Add conv_3d ( #24691 ) add conv_3d optimize update ops.md restore test script rm unused code rm copyright notes macOS/iOS: macOS Apple Silicon (arm64) macOS Apple Silicon (arm64, KleidiAI enabled) DISABLED macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu…

15
Hacker News — AI on Front Page community 19d ago

H.R. 6028 would fundamentally change the U.S. Copyright Office

Article URL: https://www.eff.org/deeplinks/2026/06/congress-just-rushed-through-disastrous-copyright-office-overhaul Comments URL: https://news.ycombinator.com/item?id=48484496 Points: 209 # Comments: 65

25
r/MachineLearning community 22d ago

ICML rejected paper visibility [D]

If ICML conference paper is rejected and no one opts-in or opts-out to keep the reviews visible, will the reviews be visible to everyone? There was clear instruction that only papers with at-least 1 opt-in AND zero opt-out options will be visible. None of the authors selected…

7
arXiv — Machine Learning research 22d ago

Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path

arXiv:2606.07271v1 Announce Type: new Abstract: Understanding what generative models retain from training data remains challenging, with implications for copyright and privacy. Beyond verbatim reproduction, models can encode subtler traces of their training data that never…

12
TechCrunch — AI news-outlet 26d ago

Publishers will be able to opt out of AI Search, thanks to new regulation

U.K. regulators are requiring Google offer a tool allowing website publishers to opt-out of generative AI search features. The option will be tested in the UK then rolled out globally.

10
arXiv — Machine Learning research 28d ago

Geometric Erasure by Contrastive Velocity Matching in Rectified Flows

arXiv:2606.00140v1 Announce Type: new Abstract: While the rapid adoption of multimodal generative models offers immense potential, it has also increased the risks of harmful content synthesis, deepfakes, and copyright infringements. To address these challenges, concept erasure…

21
arXiv — NLP / Computation & Language research 29d ago

Divergence Decoding: Inference-Time Unlearning via Auxiliary Models

arXiv:2605.31293v1 Announce Type: new Abstract: Large Language Models (LLMs) frequently memorize sensitive training data thereby creating significant privacy and copyright risks. Addressing these risks, i.e., removing such knowledge from an existing model checkpoint, has proven…

34
arXiv — Machine Learning research 1mo ago

Localizing Memorized Regions in Diffusion Models via Coordinate-Wise Curvature Differences

arXiv:2605.26756v1 Announce Type: new Abstract: Diffusion models can unintentionally memorize training samples, raising concerns about privacy and copyright. While recent methods can detect memorization, they often rely on global or model-specific signals and provide limited…

30
arXiv — NLP / Computation & Language research 1mo ago

Translators as Invisible Teachers of AI: Copyright, Translation Memory, and the Political Economy of Linguistic Data

arXiv:2605.24842v1 Announce Type: new Abstract: This paper examines how the labour of translators has been transformed into foundational data capital for the age of artificial intelligence (AI). Translation memories (TM) and parallel corpora preserve a one-to-one correspondence…

6
Hacker News — AI on Front Page community 1mo ago

Show HN: Auto-identity-remove – Automated data broker opt-out runner for macOS

Article URL: https://github.com/stephenlthorn/auto-identity-remove Comments URL: https://news.ycombinator.com/item?id=48178184 Points: 282 # Comments: 112

15
arXiv — NLP / Computation & Language research 1mo ago

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

arXiv:2506.01732v3 Announce Type: replace Abstract: Large Language Models (LLMs) are pre-trained on large amounts of data from different sources and domains. Such datasets often contain trillions of tokens, including large portions of copyrighted or proprietary content, which…

11
Ars Technica — AI news-outlet 1mo ago

Authors fight for higher payouts from Anthropic’s $1.5B copyright settlement

Lawyers accused of rushing historic settlement to seize $320 million in fees.

8
arXiv — NLP / Computation & Language research 1mo ago

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

arXiv:2605.14291v1 Announce Type: cross Abstract: The rapid advancement of Large Vision-Language Models (LVLMs) is increasingly accompanied by unauthorized scraping and training on multimodal web data, posing severe copyright and privacy risks to data owners. Existing…

8
arXiv — Machine Learning research 1mo ago

Inference-Time Machine Unlearning via Gated Activation Redirection

arXiv:2605.12765v1 Announce Type: new Abstract: Large Language Models memorize vast amounts of training data, raising concerns regarding privacy, copyright infringement, and safety. Machine unlearning seeks to remove the influence of a targeted forget set while preserving model…

10
arXiv — NLP / Computation & Language research 1mo ago

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

arXiv:2605.11685v1 Announce Type: new Abstract: Large language model (LLM) unlearning aims to remove specific data influences from pre-trained model without costly retraining, addressing privacy, copyright, and safety concerns. However, recent studies reveal a critical…

17
Vercel — AI dev-tools 2mo ago

Team-wide Zero Data Retention and prompt training controls now on AI Gateway

AI Gateway now supports Zero Data Retention (ZDR) at the team level, removing the need to configure opt-outs or reach agreements with each provider individually. It routes requests only to providers where ZDR agreements are in place, with support for Anthropic, OpenAI, Google,…

35
Smol AI News news-outlet 5mo ago

not much happened today

**Stanford paper** reveals **Claude 3.7 Sonnet** memorized **95.8% of Harry Potter 1**, highlighting copyright extraction risks compared to **GPT-4.1**. **Google AI Studio** sponsors **TailwindCSS** amid OSS funding debates. **Google** and **Sundar Pichai** launch **Gmail Gemini…

21
Eugene Yan research 27mo ago

Task-Specific LLM Evals that Do & Don't Work

Evals for classification, summarization, translation, copyright regurgitation, and toxicity.

9

CBD: API-Only LLM Black-Box Unlearning through Controlled Behavioral Divergence

Position: The Term "Machine Unlearning" Is Overused in LLMs

NYT slams Microsoft for building copyright-infringing supercomputer for OpenAI

Output Vector Editing for Memorization Mitigation in Large Language Models

b9684

H.R. 6028 would fundamentally change the U.S. Copyright Office

ICML rejected paper visibility [D]

Where Rectified Flows Leak: Characterising Membership Signals Along the Interpolation Path

Publishers will be able to opt out of AI Search, thanks to new regulation

Geometric Erasure by Contrastive Velocity Matching in Rectified Flows

Divergence Decoding: Inference-Time Unlearning via Auxiliary Models

Localizing Memorized Regions in Diffusion Models via Coordinate-Wise Curvature Differences

Translators as Invisible Teachers of AI: Copyright, Translation Memory, and the Political Economy of Linguistic Data

Show HN: Auto-identity-remove – Automated data broker opt-out runner for macOS

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

Authors fight for higher payouts from Anthropic’s $1.5B copyright settlement

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

Inference-Time Machine Unlearning via Gated Activation Redirection

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

Team-wide Zero Data Retention and prompt training controls now on AI Gateway

not much happened today

Task-Specific LLM Evals that Do & Don't Work