Tag

Code

106 articles archived under #code · RSS

Simon Willison community 1mo ago

Microsoft Copilot Cowork Exfiltrates Files

Microsoft Copilot Cowork Exfiltrates Files The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this case Microsoft Copilot Cowork (yes, that's a real product name ) was allowing agents to send emails…

21
Hugging Face Daily Papers research 1mo ago

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

Abstract CoSPlay is a GT-free framework that jointly improves code generation and unit test quality through cooperative self-play, achieving competitive performance without ground-truth unit tests. AI-generated summary Recently, Reinforcement Learning with Verifiable Rewards…

7
arXiv — NLP / Computation & Language research 1mo ago

Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction

arXiv:2605.25297v1 Announce Type: new Abstract: Effective features are crucial for predictive model performance, but creating them often requires domain expertise, limiting scalability across applications. We define feature engineering as an agentic code generation problem:…

35
Hacker News — AI on Front Page community 1mo ago

Microsoft Copilot Cowork Exfiltrates Files

Article URL: https://www.promptarmor.com/resources/microsoft-copilot-cowork-exfiltrates-files Comments URL: https://news.ycombinator.com/item?id=48272354 Points: 201 # Comments: 43

33
arXiv — NLP / Computation & Language research 1mo ago

RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation

arXiv:2605.22937v1 Announce Type: new Abstract: Inference-time scaling can reduce errors in structured query generation, but methods to allocate the compute for query code generation remains underexplored. We study Text2Cypher, where language models generate Cypher queries that…

26
arXiv — NLP / Computation & Language research 1mo ago

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

arXiv:2605.23491v1 Announce Type: cross Abstract: Recently, Reinforcement Learning with Verifiable Rewards (RLVR) and Test-Time Scaling (TTS) have advanced LLM code generation through executable verification. Yet Ground-Truth Unit Tests (GT UTs) remain a bottleneck: SOTA RLVR…

8
Hacker News — AI on Front Page community 1mo ago

Constraint Decay: The Fragility of LLM Agents in Back End Code Generation

Article URL: https://arxiv.org/abs/2605.06445 Comments URL: https://news.ycombinator.com/item?id=48256912 Points: 232 # Comments: 126

13
r/LocalLLaMA community 1mo ago

Top 10 Fastest Growing AI repos this week

Curated this list of fastest growing AI repos. They are mostly AI coding agents, personal AI, memory, browser automation, Claude Skills and local-first dev tooling: colbymchenry/codegraph (+14.1K stars) Pre-indexed local code knowledge graph for Claude Code, Codex, Cursor,…

34
r/LocalLLaMA community 1mo ago

Anyone evaluated the difference between Qwen Code for the local qwen models vs another harness? CC, OC, LC, Aider etc..

For me, opencode doing fantastically but was wondering if qwen code would be more native and have better functionality, since idk which agentic harness they used to get their benchmark results   submitted by   /u/EggDroppedSoup [link]   [comments]

13
The Information — AI news-outlet 1mo ago

Cursor Sees Opening as GitHub Flounders

Microsoft's GitHub unit has been on the defensive lately. Amid a series of outages and other snags, Jay Parikh , who oversees the software-project management platform, recently warned deputies that coding tools from Cursor and Anthropic could eventually make GitHub obsolete, my…

12
r/LocalLLaMA community 1mo ago

Same task in github-copilot, pi, claude-code, and opencode with Qwen3.6 27B

I wanted to know how much of a coding agent's performance came from the model and how much came from the harness, so I vibed a setup to allow me to test multiple agentic harnesses/model combinations on the same task. ALl the images above all come from the same model, but with a…

24
arXiv — Machine Learning research 1mo ago

LEAP: A closed-loop framework for perovskite precursor additive discovery

arXiv:2605.20242v1 Announce Type: new Abstract: Efficient discovery of precursor additives is essential for improving the performance of perovskite solar cells, yet the large chemical space makes conventional trial-and-error screening inefficient. We develop LEAP(LLM-driven…

14
arXiv — NLP / Computation & Language research 1mo ago

DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

arXiv:2605.20369v1 Announce Type: new Abstract: Number prediction stands as a fundamental capability of large language models (LLMs) in mathematical problem-solving and code generation. The widely adopted maximum likelihood estimation (MLE) for LLM training is not tailored to…

30
r/LocalLLaMA community 1mo ago

How can you stop your model from looping

So i thought this is a small model issue but when i added a new gpu and i am able to run low mid model like Qwen 3.6 35b q4 or q5 this issue still exists now its not as much as small model but it does break when linking the model to copilot chat or Hermes the model mid task will…

33
The Information — AI news-outlet 1mo ago

SpaceX to Acquire Cursor 30 Days After IPO

SpaceX and Cursor expect to proceed with their planned acquisition 30 days after SpaceX begins trading publicly, according to someone familiar with the matter. SpaceX is expected to go public in mid-June in the largest IPO in U.S. history. The Elon Musk-founded rockets-and-AI…

36
VentureBeat — AI news-outlet 1mo ago

Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.

For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list of blue links. On Tuesday, Google will formally retire that paradigm. At its annual I/O developer…

32
The Information — AI news-outlet 1mo ago

How Microsoft, Meta, xAI Get AI Training Data From Their Employees

Microsoft’s GitHub Copilot may have lost much of its early lead in the AI coding race to rivals like Anthropic and Cursor, but Microsoft thinks it has an advantage over those companies: roughly 100,000 software engineers who work for Microsoft. As we reported Monday , Microsoft…

16
r/LocalLLaMA community 1mo ago

Public Repository "Codegraph" claims to reduce Claude, Cursor, Codex, and OpenCode API tool calls by 94% locally, an innovation that could directly offset the most recent Claude API pricing model.

Author Colbymchenry has developed a tool leveraging Claudes Explore Agents to utilize a pre-indexed knowledge graph — symbol relationships, call graphs, and code structure. Agents query the graph instantly instead of scanning files, which he declares reduces API tool calls by up…

14
arXiv — NLP / Computation & Language research 1mo ago

Constrained Code Generation with Discrete Diffusion

arXiv:2605.16829v1 Announce Type: new Abstract: Discrete diffusion models are a powerful, emerging paradigm for code generation. They construct programs through iterative refinement of partially corrupted token sequences and enable parallel token refinement. Importantly, this…

12
Hacker News — AI on Front Page community 1mo ago

Cursor Introduces Composer 2.5

https://twitter.com/cursor_ai/status/2056415413077233983 Comments URL: https://news.ycombinator.com/item?id=48182516 Points: 215 # Comments: 164

19
GitHub Blog — AI & ML official-blog 1mo ago

Take your local GitHub sessions anywhere

Kick off work in VS Code or the CLI, finish it from your phone. Remote control for GitHub Copilot sessions is now generally available on github.com and GitHub Mobile. The post Take your local GitHub sessions anywhere appeared first on The GitHub Blog .

32
The Information — AI news-outlet 1mo ago

Microsoft Executives Sound the Alarm Over GitHub’s Eroding AI Lead

No part of Microsoft better illustrates its predicament in AI than GitHub. The AI boom has boosted usage and revenue of the code repository as well as GitHub Copilot, an AI coding assistant. But GitHub has struggled to respond to new AI coding competitors that have since…

34
r/LocalLLaMA community 1mo ago

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse. So I…

12
arXiv — NLP / Computation & Language research 1mo ago

Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language

arXiv:2605.15607v1 Announce Type: new Abstract: Large language models (LLMs) achieve high pass rates on code generation benchmarks, yet whether they can transfer this ability to languages absent from pretraining remains poorly understood. We introduce PyLang, a minimal…

32
Hugging Face Daily Papers research 1mo ago

Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution

Abstract Solvita is an agentic evolution framework that enables continuous learning in code generation through reinforcement learning updates to graph-structured knowledge networks, achieving state-of-the-art performance on competitive programming benchmarks. AI-generated…

30
r/LocalLLaMA community 1mo ago

Moving from Composer 2/Kimi 2.6 to Qwen3.6:35b-a3b

I can't believe it, but I'm able to do my daily software development work on this model. We have a 500-700k line of code enterprise software suite that I'm devving for 60 hours a week. I've been hunting for a cursor replacement for a little bit now, and was previously toying…

27
r/LocalLLaMA community 1mo ago

I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED

One thing missing when running local models as agents: real, current data. So I built Equibles — a self-hosted MCP server that scrapes and serves public U.S. financial data and exposes it as MCP tools, so any MCP-capable client (Claude Code/Desktop, Cursor, or your own…

30
arXiv — NLP / Computation & Language research 1mo ago

When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context

arXiv:2605.14478v1 Announce Type: cross Abstract: Context: Retrieval-augmented code generation relies on cross-file repository context, but retrieved snippets may come from obsolete project states. Objectives: We study whether temporally stale repository snippets act as harmless…

26
r/LocalLLaMA community 1mo ago

VS Code's new "Agents window" lets you use local AI models. Still requires an Internet connection and a Github Copilot plan (because we can't have nice things)

At first I was excited to see this, but I guess I'll wait till someone figures out what people actually want   submitted by   /u/_wsgeorge [link]   [comments]

5
r/LocalLLaMA community 1mo ago

Computer-use MCP that can control multiple machines (Integrate with claude, Cursor, Codex or your custom harness)

Hey everyone, We built opendesk: it lets AI agents control your desktop using computer use MCP that can integrate with your custom workflow. Today we shipped something a bit wild: Your AI can now see, click, type, and navigate on a completely different computer, over your WiFi.…

20
Smol AI News news-outlet 1mo ago

not much happened today

**OpenAI** expanded **Codex** integration with the ChatGPT mobile app enabling remote task management and introduced Remote SSH, hooks, and programmatic tokens for enterprise automation. The IDE ecosystem is shifting to "agent-first" UX with **GitHub Copilot App** preview and…

26
Smol AI News news-outlet 1mo ago

not much happened today

**Cline, LangChain, Notion, and Cursor** advanced agent infrastructure and developer platforms with innovations like **Cline SDK**, **LangSmith Engine**, **SmithDB** (offering **12–15×** faster observability), and Notion's External Agents API integrating third-party agents such…

14
arXiv — Machine Learning research 1mo ago

Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

arXiv:2605.11299v1 Announce Type: new Abstract: Code generation is typically trained in the primal space of programs: a model produces a candidate solution and receives sparse execution feedback, often a single pass/fail bit. Test-time scaling enriches the inference procedure by…

32
arXiv — NLP / Computation & Language research 1mo ago

An Empirical Study of Automating Agent Evaluation

arXiv:2605.11378v1 Announce Type: new Abstract: Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate…

5
ThursdAI news-outlet 2mo ago

📅 ThursdAI - Apr 30 - DeepSeek V4 (1.6T MoE), Cursor SDK Wins WolfBench, Mayo's REDMOD Saves Lives, Stripe Gives Agents a Wallet & more

From Weights & Biases - one last one for April, with incredible AI news, a monthly recap and Max from Pangram as a guest + I have OpenClaw a credit card!

21
Vercel — AI dev-tools 2mo ago

Custom tags available in beta on Vercel Sandbox

As teams scale isolated environments for AI agents, code generation, or dev workflows, keeping track of which sandbox belongs to whom, and why, becomes critical. Custom tags allow you to organize, filter, and manage Vercel Sandboxes at scale. Each sandbox supports up to five…

29
The Algorithmic Bridge news-outlet 2mo ago

Weekly Top Picks #119

SpaceX + Cursor + Mistral / Jensen v Jensen / The job AI can't take / GPT-5.5 and ChatGPT Images 2.0 / An anti-grammar app / Terence Tao on the future

20
Latent.Space news-outlet 2mo ago

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Note: This episode was recorded just after AIE Europe, but before the Cursor-xAI deal.

37
Latent.Space news-outlet 2mo ago

[AINews] OpenAI launches GPT-Image-2

with Cursor getting a $10B contract with xAI and a right to acquire for $60B.

33
NVIDIA Developer Blog official-blog 2mo ago

Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments

AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating...

4
Vercel — AI dev-tools 2mo ago

GLM 5V Turbo on AI Gateway

GLM 5V Turbo from Z.ai is now available on Vercel AI Gateway . GLM 5V Turbo is a multimodal coding model that turns screenshots and designs into code, debugs visually, and operates GUIs autonomously. It's strong at design-to-code generation, visual code generation, and…

26
Smol AI News news-outlet 3mo ago

not much happened today

**Anthropic** advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. **Figma**, **GitHub**, and **Cursor** launch design canvases with direct AI editing, showcasing tool-calling becoming…

12
Smol AI News news-outlet 3mo ago

not much happened today

**Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into…

36
ThursdAI news-outlet 3mo ago

ThursdAI - Opus 1M, Jensen declares OpenClaw as the new Linux, GPT 5.4 Mini & Nano, Minimax 2.7, Composer 2 & more AI news

From Weights & Biases, here's what happened in AI this week. Jensen goes ClawPilled with NemoClaw, new smaller GPT 5.4s, MiniMax autoresearches 3.7 and Composer 2 from Cursor beats Opus + more AI

15
Smol AI News news-outlet 3mo ago

not much happened today

**Cursor** launched **Composer 2**, a frontier-class coding model with major cost reductions and strong benchmark scores like **61.3 on CursorBench** and **73.7 on SWE-bench Multilingual**. The model was improved via a **first continued pretraining run** feeding into…

36
Vercel — AI dev-tools 3mo ago

Introducing the Vercel plugin for coding agents

Claude Code and Cursor can now further understand Vercel projects using the new Vercel plugin and a full platform knowledge graph. The plugin observes real-time activity, including file edits and terminal commands, to dynamically inject Vercel knowledge into the agent's context.…

28
Smol AI News news-outlet 4mo ago

Claude Code Anniversary + Launches from: Qwen 3.5, Cursor Demos, Cognition Devin 2.2, Inception Mercury 2

**Alibaba** launched the **Qwen 3.5 Medium Model Series** featuring models like **Qwen3.5-Flash**, **Qwen3.5-35B-A3B (MoE)**, and **Qwen3.5-122B-A10B (MoE)** emphasizing efficiency over scale with innovations like **1M context** and INT4 quantization. **OpenAI** released…

14
Smol AI News news-outlet 4mo ago

not much happened today

**OpenAI** launched **GPT-5.3-Codex** with a Super Bowl ad emphasizing "You can just build things" as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across **Cursor, VS Code, and GitHub** with phased API access and is flagged as…

34
Zed Editor dev-tools 4mo ago

Choose Your Edit Prediction Provider

Zed now supports multiple edit prediction providers: Zeta, Mercury Coder, Sweep, Ollama, and GitHub Copilot Next-Edit.

8
Smol AI News news-outlet 5mo ago

not much happened today

**Anthropic** launches "Claude in Excel Pro" with enhanced features. **OpenAI** reveals upcoming **Codex** agent loop and cybersecurity measures. **Google** boosts **Gemini App** quotas and partners with **Sakana AI** for advanced AI Scientist projects in Japan. **Cursor**…

25

Microsoft Copilot Cowork Exfiltrates Files

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction

Microsoft Copilot Cowork Exfiltrates Files

RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation

CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test

Constraint Decay: The Fragility of LLM Agents in Back End Code Generation

Top 10 Fastest Growing AI repos this week

Anyone evaluated the difference between Qwen Code for the local qwen models vs another harness? CC, OC, LC, Aider etc..

Cursor Sees Opening as GitHub Flounders

Same task in github-copilot, pi, claude-code, and opencode with Qwen3.6 27B

LEAP: A closed-loop framework for perovskite precursor additive discovery

DEL: Digit Entropy Loss for Numerical Learning of Large Language Models

How can you stop your model from looping

SpaceX to Acquire Cursor 30 Days After IPO

Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.

How Microsoft, Meta, xAI Get AI Training Data From Their Employees

Public Repository "Codegraph" claims to reduce Claude, Cursor, Codex, and OpenCode API tool calls by 94% locally, an innovation that could directly offset the most recent Claude API pricing model.

Constrained Code Generation with Discrete Diffusion

Cursor Introduces Composer 2.5

Take your local GitHub sessions anywhere

Microsoft Executives Sound the Alarm Over GitHub’s Eroding AI Lead

I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how

Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language

Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution

Moving from Composer 2/Kimi 2.6 to Qwen3.6:35b-a3b

I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED

When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context

VS Code's new "Agents window" lets you use local AI models. Still requires an Internet connection and a Github Copilot plan (because we can't have nice things)

Computer-use MCP that can control multiple machines (Integrate with claude, Cursor, Codex or your custom harness)

not much happened today

not much happened today

Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

An Empirical Study of Automating Agent Evaluation

📅 ThursdAI - Apr 30 - DeepSeek V4 (1.6T MoE), Cursor SDK Wins WolfBench, Mayo's REDMOD Saves Lives, Stripe Gives Agents a Wallet & more

Custom tags available in beta on Vercel Sandbox

Weekly Top Picks #119

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

[AINews] OpenAI launches GPT-Image-2

Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments

GLM 5V Turbo on AI Gateway

not much happened today

not much happened today

ThursdAI - Opus 1M, Jensen declares OpenClaw as the new Linux, GPT 5.4 Mini & Nano, Minimax 2.7, Composer 2 & more AI news

not much happened today

Introducing the Vercel plugin for coding agents

Claude Code Anniversary + Launches from: Qwen 3.5, Cursor Demos, Cognition Devin 2.2, Inception Mercury 2

not much happened today

Choose Your Edit Prediction Provider

not much happened today