News / #code Tag Code 106 articles archived under #code · RSS Sign in to follow Simon Willison community 1mo ago Microsoft Copilot Cowork Exfiltrates Files Microsoft Copilot Cowork Exfiltrates Files The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this case Microsoft Copilot Cowork (yes, that's a real product name ) was allowing agents to send emails… 21 Hugging Face Daily Papers research 1mo ago CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test Abstract CoSPlay is a GT-free framework that jointly improves code generation and unit test quality through cooperative self-play, achieving competitive performance without ground-truth unit tests. AI-generated summary Recently, Reinforcement Learning with Verifiable Rewards… 7 arXiv — NLP / Computation & Language research 1mo ago Eureka: Intelligent Feature Engineering for Enterprise AI Cloud Resource Demand Prediction arXiv:2605.25297v1 Announce Type: new Abstract: Effective features are crucial for predictive model performance, but creating them often requires domain expertise, limiting scalability across applications. We define feature engineering as an agentic code generation problem:… 35 Hacker News — AI on Front Page community 1mo ago Microsoft Copilot Cowork Exfiltrates Files Article URL: https://www.promptarmor.com/resources/microsoft-copilot-cowork-exfiltrates-files Comments URL: https://news.ycombinator.com/item?id=48272354 Points: 201 # Comments: 43 33 arXiv — NLP / Computation & Language research 1mo ago RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation arXiv:2605.22937v1 Announce Type: new Abstract: Inference-time scaling can reduce errors in structured query generation, but methods to allocate the compute for query code generation remains underexplored. We study Text2Cypher, where language models generate Cypher queries that… 26 arXiv — NLP / Computation & Language research 1mo ago CoSPlay: Cooperative Self-Play at Test-Time with Self-Generated Code and Unit Test arXiv:2605.23491v1 Announce Type: cross Abstract: Recently, Reinforcement Learning with Verifiable Rewards (RLVR) and Test-Time Scaling (TTS) have advanced LLM code generation through executable verification. Yet Ground-Truth Unit Tests (GT UTs) remain a bottleneck: SOTA RLVR… 8 Hacker News — AI on Front Page community 1mo ago Constraint Decay: The Fragility of LLM Agents in Back End Code Generation Article URL: https://arxiv.org/abs/2605.06445 Comments URL: https://news.ycombinator.com/item?id=48256912 Points: 232 # Comments: 126 13 r/LocalLLaMA community 1mo ago Top 10 Fastest Growing AI repos this week Curated this list of fastest growing AI repos. They are mostly AI coding agents, personal AI, memory, browser automation, Claude Skills and local-first dev tooling: colbymchenry/codegraph (+14.1K stars) Pre-indexed local code knowledge graph for Claude Code, Codex, Cursor,… 34 r/LocalLLaMA community 1mo ago Anyone evaluated the difference between Qwen Code for the local qwen models vs another harness? CC, OC, LC, Aider etc.. For me, opencode doing fantastically but was wondering if qwen code would be more native and have better functionality, since idk which agentic harness they used to get their benchmark results   submitted by   /u/EggDroppedSoup [link]   [comments] 13 The Information — AI news-outlet 1mo ago Cursor Sees Opening as GitHub Flounders Microsoft's GitHub unit has been on the defensive lately. Amid a series of outages and other snags, Jay Parikh , who oversees the software-project management platform, recently warned deputies that coding tools from Cursor and Anthropic could eventually make GitHub obsolete, my… 12 r/LocalLLaMA community 1mo ago Same task in github-copilot, pi, claude-code, and opencode with Qwen3.6 27B I wanted to know how much of a coding agent's performance came from the model and how much came from the harness, so I vibed a setup to allow me to test multiple agentic harnesses/model combinations on the same task. ALl the images above all come from the same model, but with a… 24 arXiv — Machine Learning research 1mo ago LEAP: A closed-loop framework for perovskite precursor additive discovery arXiv:2605.20242v1 Announce Type: new Abstract: Efficient discovery of precursor additives is essential for improving the performance of perovskite solar cells, yet the large chemical space makes conventional trial-and-error screening inefficient. We develop LEAP(LLM-driven… 14 arXiv — NLP / Computation & Language research 1mo ago DEL: Digit Entropy Loss for Numerical Learning of Large Language Models arXiv:2605.20369v1 Announce Type: new Abstract: Number prediction stands as a fundamental capability of large language models (LLMs) in mathematical problem-solving and code generation. The widely adopted maximum likelihood estimation (MLE) for LLM training is not tailored to… 30 r/LocalLLaMA community 1mo ago How can you stop your model from looping So i thought this is a small model issue but when i added a new gpu and i am able to run low mid model like Qwen 3.6 35b q4 or q5 this issue still exists now its not as much as small model but it does break when linking the model to copilot chat or Hermes the model mid task will… 33 The Information — AI news-outlet 1mo ago SpaceX to Acquire Cursor 30 Days After IPO SpaceX and Cursor expect to proceed with their planned acquisition 30 days after SpaceX begins trading publicly, according to someone familiar with the matter. SpaceX is expected to go public in mid-June in the largest IPO in U.S. history. The Elon Musk-founded rockets-and-AI… 36 VentureBeat — AI news-outlet 1mo ago Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think. For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin white rectangle, a blinking cursor, a few typed words, and a list of blue links. On Tuesday, Google will formally retire that paradigm. At its annual I/O developer… 32 The Information — AI news-outlet 1mo ago How Microsoft, Meta, xAI Get AI Training Data From Their Employees Microsoft’s GitHub Copilot may have lost much of its early lead in the AI coding race to rivals like Anthropic and Cursor, but Microsoft thinks it has an advantage over those companies: roughly 100,000 software engineers who work for Microsoft. As we reported Monday , Microsoft… 16 r/LocalLLaMA community 1mo ago Public Repository "Codegraph" claims to reduce Claude, Cursor, Codex, and OpenCode API tool calls by 94% locally, an innovation that could directly offset the most recent Claude API pricing model. Author Colbymchenry has developed a tool leveraging Claudes Explore Agents to utilize a pre-indexed knowledge graph — symbol relationships, call graphs, and code structure. Agents query the graph instantly instead of scanning files, which he declares reduces API tool calls by up… 14 arXiv — NLP / Computation & Language research 1mo ago Constrained Code Generation with Discrete Diffusion arXiv:2605.16829v1 Announce Type: new Abstract: Discrete diffusion models are a powerful, emerging paradigm for code generation. They construct programs through iterative refinement of partially corrupted token sequences and enable parallel token refinement. Importantly, this… 12 Hacker News — AI on Front Page community 1mo ago Cursor Introduces Composer 2.5 https://twitter.com/cursor_ai/status/2056415413077233983 Comments URL: https://news.ycombinator.com/item?id=48182516 Points: 215 # Comments: 164 19 GitHub Blog — AI & ML official-blog 1mo ago Take your local GitHub sessions anywhere Kick off work in VS Code or the CLI, finish it from your phone. Remote control for GitHub Copilot sessions is now generally available on github.com and GitHub Mobile. The post Take your local GitHub sessions anywhere appeared first on The GitHub Blog . 32 The Information — AI news-outlet 1mo ago Microsoft Executives Sound the Alarm Over GitHub’s Eroding AI Lead No part of Microsoft better illustrates its predicament in AI than GitHub. The AI boom has boosted usage and revenue of the code repository as well as GitHub Copilot, an AI coding assistant. But GitHub has struggled to respond to new AI coding competitors that have since… 34 r/LocalLLaMA community 1mo ago I built a coding agent that gets 87% on benchmarks with a 4B parameter model, here's how I was frustrated that every coding agent (OpenCode, Cursor, Claude Code) assumes you're running GPT-5.4 or Claude Opus. If you try them with a local model like Gemma or Qwen they fall apart. I find that often tool calls fail, context overflows, multi-step tasks collapse. So I… 12 arXiv — NLP / Computation & Language research 1mo ago Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language arXiv:2605.15607v1 Announce Type: new Abstract: Large language models (LLMs) achieve high pass rates on code generation benchmarks, yet whether they can transfer this ability to languages absent from pretraining remains poorly understood. We introduce PyLang, a minimal… 32 Hugging Face Daily Papers research 1mo ago Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution Abstract Solvita is an agentic evolution framework that enables continuous learning in code generation through reinforcement learning updates to graph-structured knowledge networks, achieving state-of-the-art performance on competitive programming benchmarks. AI-generated… 30 r/LocalLLaMA community 1mo ago Moving from Composer 2/Kimi 2.6 to Qwen3.6:35b-a3b I can't believe it, but I'm able to do my daily software development work on this model. We have a 500-700k line of code enterprise software suite that I'm devving for 60 hours a week. I've been hunting for a cursor replacement for a little bit now, and was previously toying… 27 r/LocalLLaMA community 1mo ago I built a self-hosted open-source MCP server that gives any local LLM real financial data — SEC filings, 13F, insider & congressional trades, short data, FRED One thing missing when running local models as agents: real, current data. So I built Equibles — a self-hosted MCP server that scrapes and serves public U.S. financial data and exposes it as MCP tools, so any MCP-capable client (Claude Code/Desktop, Cursor, or your own… 30 arXiv — NLP / Computation & Language research 1mo ago When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context arXiv:2605.14478v1 Announce Type: cross Abstract: Context: Retrieval-augmented code generation relies on cross-file repository context, but retrieved snippets may come from obsolete project states. Objectives: We study whether temporally stale repository snippets act as harmless… 26 r/LocalLLaMA community 1mo ago VS Code's new "Agents window" lets you use local AI models. Still requires an Internet connection and a Github Copilot plan (because we can't have nice things) At first I was excited to see this, but I guess I'll wait till someone figures out what people actually want   submitted by   /u/_wsgeorge [link]   [comments] 5 r/LocalLLaMA community 1mo ago Computer-use MCP that can control multiple machines (Integrate with claude, Cursor, Codex or your custom harness) Hey everyone, We built opendesk: it lets AI agents control your desktop using computer use MCP that can integrate with your custom workflow. Today we shipped something a bit wild: Your AI can now see, click, type, and navigate on a completely different computer, over your WiFi.… 20 Smol AI News news-outlet 1mo ago not much happened today **OpenAI** expanded **Codex** integration with the ChatGPT mobile app enabling remote task management and introduced Remote SSH, hooks, and programmatic tokens for enterprise automation. The IDE ecosystem is shifting to "agent-first" UX with **GitHub Copilot App** preview and… 26 Smol AI News news-outlet 1mo ago not much happened today **Cline, LangChain, Notion, and Cursor** advanced agent infrastructure and developer platforms with innovations like **Cline SDK**, **LangSmith Engine**, **SmithDB** (offering **12–15×** faster observability), and Notion's External Agents API integrating third-party agents such… 14 arXiv — Machine Learning research 1mo ago Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling arXiv:2605.11299v1 Announce Type: new Abstract: Code generation is typically trained in the primal space of programs: a model produces a candidate solution and receives sparse execution feedback, often a single pass/fail bit. Test-time scaling enriches the inference procedure by… 32 arXiv — NLP / Computation & Language research 1mo ago An Empirical Study of Automating Agent Evaluation arXiv:2605.11378v1 Announce Type: new Abstract: Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate… 5 ThursdAI news-outlet 2mo ago 📅 ThursdAI - Apr 30 - DeepSeek V4 (1.6T MoE), Cursor SDK Wins WolfBench, Mayo's REDMOD Saves Lives, Stripe Gives Agents a Wallet & more From Weights & Biases - one last one for April, with incredible AI news, a monthly recap and Max from Pangram as a guest + I have OpenClaw a credit card! 21 Vercel — AI dev-tools 2mo ago Custom tags available in beta on Vercel Sandbox As teams scale isolated environments for AI agents, code generation, or dev workflows, keeping track of which sandbox belongs to whom, and why, becomes critical. Custom tags allow you to organize, filter, and manage Vercel Sandboxes at scale. Each sandbox supports up to five… 29 The Algorithmic Bridge news-outlet 2mo ago Weekly Top Picks #119 SpaceX + Cursor + Mistral / Jensen v Jensen / The job AI can't take / GPT-5.5 and ChatGPT Images 2.0 / An anti-grammar app / Terence Tao on the future 20 Latent.Space news-outlet 2mo ago AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026) Note: This episode was recorded just after AIE Europe, but before the Cursor-xAI deal. 37 Latent.Space news-outlet 2mo ago [AINews] OpenAI launches GPT-Image-2 with Cursor getting a $10B contract with xAI and a right to acquire for $60B. 33 NVIDIA Developer Blog official-blog 2mo ago Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating... 4 Vercel — AI dev-tools 2mo ago GLM 5V Turbo on AI Gateway GLM 5V Turbo from Z.ai is now available on Vercel AI Gateway . GLM 5V Turbo is a multimodal coding model that turns screenshots and designs into code, debugs visually, and operates GUIs autonomously. It's strong at design-to-code generation, visual code generation, and… 26 Smol AI News news-outlet 3mo ago not much happened today **Anthropic** advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. **Figma**, **GitHub**, and **Cursor** launch design canvases with direct AI editing, showcasing tool-calling becoming… 12 Smol AI News news-outlet 3mo ago not much happened today **Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into… 36 ThursdAI news-outlet 3mo ago ThursdAI - Opus 1M, Jensen declares OpenClaw as the new Linux, GPT 5.4 Mini & Nano, Minimax 2.7, Composer 2 & more AI news From Weights & Biases, here's what happened in AI this week. Jensen goes ClawPilled with NemoClaw, new smaller GPT 5.4s, MiniMax autoresearches 3.7 and Composer 2 from Cursor beats Opus + more AI 15 Smol AI News news-outlet 3mo ago not much happened today **Cursor** launched **Composer 2**, a frontier-class coding model with major cost reductions and strong benchmark scores like **61.3 on CursorBench** and **73.7 on SWE-bench Multilingual**. The model was improved via a **first continued pretraining run** feeding into… 36 Vercel — AI dev-tools 3mo ago Introducing the Vercel plugin for coding agents Claude Code and Cursor can now further understand Vercel projects using the new Vercel plugin and a full platform knowledge graph. The plugin observes real-time activity, including file edits and terminal commands, to dynamically inject Vercel knowledge into the agent's context.… 28 Smol AI News news-outlet 4mo ago Claude Code Anniversary + Launches from: Qwen 3.5, Cursor Demos, Cognition Devin 2.2, Inception Mercury 2 **Alibaba** launched the **Qwen 3.5 Medium Model Series** featuring models like **Qwen3.5-Flash**, **Qwen3.5-35B-A3B (MoE)**, and **Qwen3.5-122B-A10B (MoE)** emphasizing efficiency over scale with innovations like **1M context** and INT4 quantization. **OpenAI** released… 14 Smol AI News news-outlet 4mo ago not much happened today **OpenAI** launched **GPT-5.3-Codex** with a Super Bowl ad emphasizing "You can just build things" as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across **Cursor, VS Code, and GitHub** with phased API access and is flagged as… 34 Zed Editor dev-tools 4mo ago Choose Your Edit Prediction Provider Zed now supports multiple edit prediction providers: Zeta, Mercury Coder, Sweep, Ollama, and GitHub Copilot Next-Edit. 8 Smol AI News news-outlet 5mo ago not much happened today **Anthropic** launches "Claude in Excel Pro" with enhanced features. **OpenAI** reveals upcoming **Codex** agent loop and cybersecurity measures. **Google** boosts **Gemini App** quotas and partners with **Sakana AI** for advanced AI Scientist projects in Japan. **Cursor**… 25 Page 2 of 3 · 106 articles ← Newer Older →