News / #code Tag Code 24 articles archived under #code · RSS Sign in to follow arXiv — Machine Learning research 16h ago Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling arXiv:2605.11299v1 Announce Type: new Abstract: Code generation is typically trained in the primal space of programs: a model produces a candidate solution and receives sparse execution feedback, often a single pass/fail bit. Test-time scaling enriches the inference procedure by… 32 arXiv — NLP / Computation & Language research 16h ago An Empirical Study of Automating Agent Evaluation arXiv:2605.11378v1 Announce Type: new Abstract: Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate… 5 ThursdAI news-outlet 12d ago 📅 ThursdAI - Apr 30 - DeepSeek V4 (1.6T MoE), Cursor SDK Wins WolfBench, Mayo's REDMOD Saves Lives, Stripe Gives Agents a Wallet & more From Weights & Biases - one last one for April, with incredible AI news, a monthly recap and Max from Pangram as a guest + I have OpenClaw a credit card! 21 Vercel — AI dev-tools 13d ago Custom tags available in beta on Vercel Sandbox As teams scale isolated environments for AI agents, code generation, or dev workflows, keeping track of which sandbox belongs to whom, and why, becomes critical. Custom tags allow you to organize, filter, and manage Vercel Sandboxes at scale. Each sandbox supports up to five… 29 The Algorithmic Bridge news-outlet 19d ago Weekly Top Picks #119 SpaceX + Cursor + Mistral / Jensen v Jensen / The job AI can't take / GPT-5.5 and ChatGPT Images 2.0 / An anti-grammar app / Terence Tao on the future 20 Latent.Space news-outlet 20d ago AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026) Note: This episode was recorded just after AIE Europe, but before the Cursor-xAI deal. 37 Latent.Space news-outlet 21d ago [AINews] OpenAI launches GPT-Image-2 with Cursor getting a $10B contract with xAI and a right to acquire for $60B. 33 NVIDIA Developer Blog official-blog 23d ago Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating... 4 Vercel — AI dev-tools 1mo ago GLM 5V Turbo on AI Gateway GLM 5V Turbo from Z.ai is now available on Vercel AI Gateway . GLM 5V Turbo is a multimodal coding model that turns screenshots and designs into code, debugs visually, and operates GUIs autonomously. It's strong at design-to-code generation, visual code generation, and… 26 Smol AI News news-outlet 1mo ago not much happened today **Anthropic** advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. **Figma**, **GitHub**, and **Cursor** launch design canvases with direct AI editing, showcasing tool-calling becoming… 12 Smol AI News news-outlet 1mo ago not much happened today **Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into… 36 ThursdAI news-outlet 1mo ago ThursdAI - Opus 1M, Jensen declares OpenClaw as the new Linux, GPT 5.4 Mini & Nano, Minimax 2.7, Composer 2 & more AI news From Weights & Biases, here's what happened in AI this week. Jensen goes ClawPilled with NemoClaw, new smaller GPT 5.4s, MiniMax autoresearches 3.7 and Composer 2 from Cursor beats Opus + more AI 15 Smol AI News news-outlet 1mo ago not much happened today **Cursor** launched **Composer 2**, a frontier-class coding model with major cost reductions and strong benchmark scores like **61.3 on CursorBench** and **73.7 on SWE-bench Multilingual**. The model was improved via a **first continued pretraining run** feeding into… 36 Vercel — AI dev-tools 1mo ago Introducing the Vercel plugin for coding agents Claude Code and Cursor can now further understand Vercel projects using the new Vercel plugin and a full platform knowledge graph. The plugin observes real-time activity, including file edits and terminal commands, to dynamically inject Vercel knowledge into the agent's context.… 28 Smol AI News news-outlet 2mo ago Claude Code Anniversary + Launches from: Qwen 3.5, Cursor Demos, Cognition Devin 2.2, Inception Mercury 2 **Alibaba** launched the **Qwen 3.5 Medium Model Series** featuring models like **Qwen3.5-Flash**, **Qwen3.5-35B-A3B (MoE)**, and **Qwen3.5-122B-A10B (MoE)** emphasizing efficiency over scale with innovations like **1M context** and INT4 quantization. **OpenAI** released… 14 Smol AI News news-outlet 3mo ago not much happened today **OpenAI** launched **GPT-5.3-Codex** with a Super Bowl ad emphasizing "You can just build things" as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across **Cursor, VS Code, and GitHub** with phased API access and is flagged as… 34 Zed Editor dev-tools 3mo ago Choose Your Edit Prediction Provider Zed now supports multiple edit prediction providers: Zeta, Mercury Coder, Sweep, Ollama, and GitHub Copilot Next-Edit. 8 Smol AI News news-outlet 3mo ago not much happened today **Anthropic** launches "Claude in Excel Pro" with enhanced features. **OpenAI** reveals upcoming **Codex** agent loop and cybersecurity measures. **Google** boosts **Gemini App** quotas and partners with **Sakana AI** for advanced AI Scientist projects in Japan. **Cursor**… 25 Smol AI News news-outlet 3mo ago not much happened today. **OpenAI** launched **GPT-5.2-Codex** API, touted as their strongest coding model for long-running tasks and cybersecurity. **Cursor** integrated GPT-5.2-Codex to autonomously run a browser for a week, producing over 3 million lines of Rust code. **GitHub** incorporated it into… 19 Smol AI News news-outlet 4mo ago not much happened today **AI News for 1/6/2026-1/7/2026** highlights a quiet day with key updates on **LangChain DeepAgents** introducing **Ralph Mode** for persistent agent loops, **Cursor** improving context management by reducing token usage by **46.9%**, and operational safety measures for coding… 26 Aider releases dev-tools 9mo ago Aider v0.86.0 Added support for all GPT-5 models. Added support for Grok-4 via xai/grok-4 and openrouter/x-ai/grok-4 model names. Added support for gemini/gemini-2.5-flash-lite-preview-06-17 model, by Tamir Zahavi-Brunner. /clear now prints “All chat history cleared.” so you know it worked,… 13 Eugene Yan research 20mo ago Building the Same App Using Various Web Frameworks FastAPI, FastHTML, Next.js, SvelteKit, and thoughts on how coding assistants influence builders' choices. 35 Eugene Yan research 35mo ago Obsidian-Copilot: An Assistant for Writing & Reflecting Writing drafts via retrieval-augmented generation. Also reflecting on the week's journal entries. 15 Eugene Yan research 40mo ago Mechanisms for Effective Machine Learning Projects Pilot & copilot, literature review, methodology review, and timeboxing. 4