Tag

Code

24 articles archived under #code · RSS

arXiv — Machine Learning research 16h ago

Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

arXiv:2605.11299v1 Announce Type: new Abstract: Code generation is typically trained in the primal space of programs: a model produces a candidate solution and receives sparse execution feedback, often a single pass/fail bit. Test-time scaling enriches the inference procedure by…

32
arXiv — NLP / Computation & Language research 16h ago

An Empirical Study of Automating Agent Evaluation

arXiv:2605.11378v1 Announce Type: new Abstract: Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate…

5
ThursdAI news-outlet 12d ago

📅 ThursdAI - Apr 30 - DeepSeek V4 (1.6T MoE), Cursor SDK Wins WolfBench, Mayo's REDMOD Saves Lives, Stripe Gives Agents a Wallet & more

From Weights & Biases - one last one for April, with incredible AI news, a monthly recap and Max from Pangram as a guest + I have OpenClaw a credit card!

21
Vercel — AI dev-tools 13d ago

Custom tags available in beta on Vercel Sandbox

As teams scale isolated environments for AI agents, code generation, or dev workflows, keeping track of which sandbox belongs to whom, and why, becomes critical. Custom tags allow you to organize, filter, and manage Vercel Sandboxes at scale. Each sandbox supports up to five…

29
The Algorithmic Bridge news-outlet 19d ago

Weekly Top Picks #119

SpaceX + Cursor + Mistral / Jensen v Jensen / The job AI can't take / GPT-5.5 and ChatGPT Images 2.0 / An anti-grammar app / Terence Tao on the future

20
Latent.Space news-outlet 20d ago

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

Note: This episode was recorded just after AIE Europe, but before the Cursor-xAI deal.

37
Latent.Space news-outlet 21d ago

[AINews] OpenAI launches GPT-Image-2

with Cursor getting a $10B contract with xAI and a right to acquire for $60B.

33
NVIDIA Developer Blog official-blog 23d ago

Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments

AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating...

4
Vercel — AI dev-tools 1mo ago

GLM 5V Turbo on AI Gateway

GLM 5V Turbo from Z.ai is now available on Vercel AI Gateway . GLM 5V Turbo is a multimodal coding model that turns screenshots and designs into code, debugs visually, and operates GUIs autonomously. It's strong at design-to-code generation, visual code generation, and…

26
Smol AI News news-outlet 1mo ago

not much happened today

**Anthropic** advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. **Figma**, **GitHub**, and **Cursor** launch design canvases with direct AI editing, showcasing tool-calling becoming…

12
Smol AI News news-outlet 1mo ago

not much happened today

**Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into…

36
ThursdAI news-outlet 1mo ago

ThursdAI - Opus 1M, Jensen declares OpenClaw as the new Linux, GPT 5.4 Mini & Nano, Minimax 2.7, Composer 2 & more AI news

From Weights & Biases, here's what happened in AI this week. Jensen goes ClawPilled with NemoClaw, new smaller GPT 5.4s, MiniMax autoresearches 3.7 and Composer 2 from Cursor beats Opus + more AI

15
Smol AI News news-outlet 1mo ago

not much happened today

**Cursor** launched **Composer 2**, a frontier-class coding model with major cost reductions and strong benchmark scores like **61.3 on CursorBench** and **73.7 on SWE-bench Multilingual**. The model was improved via a **first continued pretraining run** feeding into…

36
Vercel — AI dev-tools 1mo ago

Introducing the Vercel plugin for coding agents

Claude Code and Cursor can now further understand Vercel projects using the new Vercel plugin and a full platform knowledge graph. The plugin observes real-time activity, including file edits and terminal commands, to dynamically inject Vercel knowledge into the agent's context.…

28
Smol AI News news-outlet 2mo ago

Claude Code Anniversary + Launches from: Qwen 3.5, Cursor Demos, Cognition Devin 2.2, Inception Mercury 2

**Alibaba** launched the **Qwen 3.5 Medium Model Series** featuring models like **Qwen3.5-Flash**, **Qwen3.5-35B-A3B (MoE)**, and **Qwen3.5-122B-A10B (MoE)** emphasizing efficiency over scale with innovations like **1M context** and INT4 quantization. **OpenAI** released…

14
Smol AI News news-outlet 3mo ago

not much happened today

**OpenAI** launched **GPT-5.3-Codex** with a Super Bowl ad emphasizing "You can just build things" as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across **Cursor, VS Code, and GitHub** with phased API access and is flagged as…

34
Zed Editor dev-tools 3mo ago

Choose Your Edit Prediction Provider

Zed now supports multiple edit prediction providers: Zeta, Mercury Coder, Sweep, Ollama, and GitHub Copilot Next-Edit.

8
Smol AI News news-outlet 3mo ago

not much happened today

**Anthropic** launches "Claude in Excel Pro" with enhanced features. **OpenAI** reveals upcoming **Codex** agent loop and cybersecurity measures. **Google** boosts **Gemini App** quotas and partners with **Sakana AI** for advanced AI Scientist projects in Japan. **Cursor**…

25
Smol AI News news-outlet 3mo ago

not much happened today.

**OpenAI** launched **GPT-5.2-Codex** API, touted as their strongest coding model for long-running tasks and cybersecurity. **Cursor** integrated GPT-5.2-Codex to autonomously run a browser for a week, producing over 3 million lines of Rust code. **GitHub** incorporated it into…

19
Smol AI News news-outlet 4mo ago

not much happened today

**AI News for 1/6/2026-1/7/2026** highlights a quiet day with key updates on **LangChain DeepAgents** introducing **Ralph Mode** for persistent agent loops, **Cursor** improving context management by reducing token usage by **46.9%**, and operational safety measures for coding…

26
Aider releases dev-tools 9mo ago

Aider v0.86.0

Added support for all GPT-5 models. Added support for Grok-4 via xai/grok-4 and openrouter/x-ai/grok-4 model names. Added support for gemini/gemini-2.5-flash-lite-preview-06-17 model, by Tamir Zahavi-Brunner. /clear now prints “All chat history cleared.” so you know it worked,…

13
Eugene Yan research 20mo ago

Building the Same App Using Various Web Frameworks

FastAPI, FastHTML, Next.js, SvelteKit, and thoughts on how coding assistants influence builders' choices.

35
Eugene Yan research 35mo ago

Obsidian-Copilot: An Assistant for Writing & Reflecting

Writing drafts via retrieval-augmented generation. Also reflecting on the week's journal entries.

15
Eugene Yan research 40mo ago

Mechanisms for Effective Machine Learning Projects

Pilot & copilot, literature review, methodology review, and timeboxing.

4

Primal Generation, Dual Judgment: Self-Training from Test-Time Scaling

An Empirical Study of Automating Agent Evaluation

📅 ThursdAI - Apr 30 - DeepSeek V4 (1.6T MoE), Cursor SDK Wins WolfBench, Mayo's REDMOD Saves Lives, Stripe Gives Agents a Wallet & more

Custom tags available in beta on Vercel Sandbox

Weekly Top Picks #119

AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026)

[AINews] OpenAI launches GPT-Image-2

Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments

GLM 5V Turbo on AI Gateway

not much happened today

not much happened today

ThursdAI - Opus 1M, Jensen declares OpenClaw as the new Linux, GPT 5.4 Mini & Nano, Minimax 2.7, Composer 2 & more AI news

not much happened today

Introducing the Vercel plugin for coding agents

Claude Code Anniversary + Launches from: Qwen 3.5, Cursor Demos, Cognition Devin 2.2, Inception Mercury 2

not much happened today

Choose Your Edit Prediction Provider

not much happened today

not much happened today.

not much happened today

Aider v0.86.0

Building the Same App Using Various Web Frameworks

Obsidian-Copilot: An Assistant for Writing & Reflecting

Mechanisms for Effective Machine Learning Projects