News / #agents Tag Agents + tool use 238 articles archived under #agents · RSS Sign in to follow OpenAI official-blog -1723m ago Building a safe, effective sandbox to enable Codex on Windows Learn how OpenAI built a secure sandbox for Codex on Windows, enabling safe, efficient coding agents with controlled file access and network restrictions. 15 Simon Willison community 3h ago Quoting Boris Mann “11 AI agents” is meaningless as a phrase. If I said “I have 11 spreadsheets” or “I have 11 browser tabs” to do my work, it means about the same thing. — Boris Mann Tags: ai-agents , ai , agent-definitions 27 NVIDIA Developer Blog official-blog 4h ago Transform Video Into Instantly Searchable, Actionable Intelligence with AI Agents and Skills In today’s data-driven world, organizations increasingly rely on video to capture critical information, yet extracting meaningful, real-time insights from... 20 OpenAI official-blog 5h ago Claude 4 announced — context window doubles, agentic tools land Anthropic published the Claude 4 release notes today, doubling the context window to 400K tokens and shipping native tool-use across the API + Claude.ai web client. 18 2 Hugging Face official-blog 6h ago Cursor 0.50 ships — agent mode + multi-file edits The Cursor IDE team rolled out 0.50 with a redesigned agent panel, multi-file refactor flow, and native support for Anthropic + OpenAI tool-use APIs. 6 2 Perplexity official-blog 8h ago GitHub Copilot now supports MCP servers natively A new section in the Copilot settings panel lets users wire arbitrary MCP servers into their agent context — first wave of MCP-native IDE integrations. 22 Stack Overflow Blog news 11h ago How Braze’s CTO is rethinking engineering for the agentic area Jon Hyman, co-founder and CTO of Braze, shares how he's led the company's engineering organization over nearly 15 years of growth — and how they transformed into an AI-first team in just a few months. 10 r/LocalLLaMA community 11h ago How many of you tried BeeLlama.cpp? How's it? Agentic coding possible with 8GB VRAM? We'll be getting those features(check bottom link) on mainline soon or later anyway. But for now this fork could be useful to see the full potential of our poor GPUs(and also big, large GPUs). Any 8GB VRAM(and 32GB RAM) folks already doing Agentic coding with models(@ Q4 at… 12 arXiv — Machine Learning research 15h ago SkillGen: Verified Inference-Time Agent Skill Synthesis arXiv:2605.10999v1 Announce Type: new Abstract: Skills are a promising way to improve LLM agent capabilities without retraining, while keeping the added procedure reusable and controllable. However, high-quality skills are still largely written by hand. We introduce SkillGen, a… 33 arXiv — Machine Learning research 15h ago GRAFT-ATHENA: Self-Improving Agentic Teams for Autonomous Discovery and Evolutionary Numerical Algorithms arXiv:2605.11117v1 Announce Type: new Abstract: Scientific discovery can be modeled as a sequence of probabilistic decisions that map physical problems to numerical solutions. Recent agentic AI systems automate individual scientific tasks by orchestrating LLM-driven planners,… 22 arXiv — Machine Learning research 15h ago Robust Multi-Agent Path Finding under Observation Attacks: A Principled Adversarial-Plus-Smoothing Training Recipe arXiv:2605.11469v1 Announce Type: new Abstract: Decentralized multi-agent path finding (MAPF) routes a team of agents on a shared grid, each acting from its own local view. The standard solution trains one shared neural policy with Proximal Policy Optimization (PPO), a popular… 20 arXiv — Machine Learning research 15h ago CTFusion: A CTF-based Benchmark for LLM Agent Evaluation arXiv:2605.11504v1 Announce Type: new Abstract: Recent advances in Large Language Models (LLMs) have enabled agentic systems for complex, multi-step tasks; cybersecurity is emerging as a prominent application. To evaluate such agents, researchers widely adopt Capture The Flag… 23 arXiv — NLP / Computation & Language research 15h ago ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction arXiv:2605.11212v1 Announce Type: new Abstract: Computer-use agents~(CUAs) rely on visual observations of graphical user interfaces, where each screenshot is encoded into a large number of visual tokens. As interaction trajectories grow, the token cost increases rapidly,… 11 arXiv — NLP / Computation & Language research 15h ago An Empirical Study of Automating Agent Evaluation arXiv:2605.11378v1 Announce Type: new Abstract: Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate… 5 arXiv — NLP / Computation & Language research 15h ago Deep Reasoning in General Purpose Agents via Structured Meta-Cognition arXiv:2605.11388v1 Announce Type: new Abstract: Humans intuitively solve complex problems by flexibly shifting among reasoning modes: they plan, execute, revise intermediate goals, resolve ambiguity through associative judgment, and apply formal procedures to well-specified… 5 arXiv — NLP / Computation & Language research 15h ago Learning Agentic Policy from Action Guidance arXiv:2605.12004v1 Announce Type: new Abstract: Agentic reinforcement learning (RL) for Large Language Models (LLMs) critically depends on the exploration capability of the base policy, as training signals emerge only within its in-capability region. For tasks where the base… 12 arXiv — NLP / Computation & Language research 15h ago SkillGraph: Skill-Augmented Reinforcement Learning for Agents via Evolving Skill Graphs arXiv:2605.12039v1 Announce Type: new Abstract: Skill libraries enable large language model agents to reuse experience from past interactions, but most existing libraries store skills as isolated entries and retrieve them only by semantic similarity. This leads to two key… 11 arXiv — NLP / Computation & Language research 15h ago PRISM: Pareto-Efficient Retrieval over Intent-Aware Structured Memory for Long-Horizon Agents arXiv:2605.12260v1 Announce Type: new Abstract: Long-horizon language agents accumulate conversation history far faster than any fixed context window can hold, making memory management critical to both answer accuracy and serving cost. Existing approaches either expand the… 8 arXiv — NLP / Computation & Language research 15h ago LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues arXiv:2605.12493v1 Announce Type: new Abstract: Long-term memory is crucial for agents in specialized web environments, where success depends on recalling interface affordances, state dynamics, workflows, and recurring failure modes. However, existing memory benchmarks for… 17 arXiv — NLP / Computation & Language research 15h ago AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents arXiv:2605.11026v1 Announce Type: cross Abstract: Defenses against indirect prompt injection (IPI) in tool-using LLM agents share two structural weaknesses. First, they all attempt to prevent attacks rather than detect the compromises that slip through. Second, they have only… 21 arXiv — NLP / Computation & Language research 15h ago On Problems of Implicit Context Compression for Software Engineering Agents arXiv:2605.11051v1 Announce Type: cross Abstract: LLM-based Software Engineering agents face a critical bottleneck: context length limitations cause failures on complex, long-horizon tasks. One promising solution is to encode context as continuous embeddings rather than discrete… 27 arXiv — NLP / Computation & Language research 15h ago PresentAgent-2: Towards Generalist Multimodal Presentation Agents arXiv:2605.11363v1 Announce Type: cross Abstract: Presentation generation is moving beyond static slide creation toward end-to-end presentation video generation with research grounding, multimodal media, and interactive delivery. We introduce PresentAgent-2, an agentic framework… 30 arXiv — NLP / Computation & Language research 15h ago Test-Time Compute for Dense Retrieval: Agentic Program Generation with Frozen Embedding Models arXiv:2605.11374v1 Announce Type: cross Abstract: Test-time compute is widely believed to benefit only large reasoning models. We show it also helps small embedding models. Most modern embedding checkpoints are distilled from large LLM backbones and inherit their representation… 21 arXiv — NLP / Computation & Language research 15h ago Can a Single Message Paralyze the AI Infrastructure? The Rise of AbO-DDoS Attacks through Targeted Mobius Injection arXiv:2605.11442v1 Announce Type: cross Abstract: Large Language Model (LLM) agents have emerged as key intermediaries, orchestrating complex interactions between human users and a wide range of digital services and LLM infrastructures. While prior research has extensively… 20 arXiv — NLP / Computation & Language research 15h ago AutoLLMResearch: Training Research Agents for Automating LLM Experiment Configuration -- Learning from Cheap, Optimizing Expensive arXiv:2605.11518v1 Announce Type: cross Abstract: Effectively configuring scalable large language model (LLM) experiments, spanning architecture design, hyperparameter tuning, and beyond, is crucial for advancing LLM research, as poor configuration choices can waste substantial… 13 arXiv — NLP / Computation & Language research 15h ago Controllable User Simulation arXiv:2605.11519v1 Announce Type: cross Abstract: Using offline datasets to evaluate conversational agents often fails to cover rare scenarios or to support testing new policies. This has motivated the use of controllable user simulators for targeted, counterfactual evaluation,… 20 TechCrunch — AI news-outlet 18h ago Medicare’s new payment model is built for AI, and most of the tech world has no idea There is no governmental mechanism to pay for an AI agent that monitors a patient between visits, calls to check in, coordinates a housing referral, or makes sure someone picks up their medication. ACCESS creates that mechanism for the first time. 12 Hacker News — Front Page community 1d ago Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the little… 28 r/LocalLLaMA community 1d ago Needle: We Distilled Gemini Tool Calling Into a 26M Model We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices. We were always frustrated by the little effort made towards building agentic models that run on budget phones, so we conducted… 4 r/LocalLLaMA community 1d ago Agentic harness for theoretical physics research Hi everyone, at Hugging Face we've been developing agentic harnesses for various domains and today we're releasing physics-intern to tackle research-level problems in theoretical physics. It's a multi-agent framework which we designed to mimic the research process and decomposes… 16 TechCrunch — AI news-outlet 1d ago Everything Google announced at its Android Show, from Googlebooks to vibe-coded widgets Google unveiled its new AI-first Googlebooks laptops, more agentic Gemini features, vibe-coded Android widgets, Gemini in Chrome, refreshed Android Auto, and more ahead of I/O. 32 TechCrunch — AI news-outlet 1d ago Google brings agentic AI and vibe-coded widgets to Android Gemini Intelligence will also include Gboard-based dictation and form-filling capabilities. 30 r/LocalLLaMA community 1d ago Local LLM autocomplete + agentic coding on a single 16GB GPU + 64GB RAM Today I set up a full coding toolbox on a single RTX 5080 (with RAM offloading) that's actually viable. Autocomplete : bartowski/Qwen2.5-Coder-7B-Instruct-GGUF:Q6_K_L Agentic : unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL Why these models: Qwen2.5 is still the best model for infill… 9 LangChain releases dev-tools 1d ago langchain==1.3.0 This release adds support for version="v3" in stream_events / astream_events for langchain agents. Refer to the event streaming guide for details. 30 Vercel — AI dev-tools 1d ago Manage Vercel Firewall in the CLI You can now manage the Vercel Firewall directly from the CLI. Using the vercel firewall command, you can configure custom rules , IP blocks , system bypasses , attack mode , and system mitigations . Building on the new CLI commands, the Vercel Firewall skill lets agents interact… 31 Simon Willison community 1d ago Thoughts on GitLab's workforce reduction" and "structural and strategic decisions" GitLab Act 2 There's a lot going on in this announcement from GitLab about the "workforce reduction" and "structural and strategic decisions" they are making with respect to the agentic era. They're "planning to reduce the number of countries by up to 30% where we have small… 35 TechCrunch — AI news-outlet 1d ago GM just laid off hundreds of IT workers to hire those with stronger AI skills Some of the positions focus on AI-native development, data engineering and analytics, cloud-based engineering, and agent and model development as well as prompt engineering and new AI workflows. 20 The Algorithmic Bridge news-outlet 2d ago How to Stop AI Agents From Frying Your Brain Avoid the latest AI-induced disease 5 Stack Overflow Blog news 2d ago When the Sensor Starts Thinking: SnortML, Agentic AI, and the Evolving Architecture of Intrusion Detection Signature-based detection has always known what it was looking for. Machine learning and autonomous agents are changing the question entirely, shifting from "does this match a known pattern?" to "does this actually make… 8 Simon Willison community 2d ago Learning on the Shop floor Learning on the Shop floor Tobias Lütke describes Shopify's internal coding agent tool, River, which operates entirely in public on their Slack: River does not respond to direct messages. She politely declines and suggests to create a public channel for you and her to start… 19 Vercel — AI dev-tools 3d ago How Superset built the IDE for AI agents on Vercel Superset on Vercel 1,000–1,400 deployments per week ~600 preview deployments per day ~30 second average build time 57–64% week-over-week DAU growth Software development with AI started as a single engineer chatting with a single agent about a local repo. Today, developers direct… 5 NVIDIA Developer Blog official-blog 5d ago Improving Bash Generation in Small Language Models with Grammar-Constrained Decoding Bash is one of the most flexible and powerful interfaces exposed to AI agents. In the right system, a model that emits grep, curl, tar, or a shell pipeline is... 16 Don't Worry About the Vase community 5d ago Claude Code, Codex and Agentic Coding #8 When I started this series, everyone was going crazy for coding agents. 20 Stack Overflow Blog news 5d ago No Dumb Questions: What is an MCP server and why do I care? Welcome to No Dumb Questions, a column where our least technical writer asks our technical staff the simple, basic tech questions people are afraid to ask. In this first entry, Stack's Director of Ecosystem… 9 NVIDIA Developer Blog official-blog 5d ago Streaming Tokens and Tools: Multi-Turn Agentic Harness Support in NVIDIA Dynamo An agentic exchange must preserve a structured interaction: assistant turns interleave reasoning with one or more tool calls, and subsequent user turns return... 11 OpenAI news 5d ago Running Codex safely at OpenAI How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption. 23 Smol AI News news-outlet 5d ago not much happened today **OpenAI** rapidly expanded the **GPT-5.5** family with multiple variants including **gpt-image-2**, **GPT-5.5 Pro**, and **GPT-5.5 Cyber**, receiving positive feedback for efficiency and usability. **Codex** evolved into a long-running agent runtime with a new **/goal**… 35 Vercel — AI dev-tools 5d ago Chat SDK adds Messenger adapter support Chat SDK now supports Messenger as a chat adapter. Build agents that support messages, reactions, multimedia downloads, postback buttons, and direct conversations, with display names fetched automatically from user profiles. Read the Chat SDK documentation to get started, browse… 13 Vercel — AI dev-tools 5d ago Chat SDK adds web adapter support You can now build chat UIs that connect to Chat SDK with the new web adapter . Build in-product assistants, support agents, or any other browser-based chat experience. Define the bot on your server: Then stream replies live to the browser with a preconfigured @ai-sdk/react… 36 GitHub Blog — AI & ML official-blog 5d ago Improving token efficiency in GitHub Agentic Workflows Agentic workflows that run on every pull request can quietly accumulate large API bills. Here's how we instrumented our own production workflows, found the inefficiencies, and built agents to fix them. The post… 39 GitHub Blog — AI & ML official-blog 6d ago Agent pull requests are everywhere. Here’s how to review them. A practical guide to reviewing agent-generated pull requests: what to look for, where issues hide, and how to catch technical debt before it ships. The post Agent pull requests are everywhere. Here’s how to review… 31 Smol AI News news-outlet 6d ago GPT-Realtime-2, -Translate, and -Whisper: new SOTA realtime voice APIs **OpenAI** released **GPT-Realtime-2**, a voice model with **GPT-5-class reasoning**, tool use, interruption handling, and extended context windows up to **128K tokens**, achieving top scores on **Big Bench Audio** and **Conversational Dynamics** benchmarks. They also launched a… 21 GitHub Blog — AI & ML official-blog 6d ago Validating agentic behavior when “correct” isn’t deterministic How to build the “Trust Layer” for Github Copilot Coding Agents without brittle scripts or black-box judgements by using dominatory analysis. The post Validating agentic behavior when “correct” isn’t deterministic… 37 Ars Technica — AI news-outlet 7d ago Anthropic's Claude Managed Agents can now "dream," sort of Also, 5-hour usage limits will double for Pro and Max users of Claude Code. 14 Anthropic SDK (Python) releases dev-tools 7d ago v0.100.0 0.100.0 (2026-05-06) Full Changelog: v0.99.0...v0.100.0 Features api: add support for Managed Agents multiagents and outcomes, webhooks, vault validation ( 3b3deee ) Bug Fixes api: Adjust webhook configuration ( 8c3339e ) 13 Simon Willison community 7d ago Vibe coding and agentic engineering are getting closer than I'd like I recently talked with Joseph Ruscio about AI coding tools for Heavybit's High Leverage podcast: Ep. #9, The AI Coding Paradigm Shift with Simon Willison . Here are some of my highlights, including my disturbing realization that vibe coding and agentic engineering have started… 25 Google DeepMind official-blog 7d ago AlphaEvolve: How our Gemini-powered coding agent is scaling impact across fields Explore how AlphaEvolve's Gemini-powered algorithms are driving impact across business, infrastructure, and science. 20 OpenAI news 7d ago How frontier firms are pulling ahead OpenAI’s B2B Signals research shows how frontier enterprises deepen AI adoption, scale Codex-powered agentic workflows, and build durable competitive advantage. 10 Marcus on AI community 8d ago Breaking: Autonomous Agents are a Shitshow Sorry to use a technical term in the title 6 NVIDIA Developer Blog official-blog 8d ago How to Build In-Vehicle AI Agents with NVIDIA: From Cloud to Car The automotive cockpit is undergoing a fundamental shift from rule-based interfaces to agentic, multimodal AI systems capable of reasoning, planning, and... 10 NVIDIA Developer Blog official-blog 8d ago Building for the Rising Complexity of Agentic Systems with Extreme Co-Design Generative AI’s explosive first chapter was defined by humans sending requests and models responding. The agentic chapter is different. Agents don't... 32 Vercel — AI dev-tools 8d ago Query observability metrics using the Vercel CLI You can now access Observability Plus metrics in the Vercel CLI. Query observability data for any Vercel team or project using the new vercel metrics command. Coding agents can also leverage this new command to better analyze the performance, reliability, or security of… 31 OpenAI news 8d ago OpenAI and PwC collaborate to reimagine the office of the CFO OpenAI and PwC are partnering to help enterprises use AI agents to automate finance workflows, improve forecasting, strengthen controls, and modernize the CFO function. 20 NVIDIA Developer Blog official-blog 8d ago Optimize Supply Chain Decision Systems Using NVIDIA cuOpt Agent Skills Modern supply chains operate under the constant pressures of fluctuating demand, volatile costs, constrained capacity, and interdependent decision-making.... 23 Anthropic SDK (Python) releases dev-tools 9d ago v0.98.0 0.98.0 (2026-05-04) Full Changelog: v0.97.0...v0.98.0 Features api: improve Managed Agents APIs ( 7faf393 ) client: add Workload Identity Federation, interactive OAuth, and auth profiles ( 6458bcc ) support setting headers via env ( 52eb8cd ) Bug Fixes streaming: propagate… 13 Vercel — AI dev-tools 9d ago How General Intelligence used agents to build an agent platform on Vercel General Intelligence on Vercel 8-person team (5 engineers) shipping 10 PRs and 70+ commits per engineer, per day 4,000+ preview branches with ~100 parallel app versions running at any moment 90% of SRE work automated through Vercel and their own agent (Cofounder) Cofounder… 27 Vercel — AI dev-tools 9d ago Introducing deepsec: The security harness for finding vulnerabilities in your codebase Today we’re open sourcing deepsec : a security harness powered by coding agents. It runs on your own infrastructure and surfaces hard-to-find issues in large codebases. You can run deepsec on your laptop without setting up a cloud service for privileged source code access. For… 38 Latent.Space news-outlet 11d ago [AINews] AI Engineer World's Fair — Autoresearch, Memory, World Models, Tokenmaxxing, Agentic Commerce, and Vertical AI Call for Speakers a quiet day lets us make a call for speakers! 24 Latent.Space news-outlet 12d ago [AINews] Agents for Everything Else: Codex for Knowledge Work, Claude for Creative Work a quiet day lets us reflect on coding agents "breaking containment" 13 ThursdAI news-outlet 12d ago 📅 ThursdAI - Apr 30 - DeepSeek V4 (1.6T MoE), Cursor SDK Wins WolfBench, Mayo's REDMOD Saves Lives, Stripe Gives Agents a Wallet & more From Weights & Biases - one last one for April, with incredible AI news, a monthly recap and Max from Pangram as a guest + I have OpenClaw a credit card! 21 Stack Overflow Blog news 13d ago The Worst Coder in the World goes agentic: building a leaderboard cracking AI Agents are everywhere, so isn't it fitting that the Worst Coder in the World goes agentic? A coding newbie explores the challenges and rewards of building an agent for work—and trying to learn a few things about… 27 NVIDIA Developer Blog official-blog 13d ago Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl NVIDIA CUDA Tile (cuTile) is a tile-based programming model that enables developers to write GPU kernels in terms of tile-level operations—loads, stores, and... 15 Vercel — AI dev-tools 13d ago Custom tags available in beta on Vercel Sandbox As teams scale isolated environments for AI agents, code generation, or dev workflows, keeping track of which sandbox belongs to whom, and why, becomes critical. Custom tags allow you to organize, filter, and manage Vercel Sandboxes at scale. Each sandbox supports up to five… 29 NVIDIA Developer Blog official-blog 14d ago Powering AI Factories with NVIDIA Enterprise Reference Architectures The next wave of enterprise productivity is being built on AI factories. As organizations deploy agentic AI systems capable of reasoning, automation, and... 23 Vercel — AI dev-tools 14d ago Vercel now supports Pro plan in Stripe Projects You can now sign up for or upgrade to a Vercel Pro plan directly from Stripe Projects using shared payment tokens (SPTs). Agents and developers can manage plan changes programmatically from the Stripe CLI, without leaving their workflow. What’s new Provision or upgrade to Vercel… 28 NVIDIA Developer Blog official-blog 15d ago NVIDIA Nemotron 3 Nano Omni Powers Multimodal Agent Reasoning in a Single Efficient Open Model Agentic systems often reason across screens, documents, audio, video, and text within a single perception‑to‑action loop. However, they still rely on... 7 NVIDIA Developer Blog official-blog 15d ago 24/7 Simulation Loops: How Agentic AI Keeps Subsurface Engineering Moving The subsurface industry is at a critical point in its digital evolution. For decades, unlocking reservoir potential has relied on experts performing essential... 20 OpenAI news 15d ago OpenAI models, Codex, and Managed Agents come to AWS OpenAI GPT models, Codex, and Managed Agents are now available on AWS, enabling enterprises to build secure AI in their AWS environments. 18 OpenAI news 16d ago An open-source spec for orchestration: Symphony Learn how Symphony, an open-source spec for Codex orchestration, turns issue trackers into always-on agent systems—boosting engineering output and reducing context switching. 21 OpenAI news 16d ago Choco automates food distribution with AI agents How Choco used OpenAI APIs to streamline food distribution, boost productivity, and unlock growth—an in-depth customer story on real-world AI impact. 5 Vercel — AI dev-tools 19d ago GPT 5.5 on AI Gateway GPT-5.5 is now available on Vercel AI Gateway . There are 2 variants: GPT-5.5 and GPT-5.5 Pro. Both models are tuned for long-running agentic work across coding, computer use, knowledge work, and scientific research, and are more token-efficient than the previous generation.… 37 NVIDIA Developer Blog official-blog 19d ago Winning a Kaggle Competition with Generative AI–Assisted Coding In March 2026, three LLM agents generated over 600,000 lines of code, ran 850 experiments, and helped secure a first-place finish in a Kaggle playground... 11 Latent.Space news-outlet 19d ago AIE Europe Debrief + Agent Labs Thesis: Unsupervised Learning x Latent Space Crossover Special (2026) Note: This episode was recorded just after AIE Europe, but before the Cursor-xAI deal. 37 Vercel — AI dev-tools 20d ago Deepseek V4 on AI Gateway DeepSeek V4 is now available on Vercel AI Gateway . There are 2 model variants: DeepSeek V4 Pro and DeepSeek V4 Flash. A 1M token context window is the default across both models. DeepSeek V4 Pro focuses on agentic coding, formal mathematical reasoning, and long-horizon… 27 Smol AI News news-outlet 20d ago GPT 5.5 **OpenAI launched GPT-5.5** as its new flagship model for "real work and powering agents," immediately available in ChatGPT and Codex but with delayed API access due to enhanced safety requirements. The model features improved token efficiency and supports longer multi-step… 14 OpenAI news 21d ago Speeding up agentic workflows with WebSockets in the Responses API A deep dive into the Codex agent loop, showing how WebSockets and connection-scoped caching reduced API overhead and improved model latency. 31 OpenAI news 21d ago Workspace agents Learn how to build, use, and scale workspace agents in ChatGPT to automate repeatable workflows, connect tools, and streamline team operations. 5 OpenAI news 21d ago Introducing workspace agents in ChatGPT Workspace agents in ChatGPT are Codex-powered agents that automate complex workflows, run in the cloud, and help teams scale work across tools securely. 29 Zed Editor dev-tools 21d ago Introducing Parallel Agents in Zed Run multiple agents at once, in the same window. 8 NVIDIA Developer Blog official-blog 23d ago Mitigating Indirect AGENTS.md Injection Attacks in Agentic Environments AI tools are significantly accelerating software development and changing how developers work with code. These tools serve as real-time copilots, automating... 4 NVIDIA Developer Blog official-blog 25d ago Full-Stack Optimizations for Agentic Inference with NVIDIA Dynamo Coding agents are starting to write production code at scale. Stripe’s agents generate 1,300+ PRs per week. Ramp attributes 30% of merged PRs to agents.... 14 NVIDIA Developer Blog official-blog 26d ago Build a More Secure, Always-On Local AI Agent with OpenClaw and NVIDIA NemoClaw Agents are evolving from question-and-answer systems into long-running autonomous assistants that read files, call APIs, and drive multi-step workflows.... 10 NVIDIA Developer Blog official-blog 27d ago How to Build Vision AI Pipelines Using NVIDIA DeepStream Coding Agents Developing real-time vision AI applications presents a significant challenge for developers, often demanding intricate data pipelines, countless lines of code,... 11 Vercel — AI dev-tools 27d ago Claude Opus 4.7 on AI Gateway Claude Opus 4.7 from Anthropic is now available on Vercel AI Gateway . Opus 4.7 is optimized for long-running, asynchronous agents and handles complex, multi-step tasks with reliable agentic execution. The model shows gains on knowledge-worker tasks, particularly where it needs… 6 Smol AI News news-outlet 27d ago Anthropic's Claude Opus 4.7 **Anthropic** launched **Claude Opus 4.7**, its most capable Opus model yet, featuring stronger coding and agentic performance, a new tokenizer, and improved long-context handling with a new **xhigh** reasoning tier. Benchmarks show substantial gains, including **SWE-bench Pro… 37 Don't Worry About the Vase community 28d ago Claude Code, Codex and Agentic Coding #7: Auto Mode As we all try to figure out what Mythos means for us down the line, the world of practical agentic coding continues, with the latest array of upgrades. 7 Stack Overflow Blog news 28d ago Human input needed: take our survey on AI agents Are you still "human-in-the-loop," or have you moved to "human-on-the-loop," overseeing a bot that’s doing the driving? 12 Hugging Face official-blog 28d ago Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents 7 OpenAI news 28d ago The next evolution of the Agents SDK OpenAI updates the Agents SDK with native sandbox execution and a model-native harness, helping developers build secure, long-running agents across files and tools. 34 Smol AI News news-outlet 28d ago not much happened today **OpenAI** expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now open-source and supports execution via partner sandboxes,… 37 GitHub Blog — AI & ML official-blog 29d ago Hack the AI agent: Build agentic AI security skills with the GitHub Secure Code Game Learn to find and exploit real-world agentic AI vulnerabilities through five progressive challenges in this free, open source game that over 10,000 developers have already used to sharpen their security skills. The post… 20 Vercel — AI dev-tools 1mo ago Copy-to-Prompt instructions now available for Flags The feature flags details page now includes copy-to-prompt instructions in the instructions pane. You or your agent can install the Flags SDK, link the project using the Vercel CLI , and add the required flag definitions to the code base. Teams that prefer manual configuration… 9 Import AI news-outlet 1mo ago Import AI 453: Breaking AI agents; MirrorCode; and ten views on gradual disempowerment Was fire equivalent to a singularity for people at the time? 38 OpenAI news 1mo ago Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI Cloudflare brings OpenAI’s GPT-5.4 and Codex to Agent Cloud, enabling enterprises to build, deploy, and scale AI agents for real-world tasks with speed and security. 20 Smol AI News news-outlet 1mo ago not much happened today **Harness engineering** is emerging as a key discipline in AI agent development, emphasizing components like filesystems, memory, and retries beyond just models. **OpenAI's Codex** is expanding agentic coding workflows beyond software engineering, including codebase… 32 NVIDIA Developer Blog official-blog 1mo ago MiniMax M2.7 Advances Scalable Agentic Workflows on NVIDIA Platforms for Complex AI Applications The release of MiniMax M2.7 adds enhancements to the popular MiniMax M2.5 model, built for agentic harnesses,... 33 Stack Overflow Blog news 1mo ago Gen Z needs a knowledge base (and so do you) AI tool use is inescapable...especially if you're a young person trying to get an edge in an increasingly difficult job market. But cognitive offloading is dangerous, no matter what age you are. Building a knowledge base can save your brain and skills from atrophy. 13 Vercel — AI dev-tools 1mo ago Agentic Infrastructure Every generation of software eventually demands a new generation of infrastructure. First, we configured servers by hand. Next, the cloud turned infrastructure into APIs. Then, a more important shift: infrastructure derived from the application itself. LLMs and coding agents are… 21 Smol AI News news-outlet 1mo ago not much happened today **Anthropic's Mythos** and **OpenAI's** upcoming restricted cyber-capable models are central to recent discussions, with debates on their security realism and evaluation methods. **LangChain's Deep Agents deploy** introduces an open memory, model-agnostic agent harness… 36 Zed Editor dev-tools 1mo ago Introducing Zed's Agent Metrics A public, weekly view of AI agent adoption and turn times inside Zed, plus a few patterns worth watching. 19 OpenAI news 1mo ago The next phase of enterprise AI OpenAI outlines the next phase of enterprise AI, as adoption accelerates across industries with Frontier, ChatGPT Enterprise, Codex, and company-wide AI agents. 7 Smol AI News news-outlet 1mo ago not much happened today **Meta Superintelligence Labs** launched **Muse Spark**, a natively multimodal reasoning model featuring tool use, visual chain of thought, and multi-agent orchestration. It is live on **meta.ai** and the Meta AI app with a private API preview and plans for open-sourcing future… 29 Vercel — AI dev-tools 1mo ago Manage Vercel Microfrontends with AI Agents and the CLI Vercel Microfrontends now include two new setup and management tools: an AI skill for coding agents and new Vercel CLI commands. New Vercel Microfrontends skill: Install the Microfrontends skill to let your AI coding agent guide you through group creation with natural language… 5 Smol AI News news-outlet 1mo ago not much happened today **Hermes Agent** is gaining attention as a leading open agent stack with features like self-improving skills, persistent memory, and a self-improvement loop. Its new **Manim skill** enables generation of math/technical animations, expanding agent capabilities. The Hermes… 19 Ahead of AI (Sebastian Raschka) research 1mo ago Components of A Coding Agent How coding agents use tools, memory, and repo context to make LLMs work better in practice 7 Smol AI News news-outlet 1mo ago not much happened today **Gemma 4** was launched by **Google** under an **Apache 2.0 license**, marking a significant open-model release focused on **reasoning, agentic workflows, multimodality, and on-device use**. It outperforms models 10x larger and has immediate ecosystem support including… 35 Google DeepMind official-blog 1mo ago Gemma 4: Byte for byte, the most capable open models Gemma 4: Our most intelligent open models to date, purpose-built for advanced reasoning and agentic workflows. 7 Vercel — AI dev-tools 1mo ago Build an MCP server with Nuxt Developers building AI features with Nuxt can now create Model Context Protocol (MCP) servers directly within their applications using the Nuxt MCP Toolkit . Install the module The module lets you define tools with Zod validation, expose data as resources, and create reusable… 29 Vercel — AI dev-tools 1mo ago Qwen 3.6 Plus on AI Gateway Qwen 3.6 Plus from Alibaba is now available on Vercel AI Gateway . Compared to Qwen 3.5 Plus, this model adds stronger agentic coding capabilities, from frontend development to repository-level problem solving, along with improved multimodal perception and reasoning. It features… 19 Vercel — AI dev-tools 1mo ago Gemma 4 on AI Gateway Gemma 4 26B (MoE) and 31B (Dense) from Google are now available on Vercel AI Gateway . Built on the same architecture as Gemini 3, both open models support function-calling, agentic workflows, structured JSON output, and system instructions. Both support up to 256K context, 140+… 25 Smol AI News news-outlet 1mo ago not much happened today **Arcee’s Trinity-Large-Thinking** was released with **open weights under Apache 2.0**, featuring a **400B total / 13B active** model size and strong agentic performance, ranking **#2 on PinchBench**. **Z.ai’s GLM-5V-Turbo** is a **vision coding model** with **native multimodal… 13 Vercel — AI dev-tools 1mo ago How Waldium made a blog platform work for humans and AI alike Waldium is a two-person, YC-backed startup that built an agentic CMS for businesses. Co-founded by Amrutha Gujjar and CTO Shivam Singhal, the platform automates content research and creation, and gives every customer blog its own MCP server endpoint so AI agents can query it… 9 OpenAI news 1mo ago Gradient Labs gives every bank customer an AI account manager Gradient Labs uses GPT-4.1 and GPT-5.4 mini and nano to power AI agents that automate banking support workflows with low latency and high reliability. 18 Hugging Face official-blog 1mo ago Training mRNA Language Models Across 25 Species for $165 Back to Articles Training mRNA Language Models Across 25 Species for $165 Team Article Published March 31, 2026 Upvote 27 Maziyar Panahi MaziyarPanahi OpenMed Part II: Building the Pipeline, From Structure Prediction to Codon Optimization By OpenMed, Open-Source Agentic AI for… 14 Stack Overflow Blog news 1mo ago How can you test your code when you don’t know what’s in it? Ryan hosts SmartBear’s VP of AI and Architecture Fitz Nowlan to explore how we’re moving away from old assumptions about software development, the challenges of testing MCP servers as LLM-driven agents introduce non-determinism that breaks tradition, and how data locality and… 14 Vercel — AI dev-tools 1mo ago How FLORA shipped a creative agent on Vercel's AI stack FLORA on Vercel 2x faster to production with their generation system Zero infrastructure debates after migration 50+ image models orchestrated A seasonal fashion launch is a story, not a single frame. Crafting that story is a process of exploration: It’s the same piece, worn by… 12 Vercel — AI dev-tools 1mo ago Agent responsibly The following is based on an internal talk given at Vercel. We're sharing it publicly because the problem it describes isn't unique to us, and the framework is useful for any team shipping with agents. Coding agents generate code at unprecedented speeds. In the hands of… 10 Vercel — AI dev-tools 1mo ago Making Turborepo 96% faster with agents, sandboxes, and humans Turborepo is now 81-91% faster to compute its task graph in our repositories, scaling with repo size. On our 1,000+ package monorepo, turbo run now feels instant. Time to First Task is now 11x faster. After testing my changes with some open source Turborepos and asking Vercel… 14 Stack Overflow Blog news 1mo ago Prevent agentic identity theft Ryan is joined by Nancy Wang, CTO of 1Password, to discuss the security challenges local agents present, how enterprises can create robust governance of credentials through zero-knowledge architecture, and the implications of agent intent and misuse in a world where AI agents… 4 Stack Overflow Blog news 1mo ago Building shared coding guidelines for AI (and people too) Coding guidelines and standards for agents need to be a little different—more explicit, demonstrative of patterns, and obvious. 36 Vercel — AI dev-tools 1mo ago Vercel plugin now supported on OpenAI Codex and Codex CLI The Vercel plugin now supports OpenAI Codex and the Codex CLI. With the plugin, teams can access over 39 platform skills, three specialist agents, and real-time code validation to assist with their development workflow. Install it in the Codex app or from the Codex CLI: Learn… 27 NVIDIA Developer Blog official-blog 1mo ago Building NVIDIA Nemotron 3 Agents for Reasoning, Multimodal RAG, Voice, and Safety Agentic AI is an ecosystem where specialized models work together to handle planning, reasoning, retrieval, and safety guardrailing. As these systems scale,... 37 Smol AI News news-outlet 1mo ago not much happened today **Anthropic** advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. **Figma**, **GitHub**, and **Cursor** launch design canvases with direct AI editing, showcasing tool-calling becoming… 12 Smol AI News news-outlet 1mo ago not much happened today **ARC-AGI-3** benchmark introduced by **@arcprize** and **François Chollet** resets the frontier for general agentic reasoning with humans solving 100% of tasks versus under 1% for current models, focusing on zero-preparation generalization and human-like learning efficiency.… 4 Smol AI News news-outlet 1mo ago not much happened today **Google** launched **Gemini 3.1 Flash Live**, a realtime voice and vision agent model with **2x longer conversation memory**, supporting **70 languages** and **128k context**. **Mistral AI** released **Voxtral TTS**, a low-latency, open-weight text-to-speech model supporting… 31 Hugging Face official-blog 1mo ago A New Framework for Evaluating Voice Agents (EVA) Back to Articles A New Framework for Evaluating Voice Agents (EVA) Enterprise Article Published March 24, 2026 Upvote 92 Tara Bogavelli tarabogavelli ServiceNow-AI Gabrielle Gauthier Melancon gabegma ServiceNow-AI Katrina Stankiewicz kstankiewicz ServiceNow-AI Nifemi Bamgbose… 7 Smol AI News news-outlet 1mo ago not much happened today **Anthropic** introduced **Claude Cowork** and **Claude Code** enabling desktop control of mouse, keyboard, and screen in a **macOS research preview**, expanding agent capabilities beyond APIs and browsers. The agent ecosystem is evolving towards long-running, parallel,… 29 Stack Overflow Blog news 1mo ago After all the hype, was 2025 really the year of AI agents? Ryan is joined by Stefan Weitz, CEO and co-founder of the HumanX Conference, for a conversation on how AI has evolved in the last year. 33 Vercel — AI dev-tools 1mo ago Build knowledge agents without embeddings Most knowledge agents start the same way. You pick a vector database, then build a chunking pipeline. You choose an embedding model, then tune retrieval parameters. Weeks later, your agent answers a question incorrectly, and you have no idea which chunk it retrieved or why that… 38 Vercel — AI dev-tools 1mo ago Two startups at global scale without DevOps Leonardo.AI processes more than 4.5 million images every day across cities worldwide, and Relevance AI's agents run autonomously across time zones, touching Salesforce, HubSpot, Slack, and dozens of other systems without pause. Neither company has a dedicated DevOps team. That's… 17 Vercel — AI dev-tools 1mo ago Vercel is now available in Stripe Projects You can now signup and deploy to Vercel through Stripe Projects. Available in developer preview, this CLI-based workflow lets teams and AI agents create infrastructure environments directly from the terminal. As a launch and design partner for Stripe Projects, Vercel enables a… 18 Stack Overflow Blog news 1mo ago Building a global engineering team (plus AI agents) with Netlify Dana Lawson, CTO of Netlify, shares her insights on leading a lean, globally distributed engineering team that powers 5% of the internet. 11 Vercel — AI dev-tools 1mo ago Chat SDK brings agents to your users In early January, we gave the entire company a challenge: figure out how to multiply your output. People created agents. Mostly chat bots, but dedicated ones, purpose-built for real workflow augmentation: the agents were doing things automatically that would otherwise be tedious… 10 NVIDIA Developer Blog official-blog 1mo ago How to Build Deep Agents for Enterprise Search with NVIDIA AI-Q and LangChain While consumer AI offers powerful capabilities, workplace tools often suffer from disjointed data and limited context. Built with LangChain, the NVIDIA AI-Q... 30 Interconnects research 1mo ago GPT 5.4 is a big step for Codex On evaluating and understanding the frontier of agents, and why I still turn to Claude. 30 Vercel — AI dev-tools 1mo ago MiniMax M2.7 is live on AI Gateway MiniMax M2.7 is now available on Vercel AI Gateway in two variants: standard and high-speed. M2.7 is a major step up from previous M2-series models in software engineering, agentic workflows, and professional office tasks. The model natively supports multi-agent collaboration,… 34 Vercel — AI dev-tools 1mo ago 360 billion tokens, 3 million customers, 6 engineers Impact at a glance Durable ships new production agents to customers in a single day AI features and agents serve ~1.1B tokens per day (360B per year) 10x leverage for every engineer, product manager, and designer 3-4x lower infra cost compared to self hosting Durable began with… 5 NVIDIA Developer Blog official-blog 1mo ago Building the AI Grid with NVIDIA: Orchestrating Intelligence Everywhere AI-native services are exposing a new bottleneck in AI infrastructure: As millions of users, agents, and devices demand access to intelligence, the challenge is... 14 Vercel — AI dev-tools 1mo ago Introducing the Vercel plugin for coding agents Claude Code and Cursor can now further understand Vercel projects using the new Vercel plugin and a full platform knowledge graph. The plugin observes real-time activity, including file edits and terminal commands, to dynamically inject Vercel knowledge into the agent's context.… 28 Vercel — AI dev-tools 1mo ago Updates to Terms of Service Agents are reshaping the tools developers use, the applications they build, and the infrastructure that runs them. We’ve updated our Terms of Service and Privacy Policy to reflect how Vercel uses data to support agentic features, improve our platform, and contribute to the AI… 24 Hugging Face official-blog 1mo ago Holotron-12B - High Throughput Computer Use Agent Back to Articles Holotron-12B - High Throughput Computer Use Agent Team Article Published March 17, 2026 Upvote 22 Pierre-Louis Cedoz plcedoz38 Hcompany Hamza Benchekroun hamza-hcompany Hcompany Aurélien Lac h-aurelien-lac Hcompany delfosse aureliendelfosseathai Hcompany Tony Wu… 6 Vercel — AI dev-tools 1mo ago New GitHub App permissions for Actions and Workflows The Vercel GitHub App now requests two additional repository permissions on install: Actions ( read ) and Workflows ( read & write ). These permissions enable Vercel Agent to read workflow run logs to help diagnose CI failures and configure CI workflow files on your behalf. This… 21 NVIDIA Developer Blog official-blog 1mo ago Introducing NVIDIA BlueField-4-Powered CMX Context Memory Storage Platform for the Next Frontier of AI AI‑native organizations increasingly face scaling challenges as agentic AI workflows drive context windows to millions of tokens and models scale toward... 27 NVIDIA Developer Blog official-blog 1mo ago How NVIDIA Dynamo 1.0 Powers Multi-Node Inference at Production Scale Reasoning models are growing rapidly in size and are increasingly being integrated into agentic AI workflows that interact with other models and external tools.... 6 NVIDIA Developer Blog official-blog 1mo ago Scaling Autonomous AI Agents and Workloads with NVIDIA DGX Spark Autonomous AI agents are driving the next wave of AI innovation. These agents must often manage long-running tasks that use multiple communication channels and... 23 NVIDIA Developer Blog official-blog 1mo ago Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell AI has evolved from assistants following your directions to agents that act independently. Called claws, these agents can take a goal, figure out how to achieve... 17 NVIDIA Developer Blog official-blog 1mo ago NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer Artificial intelligence is token-driven. Every prompt, reasoning step, and agent interaction generates tokens. Over the past year, token consumption has grown... 33 Smol AI News news-outlet 2mo ago not much happened today **MCP tools** remain relevant for deterministic APIs despite ergonomic criticisms, with new **web MCP support in Chrome v146** enabling continuous browsing agents. Persistent memory is emerging as a key differentiator for agents, with IBM improving task completion rates and… 5 Smol AI News news-outlet 2mo ago not much happened today **Harnesses, agent infrastructure, and the MCP protocol** are central themes, with emphasis on how **harnesses, sandboxes, filesystem access, skills, memory, and observability** shape agent UI/UX and runtime environments. Despite jokes about MCP's demise, it remains vital in… 26 NVIDIA Developer Blog official-blog 2mo ago Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning Agentic AI systems need models with the specialized depth to solve dense technical problems autonomously. They must excel at reasoning, coding, and long-context... 6 NVIDIA Developer Blog official-blog 2mo ago Reliable AI Coding for Unreal Engine: Improving Accuracy and Reducing Token Costs Agentic code assistants are moving into daily game development as studios build larger worlds, ship more DLCs, and support distributed teams. These assistants... 31 NVIDIA Developer Blog official-blog 2mo ago How to Minimize Game Runtime Inference Costs with Coding Agents NVIDIA ACE is a suite of technologies for building AI agents for gaming. ACE provides ready-to-integrate cloud and on-device AI models for every part of in-game... 23 Import AI news-outlet 2mo ago Import AI 447: The AGI economy; testing AIs with generated games; and agent ecologies What might a superintelligence arcology be like? 30 NVIDIA Developer Blog official-blog 2mo ago Develop Native Multimodal Agents with Qwen3.5 VLM Using NVIDIA GPU-Accelerated Endpoints Alibaba has introduced the new open source Qwen3.5 series built for native multimodal agents. The first model in this series is a ~400B parameter native... 25 ThursdAI news-outlet 2mo ago 📅 ThursdAI - Feb 26 - The Pentagon wants War Claude, every benchmark collapsed, and a solo founder hit $700K ARR with AI agents From Weights & Biases, this week is the closest I've felt to the AI singularity starting, bonkers 1 man AI startups crossing $700K ARR live on show, DoD vs Anthropic, Anthropic vs Chinese models & mor 17 Smol AI News news-outlet 2mo ago Agentic Engineering: WTF Happened in December 2025? **Perplexity** launched **Computer**, an orchestration-first agent platform featuring multi-model routing, usage-based pricing, and parallel asynchronous sub-agents for distributed workflows. **Andrej Karpathy** claims a "phase change" in coding agents since December,… 21 Smol AI News news-outlet 2mo ago Gemini 3.1 Pro: 2x 3.0 on ARC-AGI 2 **Google** released **Gemini 3.1 Pro**, a developer preview integrated across the **Gemini app**, **NotebookLM**, **Gemini API / AI Studio**, and **Vertex AI**, highlighting a significant reasoning improvement with **ARC-AGI-2 = 77.1%** and strong coding and agentic-tool… 10 Hugging Face official-blog 2mo ago IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST Back to Articles IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST Enterprise Article Published February 18, 2026 Upvote 19 Ayhan Sebin ayhansebin ibm-research Rohan Arora rohan-arora ibm-research Saurabh Jha saurabhjha1 ibm-research Ayhan Sebin… 7 Smol AI News news-outlet 2mo ago not much happened today **Anthropic** released **Claude Opus/Sonnet 4.6**, showing a significant intelligence index jump but with increased token usage and cost. **Anthropic** also shared insights on AI agent autonomy, highlighting human-in-the-loop prevalence and software engineering tool calls.… 5 One Useful Thing (Ethan Mollick) community 2mo ago A Guide to Which AI to Use in the Agentic Era It's not just chatbots anymore 12 Smol AI News news-outlet 2mo ago Claude Sonnet 4.6: clean upgrade of 4.5, mostly better with some caveats **Anthropic** launched **Claude Sonnet 4.6**, an upgrade over Sonnet 4.5, featuring broad improvements in **coding, long-context reasoning, agent planning, knowledge work, and design**, plus a **1M-token context window (beta)**. Benchmarks show Sonnet 4.6 leading on **GDPval-AA… 4 Smol AI News news-outlet 2mo ago MiniMax-M2.5: SOTA coding, search, toolcalls, $1/hour **MiniMax-M2.5** is now open source, featuring an "agent-native" reinforcement learning framework called **Forge** trained across **200k+ RL environments** for coding, tool use, and workflows. It boasts strong benchmark scores like **80.2% SWE-Bench Verified** and emphasizes… 20 Hugging Face official-blog 2mo ago Custom Kernels for All from Codex and Claude Back to Articles Custom Kernels for All from Codex and Claude Published February 13, 2026 Update on GitHub Upvote 75 ben burtenshaw burtenshaw Sayak Paul sayakpaul Aritra Roy Gosthipaty ariG23498 shaun smith evalstate tl;dr: We built an agent skill that teaches coding agents how… 18 Hugging Face official-blog 3mo ago OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments Back to Articles OpenEnv in Practice: Evaluating Tool-Using Agents in Real-World Environments Published February 12, 2026 Update on GitHub Upvote 32 Christian Washington christian-washington TuringEnterprises Ankit Jasuja ajasuja TuringEnterprises Santosh Sah santosh-iima… 6 Smol AI News news-outlet 3mo ago Qwen-Image 2.0 and Seedance 2.0 **OpenAI** advances its Responses API for multi-hour agent workflows with features like **server-side compaction**, **hosted containers**, and **Skills API**, alongside upgrading **Deep Research** to **GPT-5.2** and adding connectors. Discussions around sandbox design highlight… 6 ThursdAI news-outlet 3mo ago 📆 ThursdAI - Feb 5 - Opus 4.6 was #1 for ONE HOUR before GPT 5.3 Codex, Voxtral transcription, Codex app, Qwen Coder Next & the Agentic Internet From Weights & Biases - a hell of a week to be covering the AI news, with 2 big model drops live during the show, 1 interview with VB from OpenAI about Codex app and the new model, Voxtral and more AI 19 Smol AI News news-outlet 3mo ago OpenAI and Anthropic go to war: Claude Opus 4.6 vs GPT 5.3 Codex **OpenAI** launched **GPT-5.3-Codex**, emphasizing **token efficiency**, **inference speed**, and hardware/software co-design with **GB200-NVL72** and **NVIDIA** collaboration. The new **Frontier** agent platform supports business-context agents with execution environments and… 15 Smol AI News news-outlet 3mo ago ElevenLabs $500m Series D at $11B, Cerebras $1B Series H at $23B, Vibe Coding -> Agentic Engineering **Google's Gemini 3** is being integrated widely, including a new **Chrome side panel** and **Nano Banana** UX features, with rapid adoption and a **78% unit-cost reduction** in serving costs. The **Gemini app** reached **750M+ MAU** in Q4 2025, nearing ChatGPT's user base.… 23 Import AI news-outlet 3mo ago Import AI 443: Into the mist: Moltbook, agent ecologies, and the internet in transition Plus, a story about agents corrupting other agents 35 Smol AI News news-outlet 3mo ago OpenAI Codex App: death of the VSCode fork, multitasking worktrees, Skills Automations **OpenAI** launched the **Codex app** on macOS as a dedicated agent-native command center for coding, featuring **multiple agents in parallel**, **built-in worktrees** for conflict isolation, **skills** for reusable bundles, and **scheduled automations**. The app emphasizes… 19 Smol AI News news-outlet 3mo ago MoltBook takes over the timeline **Moltbook** and **OpenClaw** showcase emergent multi-agent social networks where AI agents autonomously interact, creating an AI-native forum layer with complex security and identity challenges. **Karpathy** describes this as "takeoff-adjacent," highlighting bots… 15 ThursdAI news-outlet 3mo ago 📆 ThursdAI - Jan 29 - Genie3 is here, Clawd rebrands, Kimi K2.5 surprises, Chrome goes agentic & more AI news Listen now | From Weights & Biases (live from SF) - Genie 3 is finally here and made us go "whoah", Clawdbot delivers despite rebrand, Kimi K2.5 king OSS, Chrome crushes Atlas & Grok Imagine #1 26 Hugging Face official-blog 3mo ago We Got Claude to Build CUDA Kernels and teach open models! Back to Articles We got Claude to teach open models how to write CUDA kernels! Published January 28, 2026 Update on GitHub Upvote 156 ben burtenshaw burtenshaw shaun smith evalstate merve merve Pedro Cuenca pcuenq The best thing about agent skills is upskilling your agents on… 22 Zed Editor dev-tools 3mo ago The ACP Registry is Live Easily distribute your agent through the ACP Registry: register once, and work in Zed, JetBrains IDEs, and any ACP-compatible editor. 27 One Useful Thing (Ethan Mollick) community 3mo ago Management as AI superpower Thriving in a world of agents 11 Smol AI News news-outlet 3mo ago Moonshot Kimi K2.5 - Beats Sonnet 4.5 at half the cost, SOTA Open Model, first Native Image+Video, 100 parallel Agent Swarm manager **MoonshotAI's Kimi K2.5** is a **32B active-1T parameter open-weights model** featuring **native multimodality** with image and video understanding, built through continual pretraining on **15 trillion mixed visual and text tokens**. It introduces a new **MoonViT vision… 22 Hugging Face official-blog 3mo ago Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective Back to Articles Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective Team Article Published January 27, 2026 Upvote 74 Jason Zhu JasonZhu13 LinkedIn Hejian Sang pb09204048 LinkedIn Arup De arde171 LinkedIn Rohit Jain rohjain LinkedIn Yanning Chen m0m0chen… 32 Smol AI News news-outlet 3mo ago Anthropic launches the MCP Apps open spec, in Claude.ai **Anthropic** has officially absorbed the independent MCP UI project and, collaborating with **OpenAI**, **Block**, **VS Code**, **Antigravity**, **JetBrains**, and **AWS**, released the **MCP Apps spec** and official support in **Claude.ai**. This standard aims to enable a rich… 9 Zed Editor dev-tools 3mo ago On Programming with Agents Agents handle typing so we can focus on thinking. 9 Smol AI News news-outlet 3mo ago not much happened today **Anthropic** launches "Claude in Excel Pro" with enhanced features. **OpenAI** reveals upcoming **Codex** agent loop and cybersecurity measures. **Google** boosts **Gemini App** quotas and partners with **Sakana AI** for advanced AI Scientist projects in Japan. **Cursor**… 25 Hugging Face official-blog 3mo ago AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality Back to Articles AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality Enterprise Article Published January 21, 2026 Upvote 33 Dhaval Patel DhavalPatel ibm-research James Rayfield jtrayfield ibm-research Saumya Ahuja saumyaahuja ibm-research… 22 Import AI news-outlet 3mo ago Import AI 441: My agents are working. Are yours? Plus: Corrupting AI systems with a poison fountain 29 VentureBeat — AI news-outlet 3mo ago Claude Code costs up to $200 a month. Goose does the same thing for free. The artificial intelligence coding revolution comes with a catch: it's expensive. Claude Code , Anthropic's terminal-based AI agent that can write, debug, and deploy code autonomously, has captured the imagination of software developers worldwide. But its pricing —… 26 ThursdAI news-outlet 3mo ago 📆 ThursdAI - Jan 15 - Agent Skills Deep Dive, GPT 5.2 Codex Builds a Browser, Claude Cowork for the Masses, and the Era of Personalized AI! From Weights & Biases - come learn what agent skills are all about, Claude Cowork opens the door for non coders to do agentic stuff, GPT 5.2 Codex in API and Gemini get personalized! Big week! 34 Smol AI News news-outlet 3mo ago Open Responses: explicit spec for OpenAI's Responses API supported by OpenRouter, Ollama, Huggingface, vLLM, et al **OpenAI** launched the **Open Responses** API spec, an open-source, multi-provider standard for interoperable LLM APIs designed to simplify agent stacks and tooling. Early adopters like **ollama** and **vLLM** support the spec, while notable absences include **anthropic** and… 4 VentureBeat — AI news-outlet 4mo ago Salesforce rolls out new Slackbot AI agent as it battles Microsoft and Google in workplace AI Salesforce on Tuesday launched an entirely rebuilt version of Slackbot , the company's workplace assistant, transforming it from a simple notification tool into what executives describe as a fully powered AI agent capable of searching enterprise data, drafting documents,… 37 Smol AI News news-outlet 4mo ago Anthropic Labs: Cowork, Claude Code, MCP, Skills incubator led by Mike Krieger and Ben Mann **Anthropic** consolidates its AI agent products under the **Cowork** brand, integrating prior tools like **Claude Code** and **Claude for Chrome** into a unified agent with sandboxed Linux VM environments using **Apple's virtualization** and **bubblewrap** for security.… 23 VentureBeat — AI news-outlet 4mo ago Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required Anthropic released Cowork on Monday, a new AI agent capability that extends the power of its wildly successful Claude Code tool to non-technical users — and according to company insiders, the team built the entire feature in approximately a week and a half, largely using Claude… 38 Smol AI News news-outlet 4mo ago not much happened today **Anthropic** tightens usage policies for **Claude Max** in third-party apps, prompting builders to adopt **model-agnostic orchestration** and **BYO-key** defaults to mitigate platform risks. The **Model Context Protocol (MCP)** is evolving into a key tooling plane with **OpenAI… 31 Smol AI News news-outlet 4mo ago not much happened today **AI News for 1/6/2026-1/7/2026** highlights a quiet day with key updates on **LangChain DeepAgents** introducing **Ralph Mode** for persistent agent loops, **Cursor** improving context management by reducing token usage by **46.9%**, and operational safety measures for coding… 26 VentureBeat — AI news-outlet 4mo ago The creator of Claude Code just revealed his workflow, and developers are losing their minds When the creator of the world's most advanced coding agent speaks, Silicon Valley doesn't just listen — it takes notes. For the past week, the engineering community has been dissecting a thread on X from Boris Cherny , the creator and head of Claude Code at Anthropic .… 31 Hugging Face official-blog 4mo ago NVIDIA brings agents to life with DGX Spark and Reachy Mini Back to Articles NVIDIA brings agents to life with DGX Spark and Reachy Mini Published January 5, 2026 Update on GitHub Upvote 66 Jeff Boudier jeffboudier Nader Khalil nader-at-nvidia nvidia Alec Fong alecfong nvidia Today at CES 2026, NVIDIA unveiled a world of new open models… 7 Smol AI News news-outlet 4mo ago not much happened today **MiniMax M2.1** launches as an **open-source** agent and coding Mixture-of-Experts (MoE) model with **~10B active / ~230B total parameters**, claiming to outperform **Gemini 3 Pro** and **Claude Sonnet 4.5**, and supports local inference including on **Apple Silicon M3 Ultra**… 10 Smol AI News news-outlet 4mo ago not much happened today **GLM-4.7** and **MiniMax M2.1** open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an OSS Claude-like MoE model with 230B total parameters… 18 Smol AI News news-outlet 4mo ago not much happened today **Zhipu AI's GLM-4.7** release marks a significant improvement in **coding, complex reasoning, and tool use**, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. **Xiaomi's MiMo-V2-Flash** is highlighted as a practical, cost-efficient mixture-of-experts model… 30 Hugging Face official-blog 4mo ago CUGA on Hugging Face: Democratizing Configurable AI Agents Back to Articles CUGA on Hugging Face: Democratizing Configurable AI Agents Enterprise Article Published December 15, 2025 Upvote 67 Jim Laredo laredoj ibm-research Avi Yaeli aviyaeli ibm-research Sami Marreed samimarreed ibm-research Ayhan Sebin ayhansebin ibm-research Merve… 33 Google DeepMind official-blog 5mo ago Improved Gemini audio models for powerful voice experiences Improved Gemini audio models for powerful voice interactions Share x.com Facebook LinkedIn Mail Bibo Xu Director of Product Management Tara Sainath Distinguished Research Scientist General summary Google enhanced Gemini 2.5 Flash Native Audio for better live voice agents. Expect… 37 Hugging Face official-blog 5mo ago DeepMath: A lightweight math reasoning Agent with smolagents Back to Articles DeepMath: A lightweight math reasoning Agent with smolagents Published December 4, 2025 Update on GitHub Upvote 40 Daniel Fleischer danf Intel Moshe Berchansky mber Intel Moshe Wasserblat moshew Intel By Intel AI Software Group DeepMath is an aligned math… 22 Hugging Face official-blog 5mo ago Continuous batching from first principles Back to Articles Continuous batching Published November 25, 2025 Update on GitHub Upvote 379 Rémi Ouazan Reboul ror Arthur Zucker ArthurZ Luc Georges mcpotato TL;DR: in this blog post, starting from attention mechanisms and KV caching, we derive continuous batching by optimizing… 38 Hugging Face official-blog 5mo ago Building Deep Research: How we Achieved State of the Art Back to Articles Building Deep Research: How we Achieved State of the Art Team Article Published November 24, 2025 Upvote 36 Michael Griff michaelgriff Tavily Dean Sacoransky deansaco Tavily Noah Nefsky noahnefsky Tavily Research agents are rapidly becoming one of the most… 23 One Useful Thing (Ethan Mollick) community 5mo ago Three Years from GPT-3 to Gemini 3 From chatbots to agents 20 Google DeepMind official-blog 6mo ago SIMA 2: An Agent that Plays, Reasons, and Learns With You in Virtual 3D Worlds Introducing SIMA 2, a Gemini-powered AI agent that can think, understand, and take actions in interactive environments. 16 Zed Editor dev-tools 6mo ago Introducing Agent Extensions Zed launches Agent Server Extensions, enabling one-click installation of ACP-compatible agents like Augment Code and OpenCode. 31 Zed Editor dev-tools 6mo ago AI's 70% Problem From the Agentic Engineering Sessions | Aired on November 6th, 2025 We hosted Addy Osmani , who works on AI and dev tools at Google's Chrome Developer Experience team, to talk about what he calls the "70% problem" in AI coding. Over the past two years, Addy has been… 34 Hugging Face official-blog 6mo ago Aligning to What? Rethinking Agent Generalization in MiniMax M2 Back to Articles Aligning to What? Rethinking Agent Generalization in MiniMax M2 Community Article Published October 30, 2025 Upvote 43 MiniMax MiniMax-AI It's been fantastic to see the community dive into our new MiniMax M2 , with many highlighting its impressive skills in… 7 Google DeepMind official-blog 6mo ago Gemini Robotics 1.5 brings AI agents into the physical world We’re powering an era of physical agents — enabling robots to perceive, plan, think, use tools and act to better solve complex, multi-step tasks. 7 Google DeepMind official-blog 6mo ago Introducing CodeMender: an AI agent for code security Using advanced AI to fix critical software vulnerabilities 24 Google DeepMind official-blog 6mo ago Introducing the Gemini 2.5 Computer Use model Available in preview via the API, our Computer Use model is a specialized model built on Gemini 2.5 Pro’s capabilities to power agents that can interact with user interfaces. 21 Hugging Face official-blog 6mo ago Building the Open Agent Ecosystem Together: Introducing OpenEnv Back to Articles Building the Open Agent Ecosystem Together: Introducing OpenEnv Published October 23, 2025 Update on GitHub Upvote 162 Joseph Spisak spisakjo openenv Davide Testuggine darktex openenv Zach Wentz zkwentz openenv Pierre Andrews mortimerp9 openenv Sanyam Bhutani… 9 Zed Editor dev-tools 6mo ago Codex is Live in Zed OpenAI's Codex AI agent is now available in Zed via the Agent Client Protocol (ACP). 23 Zed Editor dev-tools 7mo ago How to Have Productive Conversations About AI From the Agentic Engineering Sessions | Aired on October 14th, 2025 We sat down with Steve Klabnik for a discussion about something that might sound meta but turns out to be incredibly practical: how to actually have useful conversations about AI when your goal is learning… 20 Zed Editor dev-tools 7mo ago ACP Brings JetBrains on Board The Agent Client Protocol reaches a major milestone: JetBrains is committing to bringing ACP support to their entire ecosystem. 8 Zed Editor dev-tools 7mo ago How the Community is Driving ACP Forward A progress report on the adoption of the Agent Client Protocol (ACP) since we launched it. 35 One Useful Thing (Ethan Mollick) community 7mo ago Real AI Agents and Real Work The race between human-centered work and infinite PowerPoints 15 Zed Editor dev-tools 8mo ago Claude Code: Now in Beta in Zed You asked, and here it is. Use Claude Code in public beta directly in Zed, built on the new Agent Client Protocol. 8 Zed Editor dev-tools 8mo ago Bring Your Own Agent to Zed — Featuring Gemini CLI Zed now lets you use the agent of your choice through the new Agent Client Protocol, starting with Google's Gemini CLI. 12 Zed Editor dev-tools 8mo ago Async Agents: Signal Over Noise Jessie Frazelle shares her hands-on experience evaluating async AI agents in production, highlighting common pitfalls and which stood out for its precision and restraint. 13 Zed Editor dev-tools 9mo ago Container Use for Locally Sandboxed, Background Agents in Zed Run AI agents in parallel without interference using containerized environments and Git Worktrees. 17 Zed Editor dev-tools 10mo ago Leveling Up Agents with MCPs Kent Dodds provides a thorough overview of MCP (Model Context Protocol) and how it is changing the way we interact with software. 21 Zed Editor dev-tools 10mo ago Agentic Engineering in Action Mitchell Hashimoto walked Richard Feldman through his approach to using AI when building Ghostty. 14 Google DeepMind official-blog 12mo ago AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms New AI agent evolves algorithms for math and practical applications in computing by combining the creativity of large language models with automated evaluators 12 Eugene Yan research 12mo ago Building News Agents for Daily News Recaps with MCP, Q, and tmux Learning to automate simple agentic workflows with Amazon Q CLI, Anthropic MCP, and tmux. 7 Maarten Grootendorst research 14mo ago A Visual Guide to LLM Agents Exploring the main components of Single- and Multi-Agents 21 Chip Huyen research 16mo ago Agents Intelligent agents are considered by many to be the ultimate goal of AI. The classic book by Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach (Prentice Hall, 1995), defines the field of AI research as “ the study and design of rational agents. ” The… 15 Lil'Log (Lilian Weng) research 17mo ago Reward Hacking in Reinforcement Learning Reward hacking occurs when a reinforcement learning (RL) agent exploits flaws or ambiguities in the reward function to achieve high rewards, without genuinely learning or completing the intended task. Reward hacking exists because RL environments are often imperfect, and it is… 26 Lil'Log (Lilian Weng) research 35mo ago LLM Powered Autonomous Agents Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT , GPT-Engineer and BabyAGI , serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories,… 26 Lil'Log (Lilian Weng) research 72mo ago Exploration Strategies in Deep Reinforcement Learning [Updated on 2020-06-17: Add “exploration via disagreement” in the “Forward Dynamics” section . Exploitation versus exploration is a critical topic in Reinforcement Learning. We’d like the RL agent to find the best solution as fast as possible.… 27 Lil'Log (Lilian Weng) research 83mo ago Meta Reinforcement Learning In my earlier post on meta-learning , the problem is mainly defined in the context of few-shot classification. Here I would like to explore more into cases when we try to “meta-learn” Reinforcement Learning (RL) tasks by developing an agent that can solve unseen… 12