Smol AI News
133 articles archived · Visit source ↗ · RSS
-
Smol AI News news-outlet 2mo ago
not much happened today
**Anthropic** launched **Claude Design**, a prototyping tool powered by **Claude Opus 4.7**, targeting design workflows and competing with **Figma** and others. Benchmarks show **Opus 4.7** leading in coding and text tasks, with improved efficiency and adaptive reasoning, though…
7 -
Smol AI News news-outlet 2mo ago
Anthropic's Claude Opus 4.7
**Anthropic** launched **Claude Opus 4.7**, its most capable Opus model yet, featuring stronger coding and agentic performance, a new tokenizer, and improved long-context handling with a new **xhigh** reasoning tier. Benchmarks show substantial gains, including **SWE-bench Pro…
37 -
Smol AI News news-outlet 2mo ago
not much happened today
**OpenAI** expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now open-source and supports execution via partner sandboxes,…
37 -
Smol AI News news-outlet 2mo ago
not much happened today
**Harness engineering** is emerging as a key discipline in AI agent development, emphasizing components like filesystems, memory, and retries beyond just models. **OpenAI's Codex** is expanding agentic coding workflows beyond software engineering, including codebase…
32 -
Smol AI News news-outlet 2mo ago
not much happened today
**GLM-5.1** has reached **#3 on Code Arena**, surpassing **Gemini 3.1** and **GPT-5.4**, and matching **Claude Sonnet 4.6** in coding performance. **Z.ai** now holds the **#1 open model rank** close to the top overall. The advisor pattern, combining a cheap executor with an…
12 -
Smol AI News news-outlet 2mo ago
not much happened today
**Anthropic's Mythos** and **OpenAI's** upcoming restricted cyber-capable models are central to recent discussions, with debates on their security realism and evaluation methods. **LangChain's Deep Agents deploy** introduces an open memory, model-agnostic agent harness…
36 -
Smol AI News news-outlet 2mo ago
not much happened today
**Meta Superintelligence Labs** launched **Muse Spark**, a natively multimodal reasoning model featuring tool use, visual chain of thought, and multi-agent orchestration. It is live on **meta.ai** and the Meta AI app with a private API preview and plans for open-sourcing future…
29 -
-
Smol AI News news-outlet 2mo ago
not much happened today
**Hermes Agent** is gaining attention as a leading open agent stack with features like self-improving skills, persistent memory, and a self-improvement loop. Its new **Manim skill** enables generation of math/technical animations, expanding agent capabilities. The Hermes…
19 -
Smol AI News news-outlet 2mo ago
not much happened today
**Google** introduced **Skills in Chrome**, enabling reusable browser workflows with Gemini prompts and a library of ready-made Skills, enhancing end-user agentization. **Tencent** teased **HYWorld 2.0**, an open-source 3D world model generating editable scenes from a single…
8 -
Smol AI News news-outlet 2mo ago
not much happened today
**Gemma 4** was launched by **Google** under an **Apache 2.0 license**, marking a significant open-model release focused on **reasoning, agentic workflows, multimodality, and on-device use**. It outperforms models 10x larger and has immediate ecosystem support including…
35 -
Smol AI News news-outlet 2mo ago
Gemma 4
**Google DeepMind** released **Gemma 4**, a family of open-weight, multimodal models with long-context support up to **256K tokens** under an **Apache 2.0 license**, marking a major capability and licensing shift. The lineup includes **31B dense**, **26B MoE (A4B)**, and two…
14 -
Smol AI News news-outlet 3mo ago
not much happened today
**Arcee’s Trinity-Large-Thinking** was released with **open weights under Apache 2.0**, featuring a **400B total / 13B active** model size and strong agentic performance, ranking **#2 on PinchBench**. **Z.ai’s GLM-5V-Turbo** is a **vision coding model** with **native multimodal…
13 -
Smol AI News news-outlet 3mo ago
not much happened today
**Anthropic** introduced **computer use inside Claude Code** for closed-loop verification in a research preview for Pro/Max users, enhancing reliable app iteration. **OpenAI** released a **Codex plugin for Claude Code**, enabling cross-agent composition and signaling a shift…
16 -
Smol AI News news-outlet 3mo ago
not much happened today
**Anthropic** is reportedly introducing a new AI model tier called **Capybara**, which is larger and more intelligent than **Claude Opus 4.6**, showing improved performance in coding, academic reasoning, and cybersecurity. The model is speculated to be around **10 trillion…
38 -
Smol AI News news-outlet 3mo ago
not much happened today
**Anthropic** advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. **Figma**, **GitHub**, and **Cursor** launch design canvases with direct AI editing, showcasing tool-calling becoming…
12 -
Smol AI News news-outlet 3mo ago
not much happened today
**ARC-AGI-3** benchmark introduced by **@arcprize** and **François Chollet** resets the frontier for general agentic reasoning with humans solving 100% of tasks versus under 1% for current models, focusing on zero-preparation generalization and human-like learning efficiency.…
4 -
Smol AI News news-outlet 3mo ago
not much happened today
**Google** launched **Gemini 3.1 Flash Live**, a realtime voice and vision agent model with **2x longer conversation memory**, supporting **70 languages** and **128k context**. **Mistral AI** released **Voxtral TTS**, a low-latency, open-weight text-to-speech model supporting…
31 -
Smol AI News news-outlet 3mo ago
The Claude Code Source Leak
**Anthropic's** closed-source coding product **Claude Code** experienced a significant source leak exposing over **500k lines** of orchestration logic, including autonomous modes and memory systems, but not model weights. The leak led to rapid public reverse-engineering,…
14 -
Smol AI News news-outlet 3mo ago
not much happened today
**Anthropic** introduced **Claude Cowork** and **Claude Code** enabling desktop control of mouse, keyboard, and screen in a **macOS research preview**, expanding agent capabilities beyond APIs and browsers. The agent ecosystem is evolving towards long-running, parallel,…
29 -
Smol AI News news-outlet 3mo ago
not much happened today
**Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into…
36 -
Smol AI News news-outlet 3mo ago
not much happened today
**Cursor** launched **Composer 2**, a frontier-class coding model with major cost reductions and strong benchmark scores like **61.3 on CursorBench** and **73.7 on SWE-bench Multilingual**. The model was improved via a **first continued pretraining run** feeding into…
36 -
-
Smol AI News news-outlet 3mo ago
not much happened today
**OpenAI** released **GPT-5.4 mini** and **GPT-5.4 nano**, their most capable small models optimized for coding, multimodal understanding, and subagents, featuring a **400k context window** and over **2x speed** compared to GPT-5 mini. The mini model approaches larger GPT-5.4…
32 -
Smol AI News news-outlet 3mo ago
not much happened today
**Moonshot's Attention Residuals** paper introduced an input-dependent attention mechanism over prior layers with a **1.25x compute advantage** and less than **2% inference latency overhead**, validated on **Kimi Linear 48B total / 3B active**. The paper sparked debate on…
26 -
Smol AI News news-outlet 3mo ago
not much happened today
**MCP tools** remain relevant for deterministic APIs despite ergonomic criticisms, with new **web MCP support in Chrome v146** enabling continuous browsing agents. Persistent memory is emerging as a key differentiator for agents, with IBM improving task completion rates and…
5 -
Smol AI News news-outlet 3mo ago
not much happened today
**Harnesses, agent infrastructure, and the MCP protocol** are central themes, with emphasis on how **harnesses, sandboxes, filesystem access, skills, memory, and observability** shape agent UI/UX and runtime environments. Despite jokes about MCP's demise, it remains vital in…
26 -
Smol AI News news-outlet 3mo ago
not much happened today
**NVIDIA’s Nemotron 3 Super** is a **120B parameter / ~12B active** open model featuring a **hybrid Mamba-Transformer / SSM Latent MoE** architecture and **1M context window**, delivering up to **2.2x faster inference than GPT-OSS-120B** in FP4 with strong throughput gains. It…
10 -
-
-
Smol AI News news-outlet 3mo ago
not much happened today
**OpenAI** rolled out **GPT-5.4**, achieving tied **#1** on the **Artificial Analysis Intelligence Index** with **Gemini 3.1 Pro Preview** scoring **57** (up from 51 for GPT-5.2 xhigh). GPT-5.4 features a larger **~1.05M token** context window and higher per-token prices…
12 -
-
Smol AI News news-outlet 3mo ago
not much happened today
**Gemini 3.1 Flash-Lite** is highlighted by **Demis Hassabis** for its speed and cost-efficiency, focusing on latency and cost per capability rather than raw performance. **NotebookLM Studio** introduces a new feature for generating immersive cinematic video overviews. Rumors…
20 -
Smol AI News news-outlet 3mo ago
not much happened today
**Google DeepMind** launched **Gemini 3.1 Flash-Lite**, emphasizing *dynamic thinking levels* for adjustable compute, with notable metrics like **$0.25/M input**, **$1.50/M output**, **1432 Elo on LMArena**, and **2.5× faster time-to-first-token** than Gemini 2.5 Flash. It…
35 -
Smol AI News news-outlet 4mo ago
not much happened today
**Alibaba** released the **Qwen 3.5** series with models ranging from **0.8B to 9B** parameters, featuring **native multimodality**, **scaled reinforcement learning**, and targeting **edge and lightweight agent** deployments. The models support very long context windows up to…
18 -
-
-
Smol AI News news-outlet 4mo ago
Agentic Engineering: WTF Happened in December 2025?
**Perplexity** launched **Computer**, an orchestration-first agent platform featuring multi-model routing, usage-based pricing, and parallel asynchronous sub-agents for distributed workflows. **Andrej Karpathy** claims a "phase change" in coding agents since December,…
21 -
-
-
Smol AI News news-outlet 4mo ago
not much happened today
**Gemini 3.1 Pro** demonstrates strong retrieval capabilities and cost efficiency compared to **GPT-5.2** and **Opus 4.6**, though users report tooling and UI issues. The **SWE-bench Verified** evaluation methodology is under scrutiny for consistency, with updates bringing…
27 -
Smol AI News news-outlet 4mo ago
Gemini 3.1 Pro: 2x 3.0 on ARC-AGI 2
**Google** released **Gemini 3.1 Pro**, a developer preview integrated across the **Gemini app**, **NotebookLM**, **Gemini API / AI Studio**, and **Vertex AI**, highlighting a significant reasoning improvement with **ARC-AGI-2 = 77.1%** and strong coding and agentic-tool…
10 -
Smol AI News news-outlet 4mo ago
not much happened today
**Anthropic** released **Claude Opus/Sonnet 4.6**, showing a significant intelligence index jump but with increased token usage and cost. **Anthropic** also shared insights on AI agent autonomy, highlighting human-in-the-loop prevalence and software engineering tool calls.…
5 -
-
-
-
-
Smol AI News news-outlet 4mo ago
Z.ai GLM-5: New SOTA Open Weights LLM
**Zhipu AI** launched **GLM-5**, an **Opus-class** model scaling from **355B to 744B parameters** with **DeepSeek Sparse Attention** integration for cost-efficient long-context serving. GLM-5 achieves **SOTA on BrowseComp** and leads on **Vending Bench 2**, focusing on office…
18 -
Smol AI News news-outlet 4mo ago
Qwen-Image 2.0 and Seedance 2.0
**OpenAI** advances its Responses API for multi-hour agent workflows with features like **server-side compaction**, **hosted containers**, and **Skills API**, alongside upgrading **Deep Research** to **GPT-5.2** and adding connectors. Discussions around sandbox design highlight…
6 -
Smol AI News news-outlet 4mo ago
not much happened today
**OpenAI** launched **GPT-5.3-Codex** with a Super Bowl ad emphasizing "You can just build things" as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across **Cursor, VS Code, and GitHub** with phased API access and is flagged as…
34