Smol AI News
100 articles archived · Visit source ↗ · RSS
-
Smol AI News news-outlet 1d ago
not much happened today
**Research-level reasoning benchmarks** are advancing with **439 new math problems** from **64 mathematicians** and expanded medical benchmarks in **Medmarks v1.0** covering **30 benchmarks** and **61 models**. **Google DeepMind's AI Co-Mathematician** achieves **48% on…
15 -
Smol AI News news-outlet 2d ago
not much happened today
**Thinking Machines** previewed their new **native interaction models** designed for **full-duplex multimodal interaction** enabling real-time concurrent listening, speaking, watching, thinking, searching, and reacting, marking a shift beyond turn-based AI. This approach…
36 -
Smol AI News news-outlet 5d ago
not much happened today
**OpenAI** rapidly expanded the **GPT-5.5** family with multiple variants including **gpt-image-2**, **GPT-5.5 Pro**, and **GPT-5.5 Cyber**, receiving positive feedback for efficiency and usability. **Codex** evolved into a long-running agent runtime with a new **/goal**…
35 -
Smol AI News news-outlet 9d ago
not much happened today
**OpenAI** rolled out **GPT-5.5 Instant** as the new default for ChatGPT and API, enhancing **factuality, intelligence, image understanding, and tone** with stronger personalization features like saved memories and Gmail integration. OpenAI also shared infrastructure updates on…
28 -
Smol AI News news-outlet 9d ago
not much happened today
**AI Twitter Recap** highlights the shift from model-centric AI to **context pipelines** and **agent orchestration** as key performance drivers. Notably, **gpt-5.2-codex** and **gpt-5.3-codex** showed significant benchmark improvements through prompt and middleware tuning. The…
16 -
Smol AI News news-outlet 12d ago
not much happened today
**xAI released Grok 4.3**, improving cost/performance with a **53 Intelligence Index score**, 4 points higher than Grok 4.20, and significant gains on **GDPval-AA** and **τ²-Bench Telecom**. However, accuracy tradeoffs raised reliability concerns. Community opinions are mixed,…
32 -
Smol AI News news-outlet 13d ago
not much happened today
**OpenAI's GPT-5.5** achieves top-tier performance in long-horizon cyber tasks, matching or surpassing **Claude Mythos Preview** with a **71.4%** pass rate and showing ongoing improvement beyond **100M tokens** inference. OpenAI also released an **Advanced Account Security**…
32 -
Smol AI News news-outlet 14d ago
not much happened today
**OpenAI** is expanding **Codex** from a coding tool to a general work surface with persistent context, tools, integrations, and team rollout, including **Codex-only seats with $0 seat fee** for Business/Enterprise customers through June. Performance improvements focus on…
23 -
Smol AI News news-outlet 15d ago
not much happened today
**vLLM v0.20.0** introduces significant improvements in memory and MoE serving efficiency, including **TurboQuant 2-bit KV cache** for **4× KV capacity** and a **2.1% latency improvement**. The update supports multiple hardware platforms like **DeepSeek V4 MegaMoE on…
9 -
Smol AI News news-outlet 16d ago
not much happened today
**OpenAI** loosens its **Azure exclusivity**, allowing distribution across **Google TPU**, **AWS Trainium**, and **Bedrock** with commitments through **2032** and revenue share through **2030**. **GPT-5.5** shows improved benchmarks but is not uniformly dominant, ranking…
11 -
Smol AI News news-outlet 19d ago
DeepSeek v4
**DeepSeek-V4** technical release features a **1.6T-parameter MoE with 49B active parameters** and **1M-token context**, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the **#2 open-weights reasoning model** behind **Kimi K2.6**…
13 -
Smol AI News news-outlet 20d ago
GPT 5.5
**OpenAI launched GPT-5.5** as its new flagship model for "real work and powering agents," immediately available in ChatGPT and Codex but with delayed API access due to enhanced safety requirements. The model features improved token efficiency and supports longer multi-step…
14 -
Smol AI News news-outlet 21d ago
not much happened today
**Alibaba** released **Qwen3.6-27B**, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench. It supports native vision-language reasoning over…
15 -
Smol AI News news-outlet 22d ago
GPT-Image-2
**OpenAI** launched **GPT-Image-2**, enhancing image generation with improved text rendering, layout fidelity, editing, multilingual support, and "thinking" capabilities. It supports generating slides, infographics, diagrams, UI mockups, and QR codes, and integrates with tools…
36 -
Smol AI News news-outlet 23d ago
not much happened today
**Moonshot's Kimi K2.6** is a major open-weight **1T-parameter MoE** model featuring **32B active parameters**, **384 experts**, **MLA attention**, **256K context window**, native multimodality, and **INT4 quantization**. It supports day-0 integration with platforms like…
9 -
Smol AI News news-outlet 26d ago
not much happened today
**Anthropic** launched **Claude Design**, a prototyping tool powered by **Claude Opus 4.7**, targeting design workflows and competing with **Figma** and others. Benchmarks show **Opus 4.7** leading in coding and text tasks, with improved efficiency and adaptive reasoning, though…
7 -
Smol AI News news-outlet 27d ago
Anthropic's Claude Opus 4.7
**Anthropic** launched **Claude Opus 4.7**, its most capable Opus model yet, featuring stronger coding and agentic performance, a new tokenizer, and improved long-context handling with a new **xhigh** reasoning tier. Benchmarks show substantial gains, including **SWE-bench Pro…
37 -
Smol AI News news-outlet 28d ago
not much happened today
**OpenAI** expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now open-source and supports execution via partner sandboxes,…
37 -
Smol AI News news-outlet 1mo ago
not much happened today
**Harness engineering** is emerging as a key discipline in AI agent development, emphasizing components like filesystems, memory, and retries beyond just models. **OpenAI's Codex** is expanding agentic coding workflows beyond software engineering, including codebase…
32 -
Smol AI News news-outlet 1mo ago
not much happened today
**GLM-5.1** has reached **#3 on Code Arena**, surpassing **Gemini 3.1** and **GPT-5.4**, and matching **Claude Sonnet 4.6** in coding performance. **Z.ai** now holds the **#1 open model rank** close to the top overall. The advisor pattern, combining a cheap executor with an…
12 -
Smol AI News news-outlet 1mo ago
not much happened today
**Anthropic's Mythos** and **OpenAI's** upcoming restricted cyber-capable models are central to recent discussions, with debates on their security realism and evaluation methods. **LangChain's Deep Agents deploy** introduces an open memory, model-agnostic agent harness…
36 -
Smol AI News news-outlet 1mo ago
not much happened today
**Meta Superintelligence Labs** launched **Muse Spark**, a natively multimodal reasoning model featuring tool use, visual chain of thought, and multi-agent orchestration. It is live on **meta.ai** and the Meta AI app with a private API preview and plans for open-sourcing future…
29 -
-
Smol AI News news-outlet 1mo ago
not much happened today
**Hermes Agent** is gaining attention as a leading open agent stack with features like self-improving skills, persistent memory, and a self-improvement loop. Its new **Manim skill** enables generation of math/technical animations, expanding agent capabilities. The Hermes…
19 -
Smol AI News news-outlet 1mo ago
not much happened today
**Google** introduced **Skills in Chrome**, enabling reusable browser workflows with Gemini prompts and a library of ready-made Skills, enhancing end-user agentization. **Tencent** teased **HYWorld 2.0**, an open-source 3D world model generating editable scenes from a single…
8 -
Smol AI News news-outlet 1mo ago
not much happened today
**Gemma 4** was launched by **Google** under an **Apache 2.0 license**, marking a significant open-model release focused on **reasoning, agentic workflows, multimodality, and on-device use**. It outperforms models 10x larger and has immediate ecosystem support including…
35 -
Smol AI News news-outlet 1mo ago
Gemma 4
**Google DeepMind** released **Gemma 4**, a family of open-weight, multimodal models with long-context support up to **256K tokens** under an **Apache 2.0 license**, marking a major capability and licensing shift. The lineup includes **31B dense**, **26B MoE (A4B)**, and two…
14 -
Smol AI News news-outlet 1mo ago
not much happened today
**Arcee’s Trinity-Large-Thinking** was released with **open weights under Apache 2.0**, featuring a **400B total / 13B active** model size and strong agentic performance, ranking **#2 on PinchBench**. **Z.ai’s GLM-5V-Turbo** is a **vision coding model** with **native multimodal…
13 -
Smol AI News news-outlet 1mo ago
not much happened today
**Anthropic** introduced **computer use inside Claude Code** for closed-loop verification in a research preview for Pro/Max users, enhancing reliable app iteration. **OpenAI** released a **Codex plugin for Claude Code**, enabling cross-agent composition and signaling a shift…
16 -
Smol AI News news-outlet 1mo ago
not much happened today
**Anthropic** is reportedly introducing a new AI model tier called **Capybara**, which is larger and more intelligent than **Claude Opus 4.6**, showing improved performance in coding, academic reasoning, and cybersecurity. The model is speculated to be around **10 trillion…
38 -
Smol AI News news-outlet 1mo ago
not much happened today
**Anthropic** advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. **Figma**, **GitHub**, and **Cursor** launch design canvases with direct AI editing, showcasing tool-calling becoming…
12 -
Smol AI News news-outlet 1mo ago
not much happened today
**ARC-AGI-3** benchmark introduced by **@arcprize** and **François Chollet** resets the frontier for general agentic reasoning with humans solving 100% of tasks versus under 1% for current models, focusing on zero-preparation generalization and human-like learning efficiency.…
4 -
Smol AI News news-outlet 1mo ago
not much happened today
**Google** launched **Gemini 3.1 Flash Live**, a realtime voice and vision agent model with **2x longer conversation memory**, supporting **70 languages** and **128k context**. **Mistral AI** released **Voxtral TTS**, a low-latency, open-weight text-to-speech model supporting…
31 -
Smol AI News news-outlet 1mo ago
The Claude Code Source Leak
**Anthropic's** closed-source coding product **Claude Code** experienced a significant source leak exposing over **500k lines** of orchestration logic, including autonomous modes and memory systems, but not model weights. The leak led to rapid public reverse-engineering,…
14 -
Smol AI News news-outlet 1mo ago
not much happened today
**Anthropic** introduced **Claude Cowork** and **Claude Code** enabling desktop control of mouse, keyboard, and screen in a **macOS research preview**, expanding agent capabilities beyond APIs and browsers. The agent ecosystem is evolving towards long-running, parallel,…
29 -
Smol AI News news-outlet 1mo ago
not much happened today
**Cursor's Composer 2**, built on **Kimi K2.5**, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and reinforcement learning. **Claude Code** is expanding into…
36 -
Smol AI News news-outlet 1mo ago
not much happened today
**Cursor** launched **Composer 2**, a frontier-class coding model with major cost reductions and strong benchmark scores like **61.3 on CursorBench** and **73.7 on SWE-bench Multilingual**. The model was improved via a **first continued pretraining run** feeding into…
36 -
-
Smol AI News news-outlet 1mo ago
not much happened today
**OpenAI** released **GPT-5.4 mini** and **GPT-5.4 nano**, their most capable small models optimized for coding, multimodal understanding, and subagents, featuring a **400k context window** and over **2x speed** compared to GPT-5 mini. The mini model approaches larger GPT-5.4…
32 -
Smol AI News news-outlet 1mo ago
not much happened today
**Moonshot's Attention Residuals** paper introduced an input-dependent attention mechanism over prior layers with a **1.25x compute advantage** and less than **2% inference latency overhead**, validated on **Kimi Linear 48B total / 3B active**. The paper sparked debate on…
26 -
Smol AI News news-outlet 2mo ago
not much happened today
**MCP tools** remain relevant for deterministic APIs despite ergonomic criticisms, with new **web MCP support in Chrome v146** enabling continuous browsing agents. Persistent memory is emerging as a key differentiator for agents, with IBM improving task completion rates and…
5 -
Smol AI News news-outlet 2mo ago
not much happened today
**Harnesses, agent infrastructure, and the MCP protocol** are central themes, with emphasis on how **harnesses, sandboxes, filesystem access, skills, memory, and observability** shape agent UI/UX and runtime environments. Despite jokes about MCP's demise, it remains vital in…
26 -
Smol AI News news-outlet 2mo ago
not much happened today
**NVIDIA’s Nemotron 3 Super** is a **120B parameter / ~12B active** open model featuring a **hybrid Mamba-Transformer / SSM Latent MoE** architecture and **1M context window**, delivering up to **2.2x faster inference than GPT-OSS-120B** in FP4 with strong throughput gains. It…
10 -
-
-
Smol AI News news-outlet 2mo ago
not much happened today
**OpenAI** rolled out **GPT-5.4**, achieving tied **#1** on the **Artificial Analysis Intelligence Index** with **Gemini 3.1 Pro Preview** scoring **57** (up from 51 for GPT-5.2 xhigh). GPT-5.4 features a larger **~1.05M token** context window and higher per-token prices…
12 -
-
Smol AI News news-outlet 2mo ago
not much happened today
**Gemini 3.1 Flash-Lite** is highlighted by **Demis Hassabis** for its speed and cost-efficiency, focusing on latency and cost per capability rather than raw performance. **NotebookLM Studio** introduces a new feature for generating immersive cinematic video overviews. Rumors…
20 -
Smol AI News news-outlet 2mo ago
not much happened today
**Google DeepMind** launched **Gemini 3.1 Flash-Lite**, emphasizing *dynamic thinking levels* for adjustable compute, with notable metrics like **$0.25/M input**, **$1.50/M output**, **1432 Elo on LMArena**, and **2.5× faster time-to-first-token** than Gemini 2.5 Flash. It…
35 -
Smol AI News news-outlet 2mo ago
not much happened today
**Alibaba** released the **Qwen 3.5** series with models ranging from **0.8B to 9B** parameters, featuring **native multimodality**, **scaled reinforcement learning**, and targeting **edge and lightweight agent** deployments. The models support very long context windows up to…
18 -
-
-
Smol AI News news-outlet 2mo ago
Agentic Engineering: WTF Happened in December 2025?
**Perplexity** launched **Computer**, an orchestration-first agent platform featuring multi-model routing, usage-based pricing, and parallel asynchronous sub-agents for distributed workflows. **Andrej Karpathy** claims a "phase change" in coding agents since December,…
21 -
-
-
Smol AI News news-outlet 2mo ago
not much happened today
**Gemini 3.1 Pro** demonstrates strong retrieval capabilities and cost efficiency compared to **GPT-5.2** and **Opus 4.6**, though users report tooling and UI issues. The **SWE-bench Verified** evaluation methodology is under scrutiny for consistency, with updates bringing…
27 -
Smol AI News news-outlet 2mo ago
Gemini 3.1 Pro: 2x 3.0 on ARC-AGI 2
**Google** released **Gemini 3.1 Pro**, a developer preview integrated across the **Gemini app**, **NotebookLM**, **Gemini API / AI Studio**, and **Vertex AI**, highlighting a significant reasoning improvement with **ARC-AGI-2 = 77.1%** and strong coding and agentic-tool…
10 -
Smol AI News news-outlet 2mo ago
not much happened today
**Anthropic** released **Claude Opus/Sonnet 4.6**, showing a significant intelligence index jump but with increased token usage and cost. **Anthropic** also shared insights on AI agent autonomy, highlighting human-in-the-loop prevalence and software engineering tool calls.…
5 -
-
-
-
-
Smol AI News news-outlet 3mo ago
Z.ai GLM-5: New SOTA Open Weights LLM
**Zhipu AI** launched **GLM-5**, an **Opus-class** model scaling from **355B to 744B parameters** with **DeepSeek Sparse Attention** integration for cost-efficient long-context serving. GLM-5 achieves **SOTA on BrowseComp** and leads on **Vending Bench 2**, focusing on office…
18 -
Smol AI News news-outlet 3mo ago
Qwen-Image 2.0 and Seedance 2.0
**OpenAI** advances its Responses API for multi-hour agent workflows with features like **server-side compaction**, **hosted containers**, and **Skills API**, alongside upgrading **Deep Research** to **GPT-5.2** and adding connectors. Discussions around sandbox design highlight…
6 -
Smol AI News news-outlet 3mo ago
not much happened today
**OpenAI** launched **GPT-5.3-Codex** with a Super Bowl ad emphasizing "You can just build things" as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across **Cursor, VS Code, and GitHub** with phased API access and is flagged as…
34 -
Smol AI News news-outlet 3mo ago
not much happened today
**AI News** for early February 2026 highlights a detailed comparison between **GPT-5.3-Codex** and **Claude Opus 4.6**, with users noting **Codex's** strength in detailed scoped tasks and **Opus's** ergonomic advantage for exploratory work. Benchmarks on Karpathy's **nanochat…
11 -
-
-
-
-
Smol AI News news-outlet 3mo ago
MoltBook takes over the timeline
**Moltbook** and **OpenClaw** showcase emergent multi-agent social networks where AI agents autonomously interact, creating an AI-native forum layer with complex security and identity challenges. **Karpathy** describes this as "takeoff-adjacent," highlighting bots…
15 -
-
Smol AI News news-outlet 3mo ago
not much happened today
**AI News for 1/27/2026-1/28/2026** highlights a quiet day with deep dives into frontier model "personality split" where **GPT-5.2** excels at *exploration* and **Claude Opus 4.5** at *exploitation*, suggesting **OpenAI** suits research workflows and **Anthropic** commercial…
21 -
-
-
Smol AI News news-outlet 3mo ago
not much happened today
**Anthropic** launches "Claude in Excel Pro" with enhanced features. **OpenAI** reveals upcoming **Codex** agent loop and cybersecurity measures. **Google** boosts **Gemini App** quotas and partners with **Sakana AI** for advanced AI Scientist projects in Japan. **Cursor**…
25 -
-
Smol AI News news-outlet 3mo ago
not much happened today
**X Engineering** open-sourced its new transformer-based recommender algorithm, sparking community debate on transparency and fairness. **GLM-4.7-Flash (30B-A3B)** gains momentum as a strong local inference model with efficient KV-cache management and quantization tuning…
32 -
Smol AI News news-outlet 3mo ago
not much happened today
**AI News for 1/16/2026-1/19/2026** covers new architectures for scaling Transformer memory and context, including **STEM** from **Carnegie Mellon** and **Meta AI**, which replaces part of the FFN with a token-indexed embedding lookup enabling CPU offload and asynchronous…
37 -
-
-
Smol AI News news-outlet 3mo ago
not much happened today.
**OpenAI** launched **GPT-5.2-Codex** API, touted as their strongest coding model for long-running tasks and cybersecurity. **Cursor** integrated GPT-5.2-Codex to autonomously run a browser for a week, producing over 3 million lines of Rust code. **GitHub** incorporated it into…
19 -
-
-
Smol AI News news-outlet 4mo ago
not much happened today
**Anthropic** tightens usage policies for **Claude Max** in third-party apps, prompting builders to adopt **model-agnostic orchestration** and **BYO-key** defaults to mitigate platform risks. The **Model Context Protocol (MCP)** is evolving into a key tooling plane with **OpenAI…
31 -
Smol AI News news-outlet 4mo ago
not much happened today
**Stanford paper** reveals **Claude 3.7 Sonnet** memorized **95.8% of Harry Potter 1**, highlighting copyright extraction risks compared to **GPT-4.1**. **Google AI Studio** sponsors **TailwindCSS** amid OSS funding debates. **Google** and **Sundar Pichai** launch **Gmail Gemini…
21 -
Smol AI News news-outlet 4mo ago
not much happened today
**AI News for 1/6/2026-1/7/2026** highlights a quiet day with key updates on **LangChain DeepAgents** introducing **Ralph Mode** for persistent agent loops, **Cursor** improving context management by reducing token usage by **46.9%**, and operational safety measures for coding…
26 -
Smol AI News news-outlet 4mo ago
xAI raises $20B Series E at ~$230B valuation
**xAI**, Elon Musk's AI company, completed a massive **$20 billion Series E funding round**, valuing it at about **$230 billion** with investors like **Nvidia**, **Cisco Investments**, and others. The funds will support AI infrastructure expansion including **Colossus I and II…
36 -
Smol AI News news-outlet 4mo ago
not much happened today
**AI News** from early January 2026 highlights a viral economic prediction about **Vietnam** surpassing Thailand, **Microsoft**'s reported open-sourcing of **bitnet.cpp** for 1-bit CPU inference promising speed and energy gains, and a new research partnership between **Google…
36 -
Smol AI News news-outlet 4mo ago
not much happened today
**DeepSeek** released a new paper on **mHC: Manifold-Constrained Hyper-Connections**, advancing residual-path design as a key scaling lever in neural networks. Their approach constrains residual mixing matrices to the **Birkhoff polytope** to improve stability and performance,…
13 -
Smol AI News news-outlet 4mo ago
not much happened today
**South Korea's Ministry of Science** launched a coordinated program with **5 companies** to develop sovereign foundation models from scratch, featuring large-scale MoE architectures like **SK Telecom A.X-K1 (519B total / 33B active)** and **LG K-EXAONE (236B MoE / 23B…
24 -
Smol AI News news-outlet 4mo ago
not much happened today
**Z.ai (GLM family) IPO in Hong Kong on Jan 8, 2026**, aiming to raise **$560M** at **HK$4.35B**, marking it as the "first AI-native LLM company" public listing. The IPO highlights **GLM-4.7** as a starting point. **Meta AI** acquired **Manus** for approximately **$4–5B**, with…
15 -
Smol AI News news-outlet 4mo ago
not much happened today
**MiniMax M2.1** launches as an **open-source** agent and coding Mixture-of-Experts (MoE) model with **~10B active / ~230B total parameters**, claiming to outperform **Gemini 3 Pro** and **Claude Sonnet 4.5**, and supports local inference including on **Apple Silicon M3 Ultra**…
10 -
Smol AI News news-outlet 4mo ago
not much happened today
**GLM-4.7** and **MiniMax M2.1** open-weight model releases highlight day-0 ecosystem support, coding throughput, and agent workflows, with GLM-4.7 achieving a +9.5% improvement over GLM-4.6 and MiniMax M2.1 positioned as an OSS Claude-like MoE model with 230B total parameters…
18 -
Smol AI News news-outlet 4mo ago
not much happened today
**Zhipu AI's GLM-4.7** release marks a significant improvement in **coding, complex reasoning, and tool use**, quickly gaining ecosystem adoption via Hugging Face and OpenRouter. **Xiaomi's MiMo-V2-Flash** is highlighted as a practical, cost-efficient mixture-of-experts model…
30 -
Smol AI News news-outlet 4mo ago
not much happened today
**Alibaba** released **Qwen-Image-Layered**, an open-source model enabling Photoshop-grade layered image decomposition with recursive infinite layers and prompt-controlled structure. **Kling 2.6** introduced advanced motion control for image-to-video workflows, supported by a…
18