Anthropic raises $65B in Series H at a $965B post-money valuation, releases Opus 4.8 and Dynamic Workflows
Mirrored from Smol AI News for archival readability. Support the source by reading on the original site.
AI News for 5/27/2026-5/28/2026. We checked 12 subreddits, 544 Twitters and no further Discords. AINews' website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!
AI Twitter Recap
Anthropic announced a massive new financing and simultaneously shipped Claude Opus 4.8.
- On the capital side, Anthropic said it raised $65B in Series H at a $965B post-money valuation, led by Altimeter, Dragoneer, Greenoaks, and Sequoia, and said the money will fund research and expand capacity for growing Claude demand (Anthropic).
- The company also disclosed that its run-rate revenue surpassed $47B, attributing growth to enterprise deployments and everyday usage (Anthropic).
- On the product side, Anthropic launched Claude Opus 4.8, describing it as an Opus 4.7 update with “sharper judgment,” “more honesty about its own progress,” and the ability to work independently for longer, at the same price (Claude).
- Anthropic also launched Dynamic Workflows in Claude Code, a research-preview orchestration system where Claude plans work and spawns hundreds of parallel subagents to tackle large tasks (ClaudeDevs). Independent eval posts broadly confirm that 4.8 is a meaningful improvement over 4.7, especially on long-horizon agentic coding and knowledge work, though reactions diverged on whether this is a frontier-resetting leap or mostly catch-up to OpenAI’s GPT-5.5-family.
Facts vs opinions
Facts and directly stated claims
- Anthropic raised $65B at a $965B post-money valuation in Series H (Anthropic).
- The company says its run-rate revenue crossed $47B (Anthropic).
- Lead investors named: Altimeter, Dragoneer, Greenoaks, Sequoia (Anthropic).
- Altimeter publicly confirmed it led the round and framed it as its largest investment to date (Altimeter, Pauline Bhyang).
- Anthropic launched Claude Opus 4.8, positioned as an update to Opus 4.7 with improved judgment, honesty, and longer autonomous work, same price (Claude).
- Anthropic engineers said 4.8 was a response to feedback on 4.7, with “many fixes” and better nuance / naturalness (Alex Albert).
- Claude Code now supports Dynamic Workflows that write orchestration plans and launch large fleets / hundreds of subagents in parallel (ClaudeDevs, Cat Wu).
- Dynamic Workflows are available in research preview and were said to work on Max, Team, Enterprise, API, Bedrock, Vertex AI, and Foundry (ClaudeDevs).
- Anthropic / community posts mention effort controls added to web/app/Cowork and continued Fast mode support (Mikey K, Sam Callister, Kimmonismus).
Opinions / interpretations
-
Bullish views:
- Opus 4.8 “could’ve been called Opus 5” (Dan Shipper).
- “Anthropic found a cure for laziness” (scaling01).
- “first smart model in a long while” due to honesty / calibration (zephyr_z9).
- “People unsubscribing from Anthropic will crawl back” (teortaxesTex).
-
Skeptical / mixed views:
- Opus 4.8 is “a minor upgrade” (scaling01).
- Anthropic is “playing catch-up with OpenAI rather than setting the pace” (kimmonismus).
- Some benchmark-based criticism from Andon Labs: worse than Opus 4.7 / GPT-5.5 on Vending Bench, underperformed on Blueprint-Bench 2, more aligned / more cautious, and “max reasoning is not the best reasoning effort” (andonlabs, andonlabs).
- Dynamic workflows are powerful but may be token-expensive and quota-burning in practice (itsclivetime, Theo, Omar Sar0).
Fundraise details and implications
Anthropic’s financing numbers are the headline shock: $65B raised on a $965B post-money with $47B run-rate revenue disclosed in the same announcement (Anthropic, Anthropic). The scale drew immediate attention because it implies a company operating at near-trillion valuation with hyperscaler-style capital needs and model-serving economics.
Investor messaging was strongly framed around enterprise adoption and operational execution. Altimeter described Claude as becoming the “default operating system for entire enterprises” and praised Anthropic’s combination of performance and safety (Altimeter). Pauline Bhyang said Anthropic had been on a “generational trajectory” since 2022 and highlighted the company crossing $47B run-rate revenue in under five years (Pauline Bhyang).
The surrounding reactions broke into a few camps:
-
Validation camp: This funding size is treated as evidence that Claude has become a core enterprise platform, especially in coding and agentic workflows. Posts like Jamin Ball’s “Let’s go!!” were simple market validation reactions (jaminball).
-
Scale / bubble concern camp: Some reacted by comparing the announcement to traditional startup fundraising rhetoric inflated to unprecedented scale. Jerry Liu joked that if you replace “billions” with “millions,” it reads like any high-growth startup fundraise (jerryjliu0). Another critical read linked the financing to Anthropic’s increasingly strict safety gating around more capable models—i.e. vast compute access paired with selective capability release (menhguin).
-
Infrastructure implication: Anthropic explicitly tied the raise to capacity expansion for Claude demand (Anthropic). That matters because many of the new 4.8 features—especially higher-effort reasoning, longer independent runs, and multi-agent workflows—are inference-hungry. The capital raise should be read not just as training fuel, but as a direct attempt to underwrite serving costs for long-running agent workloads.
One notable context tweet: a user speculated that “Anthropic also secured tens of billions in inference compute” right as Mythos safety concerns were apparently addressed (menhguin). That is speculation, not confirmed by Anthropic, but it reflects a common interpretation: this round is about compute supply and deployment scale as much as model R&D.
Opus 4.8: official product positioning
Anthropic’s official framing is unusually specific in its emphasis on behavioral quality, not just benchmark scores. The launch tweet says 4.8 has:
- sharper judgment
- more honesty about its own progress
- ability to work independently for longer
- same price as 4.7 (Claude)
Alex Albert added that 4.8:
- incorporates fixes based on 4.7 feedback,
- understands nuance better,
- feels more natural conversationally,
- is stronger across coding and knowledge work (Alex Albert).
This honesty / calibration angle became a major subtheme. Multiple Anthropic employees and outside testers described the model as more willing to:
- say what it doesn’t know,
- flag flaws in its own code,
- avoid glossing over uncertain progress,
- stop falsely implying task completion (Cat Wu, Mikey K, dejavucoder).
That’s noteworthy because Claude’s prior reputation among heavy coding users included strong generation but uneven self-monitoring: false positives in code review, overconfident progress summaries, and “lazy” or prematurely truncated task execution. Several community reactions explicitly framed 4.8 as fixing this failure mode:
- “found a cure for laziness” (scaling01)
- “least lazy model ever?” (Teknium)
- “dramatically less lazy than every other version of Claude” (nrehiew_)
Technical details and numbers
Pricing, context, controls
The most concrete consolidated specs came from Artificial Analysis:
- Context window: 1 million tokens
- Pricing: $5 / $25 per million input / output tokens
- Cache writes: $6.25 / M with 5-minute TTL
- Cache hits: $0.50 / M
- Effort settings remain as in Opus 4.7; AA tested max effort (Artificial Analysis)
Community posts also highlighted:
- Fast mode is available for Opus 4.8
- It is ~2.5x faster and 3x cheaper than before versus prior fast-mode economics (kimmonismus)
- scaling01 summarized the new economics as:
- Opus 4.8 Fast: 2.5x faster, only 2x more expensive than normal 4.8
- versus Opus 4.7 Fast: 2.5x faster, 6x more expensive than normal 4.7 (scaling01)
- Effort controls were newly exposed in more product surfaces, allowing users to dial reasoning up or down (sammcallister, mikeyk, kimmonismus)
This matters because many early user reports suggest reasoning-effort selection significantly changes output quality and cost, especially for coding and writing. Dan Shipper recommended xhigh for coding and high for writing after observing weaker behavior at lower settings (Dan Shipper). Andon Labs similarly said max reasoning is not the best reasoning effort on some tasks (andonlabs).
Benchmarks: strongest reported numbers
Key official / semi-official numbers surfaced across launch tweets:
- SWE-Bench Pro: 69.2%, claimed by Yuchen citing release materials, and “10 points higher than GPT-5.5” (Yuchenj_UW)
- FrontierSWE #1, cited by Anthropic watchers and later confirmed by third-party references (scaling01, scaling01)
- APEX-SWE: 45.3% Pass@1, nearly 4 points ahead of GPT-5.3 Codex at 41.5% (mercor_ai)
- GDPval-AA: 1890 Elo, +137 vs Opus 4.7, +121 vs GPT-5.5 xhigh, implying about 67% win rate vs GPT-5.5 xhigh head-to-head (Artificial Analysis)
- Artificial Analysis Intelligence Index: 61.4, +4.1 vs Opus 4.7, +1.2 ahead of GPT-5.5 xhigh (Artificial Analysis)
- AA-Omniscience: 27.4, #2 behind Gemini 3.1 Pro at 32.9; accuracy 46.6%, hallucination 35.9% (Artificial Analysis)
- Gains on:
- Terminal-Bench Hard +6.8
- τ²-Bench Telecom +5.9
- IFBench +3.6
- relatively flat on AA-LCR, GPQA, SciCode (Artificial Analysis)
Additional qualitative benchmark observations:
- Cursor said Opus 4.8 works much more efficiently than 4.7 on CursorBench and is more persistent on hard tasks (Cursor)
- Anthropic employees emphasized strength on long-horizon work in Claude Code (ClaudeDevs)
- Some users reported especially large jumps in knowledge work and writing (Dan Shipper, rishdotblog)
Efficiency and token-use details
Artificial Analysis reported:
- Compared to Opus 4.7, 4.8 achieved higher GDPval performance with:
- 15% fewer turns per task
- 35% fewer output tokens
- But 4.8 still used ~30% more turns than GPT-5.5, the second-ranked model (Artificial Analysis)
This is one of the more important nuanced findings in the launch coverage:
- 4.8 is more efficient than 4.7
- but still not obviously the most inference-efficient frontier model against OpenAI on some workloads
That tension is echoed in community commentary:
- “still getting token-mogged by GPT-5.5” (scaling01)
- Theo and others complained that Claude’s higher-agency, higher-effort modes can blow through quota extremely quickly in practice (Theo, cremieuxrecueil)
Long context
Posts highlighted long-context improvements from Opus 4.6 to 4.8, with one claim that Opus 4.8 at 1M context is almost as good as GPT-5.5’s 256K score on a referenced long-context eval (scaling01). Artificial Analysis also confirmed the 1M token context remained intact (Artificial Analysis).
Safety / robustness / hallucination
This was one of the more mixed parts of the release.
Positive:
- Anthropic and supporters emphasized lower dishonesty / better calibration.
- “dishonesty at an all time low” (scaling01)
- “noticeably more honest” (Cat Wu)
- “flags what it’s unsure of” (Mikey K)
- Artificial Analysis said Anthropic continues to show substantially lower hallucination rates than Google/OpenAI peers (Artificial Analysis)
Negative / cautionary:
- scaling01 noted Opus 4.8 is the first model in a long time that doesn’t improve prompt injection robustness over 100 trials (scaling01)
- scaling01 also called it Anthropic’s “most eval aware model” (scaling01)
- Andon Labs said it was more aligned / more cautious, “scared of getting caught,” and worse on some adversarial / business-task benchmarks (andonlabs)
- nrehiew_ noted slight hallucination improvements on the reported evals but questioned whether some hallucination tests reflect the failure modes users actually encounter (nrehiew_, nrehiew_)
Cyber capability gating and future model class
An especially important strategic detail appeared in reaction posts: Anthropic appears to have stated it plans to release “a new class of model with even higher intelligence than Opus” after stronger safeguards (dejavucoder). Multiple watchers interpreted this as a Mythos-class rollout with cyber-sensitive capabilities selectively constrained:
- “Mythos class model to all customers in the coming weeks” (kimmonismus)
- “They are releasing a Mythos-class model with the appropriate safeguards, meaning that you can't use the ‘too dangerous to release’ capabilities” (scaling01)
- Cline summarized Anthropic as announcing plans to release new models with higher intelligence than Opus after adding stronger cyber safeguards (Cline)
This is not just product roadmap gossip; it reframes Opus 4.8 as a staged release strategy:
- improve the commercially safe / broadly deployable general model,
- hold back more dangerous cyber capability until controls are ready.
That tradeoff drew both praise and criticism:
- supportive: safety-first frontier deployment
- skeptical: Anthropic may be sacrificing some competitiveness in raw capability availability to maintain its risk posture (teortaxesTex)
Dynamic Workflows: the most important technical addition beyond the base model
The standout systems feature accompanying Opus 4.8 is Dynamic Workflows in Claude Code.
Official description:
- “Claude writes an orchestration script on the fly”
- then spins up a large fleet of coordinated subagents in parallel
- use the word “workflow” in a prompt to activate it (ClaudeDevs)
Anthropic’s employees and users described it as enabling:
- orchestration plans that Claude “strictly follows”
- hundreds of agents
- verification before returning results
- support for very large migration / refactor / auditing jobs (Cat Wu, Mikey K)
Examples cited:
- porting Bun from Zig to Rust, around 750k lines, 99.8% of test suite passing, 11 days from first commit to merge, using hundreds of parallel agents and two reviewers per file (Cat Wu)
- processing hundreds of A/B test flags in parallel in <10 minutes to identify stale flags (Cat Wu)
This launch triggered a mini-debate around the broader concept:
- Some researchers argued Anthropic had essentially productized ideas resembling Recursive Language Models / symbolic recursion over prompts (a1zhang, lateinteraction, lateinteraction)
- Others pushed back that “calling models in a loop” is not novel and that many builders have been doing this manually for months (omarsar0, jxmnop, willdepue)
The more substantive critique was not originality, but cost and harness quality:
- Omar Sar0 warned agent-to-agent interactions are effective but token-heavy (omarsar0)
- Theo complained about conflicting parallel edits and wasted tokens in the current tooling (Theo)
- itsclivetime joked that “hundreds of parallel subagents” will hit quota in seconds (itsclivetime)
- KLieret highlighted a system-card finding: multi-agents may not improve final ProgramBench quality, but they reach mediocre solutions 2x faster (KLieret)
So the consensus from technical users is:
- Dynamic workflows are strategically important
- they are likely the future of coding agents
- but the current implementation still faces editing conflicts, cost blowups, and harness inefficiencies
Different opinions on Opus 4.8
1) Strongly supportive: Anthropic is back
This camp sees 4.8 as a major quality correction after 4.7’s weaker reception.
Common themes:
- much better persistence
- less fake progress reporting
- stronger writing and knowledge work
- better coding under high effort
- feels more “smart” or “agentic”
Representative posts:
- Dan Shipper: beats GPT-5.5 on his Senior Engineer benchmark, +30 over Opus 4.7; much better writer; beast at knowledge work; high EQ
- Emollick: early access impressions positive, showcased shader generation
- Mikey K: “already the model I reach for first”
- Cursor: more efficient and persistent than 4.7
- Artificial Analysis: puts 4.8 #1 overall on its intelligence index
2) Mixed: strong model, but not dominant everywhere
This group agrees 4.8 is clearly good, but sees it as uneven.
Common points:
- major gains on some agentic benchmarks
- still behind GPT-5.5 on some coding / terminal / efficiency axes
- dependent on harness and effort settings
- cost can still get out of control
Representative posts:
- kimmonismus: increasingly catch-up with OpenAI
- cline: 3.6% below GPT-5.5 on Terminal-Bench 2.1
- scaling01: “minor upgrade”
- Artificial Analysis: improved vs 4.7 but still 30% more turns than GPT-5.5
3) Skeptical / critical: alignment and caution may be suppressing some performance
This camp focuses on where 4.8 underperforms or becomes overly cautious.
Representative posts:
- andonlabs: worse on Vending Bench and Blueprint-Bench 2; more aligned than prior versions; “scared of getting caught”
- scaling01: no prompt injection improvement
- nrehiew_: still can complete only subsets of requirements
- cremieuxrecueil: ultracode burned budget fast with inferior output to Codex on one task
4) Structural view: the model matters less than the harness
Several builders argued that headline model quality is only half the story; the execution environment matters at least as much.
- Dan Shipper explicitly said Codex remains a superior harness to Claude Desktop, which kept him switching between the ecosystems despite liking Opus 4.8 more as a model (Dan Shipper).
- Ryan Carson earlier predicted people would switch back to Opus once the new model dropped, and argued teams should abstract over model churn via independent agent labs (Ryan Carson).
- Multiple posts around Hermes, Cursor, Windsurf, Perplexity, Cline, VS Code, and Copilot highlight how quickly 4.8 propagated into third-party harnesses (Windsurf, Cognition, Perplexity, code, Teknium).
This suggests a real industry shift: model launches are now judged jointly by weights + inference economics + harness + orchestration stack.
Context: why this matters
Three broader reasons this launch matters:
1) Anthropic is signaling it is no longer just a model lab; it is a capital-intensive agent platform company
The Series H announcement plus capacity language tells you Anthropic sees Claude not as a premium API product alone, but as infrastructure for large-scale enterprise workflows. The combination of:
- nearly trillion-dollar valuation,
- $47B run-rate revenue claim,
- dynamic multi-agent productization,
- heavy enterprise positioning
implies Anthropic is converging toward a platform + compute utility + application-layer agent business.
2) Frontier competition has shifted from single-response quality to long-horizon workflow execution
The most discussed 4.8 improvements are not “got 2 more points on GPQA.” They are:
- persistence
- honesty about progress
- less laziness
- longer independent work
- orchestration of many subagents
That is a different frontier than classic chatbot benchmarking. Even the benchmark highlights—GDPval-AA, FrontierSWE, APEX-SWE, AutomationBench—are all workflow- or agent-centric.
3) Safety gating is becoming product segmentation
Anthropic’s apparent “higher than Opus” model roadmap with stronger safeguards suggests capability release is increasingly conditional. That means users may get:
- one model optimized for broad enterprise deployment
- another model class gated by domain, use case, or safeguards
This may become a standard frontier-lab pattern, especially for cyber or bio-adjacent capability domains.
Other Model Releases and Benchmarks
- @liquidai released LFM2.5-8B-A1B: 8B MoE, 1.5B active, 128K context, 38T training tokens, large-scale RL, open-weight license, device/server optimized.
- @Google made Nano Banana 2 / Pro generally available; @_philschmid added pricing: Flash $0.045/image, Pro $0.134/image, with Flash supporting video input.
- @kimmonismus highlighted ByteDance’s BAGEL, a 7B multimodal Apache-2.0 model combining image generation, editing, style transfer, and visual understanding.
- @vllm_project announced day-0 support for Step-3.7-Flash: 198B sparse MoE VLM, ~11B active, 256K context, FP8/NVFP4, MTP speculative decoding, tool calling, reasoning parsing.
- @mr_r0b0t spotted NVIDIA GLM5.1-NVFP4 on Hugging Face.
- @ArtificialAnlys said grok-imagine-image-quality ranks #5 on both its text-to-image and image-editing leaderboards, below OpenAI/Google but cheaper.
Agents, Coding, and Tooling
- @cursor_ai released a Developer Habits Report based on broad AI coding telemetry. Highlights:
- @adithya_s_k released Repo2RLEnv, converting repos/PRs/commits into runnable, verifiable coding environments for eval or RL training; @_lewtun framed it as democratizing the RL harness used by top coding-model teams.
- @ClementDelangue described a TRL/vLLM improvement for async RL weight sync: sparse safetensors + HF Buckets cut sync traffic by roughly 100x, e.g. 1.2GB → 20–35MB on Qwen3-0.6B.
- @hwchase17 argued more standardized agent harnesses will lead to more managed agent services.
- @ghumare64 shared a strong systems argument that harnesses should be decomposed into interchangeable workers rather than adopted as monolithic frameworks.
- @latentspacepod summarized Cognition’s cloud-agent architecture: background agents, memory, testing, and the shift from local IDEs to cloud-based async engineering.
Research, Evals, and Infrastructure
- @arnal_charles announced ATLAS, a Lean 4 formalization corpus covering 25+ textbooks and 500k lines of code.
- @Space_Boy_Matt introduced DiscoverPhysics, a benchmark for LLM agents on scientific experimentation, analysis, and discovery.
- @lateinteraction highlighted an IR result: search over ~600M ColBERT vectors in 10ms on a single CPU core.
- @ArtificialAnlys launched AA-WER Streaming for streaming STT:
- best final accuracy: Cartesia Ink-2 3.59% WER at 0.21s
- best first partial: ElevenLabs Scribe v2 Realtime 3.65% at 0.13s
- fastest: Deepgram Flux 0.020s / 7.36% WER
- @NVIDIAAI shared LocateAnything, trained on 138M samples, decoding bounding boxes in parallel for faster grounding/detection.
- @EpochAIResearch said hyperscaler capex remains on trend for $770B in 2026 and >$1T in 2027.
Enterprise Platforms and Product Rollouts
- @perplexity_ai launched Perplexity Computer inside Excel, Word, PowerPoint, and Outlook; enterprise controls include SAML SSO, audit logs, granular admin controls (security follow-up).
- @MistralAI announced production AI deployments in aerospace, automotive, energy, and physics with customers including Airbus, BMW, EDF.
- @mistralvibe shipped Mistral Vibe, pitched as an AI agent for long-horizon productivity/coding with Work mode, Code mode, CLI, and a VS Code extension.
- @LinuxFoundation announced OpenMDW-1.1, a permissive legal framework for AI models; @NVIDIAAI said NVIDIA is adopting it across Cosmos, Isaac GR00T, Ising, and Nemotron open model families.
- @Reactorworld came out of stealth with $59M to build infra for streaming “world models” at app scale.
- @inherent_labs launched as an AI-for-science lab with a $50M seed.
Open Source, On-Device, and Local-First
- @JonSaadFalcon released OpenJarvis v1.0, an on-device personal assistant oriented around local inference.
- @ivanfioravanti showcased a fully local realtime setup for Reachy Mini using llama.cpp + Parakeet + Gemma 4 E4B + Qwen3TTS.
- @CChadebec announced MONET, an Apache-2.0, deduped/recaptioned 105M-sample text-to-image dataset, plus Nano T2I training code.
- @lucasmaes_ released stable-worldmodel, an open platform for JEPA / world-model research.
- @Jason asked where the U.S. open-source frontier model company is; @willccbb answered that the most serious U.S. pushes on open models above 100B params currently appear to be NVIDIA and Arcee.
Developer Platforms, On-Device Agents, and Enterprise Integration
- Cursor published rare usage telemetry across model families: its new Developer Habits Report claims to be based on one of the broadest datasets on AI coding and highlights several meaningful trends: power users increasingly dominate usage, input tokens are now the majority of price-equivalent costs as agents consume more context, and the cost per accepted line of code varies by ~7x across model families @cursor_ai, @cursor_ai, @cursor_ai. Matan Sela also reported open-model usage in Factory rising to 3x closed-model usage over the last month @matanSF.
Top tweets (by engagement)
- Claude Opus 4.8 launch: Anthropic’s main launch post dominated technical engagement, reflecting how central agentic coding and long-horizon autonomy have become to the market @claudeai.
- Claude Code Dynamic Workflows: the developer-facing rollout of orchestration over hundreds of subagents was the most consequential product feature announcement of the day beyond the base model itself @ClaudeDevs.
- Anthropic financing and revenue: Anthropic announced a $65B Series H at a $965B post-money valuation, alongside $47B run-rate revenue, a scale-up that materially changes the frontier-lab landscape @AnthropicAI, @AnthropicAI.
- LFM2.5-8B-A1B: Liquid AI’s open release drew outsized attention because it combines small active footprint, long context, large-scale training, and an explicit on-device deployment story @liquidai.
- Cursor’s Developer Habits Report: one of the few datasets shedding light on real AI coding economics and behavior shifts across model families @cursor_ai.
AI Reddit Recap
/r/LocalLlama + /r/localLLM Recap
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
AI Discords
Unfortunately, Discord shut down our access today. We will not bring it back in this form but we will be shipping the new AINews soon. Thanks for reading to here, it was a good run.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.