Claude Code's product lead talks usage limits, transparency, and the "lean harness"
Mirrored from Ars Technica — AI for archival readability. Support the source by reading on the original site.
SAN FRANCISCO—Amid an ever-expanding array of surfaces, growing demand for tokens and compute, and a rapidly evolving user base, Anthropic doesn’t have a long-term road map for Claude Code. However, it’s betting that such a plan would be rendered moot by improvements in model capabilities and new signals from developers on how best to use it. That’s the takeaway from a 30-minute conversation Ars had with Cat Wu, Anthropic’s head of product for Claude Code.
Last week, in a three-level car rental parking garage meticulously converted into an event space in downtown San Francisco, Anthropic put on its second annual Code with Claude developer conference. As previously reported, the single-day event included a keynote introducing new features for Managed Agents and announcing a compute deal with SpaceX.
That compute deal was accompanied by a doubling of usage limits for Claude Code users on the company’s Pro and Max plans—a response to a lot of user frustration about a compute crunch, especially in recent weeks.
Anthropic’s products—especially Claude Code, its tool for agentic software development—have seen runaway popularity. “We tried to plan very well for a world of 10x growth per year,” Anthropic CEO Dario Amodei said on stage at the conference. “And yet we saw 80x, and so that is the reason we have had difficulties with compute.”
User growth was accompanied by a shift in how people used the company’s models, away from simple chat interfaces to complex, multi-agent workflows that are many times more demanding.
During the crunch, Anthropic has been testing solutions to reduce demand, like enforcing stricter limits during peak hours or removing Claude Code from its cheaper subscription plan.
And over the past year, Anthropic has released a plethora of new features, products, and surfaces for interacting with its models. Claude Code went from the CLI to the IDE to the desktop, and new tools for managing multiple agents were rolled out, too. The pace at which the company has shipped has been intense and chaotic at times.
Meanwhile, competitors like OpenAI’s Codex, GitHub Copilot, the Cursor IDE, Augment Code, and others are rolling out their own new products and features in this space, sometimes with differentiating hooks like more explicit context, which they claim leads to better results or greater efficiency.
At the event, I spoke with Wu about how Anthropic is operating in this context.
As head of product for Claude Code, Wu works closely with its creator, Boris Cherny, to identify which features to prioritize and how the teams at Anthropic test, use, and roll out those features. She does not oversee the models, but the product strategy she describes makes a big bet that the models will continue to improve so rapidly that it’s hard to make a plan for what a product like Claude Code should look like in the future.
As Wu tells it, the Claude Code team is going through development cycles of just a week or so to roll out new products or features in a Wild West of experimentation, discovering new use cases and methodologies.
We discussed user frustrations with usage limits, the role of structured data in making Claude Code work, IDE integration, future capabilities of the tool, and more.
A conversation with Cat Wu
This interview has been edited for length and clarity.
Ars: You’re shipping a lot of things and adding a lot of different surfaces very quickly. There’s the command line, and there’s the IDE integration, there’s the desktop app, and then there’s all this differentiation between Code and Cowork and Managed Agents and so on.
Do you still consider the command line the center of gravity for it? Or do you see people moving more to the desktop or web apps more and more?
Wu: We actually find that every developer has a different preference, so our usage is pretty split between all these. All these have a substantial number of users. I would say the center of gravity is still the CLI. It’s still the one that has the most power-user features, it’s where most of our features land first, and it’s also just the fastest for us to iterate on. It’s also what most of our team uses.
However, we are seeing a gradual shift in our team from the CLI to desktop because maybe last year people had like one agent and then over the course of the year they started having six terminal tabs, and then people started adding fancy ways to monitor a bunch of terminal tabs and get pings and other notifications, and I think people are now at the point where they feel, “OK, I don’t want to read ten tabs anymore. I just want to, like—I understand why people have graphical interfaces,” so a lot of people are moving over to desktop just to get that rich view.
Ars: At the rate that you’re shipping new surfaces, if you extrapolate that out, it might become unmanageable from a development or product point of view. You would have all these things to maintain for different purposes, and it could also get confusing for users.
Do you see a world where you might start consolidating, or do you think this is the best solution because you have something customized for each kind of user?
Wu: Yeah, I think of it as a bit of a progression. So most people start in the CLI or IDE, you get to the point where you’re managed against a ton of agents and you want to know which ones are blocks, and you can focus on those, and people go to desktop for that.
And then people are like, ‘Wait, I’m just copy and pasting messages from my customer feedback channel into Claude and babysitting it locally.’ So that’s why we added these higher level things like routines that can just watch that Slack channel where you’re getting feedback or data or whatever to kick off these runs… all the products are just a way to help you more easily elicit the intelligence of the models. We actually remove scaffolds. We remove parts of the system prompt and tool descriptions over time as models get smarter.
I can definitely see a world where maybe we all just collapse back to the text box again because maybe the model is just always right, so you actually don’t need to follow every step from every prompt. Maybe it doesn’t get blocked. So I can see a world where it collapses, but I think for now we need all these tools to meet people where they are while the models get better.
Ars: Is part of that approach following signals from users in real time, as opposed to having, you know, a lot of companies—they have this grand plan, here’s our whole year of everything…
Wu: Oh, we have no grand plan! [Laughs.]
Ars: I can tell! I don’t mean that in a bad way, because it seems you’re rolling stuff out to meet all these latent demands and signals. It’s different than something like OpenAI, where they’re talking about a super app, right? What’s your thought on that kind of approach of bringing it all into one super app that does everything for everybody? Do you think that’s a bad idea?
Wu: There are a few guiding principles for us. One is we believe that models will continue to grow on the exponential, and it’s really important for us to build where the puck is going.
I think we’re pretty humble about not knowing exactly what the right form factor is but encouraging our teams to explore that as much as possible to figure out what’s best for the next model.
I don’t know if you’ve read “The Bitter Lesson”?
Ars: Mm-hmm, yeah.
[The Bitter Lesson is a 2019 essay by computer scientist and reinforcement learning pioneer Richard Sutton. In part, it argues that efforts to bake domain-specific structures into AI systems have often “proved ultimately counterproductive” and that the methods that win out over time are general-purpose ones that scale with available compute.]
Wu: Yeah, that is one of our guiding principles for our team. And I think it’s really hard—because the models are changing so quickly—it’s really hard to say that this will definitely be the next form factor. We have a few guesses. We dogfood internally a lot of these ideas, but we’re pretty open-minded to just being wrong, and we just need to stay really close to the model capabilities.
Ars: I know some of these things have arisen from seeing users who are just using it this way and then deciding to productize that and make it more convenient.
Are there things that you’re seeing right now that are like that, that you haven’t productized yet where you’re saying, “OK, now we need to be thinking about this in the near future”?
Wu: We try to go from conviction to a product shift pretty quickly, ideally in a week or so. So usually there isn’t a big delay between us feeling that user pull and shipping something.
I think there is maybe this next level where Claude can anticipate what you want. Like, it can proactively know that, “Oh, you’re working on a voice feature,” so it should monitor GitHub issues and feedback in Slack and Twitter and whatever for people saying that there are bugs in voice or that they have new feature requests, and it just makes the routine for itself to monitor for this.
It’s actually not that far away, but I think this is an imminent next step… Claude should probably decide to actually listen for feedback on your feature and then decide how to notify you on its ideas. So the engineer doesn’t need to set up an automation, but Claude just thinks, “OK, this is what you work on, so let me monitor it and then propose what you could do today.”
Ars: Developers using these tools are frustrated that there’s just not enough compute to go around. The limits are a problem.
There are tools that exist right now that look at what the IDE knows about the code base—this function is referenced in these different places, and so on—and the idea is that makes it more efficient to comb through the codebase in terms of token usage because you have the structured data. Is that something that you are considering, or do you have reasons to not go in that direction?
Wu: We do have plugins that tell Claude Code this semantic information. We have a few LSPs available that let you, for example, say, “I want to go to like where this function is defined,” and it will jump into that exact spot without using search and stuff.
We don’t find that it makes a measurable improvement in performance, but we’ve designed Claude Code to be extensible enough that if you want a plugin that does that, it’s available, and you can connect it. But we’ve found that Claude Code is pretty good at generating high-quality code without needing to add that to be able to navigate the codebase.
Ars: The question is less about the quality of the code than the efficiency of getting there, right? Because, again, people get very frustrated with usage limits. Sometimes people try to introduce some kind of structure for an LLM, and they find out that has an unexpected hidden cost. Is that what you’re saying happens with that kind of semantic information? Do you have data that tells you that’s not the way to go with this?
Wu: Going by the evals, we don’t see a measurable change. And I think we generally lean more toward shipping a leaner harness with fewer opinionated tools and just letting developers add their own if they want. So unless a tool clearly improves token performance or accuracy, we default toward not shipping it.
I think token efficiency is always top of mind for us because we just want to give people the maximum amount of intelligence per token, so we’re constantly experimenting with ways to reduce it, but it’s actually harder than I wish it were to do it well.
For us, the most important thing is just maintaining intelligence, so we would only ship something if we felt like it actually makes a model more intelligent because that’s that’s really the north star for us, not token efficiency.
Ars: For some users it might be easier to accept limitations on token availability if it was more transparent. But at the same time, my impression is that actually having real transparency about the token usage of “this task did this much because you did this instead of this”—that’s actually hard to do.
I assume you’ve looked into ways to communicate that to users. What have you found when you’ve tried to do that?
Wu: We did get a lot of questions about that, like, “Hey, my usage limits got used up quickly, where did they go?” And I think that’s totally valid, and we need to be transparent about that. It is hard to diagnose.
So when people have these complaints, we pick a few people, we jump on a call with them, and we actually just debug live because your full transcript is fully stored locally, so you actually have all the data on your computer already about all the tokens that you use…
We noticed two main patterns. One, people have these really long sessions, they step away for two hours, they come back and then the cache is broken—and when the cache is broken, it’s actually much more expensive to send the next query. So we start showing a notification that says, “Hey, the cache is broken, run /clear if you want to start a new session.” So it’s just a reminder that this one’s pretty expensive to resume. Also, when you run /usage, you’ll actually see, “Hey, these sessions cost a lot because your cache is broken.”
Another common pattern we found was that some people were installing plugins that actually told Claude to kick off a ton of subagents, so there are people who were running a hundred subagents behind the scenes, which is really expensive because that’s like 100 Claude Codes, but they didn’t realize… and then we added some usage, so it will tell you you’re using a ton of subagents.
I think there’s more we can do, and as we figure out what the bad patterns are, or the concerning patterns are, we want to surface it to the users.
This is really pretty hard. So it’s really important for us to be transparent, but we need to have a strong thesis about where the tokens are going before we know how to surface it in a useful way.
Ars: Last question: You have these very different kinds of organizations or users that are using this. You’ve got people vibe-coding with no programming experience whatsoever, you have experienced individual developers, you have teams like the size of a few dozen people, you have large enterprises. How do you make this work for all those different kinds of users and team sizes? Because they have such radically different needs.
Wu: We try to make the core harness as un-opinionated as possible. It’s just our minimal viable set of tools, so make a plan, make a to-do list, edit files, a few others, ask clarifying questions for the user. We try to make this very general and then let everyone else bring in the customizations that work well.
I think that along this adoption axis or something that you walk through, what we find is that, as the models get better, the adoption grows pretty much along the trajectory that you talked about. So Opus 4.5, for example, was a massive unlock for enterprises. That was the one where even if you weren’t familiar with how to get the most out of AI tools, you could just use Claude Code on your possibly legacy codebase, and it just works. And I think that was a big unlock for the people who didn’t want to do a bunch of education to learn how to coax it to work.
Ars: Thank you for chatting with me.
Wu: Yeah, thank you!
Takeaways
Anthropic isn’t alone in placing a big bet on the notion that models will continue to improve as compute scales so much that planning ahead is difficult; this is part of why there is such massive investment in compute infrastructure, with Anthropic and other companies working on models: The guiding philosophy of the current AI spring depends on growing compute efficiency and access. That’s happening while the existing infrastructure is not always keeping up with user demands, though.
As Wu noted at the end, the better the models’ performance, the deeper AI integration goes at increasingly large organizations. Maybe it starts with vibe coding, but larger, more buttoned-down organizations are using either Claude Code or tools like it more than they were before. There still remain several questions that we didn’t get into in our short conversation, though—for example, methodologies and governance for ensuring that work done by these agents meets the standards it needs to in that kind of environment.
Anthropic also isn’t alone in its philosophy of maintaining an un-opinionated harness, largely clear of structured semantic data and the like. Nonetheless, several competitors, like Cursor or Augment Code, are actively exploring what domain-specific structure might do for specialized applications. It remains to be seen exactly where we’ll land on that.
The messaging and product strategy around Claude Code have at times seemed chaotic to developers and users. Wu presents that as both a side effect of the current landscape and a virtue in terms of staying agile as models improve. If the next year moves at the pace of the previous one, that may be a reasonable tack—but for better or worse, there are sure to be some more surprises for both Anthropic and developers along the way.
More from Ars Technica — AI
-
Energy supplier abandons Lake Tahoe residents to serve data centers
May 14
-
Your doctor’s AI notetaker may be making things up, Ontario audit finds
May 14
-
Desperate Trump taps "Tim Apple," Jensen Huang, Elon Musk to attend Xi summit
May 14
-
AI invades Princeton, where 30% of students cheat—but peers won't snitch
May 13
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.