How to Make Claude Code Your AI Engineering Team
By Y Combinator
Summary
Topics Covered
- I coded more in two months than all of 2013
- The bottleneck is scaffolding, not model intelligence
- Office hours is a conversation, not rails
- Browser in the cloud is just someone else's computer
- The barrier to building just collapsed
Full Transcript
Hi, I'm Gary, president and CEO of Y Combinator. I'm also an engineer who
Combinator. I'm also an engineer who spent the first decade of my career building software full-time. I studied
computer systems engineering at Stanford, then was employee number 10 at Palunteer, where I was an engineer, designer, and product manager all at
once. I co-founded Posterous, a micro
once. I co-founded Posterous, a micro blogging platform that sold to Twitter, and I also built the first version of Bookface, YC's internal social platform
and knowledge base. Basically, I've
written a lot of code in my career, and I'm here to tell you we are in a completely new era of building software, the agent era. It turns out the way to
get agents to do real work is the same way humans have always done it as a team with roles with process with review. I
built GStack to encode this 3 weeks ago and now it has more GitHub stars than Ruby on Rails. In this video I want to explain how it can help you build with
agents. I've coded more in the past two
agents. I've coded more in the past two months than I did in all of 2013, which is the last time I worked really, really hard as an engineer. I started playing
with Claude Code back in January after hearing people like Andre Karpathy and Boris Churnney say they weren't manually writing any code anymore. And I got
completely hooked. Along the way, I've
completely hooked. Along the way, I've essentially built all of Posterus, which took 2 years to build with a co-founder and a team of 10 engineers. I've
essentially built all of my startup Posterus, which took 2 years, $10 million, and 10 engineers to build. Out
of the box, the model wanders. It
doesn't know your data well. So, it
guesses. And guessing at that scale is how you get plausible looking code that silently breaks. The bottleneck here is
silently breaks. The bottleneck here is not the model's intelligence. As long as you set the models upright, they are already smart enough to do extraordinary
work on your codebase. This is
backwards. The scaffolding should be trivially thin. GStack is my
trivially thin. GStack is my implementation of the thin harness fat skills approach. It's an open-source
skills approach. It's an open-source repo that I built that turns clawed code into an AI engineering team for you.
skills that act like a team of specialists. Office hours is one of
specialists. Office hours is one of those skills. It's actually modeled
those skills. It's actually modeled exactly after what we go through at YC as a partner doing office hours with startups. It starts by asking six
startups. It starts by asking six forcing questions for you to reframe your product before you start building.
Let me show you how it works. The best
way to get started with GStack is uh actually conductor. And so we're going
actually conductor. And so we're going to go in quick start. And GStack is actually built into Conductor right now.
You just click GStack. And today we're going to make a tax app. Uh it's going to go into your Gmail and fish out all of your 1099s cuz it's tax day as of
today. Gstack is actually a set of
today. Gstack is actually a set of skills. And the first one that we're
skills. And the first one that we're actually going to use is called office hours. This is actually the distilled
hours. This is actually the distilled version of what is thousands and tens of thousands of hours that the 16 YC
partners have spent many many years honing and perfecting. And uh this is a distilled down 10% strength version of
what we do at YC every day. So as you can see uh conductor actually just drops you right in there. We're in YC office hours now and I'm trying to do a startup
to create to help people get all their 1099 ins out of
their Gmail and financial institutions.
Many banks will email you with uh new tax documents, but some won't. So, we
need to both search the user's inbox and accept URLs to go and search and
download the 1099 int PDFs.
Cool. That's our startup idea. It's just
uh something to help people with their taxes and it's something that I had to deal with just yesterday. So, the user wants to do office hours about a startup idea. um and it's starting on office
idea. um and it's starting on office hours. So, one of the things you'll
hours. So, one of the things you'll notice is um we have Gary mode on and uh what that does is it actually shows you all of the reasoning traces. So, that's
one of the things I really like about um using GStack. It actually with conductor
using GStack. It actually with conductor you actually get to see exactly what um the model is thinking as it does it. Uh
right now it's just sort of getting started but now it's starting to do some context. Okay, this is a fresh project
context. Okay, this is a fresh project with an initial commit. No prior design docs. We're in startup mode. Oh, and
docs. We're in startup mode. Oh, and
here this is uh this is sort of the first thing that happens. You know, your model with the skill uh of office hours does a lot of thinking. It searches the web sometimes and figures out well what
what are you trying to do? Here's the
question that determines everything else. What's the strongest evidence that
else. What's the strongest evidence that you have that someone actually wants this? This is actually one of the most
this? This is actually one of the most important questions to ask yourself when you're trying to decide, should I work on project X or work on that startup? I
actually just experienced this recently.
So, I'm going to click that. I have this experience uh just the other day.
So, what happened exactly? How many bank accounts do we have and which one sent uh 1099? Well, I have a few and then the
uh 1099? Well, I have a few and then the process of hunting down is actually quite painful. Okay, the pain is real,
quite painful. Okay, the pain is real, but the consequence is friction and annoyance, not actual penalties. Well, I
mean, frankly, my accountant uh sends me annoyed emails, and sometimes they get a phone call because they're just trying to do my their job, and uh I'm supposed to be there to help them.
And so, I do have more than five accounts. This is actually a really
accounts. This is actually a really interesting thing. It points out,
interesting thing. It points out, Turboax and HR Block actually have 1099 import features, and Plaid connects to banks. Why aren't those solving it for
banks. Why aren't those solving it for you? So, I answered and it said, "Well,
you? So, I answered and it said, "Well, the user's thinking bigger than a document act creator. They're seeing
this as a funnel. The hook is we'll find all your 1099 ins for you, solving an immediate pain. But the expansion is now
immediate pain. But the expansion is now that you have your docs, let's actually get your taxes prepared, which is matchmaking and lead genen for tax preparers. And it's a classic wedge
preparers. And it's a classic wedge strategy. It's actually a more
strategy. It's actually a more interesting business model than just the document aggregation alone, which is right. You could probably charge $2 or
right. You could probably charge $2 or $5 a month or $2 or $5 maybe an entire year for $1099 in aggregation, but you can actually charge um a percentage of
the transaction that ends up happening with a tax preparer, which uh might be 10x more. This is actually a very cool
10x more. This is actually a very cool part of office hours. I might just build this for myself just for fun because I do have to do this every single year.
And this is actually one of the most fun parts about using office hours and GStack is it's not this sort of on rails thing. It's more like a conversation
thing. It's more like a conversation with your model. And so if you just came in and gave instructions to the model as it is, if I just typed the original
thing, which is I need to go and find my 1099s, it'll go do that. It will just literally do it, but it won't think about who's the user, what is this, what is the business model, who wants this,
what's the pain point, how does it work, right? Like this is the kind of stuff
right? Like this is the kind of stuff that we get to do every day with lots and lots of founders in office hours and we're pretty good at it, but so is this skill. Revised AI browser automation
skill. Revised AI browser automation GStack browser style lets the user log in then AI navigates to tax docs and downloads PDFs. The user watches it
downloads PDFs. The user watches it happen. I think that's correct. Let's
happen. I think that's correct. Let's
see. I mean this is a very interesting thing for me. I mean I feel like I learn about the process of building a little bit even though I do office hours with people all the time. This is sharp push back on premise 3. You're right. I do
have GStack browser and the approach you're describing is genuinely clever.
The model would be user logs in AI takes over, navigates to tax docs, finds the 1099 ins, downloads it. No plaid, no stored credentials. The user watches the
stored credentials. The user watches the whole thing happen in the visible browser. I mean, what I like about it is
browser. I mean, what I like about it is it can happen on their actual browser.
It's not happening in the cloud. The
cloud is just someone else's computer.
One of the cool things that GStack does increasingly is it lets you use codecs to actually sort out all of the crazy bugs that might be in here. I'm not
going to do it right now, but it is something that you can do when you're at home working on it yourself. The way to think about Claude code is that by default it uses Claude. And I think Opus
4.6 is sort of ADHD CEO. He's the guy you want to get a beer with and he's got a billion ideas, but when the going gets tough, you got to call in your autistic
CTO and that's Codex. All right, we're going to skip for now because we're actually pretty close. I feel like basically we're in plan mode and office hours helps us start off with a plan
that has a lot of the things thought through. So, here's actually a really
through. So, here's actually a really cool uh example. It actually thinks through and here's three different approaches. The first approach is Gmail
approaches. The first approach is Gmail off then search for tax doc not notification then output a checklist of banks which issue 1099s. There's no
browser automation initially. The effort
is small and the risk is small. You know
when I look at that it's like that sounds interesting but it doesn't sound big enough for me to actually even work on this. Like I could do that myself.
on this. Like I could do that myself.
Next is full stack Gmail and AI browser automation using uh and a CPA marketplace. This sounds like what I
marketplace. This sounds like what I want actually.
And then uh it sort of thinks out of the box. It says, "Oh, okay. What about
box. It says, "Oh, okay. What about
approach C? CPA first flip the go to market. You know, I would say B sounds
market. You know, I would say B sounds right." And then actually I sometimes I
right." And then actually I sometimes I like to add this extra thing which like when I have an idea when I one of the approaches speaks to me, but then I think about something else. I'm like,
"Okay, well, I like B, but actually we could use the browser interaction
to skip Google OOTH entirely and just have the user open Gmail and a version
of GStack browser could just use Gmail to find the 1099s to search for automatically."
Simultaneous to that, it could also ask the user what other banks they have. Also, and this is
what happens for me.
If they already have a CPA, you can find out from the email. And if you're me, you probably already have a bunch of
emails from your CPA bugging you for the specific accounts.
We're sort of at the end of office hours, but as you can see, we already went from sort of a halfbaked rough idea for something that we might want to do.
I'm not saying this is actually a good startup idea, but you can see how this got farther along. We started with something that might start with OOTH and
then CPA's nagging emails, but in the end we realized, well, we have a browser and the browser could be used with browser automation to search the inbox,
find all of the 1099s that you need to download. It can also using LLM ask you
download. It can also using LLM ask you which bank portals you need to add to and it can go log in with your account and actually download the PDFs for you
and then send an email to the CPA. So I
really like this browser automation is a very outofpocket sort of unusual way to solve this problem. And the wild thing about coding models is, you know, a year
ago, two years ago, even like 3 months ago, it's not clear to me that anyone would even try this. I think that's the most interesting thing about uh our time
right now. You're able to have an idea
right now. You're able to have an idea and then get farther along with it than you ever would be. Frankly, sometimes I use office hours and maybe one in three times I get to the end of it and I say,
"You know what? This isn't something that makes sense." You'll notice that there's actually a feasibility aspect of office hours and that's one thing I really pride myself on in office hours working with startups. I have a very
strong opinion about how the world works and what might work and um it's just very interesting to see Opus 4.6 mirror that in trying to help you figure out
what your startup or product idea might be. Now, what it's doing is a multi-step
be. Now, what it's doing is a multi-step adversarial review. It's trying to put
adversarial review. It's trying to put your idea through the paces. And as you can see, it's already found a bunch of things and it's going to try to autofix it. There's no failure handling. There's
it. There's no failure handling. There's
no privacy section. 2FA handoff has no proposed solution. It actually tries to
proposed solution. It actually tries to auto fill out these things. And it if it can, it does. And so our doc survived two rounds of adversarial review. And it
automatically caught and fixed 16 issues. Um so we're going to approve
issues. Um so we're going to approve this design doc. So, as you can see, the adversarial review improved the score from 6 out of 10 to 8 out of 10 with three remaining issues that we can worry
about later. Now that we've locked in
about later. Now that we've locked in the adversarial review and addressed all these issues, uh, normally what I would do is run plan CEO review, but instead I
think what we're going to do is jump directly to design shotgun, which is one of my most fun uh, ways to use this. And
this is just one of a bunch of different design tools that are in the bag. So, it
figured out here's a bunch of different views. What do you want to actually
views. What do you want to actually design? And let's just do the main
design? And let's just do the main checklist dashboard. Design checks my
checklist dashboard. Design checks my visual brainstorming tool. So, it'll
actually generate multiple AI versions and then ask us questions about it.
These are three directions. It takes
about 60 seconds. it actually farms it out to uh OpenAI codecs which um is able to use image gen. So all right let's there's three versions command center friendly progress and split view. Let's
take a look. All right so let's let the agents cook and we'll be back in about 5 minutes. Great. The agents are done
minutes. Great. The agents are done cooking. And this is what we we got
cooking. And this is what we we got back. We got three different options for
back. We got three different options for the actual page that shows up in the command center for tracking down our tax documents. So let's look at them one by
documents. So let's look at them one by one. There's option A, B, and C. All
one. There's option A, B, and C. All
right, here's one command center.
There's a dashboard. Here's all the specific. I mean, this looks pretty
specific. I mean, this looks pretty good. If you can extract here are all
good. If you can extract here are all the banks and here are all the 1099s and where are they coming from? Um, and what their status is, that's pretty good. I
like that. I'm I'm going to give that a four out of five stars. Option B is like much more friendly. Um, so option A is sort of like if you're a Linux hacker, I bet you would really like this. But
option B, I think it's more friendly for just normal people. So I kind of like I I I might put that as a five. That might
be a pick. And then let's see. Option C.
This makes it way more complicated than it needs to. So I really wouldn't do that. Let's go with option B.
that. Let's go with option B.
Uh, and then the cool thing is if you don't like it, you can enter uh, you know, any of your feedback. You can
click regenerate. But in this case, we're just going to run with option B and continue. So that comes back in. And
and continue. So that comes back in. And
you know, as you can see, we're going to go ahead and select option B. And there
it is. So a friendly card-based approach with progress and the progress ring.
Good instinct. Variant B is locked in.
So while I have you, I mean, that is just two of 28 different commands. We've
got more than 70,000 uh stars now. And
some of the people who use it like they they actually talk about how when they're using cloud code they spend 80 to 90% of their time in office hours
plan CEO review and auto plan. Um this
is sort of a rough view of how that sprint process actually works. We
already talked about office space, but if you don't want to do a lot of back and forth, if you don't want to be in the weeds, I did create auto plan, which gets you through CEO, engineering,
design, and developer experience review using basically my default recommendations. Like these are sort of
recommendations. Like these are sort of programmed to be what I would do if I were you. There are a bunch of design
were you. There are a bunch of design skills that you can use after the code is actually done. Cloud code will actually build when you click approve on the plan and then after it's done
writing the code you can run review which does a staff level uh bug catching service that goes through puts the work
through the paces full code review uh finding bugs that might not have been in the plan mode and then the coolest part I think that um is actually an incredible amount of code is I wrote a
CLI around playrite and chromium So there's actually an entire headed and headless browser in there. And that was a real magic moment for me as I was
using cloud code as I sped up. Um
there's this idea of trying to get a to a level 8 software factory and GSAC does not get you to level 8, but I do think it gets you to level seven. And that's
where I can run multiple conductor windows on different projects and sometimes three or four all on the same project all at the same time. These are
parallel PRs with parallel branches and parallel different features that all can land more or less simultaneously. And
one of the bottlenecks I ran into was that, you know, once the agent was doing all the work of planning and design and coding it, I found myself sitting there
doing QA, probably the least fun part of software development. So that made it
software development. So that made it very, very important for me to try to automate that. And when I did, Claude in
automate that. And when I did, Claude in Chrome MCP is one of the worst pieces of software I've ever used. You know, every time it would try to do an action, it would think and think and think. There
was crazy context bloat. Often it
wouldn't even do anything, but it would take two to three seconds even when it was working to be able to take an action. And I was amazed that I could
action. And I was amazed that I could use all of my other skills in GStack to create the SLQA and SL browse tool. I
basically wrapped Playright at the CLI level. And now your cloud code and any
level. And now your cloud code and any agent now can actually just use the browser. And so you know, not only could
browser. And so you know, not only could it use the browser, it can take screenshots. It can do complex
screenshots. It can do complex interactions. It can click on things. It
interactions. It can click on things. It
can fill things out. Now it can even download media, run eventually full regression tests and update CSS and
assess real browser bug issues, whether it's JavaScript or CSS. And finally,
there's a ship tool. So, it's sort of the last step before to make sure that your PR is ready to land on main. And
this is actually how I work. I run 10 to 15 parallel cla code sessions all at the same time. I might in one session be
same time. I might in one session be running office hours on a brand new idea. And I actually now have multiple
idea. And I actually now have multiple open- source projects with tens of thousands of stars. And I'm probably sitting on about 400 uh PRs to review
right now. And so I almost always have
right now. And so I almost always have one or two sessions active for each project just evaluating and bringing in all the open- source fixes that I'm getting from the community. Uh and I
evaluate it in waves. Um one of the things that's been really scary in AI coding right now is supply chain attacks. So I'm really really paranoid
attacks. So I'm really really paranoid about it. But the great thing is I have
about it. But the great thing is I have GStack that has my back. So I don't have a to-do list anymore. One of the things that has emerged is I actually click on whenever I have an idea or I get a bug
report from a user or I see something on X where someone's frustrated with what GStack or GBrain does, I just click the plus icon in Conductor. It creates a new work tree and each one of these things
is a new work item. And all I have to do is run office hours, CEO review, end review, uh adversarial review, and then I just run my normal process. when it's
ready to land, it lands and I can do 10, 15, 20, sometimes 50 PRs in any given day, depending on the number of meetings I have in that day. So that's it. Uh,
GStack is available right now. Just go
to github.com/gritan/GStack.
When you run/off hours, you're getting a version of the real product thinking we do at YC with founders. similar push
back and similar reframing before you ever meet us. Give it a try and let me know what you think. This is the most incredible time in history to build
software. The barrier to building just
software. The barrier to building just collapsed. The only question left is
collapsed. The only question left is what are you going to build? It's time
to let it rip. Go make something people want.
Loading video analysis...