LongCut logo

How to Make Claude Code Your AI Engineering Team

By Y Combinator

Summary

Topics Covered

  • I coded more in two months than all of 2013
  • The bottleneck is scaffolding, not model intelligence
  • Office hours is a conversation, not rails
  • Browser in the cloud is just someone else's computer
  • The barrier to building just collapsed

Full Transcript

Hi, I'm Gary, president and CEO of Y Combinator. I'm also an engineer who

Combinator. I'm also an engineer who spent the first decade of my career building software full-time. I studied

computer systems engineering at Stanford, then was employee number 10 at Palunteer, where I was an engineer, designer, and product manager all at

once. I co-founded Posterous, a micro

once. I co-founded Posterous, a micro blogging platform that sold to Twitter, and I also built the first version of Bookface, YC's internal social platform

and knowledge base. Basically, I've

written a lot of code in my career, and I'm here to tell you we are in a completely new era of building software, the agent era. It turns out the way to

get agents to do real work is the same way humans have always done it as a team with roles with process with review. I

built GStack to encode this 3 weeks ago and now it has more GitHub stars than Ruby on Rails. In this video I want to explain how it can help you build with

agents. I've coded more in the past two

agents. I've coded more in the past two months than I did in all of 2013, which is the last time I worked really, really hard as an engineer. I started playing

with Claude Code back in January after hearing people like Andre Karpathy and Boris Churnney say they weren't manually writing any code anymore. And I got

completely hooked. Along the way, I've

completely hooked. Along the way, I've essentially built all of Posterus, which took 2 years to build with a co-founder and a team of 10 engineers. I've

essentially built all of my startup Posterus, which took 2 years, $10 million, and 10 engineers to build. Out

of the box, the model wanders. It

doesn't know your data well. So, it

guesses. And guessing at that scale is how you get plausible looking code that silently breaks. The bottleneck here is

silently breaks. The bottleneck here is not the model's intelligence. As long as you set the models upright, they are already smart enough to do extraordinary

work on your codebase. This is

backwards. The scaffolding should be trivially thin. GStack is my

trivially thin. GStack is my implementation of the thin harness fat skills approach. It's an open-source

skills approach. It's an open-source repo that I built that turns clawed code into an AI engineering team for you.

skills that act like a team of specialists. Office hours is one of

specialists. Office hours is one of those skills. It's actually modeled

those skills. It's actually modeled exactly after what we go through at YC as a partner doing office hours with startups. It starts by asking six

startups. It starts by asking six forcing questions for you to reframe your product before you start building.

Let me show you how it works. The best

way to get started with GStack is uh actually conductor. And so we're going

actually conductor. And so we're going to go in quick start. And GStack is actually built into Conductor right now.

You just click GStack. And today we're going to make a tax app. Uh it's going to go into your Gmail and fish out all of your 1099s cuz it's tax day as of

today. Gstack is actually a set of

today. Gstack is actually a set of skills. And the first one that we're

skills. And the first one that we're actually going to use is called office hours. This is actually the distilled

hours. This is actually the distilled version of what is thousands and tens of thousands of hours that the 16 YC

partners have spent many many years honing and perfecting. And uh this is a distilled down 10% strength version of

what we do at YC every day. So as you can see uh conductor actually just drops you right in there. We're in YC office hours now and I'm trying to do a startup

to create to help people get all their 1099 ins out of

their Gmail and financial institutions.

Many banks will email you with uh new tax documents, but some won't. So, we

need to both search the user's inbox and accept URLs to go and search and

download the 1099 int PDFs.

Cool. That's our startup idea. It's just

uh something to help people with their taxes and it's something that I had to deal with just yesterday. So, the user wants to do office hours about a startup idea. um and it's starting on office

idea. um and it's starting on office hours. So, one of the things you'll

hours. So, one of the things you'll notice is um we have Gary mode on and uh what that does is it actually shows you all of the reasoning traces. So, that's

one of the things I really like about um using GStack. It actually with conductor

using GStack. It actually with conductor you actually get to see exactly what um the model is thinking as it does it. Uh

right now it's just sort of getting started but now it's starting to do some context. Okay, this is a fresh project

context. Okay, this is a fresh project with an initial commit. No prior design docs. We're in startup mode. Oh, and

docs. We're in startup mode. Oh, and

here this is uh this is sort of the first thing that happens. You know, your model with the skill uh of office hours does a lot of thinking. It searches the web sometimes and figures out well what

what are you trying to do? Here's the

question that determines everything else. What's the strongest evidence that

else. What's the strongest evidence that you have that someone actually wants this? This is actually one of the most

this? This is actually one of the most important questions to ask yourself when you're trying to decide, should I work on project X or work on that startup? I

actually just experienced this recently.

So, I'm going to click that. I have this experience uh just the other day.

So, what happened exactly? How many bank accounts do we have and which one sent uh 1099? Well, I have a few and then the

uh 1099? Well, I have a few and then the process of hunting down is actually quite painful. Okay, the pain is real,

quite painful. Okay, the pain is real, but the consequence is friction and annoyance, not actual penalties. Well, I

mean, frankly, my accountant uh sends me annoyed emails, and sometimes they get a phone call because they're just trying to do my their job, and uh I'm supposed to be there to help them.

And so, I do have more than five accounts. This is actually a really

accounts. This is actually a really interesting thing. It points out,

interesting thing. It points out, Turboax and HR Block actually have 1099 import features, and Plaid connects to banks. Why aren't those solving it for

banks. Why aren't those solving it for you? So, I answered and it said, "Well,

you? So, I answered and it said, "Well, the user's thinking bigger than a document act creator. They're seeing

this as a funnel. The hook is we'll find all your 1099 ins for you, solving an immediate pain. But the expansion is now

immediate pain. But the expansion is now that you have your docs, let's actually get your taxes prepared, which is matchmaking and lead genen for tax preparers. And it's a classic wedge

preparers. And it's a classic wedge strategy. It's actually a more

strategy. It's actually a more interesting business model than just the document aggregation alone, which is right. You could probably charge $2 or

right. You could probably charge $2 or $5 a month or $2 or $5 maybe an entire year for $1099 in aggregation, but you can actually charge um a percentage of

the transaction that ends up happening with a tax preparer, which uh might be 10x more. This is actually a very cool

10x more. This is actually a very cool part of office hours. I might just build this for myself just for fun because I do have to do this every single year.

And this is actually one of the most fun parts about using office hours and GStack is it's not this sort of on rails thing. It's more like a conversation

thing. It's more like a conversation with your model. And so if you just came in and gave instructions to the model as it is, if I just typed the original

thing, which is I need to go and find my 1099s, it'll go do that. It will just literally do it, but it won't think about who's the user, what is this, what is the business model, who wants this,

what's the pain point, how does it work, right? Like this is the kind of stuff

right? Like this is the kind of stuff that we get to do every day with lots and lots of founders in office hours and we're pretty good at it, but so is this skill. Revised AI browser automation

skill. Revised AI browser automation GStack browser style lets the user log in then AI navigates to tax docs and downloads PDFs. The user watches it

downloads PDFs. The user watches it happen. I think that's correct. Let's

happen. I think that's correct. Let's

see. I mean this is a very interesting thing for me. I mean I feel like I learn about the process of building a little bit even though I do office hours with people all the time. This is sharp push back on premise 3. You're right. I do

have GStack browser and the approach you're describing is genuinely clever.

The model would be user logs in AI takes over, navigates to tax docs, finds the 1099 ins, downloads it. No plaid, no stored credentials. The user watches the

stored credentials. The user watches the whole thing happen in the visible browser. I mean, what I like about it is

browser. I mean, what I like about it is it can happen on their actual browser.

It's not happening in the cloud. The

cloud is just someone else's computer.

One of the cool things that GStack does increasingly is it lets you use codecs to actually sort out all of the crazy bugs that might be in here. I'm not

going to do it right now, but it is something that you can do when you're at home working on it yourself. The way to think about Claude code is that by default it uses Claude. And I think Opus

4.6 is sort of ADHD CEO. He's the guy you want to get a beer with and he's got a billion ideas, but when the going gets tough, you got to call in your autistic

CTO and that's Codex. All right, we're going to skip for now because we're actually pretty close. I feel like basically we're in plan mode and office hours helps us start off with a plan

that has a lot of the things thought through. So, here's actually a really

through. So, here's actually a really cool uh example. It actually thinks through and here's three different approaches. The first approach is Gmail

approaches. The first approach is Gmail off then search for tax doc not notification then output a checklist of banks which issue 1099s. There's no

browser automation initially. The effort

is small and the risk is small. You know

when I look at that it's like that sounds interesting but it doesn't sound big enough for me to actually even work on this. Like I could do that myself.

on this. Like I could do that myself.

Next is full stack Gmail and AI browser automation using uh and a CPA marketplace. This sounds like what I

marketplace. This sounds like what I want actually.

And then uh it sort of thinks out of the box. It says, "Oh, okay. What about

box. It says, "Oh, okay. What about

approach C? CPA first flip the go to market. You know, I would say B sounds

market. You know, I would say B sounds right." And then actually I sometimes I

right." And then actually I sometimes I like to add this extra thing which like when I have an idea when I one of the approaches speaks to me, but then I think about something else. I'm like,

"Okay, well, I like B, but actually we could use the browser interaction

to skip Google OOTH entirely and just have the user open Gmail and a version

of GStack browser could just use Gmail to find the 1099s to search for automatically."

Simultaneous to that, it could also ask the user what other banks they have. Also, and this is

what happens for me.

If they already have a CPA, you can find out from the email. And if you're me, you probably already have a bunch of

emails from your CPA bugging you for the specific accounts.

We're sort of at the end of office hours, but as you can see, we already went from sort of a halfbaked rough idea for something that we might want to do.

I'm not saying this is actually a good startup idea, but you can see how this got farther along. We started with something that might start with OOTH and

then CPA's nagging emails, but in the end we realized, well, we have a browser and the browser could be used with browser automation to search the inbox,

find all of the 1099s that you need to download. It can also using LLM ask you

download. It can also using LLM ask you which bank portals you need to add to and it can go log in with your account and actually download the PDFs for you

and then send an email to the CPA. So I

really like this browser automation is a very outofpocket sort of unusual way to solve this problem. And the wild thing about coding models is, you know, a year

ago, two years ago, even like 3 months ago, it's not clear to me that anyone would even try this. I think that's the most interesting thing about uh our time

right now. You're able to have an idea

right now. You're able to have an idea and then get farther along with it than you ever would be. Frankly, sometimes I use office hours and maybe one in three times I get to the end of it and I say,

"You know what? This isn't something that makes sense." You'll notice that there's actually a feasibility aspect of office hours and that's one thing I really pride myself on in office hours working with startups. I have a very

strong opinion about how the world works and what might work and um it's just very interesting to see Opus 4.6 mirror that in trying to help you figure out

what your startup or product idea might be. Now, what it's doing is a multi-step

be. Now, what it's doing is a multi-step adversarial review. It's trying to put

adversarial review. It's trying to put your idea through the paces. And as you can see, it's already found a bunch of things and it's going to try to autofix it. There's no failure handling. There's

it. There's no failure handling. There's

no privacy section. 2FA handoff has no proposed solution. It actually tries to

proposed solution. It actually tries to auto fill out these things. And it if it can, it does. And so our doc survived two rounds of adversarial review. And it

automatically caught and fixed 16 issues. Um so we're going to approve

issues. Um so we're going to approve this design doc. So, as you can see, the adversarial review improved the score from 6 out of 10 to 8 out of 10 with three remaining issues that we can worry

about later. Now that we've locked in

about later. Now that we've locked in the adversarial review and addressed all these issues, uh, normally what I would do is run plan CEO review, but instead I

think what we're going to do is jump directly to design shotgun, which is one of my most fun uh, ways to use this. And

this is just one of a bunch of different design tools that are in the bag. So, it

figured out here's a bunch of different views. What do you want to actually

views. What do you want to actually design? And let's just do the main

design? And let's just do the main checklist dashboard. Design checks my

checklist dashboard. Design checks my visual brainstorming tool. So, it'll

actually generate multiple AI versions and then ask us questions about it.

These are three directions. It takes

about 60 seconds. it actually farms it out to uh OpenAI codecs which um is able to use image gen. So all right let's there's three versions command center friendly progress and split view. Let's

take a look. All right so let's let the agents cook and we'll be back in about 5 minutes. Great. The agents are done

minutes. Great. The agents are done cooking. And this is what we we got

cooking. And this is what we we got back. We got three different options for

back. We got three different options for the actual page that shows up in the command center for tracking down our tax documents. So let's look at them one by

documents. So let's look at them one by one. There's option A, B, and C. All

one. There's option A, B, and C. All

right, here's one command center.

There's a dashboard. Here's all the specific. I mean, this looks pretty

specific. I mean, this looks pretty good. If you can extract here are all

good. If you can extract here are all the banks and here are all the 1099s and where are they coming from? Um, and what their status is, that's pretty good. I

like that. I'm I'm going to give that a four out of five stars. Option B is like much more friendly. Um, so option A is sort of like if you're a Linux hacker, I bet you would really like this. But

option B, I think it's more friendly for just normal people. So I kind of like I I I might put that as a five. That might

be a pick. And then let's see. Option C.

This makes it way more complicated than it needs to. So I really wouldn't do that. Let's go with option B.

that. Let's go with option B.

Uh, and then the cool thing is if you don't like it, you can enter uh, you know, any of your feedback. You can

click regenerate. But in this case, we're just going to run with option B and continue. So that comes back in. And

and continue. So that comes back in. And

you know, as you can see, we're going to go ahead and select option B. And there

it is. So a friendly card-based approach with progress and the progress ring.

Good instinct. Variant B is locked in.

So while I have you, I mean, that is just two of 28 different commands. We've

got more than 70,000 uh stars now. And

some of the people who use it like they they actually talk about how when they're using cloud code they spend 80 to 90% of their time in office hours

plan CEO review and auto plan. Um this

is sort of a rough view of how that sprint process actually works. We

already talked about office space, but if you don't want to do a lot of back and forth, if you don't want to be in the weeds, I did create auto plan, which gets you through CEO, engineering,

design, and developer experience review using basically my default recommendations. Like these are sort of

recommendations. Like these are sort of programmed to be what I would do if I were you. There are a bunch of design

were you. There are a bunch of design skills that you can use after the code is actually done. Cloud code will actually build when you click approve on the plan and then after it's done

writing the code you can run review which does a staff level uh bug catching service that goes through puts the work

through the paces full code review uh finding bugs that might not have been in the plan mode and then the coolest part I think that um is actually an incredible amount of code is I wrote a

CLI around playrite and chromium So there's actually an entire headed and headless browser in there. And that was a real magic moment for me as I was

using cloud code as I sped up. Um

there's this idea of trying to get a to a level 8 software factory and GSAC does not get you to level 8, but I do think it gets you to level seven. And that's

where I can run multiple conductor windows on different projects and sometimes three or four all on the same project all at the same time. These are

parallel PRs with parallel branches and parallel different features that all can land more or less simultaneously. And

one of the bottlenecks I ran into was that, you know, once the agent was doing all the work of planning and design and coding it, I found myself sitting there

doing QA, probably the least fun part of software development. So that made it

software development. So that made it very, very important for me to try to automate that. And when I did, Claude in

automate that. And when I did, Claude in Chrome MCP is one of the worst pieces of software I've ever used. You know, every time it would try to do an action, it would think and think and think. There

was crazy context bloat. Often it

wouldn't even do anything, but it would take two to three seconds even when it was working to be able to take an action. And I was amazed that I could

action. And I was amazed that I could use all of my other skills in GStack to create the SLQA and SL browse tool. I

basically wrapped Playright at the CLI level. And now your cloud code and any

level. And now your cloud code and any agent now can actually just use the browser. And so you know, not only could

browser. And so you know, not only could it use the browser, it can take screenshots. It can do complex

screenshots. It can do complex interactions. It can click on things. It

interactions. It can click on things. It

can fill things out. Now it can even download media, run eventually full regression tests and update CSS and

assess real browser bug issues, whether it's JavaScript or CSS. And finally,

there's a ship tool. So, it's sort of the last step before to make sure that your PR is ready to land on main. And

this is actually how I work. I run 10 to 15 parallel cla code sessions all at the same time. I might in one session be

same time. I might in one session be running office hours on a brand new idea. And I actually now have multiple

idea. And I actually now have multiple open- source projects with tens of thousands of stars. And I'm probably sitting on about 400 uh PRs to review

right now. And so I almost always have

right now. And so I almost always have one or two sessions active for each project just evaluating and bringing in all the open- source fixes that I'm getting from the community. Uh and I

evaluate it in waves. Um one of the things that's been really scary in AI coding right now is supply chain attacks. So I'm really really paranoid

attacks. So I'm really really paranoid about it. But the great thing is I have

about it. But the great thing is I have GStack that has my back. So I don't have a to-do list anymore. One of the things that has emerged is I actually click on whenever I have an idea or I get a bug

report from a user or I see something on X where someone's frustrated with what GStack or GBrain does, I just click the plus icon in Conductor. It creates a new work tree and each one of these things

is a new work item. And all I have to do is run office hours, CEO review, end review, uh adversarial review, and then I just run my normal process. when it's

ready to land, it lands and I can do 10, 15, 20, sometimes 50 PRs in any given day, depending on the number of meetings I have in that day. So that's it. Uh,

GStack is available right now. Just go

to github.com/gritan/GStack.

When you run/off hours, you're getting a version of the real product thinking we do at YC with founders. similar push

back and similar reframing before you ever meet us. Give it a try and let me know what you think. This is the most incredible time in history to build

software. The barrier to building just

software. The barrier to building just collapsed. The only question left is

collapsed. The only question left is what are you going to build? It's time

to let it rip. Go make something people want.

Loading...

Loading video analysis...