How to Build a Self-Improving Company with AI

By Y Combinator

Summary

Topics Covered

AI Makes the Traditional Hierarchy Obsolete
Build a Self-Improving Company Loop
Burn Tokens, Not Headcount
Record Everything to Make Your Company Legible
Software Is Ephemeral, Context Is King

Full Transcript

This is based a little bit off a talk Diana gave. There's a video up over the

Diana gave. There's a video up over the weekend which is super cool. Um Jack

Dorsey was tweeting some stuff like two or three weeks ago that I thought was super cool and I've kind of um stolen a bunch of those ideas and shove them into here. This talk is like pretty

here. This talk is like pretty conceptual and high level about thinking about how to build companies. So the

Roman legions were designed to project power over two continents or something from Rome at the center to like these people on Hadron's wall up in

Scotland. And the idea was um this

Scotland. And the idea was um this nested hierarchies with consistent spans of control and you had like named individual with spans of control to pass

orders down and send information back up the hierarchy. And if you think about

the hierarchy. And if you think about most companies today, they are organized like a Roman legion where human beings are the conduit for information flowing up and down. And so Jack Dorsey's tweet which I thought was great was it's like

this underlying assumption that hierarchically organized companies are the are the way that we should be organizing like our economic units of value. And I think AI basically breaks

value. And I think AI basically breaks that. If you talk to people a year ago

that. If you talk to people a year ago about how AI was useful, they talked about productivity, like co-pilots, making engineers 20% more productive,

adding co-pilots to workflows, shipping more software. But I think that is

more software. But I think that is actually a broken way of thinking about AI. That's like Pete had a great blog

AI. That's like Pete had a great blog post. We're basically just like taking

post. We're basically just like taking the old way of working and adding like a more powerful engine onto it. And

instead of that, I think you can reimagine like what a company is and how it acts. And so as Gary's talking like

it acts. And so as Gary's talking like he I genuinely believe can produce more code than an entire engineering team.

The thing that's really stuck with me is this idea of like extracting the domain knowledge from your company and defining it as a as like context or a set of skills or whatever you want to call it.

But like this idea that there's domain knowledge or business knowledge or like some knowhow that's inside the heads of people and in Slack messages and in

emails and in notion. All of this like information together defines how your company works. And if you can make that

company works. And if you can make that legible, you suddenly can can move from this hierarchal organization to a sort of intelligent AI powered organization

with AI native software. AI isn't the some it's not something you bolt onto the side of a company. It's not like a tool you give to your engineers to make them more productive. But I think you

can reimagine what a company is as a set of recursive self-improving AI loops. I

think this is really, really, really important because when it gets there, I think the company starts to self-improve even when you're sleeping. So, let me give you an example. Diana's talks about

this as well. this AI loop. You start

with like a sensor layer, which is like that's a fancy word, but really it might be like emails from your customers.

Might be support tickets, code changes, people canceling their subscription, product telemetry. It's like sensor data

product telemetry. It's like sensor data to get information from the outside world. And then a a policy layer,

world. And then a a policy layer, decision layer, like rules about what you can do, what it has to ask a human permission for, what it must log. A tool

layer, that's kind of Gary's skills and code. Like the tool layer is Gary's

code. Like the tool layer is Gary's code. It's basically deterministic APIs,

code. It's basically deterministic APIs, things like query my database or look at my calendar. Um, a set of tools that the

my calendar. Um, a set of tools that the the AI can call a quality gate like that might be evalistic checks, safety filters, human review for high-risk

stuff. and then a learning mechanism.

stuff. and then a learning mechanism.

It's like your system interacts with the real world, picks up where it doesn't work, and loops back into the top again.

And if you can run every single step of that without human intervention, without with minimal human intervention, your system gets better and better and better while you're sleeping. And I can give you actual examples of this that are

live right now. We started with an agent that you can ask and it it has deterministic tools to query our database. Pretty simple, like when did I

database. Pretty simple, like when did I last have office hours with this company? Then it got a little bit

company? Then it got a little bit smarter which was like for this company I'm doing offices hours with right now they need introductions for anyone in petrochemicals or something and it could query the database in different ways and

use rag and all sorts of stuff to like come up with five relevant founders for you to meet. But again this is like this is a sidekick right this is an agent this is like the old this is last year's version of how AI is making me better as

a group partner. It's making me 20 or 30% more effective. The aha moment for me came when we put a monitoring agent on top of that which looked at every

single query every single YC employee was doing and saw when it worked and when it did not work and when it did not work it's like oh why not what would

have made this query work do we need different deterministic tools do we need to update the skills file do we need a different database view do we need a new index and this happen this literally happens overnight now let's write the

code put in a merge request to the YC codebase have an agent review it and merge it and deploy it. So when a human comes the next day to ask the same query, it will now succeed. For me, that

was like the holy [ __ ] [ __ ] right?

That's not just AI making you 20 or 30% more valuable. It is the AI going

more valuable. It is the AI going through this loop to figure out how to self-improve. And I think basically if

self-improve. And I think basically if you can identify parts of your company that work like this and eliminate as have the human and kind of a monitoring of supervisory capacity,

you can just throw tokens at this problem and your company will get better. And so other examples might be

better. And so other examples might be if you have product analytics, having an agent go through your product analytics to to figure out what part of your sales funnel is presenting the highest amount

of friction, researching best practices, putting in place an AB test, running it for a week, picking the best version, and deploying it. Then doing that again and again and again for your product.

Just have a self-optimizing like product loop. Or you do it with customer service

loop. Or you do it with customer service queries. You have customer suggestions

queries. You have customer suggestions coming in and in and in. you triage it with a kind of you have to have an agent which is like your chief product officer and your chief technology officer who make kind of judgment calls about okay

this is a suggestion we just don't want to do we'll discard it but no this is a suggestion which is now in line with our road map um we can do it overnight let's write the code let's deploy it let's ship it to the customer without a human

being involved so I think if you can think about each part of your company as a self-improving like recursive AI loop it becomes very very different to this like hierarchically organized Roman legion from a company so what So like if

you want to do this, what are the implications? One is like burn tokens,

implications? One is like burn tokens, not headcount. We are seeing companies

not headcount. We are seeing companies get to demo day with about 5x more revenue per employee than they did 18 months ago. And I think that's going to

months ago. And I think that's going to continue to series A and series B. And

so I think you're going to be constrained on token usage, not on headcount really, really soon. The blunt

measure now is just like measuring everyone's token usage, which is obviously like dumb and gameable at the extreme, but directionally I think is correct. We're in the phase of like what

correct. We're in the phase of like what is possible right now and so everyone should be experimenting to the max to figure out what we can even do with this crazy new intelligence we have. As soon

as you turn it into a leaderboard and people get promoted or fired based on it, obviously it gets gamed, obviously that's dumb. But I think directionally

that's dumb. But I think directionally figuring out who in your organization is token maxing, who is not is like a good way to think about which employees you should be spending your time with. I

think middle management is done. I just

don't think you need middle management for this coordination problem. I think

AI should be doing it. And for me, there are two roles. Jack Dorsey has three. I

actually don't like the third one, so I deleted it. But there are two roles that

deleted it. But there are two roles that really, really matter for me. I think

everyone just has to be an IC now, a builder, an operator. And I think crucially having directly responsible individuals to get anything done I think you need a named human not a committee not a group of people just a single

person and I think you can build companies based on IC's effectively I think just middle management is is over so building this self-improving company that's a dream and by the way I think

like people are at the bleeding edge of this right now I'd be interested to see where you all are but it feels like people are like exploring the boundaries here I'm not sure anyone has a truly self-improving company in every

function. I might be wrong. You might

function. I might be wrong. You might

prove me wrong. What would I do? First

of all, this is really, really important. I would make the entire

important. I would make the entire organization legible to AI. What does

that mean? It means you've got to record everything.

Simplistically, all of our um partner emails. Now, if you email a YC partner,

emails. Now, if you email a YC partner, that email is in the YC database. Every

Slack message, every DM, every office hour we've started recording for the last three or four months. every single

thing that happens, if it is recorded, it happened to the AI. If it did not get recorded, it is it did not happen to your intelligence. You know what I mean?

your intelligence. You know what I mean?

And so, I was talking with some founders over here um just now and we're having like really good conversations about their company, but every conversation I had, I was like, "Fuck, I need to be recording this conversation." Because

some guy wanted an introduction to I can't even remember who the introduction was now. Who was that? I was talking to

was now. Who was that? I was talking to someone about and I promise you an introduction. said yes. And I said,

introduction. said yes. And I said, "Email me afterwards cuz I would I'm going to forget this. I'm going to talk to 20 people." Yeah. So, it needs to be on my phone or a clip or or smart glasses or we deck out every room with like microphones. But basically,

like microphones. But basically, everything needs to be recorded so that it can be legible to the AI. And then,

as Gary talked about like diorization, you cannot pump in 100,000 hours worth of recordings into a context window. So,

you have to diorize it. You have to basically aggregate it down, synthesize it into the important parts, and then give the AI breadcrumbs. It's like,

okay, so here's an example. Who's read

the user manual? The YC user manual.

Hopefully, everyone in this room has at least opened the user manual at one point in time, right? Like, it's fine.

It was written 5 to 10 years ago, most of it. It's kind of out of date. So, Haj

of it. It's kind of out of date. So, Haj

thought uh last weekend, since now we've got about 2,000 hours of recorded office hours in the last 3 months, why don't we regenerate the user manual? And so you can click like you give it a set of instructions. You basically diorize it

instructions. You basically diorize it down, synthes like categorize it into certain areas like fundraising, hiring, co-founder disputes, whatever. And then

write me a new user manual. And by the end of the weekend, he had 150 page user manual, which is dramatically better than the existing user manual. And now

we can also update it every single month. So our user manual becomes

month. So our user manual becomes self-improving. Every new piece of

self-improving. Every new piece of advice we give, it's compared with the existing user manual and either incorporated or thrown away. So the user manual becomes this up-to-date living brain of the advice we give to founders.

And obviously it doesn't stop as a user manual. You then pump it in as context

manual. You then pump it in as context to an AI agent and suddenly you can ask a super intelligent AI and get the combined wisdom of 16 YC partners in one,

but only if it's legible. So you have to record everything. The second point is

record everything. The second point is kind of the same, right? Like if it creates an artifact that can self-improve, it's legible. If it

doesn't, you throw it away. The third

point then is that every function can generate this used to say dashboards.

It's not just dashboards. It's on demand software. Codeex 55 is now good enough.

software. Codeex 55 is now good enough.

You can oneshot most simple inter like most internal software dashboards you can oneshot to a pretty high level of quality. I tried it over the weekend on

quality. I tried it over the weekend on a bunch of our stuff. It's just unreal.

So all of your internal operations teams should be sitting on this layer of like kind of intelligence understanding and then creating their own dashboards and their own workflows. And I would see

that those as entirely disposable. I

would very preciously store all the data. So as Gary said, he puts it all

data. So as Gary said, he puts it all all of his emails in markdown. Never

throw anything away, but then treat the the software as ephemeral. You can you can generate it, you can regenerate it.

The valuable part is like the comprehension inside people's heads of like this is how the function works.

This is how we run a YC event. Whatever

the software to actually run the event, you can generate for the event. You can

throw it away. The mo the models get smarter in a month or two. Throw the

software away. Give it your original set of instructions and regenerate the software. So I think the business

software. So I think the business context and and skills are the valuable part. I think the software on top of it

part. I think the software on top of it is ephemeral. So what what are humans

is ephemeral. So what what are humans for in this world? I think basically we're talking about a company brain and I know a bunch of people in this room are building this but the bit in the middle like all of your data, all of

your emails, your DMs, the skills, the knowhow that is like the company brain and I think the humans sit around the edge of this interfacing with the real world. So it's where this intelligence

world. So it's where this intelligence makes contact with reality. Human beings

reach into places the models can't go yet. That might be like a conference. It

yet. That might be like a conference. It

might be a I'm trying to think of examples. I would say a phone call, but

examples. I would say a phone call, but I think the AI can reach into phone calls pretty easily now. Um I think it's like novel situations, ethical considerations, high stakes moments, you know, it's like it's where the founder

comes to us and is like thinking about breaking up with their co-founder, right? It's like those real high stakes,

right? It's like those real high stakes, high emotion moments where you really want a human being. I think that's where the human fits for all of you like sales conversations. I think that's a human

conversations. I think that's a human being in the room for the next 20 years.

So the humans live I think around the edge and I'm over time and cool vision should bullhorn me. I will leave you this one question. If you were building

your company today would you start it in this shape for most of you you're small enough to build it right and so I don't think you have any excuse and I know there are a few of you who are in the

process of ripping up and rebuilding your company. So with that I will stop

your company. So with that I will stop um and we'll hand over to Pete. Thank

you for listening.

Loading...

Loading video analysis...