Windsurf CEO: Betting On AI Agents, Pivoting In 48 Hours, And The Future of Coding

By Y Combinator

Summary

## Key takeaways - **Weekend Pivot to AI Coding**: In mid-2022, with $28 million raised and $2 million in revenue from GPU virtualization, Varun Mohan and his team of eight pivoted over a single weekend to building Codeium after realizing transformers would commoditize their infrastructure business. [02:37], [03:35] - **Irrational Optimism Meets Realism**: Startups need irrational optimism to act boldly, like betting against GitHub Copilot despite its advantages, combined with uncompromising realism to pivot when facts change, such as abandoning GPU infra for AI tools. [07:00], [07:14] - **Custom Models Outpace Copilot**: After shipping a free VS Code extension worse than Copilot, the team trained their own models from scratch in two months using their GPU expertise, enabling infilling incomplete code and surpassing Copilot's autocomplete by early 2023. [08:22], [09:01] - **Enterprise Wins via Multi-IDE Support**: To serve enterprises like JP Morgan with diverse IDEs, they built shared infrastructure for VS Code, JetBrains, Eclipse, and Vim early on, landing pilots with massive codebases over 100 million lines within months. [10:48], [12:14] - **Fork to Windsurf in Under 3 Months**: In mid-2023, believing agents would dominate, the team of under 25 engineers forked VS Code to create Windsurf, shipping across OSes in less than three months to better showcase agent capabilities beyond extensions. [15:51], [16:46] - **Non-Technical Vibe Coding Success**: Non-technical users, like a partnerships lead, build apps in Windsurf's Cascade agent without touching code, using browser previews and domain expertise to replace sales tools, with one person deploying many prototypes securely. [18:40], [19:25]

Topics Covered

Why pivot a profitable startup in 2022?
Balance optimism with uncompromising realism?
Build evals before complex AI systems?
Will everyone become a software builder?

Full Transcript

One of the things that I think is is true for any startup is you have to keep proving yourself.

Every single insight that we have is a depreciating insight.

You look at a company like Nvidia.

If Nvidia doesn't innovate in the next two years, AMD will be on their case.

That's why I'm completely okay with a lot of our insights being wrong. Um if we don't continually have insights that we are executing on, uh we are just slowly dying.

This notion of just a developer is is probably going to broaden out to what's called a builder. And uh I think everyone is going to be a builder.

I think software is going to be this very very democratized thing.

[Music] Welcome back to another episode of the Light Cone.

Today we've got a real treat.

We have the co-founder and CEO of Windsurf, one of the people who literally brought Vibe Coding into existence, Verun.

Thanks for joining us.

Thanks for having me, guys.

Where is Windinsurf now? Like all of us intuitively know.

I mean, well, we use it, but you know, how big is it now?

Where is it? Yeah. So, the the product has had well over a million developers kind of use uh the product um has has hundreds of thousands of daily active users right now. It's being used for all sorts of things from modifying large code bases to building apps extremely quickly, 0ero to one. Um and uh and we're super excited to see where the technology is going. Let's get to the brass tax.

How did you get started?

The company actually started uh four years ago.

uh we didn't start as windsurf uh we actually started as a company called exofunction we were a GPU virtualization company at the time uh previously me and my co-founder had worked on autonomous vehicles and ARVR and we believe deep learning was going to transform many many industries uh from financial services to defense to healthcare uh many industries yeah uh we might have timed it wrong though ultimately we built a system to make it easier to run these deep learning workloads uh and similar to what VMware does for computers and CPUs. We did that for GPUs.

Um the middle of 2022 rolled around though and what happened at the time was we were managing upwards of 10,000 GPUs for a handful of companies and we had made it to a couple million in revenue but the transformer became very popular with these models like text Da Vinci from OpenAI and we felt that that was going to fundamentally disrupt the business we had or that small small business that we had at the time uh because we felt that everyone was going to run these transformer type models and in a world in which everyone was going to run one type of model architecture transformers we thought if we were a GPU infrastructure provider, we would get commoditized, right?

If everyone's going to do the same thing, what is our alpha alpha going to be? So, at the time, we basically said, hey, could we take our technology and wholesale pivot the company to do something else?

And that's that's scary. Yeah, that was that was a a bet the company moment. We did it within a weekend. Uh so, you know weekend me and my co-founder had a conversation along the lines of I don't think this is going to work.

We don't think we know how to scale this company.

Um and uh at the time, we were early adopters of GitHub copilot. We told the rest of the company on Monday and uh everyone started working on Kodium which is the extension product uh the Monday immediately after.

I'm just curious to like dig into this pivot story because it's like pretty rare actually to hear like the details of a pivot especially a late stage pivot like at the time that you guys decided to pivot to codium.

How far along was the company? How many how many people One of the things about the company was I guess like we tried to embrace a lot of the YC uh sort of analogies here of you know ramen profitability all these other uh sort of key insights.

We were only a team of eight people at the time even though we were making a couple million in revenue.

We were kind of like free cash flow positive.

It was the peak of zero interest rate uh at that time.

So the company was a year and a half old.

We had raised somehow magically $28 million of of cash at the time. And uh I think the big point here uh in our in our minds was um it doesn't matter if we're doing kind of well now if we didn't know how to scale it, we kind of need to change things like really fast.

I guess the thing that's remarkable is when you started the company, you had this thesis where you were betting that a lot of companies were going to build their own custom deep learning pipelines to train birds, right?

That was like the thing that was working. But in 2022, you saw the hockey stick shift that suddenly there would be one model that would rule them all.

So you were foreshadowing a lot of the future and there's a lot of it that came from that conviction.

So I'm curious what were those signs because you had to be really embedded into it to we're making already seven figures.

You could have raised a series A.

You're like we're going to throw that all that away and burn it. Yeah.

So actually even crazily enough we had raised our series A at that time.

So but whether or not we should have been able to is a is a different question.

No, I think you're you're totally right.

I think one of the things that was happening at the time was we were working with largely speaking these autonomous vehicle companies because they had the largest sort of deep learning workloads at the time and we were sort of seeing hey that workload is growing and it's large but we were betting fundamentally that the other workloads which were these other natural language workloads uh kind of like these workloads in these other industries like financial services healthcare would take off but I think once we saw these these models these generative models handle so many of the use cases right uh maybe an example is in the past you would train a BERT model uh to actually go ahead and do sentiment classification.

But very quickly when we tried even a bad version of GPT3 like the very old version, we were like this is going to kill sentiment classification.

There's no reason why anyone is going to train a very custom model anymore for this task.

I think we saw the writing on the wall that our hypothesis was just wrong right?

And uh and that's like one of those things, right? you go in with some thesis on on uh on where you believe the space is going to go, but if your hypothesis is wrong and and you know information on the ground changes, you have to change really fast. So then what did you decide to do? So it's like you decided, okay, we're going to pivot and when we work with founders, that's kind of stage one. So you're not half foot in, half foot out. So you had that conviction, we need to try something out.

How do you figure out what was going to be the next step? I think we needed to pick something that actually everyone at the company was going to be excited about.

I think if we had picked something that was was what we thought could be valuable but people were not excited about we would ultimately we would fail we'd fail immediately.

We came with an opinionated stance of we were early adopters of a product called GitHub copilot.

We thought that that was the tip of the iceberg on where the technology could go.

Obviously everyone at the company was a developer.

Uh dev tool companies generally speaking usually don't do that well in the in the past.

They have not. But hey like when you have no other options it's a very easy decision, right? Like you're going to be a zero with a high probability anyways.

uh you might as well pick something that you think could be valuable and everyone's going to be motivated to work on.

I everyone's forgot this now it feels like because GitHub copilot's in the background but at that particular moment it felt inevitable that copilot was going to win right like it just had everything like the GitHub connection Microsoft distribution like open AI open AI yeah like it it would it seemed like no one could compete so how did you have like the bravery to be like ah yeah we can totally crush yeah co-pilot yeah so this is where uh there's there's like an irrational optimism piece I've said this before but uh to the company.

But I think startups require like two distinct beliefs and they're they actually kind of like run counter uh to to each other.

You need this irrational optimism because if you don't have the optimism you just won't do anything.

You're just a pessimist and a skeptic and those people don't really accomplish anything in life.

Um and you need uncompromising realism, which is that when the when the facts change, you actually change your mind, right?

Uh and that's a very hard thing to do both because the thing that makes you succeed through irrational optimism like you know is exactly opposite to the things that allow you to be a very realistic company.

So irrational optimism we basically said hey we know how to run and train models ourselves.

We actually trained the first autocomplete models ourselves and ran it on our product gave it out for free.

I don't think we had the exact road map on where this was going but we just felt there was a lot more to do here.

Um and if the if uh if we couldn't do it then I guess we'd die. uh but we might as well bet that we could do it. Were your early versions better than copilot GitHub copilot at the time? So our earliest version that we shipped out was materially worse than GitHub copilot.

The only difference was it was free.

Uh we built a VS code extension after pivot within I think 2 months we had shipped the product and given it out to hackernews uh like posted something on hackernews.

Um and uh we we built that out.

it was missing a lot of key features like the model that we were running was like an open source model that was not nearly good as the model that GitHub copilot was running.

Uh very quickly then our training infrastructure got better.

So we actually went out and trained our own models uh based on the task and then suddenly it actually got capabilities that even GitHub copilot didn't within 2 months.

Basic capabilities. Now we'd find this like hilarious that it was even state-of-the-art but our our model could actually fill in the middle of code.

So when you're writing code, you're not only just adding code at the end of your cursor, but you're filling it in between two parts between two parts of a line right?

And that code is very incomplete and looks nothing like the training data of these original models, right?

So we trained our models to make it actually very very capable for that use case.

And uh that actually allowed us to pull ahead in terms of the quality, latency um we were able to control a lot of details within a couple months.

So the beginning of 2023, I'd say the autocomplete capabilities were much better than what co-pilot had.

Was that a totally new capability for you guys?

Cuz like you guys have been building GPU infrastructure.

It sounds like you basically hacked together the first version by taking an off-the-shelf open source model, sticking it into like a VS Code extension and just kind of like wiring the two to like talk to each other.

But then right after that, you had to train your own coding model like from scratch.

And you guys have been like following the transformer stuff but like you hadn't built it.

Like in order to do that, I assume you had to like download all of GitHub and like train a whole model from scratch.

How did you all figure out how to do that in like only two months? Yeah. Um it's a it's a great question. So first of all when we ran the model ourselves, the reason why we were able to run it and provide it for free is we actually had our own inference runtime at the time and that that obviously came from the fact that we were originally a GPU virtualization company to start with.

So that enabled us to actually ship the vzero with an open source product uh quite quickly.

Immediately after that you're totally right. We had never trained a model like this in the past.

Uh but I think we hired people that were smart, capable, and and uh excited to to win.

Uh so we needed to figure it out.

There is no other option, right?

otherwise you die. Um, makes the decision-m really really simple.

So yeah, we had to figure out how how do you get a lot of data? How do you do this at scale? How do you clean this data?

How do you make it so that it's actually capable of of uh handling this case where code is very incomplete?

And uh we shipped a we shipped a a model like very very quickly after that. Wow.

And you did all of that with like eightish people in two months.

Yeah that's right. And then right after that because you were running your own models you started getting interesting customers right?

Yeah. So basically what happened the product was free at the time.

So we ended up getting a lot of developers using the product across all the ids.

So VS code jet brains um Eclipse Vim company started reaching out because they not wanted to not only I guess run the product in a secure way.

They also wanted to personalize it to all the private data inside the company.

So very quickly afterwards in the next coming months companies like Dell, JP Morgan Chase started to become customers of our of our product. Um now these companies have like tens of thousands of developers on the product internally.

But we started actually m making a lot of focus of the company.

Making sure that the product works on these very large code bases. Some of these companies have code bases that are well over a 100 million lines of code, right?

And making sure that the suggestions are are fast is one thing, but making sure it's actually personalized to the codebase and the environment that they have was almost a requirement at the time.

You did that pivot, you built it in two months, then shipped it, and within a couple of months, you got these big logos.

Yeah. So, I mean, obviously these companies take some time to time to close, but pilots were starting uh within a within like a couple months or a quarter after that. Uh, you know obviously we had no sales people at the company, so like the founding team was just trying to run as many pilots as possible.

to see what would ultimately work.

At what point did you expand beyond just the VS Code extension into supporting all these other IDEs?

That was actually very very soon afterwards.

Um how did you think about that?

Like you know there's like one argument that you could make which is like there's lots of VS Code developers. You had a tiny team.

You could have made the argument that I just focus on building a great experience for VS Code.

You'd only captured a tiny percentage of the market of all possible VS VS Code developers.

And that that's not what you did.

you expanded like horizontally very quickly and built extensions for all those IDEs.

Why? I think maybe the fundamental reason that that we thought was quite critical is if we were going to work with companies, companies have developers that write in many languages.

Uh like for instance uh like you know a company like JP Morgan Chase might have over half of their developers writing in Java and for those developers they are going to use Jetrains and Intelligj.

Intelligj is over 70 to 80% of all Java developers in the world currently use uh Intelligj right now. So we would just need to turn away a lot of companies like a lot of companies would not be able to use us as the de facto solution like we'd be one of many solutions inside the company. Um so because of that we made the decision but luckily because we made the decision early enough it changed the architecture of how we built the product out which is to say we are not like building a separate version of the product for every single ID.

We have a lot of shared infrastructure that actually lives on a per editor basis. So it's actually a very very small amount of code that actually needs to get written to make sure we can support as many ids as possible.

Um, so this is one of those things that an early decision that we made ended up making it much easier to make this transition. How about the transition from Kodium to WinSurf?

At the time we, you know, now now we're probably going to middle of 2023.

We start working with some very large enterprises.

Within the next year, like the business has gotten well over sort of eight figures in in in revenue um just from these enterprises using the product and we have this like free individual product.

But I think one of the things about this industry that we all kind of know is the space moves really really fast and uh we basically are always making bets on things that are not working right actually most of the bets we make in the company don't work and I'm excited uh when I'm like happy when we are when we're let's say only 50% of the things we're doing are actually working because I think when if a 100% of the things we're doing uh are working um I think like it's a very bad sign for us because it's probably like one of one of maybe three things.

Um the first thing it it it is is like hey we're not trying hard enough.

Uh right that's that's probably what it means.

The second thing is we somehow have a lot of hubris right and the hubris is like we believe everything we do is right even despite the facts that are that are sort of on the ground.

And then um the the sort of third key piece is here is we're not actually testing our hypotheses in a way that like tells us where the future is going.

We're not actually at the frontier of what the capabilities and technology um ultimately is.

We believed actually in the very beginning of last year that agents were going to be extremely huge and we had prototypes of this in the beginning of last year and and they just didn't work.

But there were different pieces we were building that we felt were going to be important to making agents work which is understanding large code bases understanding the intent of developers making edits on the codebase really really quickly and we had all these pieces.

The thing we didn't have is a model that was capable of calling these tools efficiently enough.

And then obviously in the middle of last year that completely changed with the advent of like set um 35 right and with that we basically said okay we now have these agent capabilities but the ceiling of what is we can show to our developers on VS code was limited. We were not able to provide a good enough experience and we thought what was going to happen is developers would spend way more time not writing software but reviewing software that the AI was going to put out.

We are I think a technology company at heart and I you know I I think we are a we are a product company but I think the product serves the technology which is to say we want to make the product as good as possible to make it so that people can experience the technology right and we felt that we were you know with VS code we were not able to do that so the middle of last year we decided hey like we need to actually go out and actually have our own ID out there um so that's what that's what triggered actually creating winerf the way that you did that was you forked VS code for VS code was got a whole new set of capabilities that you guys had to learn like basically how to like develop on this VS codebase that I'm sure is super complicated.

Yeah, we needed to figure that out.

Uh that was one once again another thing where uh we ended up shipping windsurf out in uh within uh less than 3 months of starting the project.

Uh that's when we we shipped it out across all operating systems. Wow.

And what what happened like was did it like take off immediately or was it unnoticed for for a long time?

It took off pretty quickly I would say.

Um I think the speed at which it took off among early adopters was was quite high.

There were obviously some very rough edges and this is like one of those things where you know because of the rough edges obviously people started coming and leaving the platform fairly quickly.

But what we saw was like as we improved the capabilities of the agent as we improved the capabilities of the passive experience even the passive tab experience has made massive leaps um in the last like couple months.

We started realizing that not only were people talking about the product more and more uh people were also staying on the product uh more and more at a higher rate.

How many people worked on shipping windsurf?

And it was done in a period of one or two months. Uh a couple months.

So yeah, like less than three months.

Um this was another I wouldn't say it's a bet the company moment because it's not a fundamentally different uh sort of paradigm compared to like moving from a GPU virtualization product to a to an like a an AI code product. But yeah, it was uh anyone that could work on it needed to kind of like drop what they were working on in the past and work on it immediately.

And at that time, how how big uh were you guys?

The engineering team was probably still less than 25 people. Wow. This is crazy.

Interestingly, our company actually from an employee standpoint actually didn't have that few people. We actually had a fairly large go to market team because it's in the AI space. One thing that's like a little bit weird about our company compared to most other companies is we have a fairly large go to market team.

Like we were selling our product to the largest Fortune 500 companies.

It's very hard to do that purely by letting them swipe a credit card.

Uh you need a lot of support. you need to make sure that the that the technology is like getting adopted properly.

Um, which is very different than just give the people the product and see it grow um effectively.

So from an engineering standpoint, we've always run fairly lean.

Uh but because of the market interest, we've always had a lot of people in go to market actually.

Who are like sort of the ideal people to go into that function?

Is it you know really good engineers who want to be forward deployed?

Yeah, we so we have two components of it. We have account executives.

These are folks that um in general we we try to find people that are very curious and excited about the capabilities.

In fact, people that would use windsurf in their free time because they're they're providing the product to leaders who also love software and technology, right?

So, if they're if they're just completely just unaware of the technology, they're not going to be helpful.

And then we also have these deployed engineer uh like sort of roles similar to what you said uh that get their hands really dirty with the technology and make sure that our customers get the most value from the from the technology. I mean the wild thing is uh because everyone uses wind surf it sounded like um you're having even these AEES who are non-technical become like just vibe coding champions.

Yeah. No, one of our biggest users of Windsurf at the company is a is a nontechnical person who leads u uh like partnerships uh at the company.

Uh he has actually replaced buying a bunch of sales tools uh inside the company.

uh and uh this is one of those things where I think windsurf is giving power back to the domain experts right in the past what would happen in an organization is he would need to talk to a product manager who would talk to an engineer who and the engineer would have a large backlog because there this clearly doesn't immediately make the product better right so this has to be a lower priority but now he is actually kind of uh empowered to actually go and build these apps does he have any programming background at all interesting cuz that's definitely one of the controversies on Twitter at the moment is just like is vibe can you actually vibe code unless you know some amount of coding already.

Yeah. One of the things we do have is if we do need to deploy one of these apps we have a person that actually focuses on making sure that these apps are secure and deployed. But the amount of leverage that that person has is ridiculous.

Instead of him going out and building all of these apps, the Vzero could actually get built by by people that are domain experts but non-technical inside the company.

With the Kodium launch, you went head-on against like Microsoft and GitHub and like you know these incumb huge incumbents.

with the ID launch, you went sort of head-on against cursor.

It's like the hot startup of the moment and like um I know again how was how did you all think about that internally?

This might be a weird thing about our company, but our company just doesn't have like like morale is not really affected by what other companies do right?

Uh it's not possible if uh like our company has gone through a lot of basically very turbulent times.

the fact that we needed to pivot at 10 uh 10 10 employees and just completely kill our idea is like a normal thing for the company and then second of all kind of like the companies that are relevant in our space has always been a fluctuating uh set of companies like you know I I really respect all the companies in our space but yeah co-pilot if you were to go to the beginning of 2023 everyone would have thought GitHub copilot was the was the product that everyone would be would use and there was no point building and in the middle uh kind of Devon came out and like everyone was like hey like Devon is going to solve everything, right?

Um, and and I'm sure they're doing good work now. Uh, but and then after that, obviously, Curser is is doing a really great job. So, I think what really matters to us most is are we actually do we have a good long-term strategy and are we executing in a way where we're getting towards that long-term strategy while being flexible with the details, right? And as long as we're doing that, I think we we have a fighter's chance, right? And that's like the way we've always Do you educate yourself at all on the competitor's products, though?

Yeah. Yeah. I think we don't want to put our heads in the sand and kind of tell ourselves our product is awesome.

um and uh and and just kind of because it's very easy to do that um especially given the fact that before we worked on Windsor the company was also growing very very quickly from like a revenue standpoint.

What sort of opinions did you have or what taste or opinions did you have for the full IDE um that was sort of maybe different to cursor?

I'm actually asking is curs is a very well-liked product obviously and so at a product level why are you like oh yeah like actually we want to build it this way.

Yeah. No, I think it's a great question.

Uh so maybe the first point is at the time actually when we started working on windsurf all the products were basically chat and this autocomplete uh capability.

I think that's basically what uh GitHub Copel was, what cursor was at the time.

I think we took a very opinionated stance that we thought agents were were uh where the technology was actually going.

We were the first agentic editor that was out there. Um and uh I think the biggest sort of takeaway was we didn't believe in this kind of paradigm where everyone would be at mentioning everything, right?

This almost reminded us of like the anti pattern of what Google and and these search engines were before like Google improved their product a lot which was kind of like these landing pages that had like every distinct kind of like bucket of things you could search for. But Google came out with this very clean search box.

Even Google at the time you would need you would get better answers if you wrote and or uh like site link.

Um and now it's gotten way better, right?

And I guess we had a belief that the the software would get more and more easy to build, right?

And we would build from that starting point when we saw all the other players in the space making their product so configurable in a way that we thought was I think good for users now for what the technology was but something that wouldn't be unnecessary down the line. So we invested in capabilities like how do you deeply understand the codebase to understand the intent of the developer? How do you how do you actually go out and uh and and make changes in a way that's like very quick um to the codebase.

We took the approach of hey instead of having this readonly system where you tag everything what happens if you could make changes very quickly and that's why like at the time we we were kind of the first to do that now if you were to ask like is that a very obvious decision now I think it's very obvious now it looks very obvious and this is where like one of the things that I think is is true for any startup is you have to keep proving yourself right like every single insight that we have is a depreciating insight it is a very very depreciating insight like technology the reason why companies win um at any given point is not like they had a tech they had a tech insight like one year ago, right? Actually, if a company wins, um other than the fact that they have, you know, a monopoly, uh you know, it's it's a it's actually like a compounding tech advantage that keeps sort of existing over and over again.

And I think the example of this that I find most most exciting is you look at a company like Nvidia. If Nvidia doesn't innovate in the next two years, AMD will be on their case, right? And Nvidia will not be able to make 60 70% gross margins at that point, right? um even though it's like one of the largest companies in existence uh right now by basically having good insights to start with you're able to learn from the market and maybe compound that advantage with time and that's the only thing that that is uh that could be persistent.

It sounds like a moat is uh you know we think of it as a noun but it's actually a verb.

Yeah. Something that could change with time right I also think for us and I tell the company this if we're not continuing to have insights and that's why I'm completely okay with a lot of our insights being wrong. um if we don't continually have insights that we are executing on we are just slowly dying um that's like what's actually happening I think the interesting thing is that is easier now looking back and connecting the dots on your journey how a lot of these technology bets you took actually did end up compounding what windsurf ended up becoming right like it was happen stance that you being really good at GPU deployment VMware optimization ended up being the thing to be really good of you at being blazingly fast autocomplete because it's faster than other products Right. So that kind of compounded there.

There's the aspect of you building all these uh plugins for enterprises and being so good at reading large code bases and you did something that was contrarian. There was a lot of products and we work with OC companies.

A lot of uh codegen tools use uh vector databases because we work with a lot of companies and that's was the standard approach how a lot of folks were building.

But you guys did something very different, right? Yeah. So, one of the things that sort of uh I think got really popular is this term rag got very popular.

You weren't anti-rag. Yeah.

I don't know if we're anti anti-rag.

Rag obviously makes sense. You do want to retrieve some stuff and based on the retrieval you want to generate some stuff.

So, I guess the idea is is correct that you know everything is retrieval augmented generation.

But I think what people got maybe a little too opinionated about was like the way rag is implemented.

It has to be a vector database that you go out and search.

I think a vector database is a tool in the toolkit.

Right? If you were to think about what users ultimately want, they want great answers and they want uh great agents.

That's what they actually want.

And how do you end up doing that?

You need to make sure that what's in the context is as relevant as possible.

So what we ended up doing is having a series of systems that enable us to pack the context with the most relevant snippets of code. And the way we ultimately did that was it was a combination of keyword search, rag abstract syntax tree parsing and then on top of that using as you mentioned all the GPU infrastructure we have to take large chunks of the codebase and in real time rank it right as the query is coming in right and we found that that is like the best way for us to find the best context for the user. And the kind of the motivation for this is because people have kind of weird questions.

They might have a question for a large codebase of upgrade all versions of of uh of this API to this API.

And if embedding search only finds five of them, it's not of the 10, it's not a very useful a useful feature at that point.

So we needed to make sure the precision recall was as high as possible, which meant that we used a series of technologies to actually get to the get to the the best solution.

There's a bit of a thing going on with a lot of AI startups getting built taking too much of a intellectual shortcut with what's what works for the problem domain space but you took it from first principle right so you build like a way more complex system that did a parson sing all this stuff which is like cool yeah I think maybe one of the things that I that is potentially interesting to discuss there is is uh we started off a lot of the companies started off working on autonomous vehicles and the reason why that's kind of important is these are systems where you can't just yolo these systems, which is to say you build the software and then you kind of like let it run. You need really good evaluation.

I think at the company we don't strive for complexity.

We strive for what works. Uh and then the question then is like why is the system so much more complex now? It was because we built really good evaluation systems. Oh, interesting.

How do the evals work?

Yeah, so the evals for code are actually really cool.

Uh basically the idea is code you can leverage a property of code which it can be run, right? uh and uh we not only have real-time user data, we can put that aside for now, but we can take a lot of open source projects and find I guess commits in these open source projects uh with tests attached to them.

So you can imagine a lot of cool things we can do based on that.

You can take the intent of a commit, delete all the code that is not the unit test right?

And then you can see, hey, are you able to retrieve the parts where the change needs to get made? Do you have a good highle intent to make those changes?

And then after making the changes, does the test pass? you can do that task, you can mask the task.

And by masking the task, it's it's more like more like the the Google task.

And what I mean by the Google task is it's trying to predict your intent, which is to say let's say you only put in a third of the change, but you don't get the intent.

Can you then fill out the rest to make the test pass? So there's so many ways you can slice this. And each of them you can break it down into so much granularity.

You can be like, what is my retrieval accuracy?

What is my intent accuracy?

What is my um what is my passing?

What is my test passing accuracy?

You can do that. And then now you have a hill to climb. And I think that's actually important before you add a lot of complexity for any of these AI apps.

I think you need to like make a rigorous hill that you can actually climb, right?

Otherwise, you're just shooting in the dark, right?

Why would we add the ASG parsing if it's unnecessary?

Actually, it's awesome if it was unnecessary. Um, right?

I don't want to add a lot of complex stuff to our code.

In fact, I want the I want the simplest code that ends up doing the uh having the most impact. So, the eval really really critical for us to make a lot of these investments at the company.

How much of the development that you do is like basically driven by improving the scores on the eval versus like basically vibes based like you guys are all using winerve yourself you're getting feedback from users all the time and then you have just like a sense that this thing is going to work better and then the eval are just sort of like a check that you didn't screw up something else it's a little bit of both but obviously for some kinds of systems I think eval are more important than vi but like more easy than vibes uh just because for the system that basically takes a large chunk of the code chunks it up and passes it to hundreds of GPUs in parallel giving you a result in one It's very hard to have an intuition of like is this way better because that's a very complex sort of retrieval question.

Um but on the other hand there are much easier things that from a vibe perspective are valuable. What if we looked at the open files in a codebase?

This is actually harder thing to eval because when you're evaling you don't know what the user is doing in real time.

This is one of those cases where having a product and market helps us a lot right and we're able to take a lot of user data on how people use the product to actually actively make the product much better. So that's maybe starts with vibes and then after that you can build evo right um so it's it's a little bit of both basically I think a lot of uh there's been a lot of chatter on the internet about vibes code is only for toy apps windsurf is actually being used for real production large code bases can you tell us about how the power users use it for like more hardcore engineering so this is an interesting thing where a lot of us at the company I I'm not saying that this is this is common didn't get a tremendous value from chatbt in the way that probably a lot of the rest of the world did.

Um, and that's not because Chajbt is not a useful product.

I think Chajbt is an incredibly useful product.

It's actually because a lot of them had already used things like like Stack Overflow at the time. And Stack Overflow is a worse version than Chad GBT for the kinds of questions you want, but that was just a thing that they already knew how to use, right? Um, and so they were able to get away with not using or relying on chat nearly as much.

But basically what happened is very recently with agents, the agent is making larger and larger scale changes with time.

And I think what developers now at our company do is they have felt the hills and valleys of this product which is to say if you don't provide enough intent.

It actually goes out and changes way more of the code than you actually need.

Right? Um and this is like a real problem with with the with the tool right now.

But they understand the hills and valleys and now the very first time they have a task. They are putting it into windsurf.

They're not their first thing is not to actually go out and type in in the actual editor.

It's to actually put the intent and actually uh and actually make those changes.

Um and uh they're doing kind of very interesting things now like deploying our our software uh to our servers actually now gets done with the workflows that are entirely built um inside Windsurf.

Uh so just a lot of boilerplate and repetitive tasks have been completely uh like sort of eliminated inside our company.

But the reason why this is possible is kind of because we're able to operate over a codebase that has many millions of lines of code uh really really effectively.

So if you were to give some tips to the audience, how should a user that uses windsurf properly provide this intent so that the changes are more surgical?

Because what you're saying with the agents creating all these broad changes I've seen that happen. But how how do you get those precise changes?

What do you do? How do you how do you feed the system?

How do you become a shouted it with all caps, right? Yeah. Um, no.

So I I think this is one of those things where uh I think you kind of need to need to have a little bit of faith in the system and let it kind of mess up a little bit.

Uh, which is which is kind of scary because I think a lot of people for the most part they will write off these tools really quickly.

Obviously no one at our company would ride off the tool because they're building the tools themselves.

I think people's expectations are very high and maybe that's like the the main piece of feedback I'd give which is that you know our product actually for these larger and larger changes it might make 90% of the changes correctly but if 10% is wrong people will just write off the entire tool and I think at that point actually probably the right thing to do is either revert the change we have an ability to actually revert the change or just keep going and see where see where it ultimately can go uh and and maybe the the most important aspect is commit your code like as frequently as possible I think that maybe that's like the big big sort of uh tip there which is that uh you know you don't want to get in a situation where you've made 20 changes and on top of that made some changes yourselves and you can't like revert it and then you get like very frustrated at the end of it and one thing I've wondered like in that vein is whether we need to change the way git works with this AI coding paradigm have you thought at all about whether doing git commit all the time is the right move or whether there needs to be like a deeper infrastructure change yeah I think we have so you One of the things that we always think of is in the future you're going to have many many agents running in parallel on your codebase.

That has some some trade-offs right?

If you have two agents that kind of modify the same piece of code at the same time, it's hard to actually know what's going on. That's another thing is that like it's hard to have multiple branches checked out at the same time with different agents working on them independently.

All the merge conflicts.

Oh god. Yeah, there's a lot of But hey like you know, that's how that's how real software development works too.

When you have a lot of engineers that operate on a codebase, they're all like kind of mcking around with the codebase at the same time. So that's not a very unique thing.

I think git is a great tool.

I think it's maybe a question of like how how can you skin git to work in a way that is that is uh that works for this for this product surface.

An example is git has these things called work trees which is like you can have many work trees and versions of the repository all in your all in the same directory.

Um and and you know perhaps you can have many of these agents working on different work trees.

Um or instead of exposing the branch concept to you, you actually can maintain a branch yourself that you can apply to the to the main branch of the user.

um kind of repeatedly one of the things that we think about at the company in terms of why we think our agent is really good is like we try to have a unified timeline on everything that happened.

The unified timeline is not just what the developer did but actually what the developer did in addition to what the agent did. So actually our product if you end up doing things in the editor if you end up doing things in the terminal right all of those things are captured and the intent is is actually kind of tracked in such a way where when you use the AI the AI knows right in that situation. So, in some ways, we'd like this thing where the agent is not operating on a completely different timeline, but it's like something that's kind of getting merged in um you know, at a fairly high cadence.

Um so, I think this is like an open problem.

I don't I don't think we have like the exact right answer for this.

What are other things that you envision changing about Winterf in the future?

How how is it going to evolve?

There's probably a lot of people that think the vibe coding is kind of a kind of a fad, but I think that's going to get more and more capable with time.

I think, you know, whenever I hear someone saying, "Hey, this is not going to work for this complex use case." It's it's like it feels like a lite saying something, right?

It's like if you look at the way these these uh these AIs have gotten better sort of year over year.

It's it's actually it's astonishing.

Like I'll give you an example of of something that I held like kind of near and dear to my heart which is you know there's this math Olympiad called Amy and I used to do that uh do that in in high school.

Um and you know I was very excited about about how well I would do.

My high score was somewhere close to 14.

And uh and uh that's a very high score.

Yeah. But but the but the crazy thing is that was one of those things that I thought oh wow like the AI systems they're they're not going to get anywhere near um as good and beginning of the year last year it was probably like well under five maybe um and now they're you know the average that open out is like it's getting 14 and a half to 15 right uh for O4 mini uh so it's almost like you have to keep projecting this out right it's going to get crazy and basically every part of the software development life cycle whether it be writing code reviewing code testing code debugging code designing code AI is going to be adding 10 times the amount of leverage very shortly. It's going to happen like much much more quickly than people imagine.

Going back to your current engineering team, I'm curious like if they have all this time freed up from you know not having to deal with like version upgrades and you know boring boiler plate stuff like what do they spend the extra time on? One of the things about our company and probably every startup uh that that is building in the space is the ceiling of where the technology can go is so high.

It's so high. So it's it's actually that you know if developers can spend less time doing boiler plate they can spend more time testing hypothesis for things that they're not sure work right in some ways engineering becomes a lot more of like a research kind of uh kind of culture right where you're testing hypothesis really really quickly um and that has some high cycle time attached to it right you need to go out and implement things you need to build the evaluation you need to test it out with our users but that's like the things that actually make the product way better does that mean you're going to hire a different type of engineer going forward like you're looking for different things Yeah, I think for engineers that we hire, we want to look for people with really high agency that are willing to be wrong uh and bold. But, you know weirdly, I don't know if that's that's changed for a startup, right?

Startup should never be hiring people that uh the reason why they're joining a company is to to very quickly write like boilerplate code, right? Because in some sense, and I don't want like, you know this is not the goal, but a startup can succeed even if they have extremely kind of ugly code, right? That's not usually the reason why a startup fails.

Sounds like my startup. Yeah. Yeah. Exactly.

That's not usually the reason why startup fails.

The reason why a startup fails is they didn't build a product that was differentially good uh for their users.

That's why they ultimately failed.

This is all true, but also like in reality, you always need some sort of like workh horses to just kind of get certain things done. I feel like in the old days this was like building Android apps.

Um it's like you hired someone to do it because there were very few people who would just be willing to do it.

Yeah. Like maybe in your vision for engineering like you don't need those like that's not actually a useful skill to have in an organization anymore because the AI is just like your infinite workhorse.

Is that fair? Yeah.

Maybe the the the sort of aspects of software that are really niche that are undesirable for a lot of people to do except a handful of people.

Those things kind of get democratized a lot more unless that has like a lot of depth attached to it, right? At least for the time being.

Um, yeah, if something is like, hey, like, you know, we need to change a system to use a new version.

Um, and there was someone that deeply always got in the weeds with version changes.

I don't think you have people that are just focused on that, right inside companies.

How about how you interview people?

Yeah, I think I think we have a fairly rigorous and high technical bar and that has a that's a combination of we give interviews that actually allow people to use the AI to kind of solve a problem because we want to validate if people like kind of hate these tools or not. there are still some developers that do and obviously if you do we're probably the wrong company to kind of work at but also at the same time we do have sort of interviews in person where uh kind of on site where we don't give them the AI and we want to see them think right it would be a bad thing if ultimately when someone needs to write a nested for loop they need to go to chatbt right and and I'm not like that's fundamentally because because uh it just it just feels like that is a good proxy for problem solving skills and I think problem solving skills are just at a high level still should go at a premium that That is the valuable skill that humans have. Yeah.

A challenge that a lot of companies we've talked to have had that that we've even had ourselves is that Windsurf has gotten so good that if you give people windsurf it's difficult to even come up with an interview question that Windsurf can't just oneshot where you know anyone can do it because you literally just copy and paste the question into Windsorf and hit enter and so you're not really evaluating anything at that point.

So I actually think that's true and it's it's it's you're totally right.

There's very few problems now that something like an O4 Mini is not able to solve now.

Um, right? I mean, if you look at competitive programming, it's just it's just in a league of its own already at this point. The crazy thing is interviews by nature are going to be kind of isolated problems, right?

They're by nature because if the problem actually required so much understanding to do, you wouldn't be able to explain the problem.

So, that's like perfect for the LMS where you give them an isolated problem where you can test and run code extremely quickly.

So, yeah, you're totally right.

Like I think if you tell if you only have algorithmic interviews and you let people use the AI, I don't know, you're not really testing anything at that point.

Does that mean that you've gone away from just algorithmic questions and you ask different like much harder questions that are actually well suited to being able being able to use an AI?

Yeah, we we have questions obviously where that are that are both system designy plus algorithms algorithms related.

But these are questions that are fairly open-ended, right?

Uh there may not be a correct answer.

there are there are trade-offs that you can ultimately make and I think what we want to do is just see how people think right given different trade-offs and different constraints right uh and and we're trying to validate for like intellectual curiosity right and if someone ultimately says I don't know why why why that's totally fine as long as like they've gone to a depth that we feel um kind of shows you kind of interest um interest and like good problem solving skills if that makes sense like you can tell when someone is curious right and wants to learn things it's like very obvious the next thought after is which might be counterintuitive. You're at the forefront of building all these AI coding tools.

It hasn't affected at all your hiring plans. On the contrary, you actually need way more engineers to execute.

Tell us more about that.

So, I think I think that just boils down to I think the problem has a very high ceiling, right?

There's so many more things that we really want to do.

The mission of the company is to reduce the time it takes to build technology and apps by 99%. It's going to take a lot of work to go out and do that.

Now granted each person in our company is way more productive than they were a year ago.

Uh but I think for us to go out and accomplish that, I think it's, you know it's a herculan task. We need to start targeting more of the development experience.

Right? Right now, we've helped a lot with the code writing process.

Um and and maybe the the the navigation of code process, but we have not touched much on the on the design process, on the deploying process, the debugging process is fairly rudimentary right now.

There's just so many different pieces.

is if I was to look at it like you know if you say you have 100 units of time you know we have an axe we've cut off maybe like 40 or 50 in that in that in that time but there's just a lot more snippets that we need to cut out basically at this point it does feel like when I'm using Windsurf like I am often the extremely slow bridge between different pieces of technology copying and pasting data back and forth from my that's probably actually still a large chunk of your time all the pieces have gotten so fast that now it's like the glue between them but like I'm the glue but I'm like much slower can I go off the reservation and ask a weird question.

Go for it. Okay.

Uh, one of the things that I mean, I think Pete on our team, he just released a great essay about, you know, prompting and you should let users have access to system prompts.

The other thing that he came up with that we've been using all at YC internally is a new agent infrastructure that has direct access, read access to our system of record, our Postgress database.

And in the process of using this, we're starting to realize like if codegen gets a lot better, which it's you know, I based on this conversation I think we can count on that getting like 10x 100x better from here.

Uh what if instead of you know building package software, there's like just in time software that the agent basically just builds for you as you need it.

Does that change the nature of software and SAS?

And you know what happens to all of us in Windsurf?

I don't know. I think this notion of just a developer is is probably going to broaden out to what's called a builder and uh I think everyone is going to be a builder and they can decide like you know how deep they want to go and build things. uh maybe maybe our current version of developers can go deep enough that they can build more complex things, right? Um in the in the shorter term, but yeah, I think software is going to be this very very democ democratized thing, right? I I imagine a future in which you know what actually happens when you ask an AI assistant hey, build me something that like tracks the amount of calories I have.

Why would you have a very custom app that goes out and does this? It's probably something that takes like all the inputs from your AR glasses and everything and has a custom piece of software that kind of comes out and like an app that is there and like it has tasks that go and tell you uh you know are you on track with with all the calories you're you're sort of consuming here. Um and I think that's a very very custom piece of software that you have for yourself that you can keep tweaking.

I can imagine a future like that where effectively everyone is building but people don't know what they're building as software.

They just they're kind of just building uh just capabilities and technology that they have for themselves. Do many people use windsurf who don't know how to write code at all? It's actually a large a large number of our users. Yeah.

Interesting. Yeah. How did they end up getting into windsurf? Like did they work at some company where like some programmer showed them how to use it?

Like I I I tend to think of windsurf as targeting more like the professional developer market that's like using this as a new superpower versus the like non-technical user market that's doing like what Gary was talking about.

We were shocked by this too because we were like, "Hey, our product is an ID, but there's actually a non-trivial chunk of our developers that have never opened the editor up and they just, you know our agent is called Cascade, right?

And uh just live in Cascade. We have browser preview, so they just open up the browser preview.

They can click on things and and sort of uh make changes.

The benefit is because because we kind of understand the code when they come back to the repository and the code has actually gotten like quite gnarly.

we're actually able to pick up from where the developer left off or the the the the kind of builder left off and and keep going from where they were. Um I will say we have not optimized tremendously for that use case. But it's actually kind of crazy how how how much is actually happening there? Do you think in the long term that this ends up being one product that targets like both of these audiences or do you think actually there's like different products for different audiences?

There's like a wind surf which is like focused on like serious developers who want to see the code and be in the details and then there's maybe other products for for folks who are totally nontechnical who don't who don't even want to see the code.

I don't know what the long term is going to look like. Something tells me it's going to become more unified.

But one of the things that I will just say is like as a startup for us even though we do have you know like a a good number of people there's a limit to what we can focus on internally as well. Uh so for us like we're not going to be focused on how do we build the best possible experience for uh the developer as well as build the experience where we have so many things for the the non-developer.

But I I have to imagine that this idea of building technology if you as you get better and better at understanding code um you're going to be able to deliver a great experience for non-developers um as well.

But I don't know what the path dependence is.

I assume like a bunch of companies in the space will go from non-developers to then supporting an ability to edit the code, right?

And I think we're starting to see this already where you know the the lines are sort of getting blurred right now.

You probably care about it for your evals at least.

Yeah. No, you need to you need to care about it for your evals.

That maybe that's like the hard part for me that to to imagine for the pure non-developer product.

What is the hill you're climbing if you're not like kind of understanding the code? How do you know your product is getting better and better?

Um is like an open question.

Are you completely relying on the base models getting better, which is fine but then you should imagine then your product is an extremely light layer on top of the base model, which is a scary place to be, right? That means you're going to get competed across all different axis.

How do you think about that in general? I guess like the something we've talked about a lot on this podcast is just the GBT rapper meme um has completely gone away. I feel um though every big release from one of the labs sort of brings it back a little bit and everyone's a little bit scared that like you know open AI is just going to eat everything.

Um how do you think about that?

I think the way we I I think about this is like yeah the company as I mentioned before it's a moving goalpost which is to say today if we're generating sort of 80 90% of all committed software uh yeah I think I think when the new model comes out uh we're going to need to up our game.

We can't just be at the same same stage.

maybe we need to be generating 95% of all committed code and I think our opportunity is the gap between where the foundation model is and what the what what 100% is right and as long as we can continue to deliver an experience where there is a gap between the two which I think there is as long as there's any human in the loop at all um in the experience uh there's a gap we'll be able to go out and build things but that is a constantly transforming sort of uh goalpost for us right so you can imagine when a new model comes out maybe the baseline on what the foundation model by itself provides has doubled the alpha we provide on top of what the base model provides needs to double as well.

It feels very like for me the reason why this is not the most concerning is let's suppose that you were to take the foundation model and it's providing 90%.

It's reducing the time it takes by 90%.

That actually means if we can deliver one or two% more percentage points that's a 20% gain on top of what the new baseline is, right? Like I guess 90 if 90 becomes 92 or 93. Um, which which which is still very very valuable right?

At that point because effectively the 90 becomes the new baseline uh for everyone.

So I I think basically the way we sort of operate is how can we provide as much additional value as possible and as long as we have our eye on that I think we're going to we're going to do fine.

What advice would you have for our startups that are working in the like AI coding space?

you have a ton of them.

What are what are the opportunities that like you think are going to be open to new startups?

I've seen a lot of things that I I think could be like particularly interesting.

I I don't think any of these technologies we've really adopted, but there's so many different pieces of how people build software and uh so and I'm not going to say niche, but there are so many different types of workloads out there.

I've not really seen a lot of startups in the space that are just like we do this one thing really really well.

Um like I'll give you an example.

Like we do these kind of like Java migrations really really well. Crazy enough, if you look at this category, the amount that people spend on this is probably maybe billions if not tens of billions of dollars doing these migrations every year.

It's a it's a massive category.

Um, that's an example.

Migrations from what to what? So example, this is like actually kind of crazy. JVM 7 to eight or something JVM Rails versions even even more than that actually.

It's like a lot of a lot of companies write cobalt have cobalt and crazy enough most of the IRS software is written in cobalt.

Apparently in the early 2000s they try to migrate from cobalt to Java.

I think it was a five plus billion dollar project.

Uh surprise surprise it didn't happen.

You think they could oneshot it now?

I don't I don't know if they can oneshot it but I'm just kidding.

But uh but no but it's imagine imagine if you could do those tasks very well.

It's such an economically valuable sort of task.

I think we obviously don't have the ability to focus on these kinds of things inside the company. That's a very exciting space if you could do a really good job there. The second key piece is there are so many things that developers do that are also not making the product better uh but important like automatic resolution of like you know kind of uh alerts and and and uh and bugs in the in in software.

Um that's also a huge huge amount of amount of spend um out there and I'd be curious to see like what a best-in-class product in that category actually looks like. I'm sure someone that if they got truly in the weeds on that, they could build an awesome product.

Um, but but I think I I've not heard of one that is like tremendously taken off.

I think those are actually both really great insights.

And one thing I like about them is that there's like it's not just an opportunity for like two startups. Like each one of those is like a bucket that could have like a hundred large companies in.

We actually do have a company from from S21 called Bloop that does these Cobalt to Java migrations with with agents.

That's awesome. It's a gnarly problem.

It's a very gnarly problem, but it's if you were to talk to any company that has existed for over 30 years, this is probably something that is costing them hundreds of millions a year.

So reflecting on this journey, I mean we're all really thankful for you creating Windsurf.

It's uh you supercharging all of society right now.

Uh what would you say to the person who you know basically the you from 5 years ago before you started this whole thing?

The biggest thing I would say is change your mind much much faster than you believe is reasonable. Right?

It's very easy to kind of fall in love with your ideas um over and over again and you do need to otherwise you won't really do anything uh but but pivot as quickly as possible and treat pivots as a badge of honor.

Most people don't have the courage to change their mind on things and they would rather kind of fail doing the thing that they told everyone they were doing than uh change their mind uh take a bold step and succeed.

Vun, thank you so much for joining us today.

We'll catch you guys next time.

[Music]

Loading...

Loading video analysis...