Cursor's Third Era: Cloud Agents — ft. Sam Whitmore, Jonas Nelle, Cursor
By Latent Space
Summary
Topics Covered
- Synergistic Outputs from Multi-Provider Models
- Full VM Unlocks End-to-End Agentic Workflows
- Demo Videos Bypass Code Review Bottlenecks
- Hand Coding Becomes Obsolete Relic
- Agents Parallelize Throughput Over Speed
Full Transcript
So this is another experiment that we ran last year and didn't decide to ship at that time but may come back to LM judge but one that was also agentic and could write code. So it wasn't just picking but also like taking the
learnings from two models or and models that it was looking at and writing a new diff. And what we found was that there
diff. And what we found was that there were strengths to using models from different model providers as the base level of this process. basically you
could get almost like a synergistic output that was better than having like a very unified like bottom model tier.
>> We think that over the coming months the big unlock is not going to be one person with a model getting more done like the water flowing faster and we'll be making the pipe much wider and so paralyzing
more whether that's swarms of agents or parallel agents both of those are things that contribute to getting much more done in the same amount of time.
This week, one of the biggest launches that cursor has ever done um is cloud agents. I mean, I think you you had
agents. I mean, I think you you had cloud agents before, but like this was like well, you give cursor a computer, right?
>> Um so is this basically they bought autotab and then they repackaged it. Is
that what's going on or >> that's a big part of it. Yeah. Cloud
agents already ran in their own computers but they were sort of site readading code. And those computers were
readading code. And those computers were not they were like blank VMs typically that were not set up for the DevX for whatever repo the agent's working on.
One of the things that we talk about is if you put yourself in the model shoes and you were seeing tokens stream by and all you could do was site readad code and spit out tokens and hope that you had done the right thing.
>> No chance.
>> I'd be so bad. Like you obviously you need to run the code. And so that I think also is probably not that contrarian of a take, but no one has done that yet. And so giving the model
the tools to onboard itself and then use full computer use end to end pixels in coordinates out um and have sort of the cloud computer with different apps in it is the big unlock that we've seen
internally in terms of use usage of this going from oh we use it for little copy changes to no no we're really like driving new features uh with this kind
of new type of aentic workflow.
>> All right let's see it.
>> Cool. So this is what it looks like in cursor.com/agents.
cursor.com/agents.
So this is one I kicked off a while ago.
So on the left hand side is the chat very classic sort of agentic thing. The
big new thing here is that the agent will test its changes. So you can see here it worked for half an hour. That is
because it not only took time to write the tokens of code, it also took time to test them end to end. So start at dev servers, iterate when needed. Uh, and so that's one part of it is like model
works for longer and doesn't come back with a I tried some things PR but a I tested it PR that's sort of ready for your review. Uh, one of the other
your review. Uh, one of the other intuition pumps we use there is if a human gave you PR, asked you to review it and you hadn't they hadn't tested it.
You'd also be kind of annoyed because you'd be like only ask me for a review once it's actually ready. So that's what we've done with uh simple question I wanted to gather up front. Some PRs are way smaller like just copy change. Does
it always do the video or is it sometimes?
>> Uh sometimes.
>> Okay. So,
>> so what's the judgment?
>> The model does it. So, we uh we do some default prompting with sort of what types of changes to test. There's a
slash command that people can do called slashn no test where if you do that the model will not test but the default is test.
>> The default is to be calibrated. So, we
tell it don't test very simple copy changes but test like more complex things. And then users can also write
things. And then users can also write their agents.mmd and specify like this
their agents.mmd and specify like this type of you know if you're editing this sub part of my monor repo never tested because that won't work or whatever.
Okay. So pillar one is the model actually testing. Pillar two is the
actually testing. Pillar two is the model coming back with a video of what it did. We have found that in this new
it did. We have found that in this new world where agents can end to end write much more code reviewing the code is one of these new bottlenecks that that crop
up. And so reviewing a video is not a
up. And so reviewing a video is not a substitute for reviewing code, but it is an entry point that is much much easier to start with than glancing at some giant diff. And so typically you kick
giant diff. And so typically you kick one off, you it's done. You you come back and the first thing that you would do is watch this video. So this is sort of a you know video of it. Um, in this
case, I wanted a tool tip over this button. And so, it went and uh showed me
button. And so, it went and uh showed me what that what that looks like in in this video that I think here it actually used a gallery. So, sometimes it will build storybook type galleries where you
can see sort of like that component in action. Uh, and so that's pillar two is
action. Uh, and so that's pillar two is like these demo videos of what it built.
Uh, and then pillar number three is I have full remote control access to this VM. So I can go here in here, I can
VM. So I can go here in here, I can hover things, I can type, I have full full control. Uh and same thing for the
full control. Uh and same thing for the for the terminal. Um so you know I have full access and so that is also really useful because sometimes the video is like all you need to see and often times
by the way the video is not perfect. The
video will show you is this worth either merging immediately or often times is this worth iterating with to get it to that final stage where I am ready to merge it. And so I can go through some
merge it. And so I can go through some other examples where the first video wasn't perfect, but it gave me confidence that we were on the right track. And two or three follow-ups
track. And two or three follow-ups later, it was good to go. And then I also have full access here where some things you just want to play around with. You want to get a feel for what is
with. You want to get a feel for what is this like? And there's no substitute to
this like? And there's no substitute to a live preview. And the VNC kind of VM remote access gives you that.
>> Amazing. Well, sorry, what is VNC?
>> Uh it just the the remote desktop. Uh
remote desktop. Yeah.
>> Uh Sam, any other details that you always want to call out? Yeah, I mean for me the videos have been super helpful. I would say especially in cases
helpful. I would say especially in cases where a common problem for me with agents and cloud agents beforehand was almost like underspecification in my requests where our plan mode and kind of
going really back and forth and getting detailed implementation spec is a way to reduce the risk of underspecification.
But then similar to how human communication breaks down over time, I feel like you have this risk where it's like, okay, when I pull down go to the trouble of pulling down and like running this branch locally, I'm going to see
that like I said this should be a toggle and you have a check box and like why didn't you get that detail right? And
having the video up front just has that makes that alignment like you're talking about a shared artifact with the agent very very clear which has been just super helpful for me.
>> I can quickly run through some other examples. So this is a very front-end
examples. So this is a very front-end heavy one.
>> So one question >> I was going to say is this only for front end?
>> Exactly. One question you might have is is this only for front end? So this is another example where the thing I wanted
it to implement was a better error message for saving secrets. So the cloud agents support adding secrets. That's
part of what it needs to access certain systems. Part of sort of onboarding it is giving >> cloud is working on cloud agents.
>> Yes. So this is a fun thing is >> it can get super meta.
>> It can get super meta. It can start its own cloud agents. It can talk to its own cloud agents. Sometimes it's hard to
cloud agents. Sometimes it's hard to wrap your mind around that. We have
disabled its cloud agents starting more cloud agents. So we currently disallow
cloud agents. So we currently disallow that.
>> Someday you might.
>> Someday we might. Someday we might. Um
so this actually was mostly a backend change in terms of the error handling here where if the secret is far too large, it would um Oh, this is actually really cool.
>> That's the dev tools.
>> That's the dev tools. So if the secret is far too large, we don't allow secrets above a certain size. We have a size limit on them. And the error message there was really bad. It was just some generic failed to save message. So I was
like, "Hey, we wanted an error message."
So first cool thing it did here, zero prompting on how to test this. Instead
of typing out the like a character 5,000 times to hit the limit, it opens DevTools, writes JS or to paste into the input 5,000 characters of the letter A
and then hits save, closes the DevTools, hits save, and gets this new um gets the new error message. So that uh looks like the video actually cut off, but here you can see the here you can see the screenshot of the uh of the error
message. So that is like front end, back
message. So that is like front end, back end kind of end to end feature to to get that. Um and
that. Um and >> you just you just need a full VM, full computer, run everything, right? Okay.
Yeah.
>> Yeah. So we've had versions of this.
This is one of the autotab lessons where uh we started that in 2022.
Uh no, 2023. And at the time it was like browser use, you know, DOM, like all these different things. And I think uh we ended up very sort of AGI pilled in
the sense that just give the model pixels, give it a box. Sort of a brain in a box is what you want. And you want to remove uh limitations around context and capabilities such that the
bottleneck should be the intelligence.
And given how smart models are today, that's a very far out bottleneck. And so
giving it its full VM and having it be onboarded with DevX set up like a human would has just been for us internally a really big big step change in capability.
>> Yeah. I would say you know let's call it a year ago the the models weren't even good enough to do any of this stuff right. So
right. So >> even six months ago. Yeah.
>> So like yeah what what people have told me is like around about solder 45 is when this started being good enough to just automate fully by pixel.
>> Yeah. I think it's always a question of when is good enough. I think we found in particular with Opus 4546 and Codeex 53 that those were additional step changes and sort of the autonomy grade
capabilities of the model to just go off and figure out the details and come back when it's done. Uh I want to appreciate a couple details. One, Tanstack router.
I see it. I'm a big fan. Do you know I I have to name the Tanstack?
>> No.
>> This just like random lore. Uh some
buddy Sue Tanner.
>> Uh and then the other thing uh if you switch back to the video >> Yeah.
>> So I want to shout out this thing.
Probably Sam did it. I don't know the chapters.
>> Oh, this is Yeah, this is called chapters. It's like a Vimeo thing. I
chapters. It's like a Vimeo thing. I
don't know. But like it's so nice the design details like the and obviously a company called Cursor has to have a beautiful cursor >> and it is the cursor cursor. You see it cursor. Yeah. Yeah. Okay, cool. And then
cursor. Yeah. Yeah. Okay, cool. And then
I was like, you know, I complained to Evan. I was like, "Okay, you guys
Evan. I was like, "Okay, you guys branded everything but the wallpaper."
And he was like, "No, that's a cursor wallpaper."
wallpaper." >> I was like, "What?"
>> Yeah. Rio picked the wallpaper. I think
Yeah, the video that was probably Alexi and a few others on the team with the chapters on the video. Matthew Fria,
there's been a lot of teamwork on this.
It's been a huge effort.
>> I just I like I like design details like and so and then when you download it, it adds like a little cursor uh kind of Tik Tok clip.
>> Yes. Yes. So, it's like it's like to make it really obvious it's from cursor.
We did the Tik Tok branding at the end.
This was actually in our launch video.
Alexi demoed the cloud agent that built that feature. Um, which was funny
that feature. Um, which was funny because that was an instance where one of the things that's been a consequence of having these videos is we use best of where you run head-to-head different
models on the same prompt. We use that a lot more because one of the complications with doing that before was you'd run four models and they would
come back with some giant diff 700 lines of code times four. It's like what are you going to do? You're going to review all that's horrible. But if you come back with four 20 second videos, yeah, watch four 20 second videos and then
even if none of them is perfect, you can figure out like which one of those do you want to iterate with kind of to get it over the line. Um, and so that's that's really been really fun. Here's
another example that's we found really really cool which is uh we've actually turned since into uh a slash command as
well / repro where for bugs in particular the model having full access to the uh to its own VM. it can first reproduce the bug, make a video of the
bug reproducing, fix the bug, make a video of the bug being fixed, like doing the same pattern workflow with obviously the bug not reproducing. And that has been the single category that has gone
from like these types of bugs, really hard to reproduce and takes you tons of time locally, even if you tried a cloud agent on it, like are you confident it actually fixed it to when this happens,
you'll merge it in like, you know, 90 seconds or something like that. So this
is an example where let me see if this is the broken one or the Okay, this is the fixed one. Okay, so we had a bug on cursor.com/ aents where if you would
cursor.com/ aents where if you would attach images, remove them and then still submit your prompt, they would actually still get attached to the prompt. Okay. And so here you can see
prompt. Okay. And so here you can see Kerser is using its full desktop. By the
way, this is one of the cases where if you just do, you know, browser use type stuff, you'll have a bad time because it now needs to upload files. Like it just uses its native file viewer to do that.
Um, and so you can see here it's uploading files. It's going to submit a
uploading files. It's going to submit a prompt and then it will go and open up.
So this is the meta. This is cursor agent prompting cursor agent inside its own environment. And so you can see here
own environment. And so you can see here bug there's five images attached whereas when it submitted it only had one image.
>> I see. I see. Yeah. But you you got to enable that if you're going to use cursor agent inside cursor. Yeah,
>> exactly.
>> And so here this is then the after video where it went it does the same thing. It
attaches images, removes some of them, hits send and you can see here once this agent is up only one of the images is left in in the attachments.
>> Yeah. Beautiful. Okay.
>> So easy merge.
>> Yeah.
>> When does it choose to do this? Because
this is an extra step.
>> Yes. I think I've not done a great job yet of calibrating the model on when to reproduce these things. Sometimes it
will do it of its own accord. Yeah,
>> we've been conservative where we try to have it only do it when it's quite sure because it does add some amount of time to how long it takes it to work on it, >> but we also have added things like the slreero
command where you can just do, you know, fix this bug repro and then it will know that it should first make you a video of it actually finding and making sure it can reproduce the bug.
>> Yeah. Yeah. One sort of ML topic this ties into is reward hacking where while you write tests that you up they only pass. Right. So first write test it
pass. Right. So first write test it shows me it fails and then make your test pass which is a classic like red green like uh >> TDD >> TDD thing right.
>> No very cool. Was that the last demo? Is
there no?
>> Yeah. Anything I missed on the demos or points then do you think >> that covers it well? Yeah.
>> Cool.
>> Um before we stop the screen share, can you give me like a just a tour of the slash commands cuz I there's so goddamn many and I'm like well what what are the good ones? Yeah, we want to increase
good ones? Yeah, we want to increase discoverability around this too. I think
that'll be like a future thing we work on, but there's definitely a lot of good stuff. Now,
stuff. Now, >> we have a lot of internal ones that I think will not be that interesting.
Here's an internal one that I made. I
don't know if anyone else at Cursor uses this one.
>> Fix BB.
>> I've never heard of it.
>> Yeah. Fix Bugbot. So, this is a thing that we want to integrate more tightly.
>> So, you made this for yourself.
>> I made this for myself. It's actually
available to everyone in the team, but no one knows about it. But, yeah, there will be bugbot comments. And so Bugbot has a lot of cool things. We actually
just launched Bugbot Autofix where you can click a button and uh or change a setting and it will automatically fix its own things. And that works great in a bunch of cases. There are some cases
where having the context of the original agent that created the PR is really helpful for fixing the bugs because it might be like, oh, the bug here is that this, you know, is a regression and
actually you meant to do something more like that. And so having the original
like that. And so having the original prompt and all the context of the agent that worked on it. And so here I could just do, you know, fix uh or I used to
be able to do fix bb and it would um do that. Uh no test is another one that
that. Uh no test is another one that we've had / repro is in here. We
mentioned that one.
>> One of my favorites is cloud agent diagnosis. This is one that um makes
diagnosis. This is one that um makes heavy use of the data dog MCP. And um I think Nick and David on our team wrote and basically if there is a problem with a cloud agent um we'll spin up a bunch of sub
>> single instance.
>> Yeah. Well, we'll take the idea as an argument and spin up a bunch of sub aents using the data dog MCP to explore the logs and find like all of the problems that could have happened with that. It takes the debugging time like
that. It takes the debugging time like from like potentially I mean you can do quick stuff quickly with the data dog UI, but it takes it down to again like a single agent call as opposed to like trolling through logs yourself. You
should also talk about the stuff we've done with transcripts.
>> Yes, also. Um, so basically we've also done some things internally. There'll be
some versions of this as we ship publicly soon where um you can spit up an agent and give it access to another agent's transcript um to either um
basically debug something that happened.
So kind of like act as an external debugger or continue the conversation almost like forking it.
>> A transcript includes all the train of thought for the 11 minutes here, 45 minutes there.
>> Exactly. So basically acting as a like secondary agent that debugs the first.
Um so we we've started to push >> and they're all the same code is just the different prompts but the the same >> Yeah. So basically same cloud agent
>> Yeah. So basically same cloud agent infrastructure and then same harness and then like when we do um things like include like there's some extra infrastructure that goes into piping in
like an external transcript if we include it as an attachment. Um but um for things like the cloud agent diagnosis that's mostly just using the data dog MCP um because we also launched
MCPS along with um along with this cloud agent launch launch support for cloud agent MCPS.
>> Oh that was drawn out >> we we'll be doing a bigger marketing moment for it next week but >> and you can now use MCPS people who listen to this will they'll be ahead of the curve.
>> Yeah, you'll be ahead. Um and I would I I actually don't know if the data dog MCP is like publicly available yet. I
realized this over beta testing it, but um it's been one of my favorites to use.
Um so >> I think that that one's interesting for Data Dog cuz Data Dog wants to own that side, >> right? With Bits. I don't know if you've
>> right? With Bits. I don't know if you've tried bits.
>> I haven't tried bits.
>> Yeah. Um
>> there's their their cloud agent product.
>> Yeah. They want to be like, well, we own your logs and like give us our some part of the you know self-healing software that everyone wants.
>> Yeah. But obviously cursor has a strong opinion on coding agents and uh you're you're kind of like taking away from that which which like obviously you're going to do and not every company's like cursor but uh it's interesting if you're
a data dog like what do you do here? Do
you expose your logs to MCP? You let
other people do it or do you try to own that as because it's extra business for you? Yeah, it's like an interesting one.
you? Yeah, it's like an interesting one.
>> It's a good question. All I know is that I love the data dog MCP. Uh and like yeah it's going to be no no surprise that people like will demand it right.
>> Yeah it is you know it's like any system of record company like this it's like how much do you give away >> you know? Uh cool I think that's that for uh the sort of cloud agents tour. Um
cool and we just kind of talk about like cloud agents have been uh when did launch cloud agents do do you know >> June last year.
>> June last year. So it's like been a slowly developing thing. you did like a bunch of like Michael did a a post for himself where he like showed this chart of like ages overtaking tap,
>> right? And I'm like, "Wow, this is like
>> right? And I'm like, "Wow, this is like the biggest transition in code >> like in in like the last >> Yeah, I think that kind of got drowned out."
out." >> Yeah.
>> I think it's a very >> No, I I think it's been highlighted by uh our friend Andre Kapathy today.
>> Okay.
>> Um talk more about it. Like what does it mean? Like I I just got given like the
mean? Like I I just got given like the the cursor tab key.
>> Yes. Yes.
>> That's like cool.
>> I know. It's going to be like put in a museum.
>> It is.
>> I have to say I haven't used Tab in a little bit myself.
>> Yeah. I think that what it looks like to code with AI code generally create software even if you want to go higher
level is changing very very rapidly. No,
not a hot take. But I think from our vantage point at Kerser, I think one of the things that is probably underappreciated from the outside is that we are extremely self-aware about
that fact. And Kerser, you know, got its
that fact. And Kerser, you know, got its start in phase one, era one of like tab and autocomplete. And that was really
and autocomplete. And that was really useful in its time, but a lot of people start looking at text files and editing code. Like we call it hand coding now.
code. Like we call it hand coding now.
When you like type out the actual letters, it's like uh, you know, >> like, oh, that's cute.
>> Yeah. Oh, that's cute.
>> You're so boomer.
>> So boomer. Um and so that I think has been sort of a slowly accelerating and now in the last few months rapidly accelerating shift and we think that's going to happen again with the next
thing where the I think some of the pains around tab of like that's great but I actually just want to give more to the agent and I don't want to do like one tab at a time. I want to just give
it a task and it goes off and does a larger unit of work and I can sort of lean back a little bit more and operate at that higher level of abstraction.
That's going to happen again where it goes from agents handing you back gifts and you're sort of like in the weeds and giving it, you know, 30 second to three minute tasks to you're giving it, you
know, 3 minute to 30 minute to three hour tasks and you're getting back videos and sort of trying out previews rather than immediately looking at diffs every single time.
>> Yeah. Anything to add? One other shift that I've noticed um as our crowd agents have really taken off internally has been a shift from primarily individually driven development to almost this like
collaborative nature of development.
>> For us, Slack is actually almost like a development um an IDE basically.
>> That's why I'm like maybe maybe don't even build a custom UI like maybe like you know that's like a debugging thing but actually it's Slack. I feel like yeah there's still so much to left to explore there but basically for us like
Slack is where a lot of development happens like we will have these issue channels um or just like this product discussion channels where people are always at cursoring and that kicks off a
cloud agent and um for us at least we have team follow-ups enabled so if Jonas kicks off a at cursor in a thread I can follow up with it and kind of like add more context and so it turns into almost
like a discussion service where people can kind of like collaborate on UI.
Oftentimes I will kick off an investigation and then like um sometimes I even ask it to like get blame and then tag people who should be brought in because it can tag people in Slack and then other people will
>> can tag other people who are not involved in conversation >> can just do at Jonas.
>> Yeah.
>> Yeah. That's cool. You should you guys should make a bigger deal out of that.
>> I know. Um it's a lot to I feel like there's a lot more to do with our Slack surface area um to show people externally. But yeah, basically like it
externally. But yeah, basically like it can bring other people in and then other people can also contribute to that thread and you can end up with a PR again with the artifacts visible and then people can be like okay cool we can
merge this. So for us it's like the ID
merge this. So for us it's like the ID is almost like moving into Slack in some ways as well.
>> I have the same experience um with but it's not developers it's me designer salespeople.
>> Yeah. So me on like technical marketing vision, designer on design and then sales people on like well here's the legal thirds of what we agreed on and then they all just collaborate and like correct uh the agents.
>> I think that we found these threads is the work that is left that the humans are discussing in these threads is the nugget of what is actually interesting and and relevant. It's not like you know
the boring details of where does this if statement go. It's like do we want to
statement go. It's like do we want to ship this? Is this the right UX? Is this
ship this? Is this the right UX? Is this
the right form factor? Uh, you know, how do we make this more obvious to the user? It's like those really interesting
user? It's like those really interesting kind of higher order questions that are so easy to collaborate with and leave the implementation to the to the cloud agent.
>> Totally. And no more discussion of like am I going to do this? Are you going to do this? Curs is doing it. You just have
do this? Curs is doing it. You just have to decide you like it.
>> Sometimes the I don't know if there's a this you guys probably figured this out already, but sometimes you need like a mute button. So like cursor like we're
mute button. So like cursor like we're gonna take this offline but still online but like we need to talk among the humans first before you like stop responding to everything right.
>> Yeah. This is a design decision where currently cursor won't chime in unless you explicitly atmention it.
>> Yeah.
>> So it's not always listening.
>> Yeah. Yeah.
>> Well it can see all the intermediate messages.
>> Have you have you done the recursive like can cursor at another cursor or like spawn another cursor?
>> Oh >> we've done some versions of this >> you know cuz it can add humans.
>> Yes. One of the other things we've been working on that's sort of like an implication of generating the code is so easy is getting it to production is still harder than it should be and broadly you know you solve one
bottleneck and three new ones pop up and so one of the new bottlenecks is getting it to production and we have a like joke internally where you'll be talking about some feature and someone says I have a
PR for that. uh which is like it's so easy to get to I have a PR for that but it's kind of hard still relatively to get from I have a PR for that to I'm confident and ready to merge this and so
I think that over the coming weeks and months that's a thing that we think a lot about is how do we scale up compute to that pipeline of getting things from a first draft an agent did
>> I mean isn't >> uh isn't that what graphite's for like >> graphite is a big part of that uh the cloud agent is it fully integrated are still different companies >> working on like I think we'll have more
to share there in the future but the goal is to have a great end toend experience where cursor doesn't just help you generate code tokens it helps you create software end to end and so
review is a big part of that that I think especially as models have gotten much better at writing code generating code we've felt that relatively crop up more >> sorry this this was completely unplanned
but like I have people arguing one to like you need AI to review AI.
>> And then there's another approach thought school of thought where it's like no reviews are dead. Like just show me the video.
It's kind of it like >> Yeah. I feel like again for me the video
>> Yeah. I feel like again for me the video is often like alignment and then I often still like want to go through a code review process like still look at the files and >> there's there's a spectrum of course like the video if it's really well done
and it does like like fully like test everything you can feel pretty competent but it's still helpful to to look at the code. Um I make have pay a lot of
code. Um I make have pay a lot of attention to bugbot. I feel like bugbot has been a great u really highly adopted internally. We often like won't we tell
internally. We often like won't we tell people like don't leave bugbot comments unressed because we have such high confidence in it. So people always address their bugbot comments.
>> Once you've had like two cases where you merged something and then you went back later there was a bug in it. You merged
you went back later and you were like a bugbot had found that I should have listened to bugbot. Once that happens two or three times you learn to wait for bugbot.
>> Yeah. So, so I think for us there's like that code level review where like it's looking at the actual code and then there's kind of like the um like feature level review where you're looking at the features. There's like a whole number of
features. There's like a whole number of different like areas. There will
probably eventually be things like performance level review, security review, things like that where it's like more uh more different aspects of how this feature might affect your codebase
that you want to potentially leverage an agent to help with. And some of those like bugbot will be kind of synchronous and you'll typically want to wait on before you merge. But I think another thing that we're starting to see is as
with cloud agents you scale up this parallelism and how much code you generate. 10 person startups become need
generate. 10 person startups become need the devx and pipelines that a 10,000 person company used to need. And that
looks like a lot of the things I think that 10,000 person companies invented in order to get that volume of software to production safely. So that's things
production safely. So that's things like, you know, uh release frequently, release slowly, like have different stages where you release, have checkpoints, automated ways of detecting regressions. Uh and so I think we're
regressions. Uh and so I think we're going to need >> stack diffs, merge cues. Exactly. A lot
of those things are going to be important >> for what it's worth. I think the majority of people still don't know what stackives are. Uh and you know, I I like
stackives are. Uh and you know, I I like I have many friends in Facebook and like I I'm pretty friendly with graphite.
I've just I've never needed it because I don't work on that larger team. And like
it's just like democratization of like know here's here's what we've already worked out at at very large scale and here's how you can it benefits you too.
Like I think to me one of the beautiful things about GitHub is that it's actually useful to me as an individual solo developer even though it's like actually collaboration software.
>> Yep.
>> And like I don't think a lot of dev tools have figured that out yet that that transition from like large down to small.
>> Yeah. Curser is probably an inverse story. He's a small down
story. He's a small down >> where historically curser part of why we grew so quickly was anyone on the team could pick it up and in fact people would pick it up you know on the weekend for their side project and then uh bring
it into work cuz they loved using it so much. Yeah.
much. Yeah.
>> Um, and I think a thing that we've started working on a lot more, not us specifically, but as a company and other folks at Cursor, is making it really great for teams and making it that, you
know, the 10th person that starts using cursor in a team is immediately set up with things like we launched marketplace recently. So other people can configure
recently. So other people can configure what MCPs and skills like plugins. Um,
so scales and MCPs other people can configure that so that my cursor is ready to go and set up. Sam loves the data dog MCP and Slack MCP. Uh, you've
also been using a lot. Uh,
>> that's also pre-launch, but I feel like it's so good.
>> Yeah, my cursor should be configured if Sam feels strongly that that's just amazing and required.
>> Is it is it automatically shared or you have to like go and >> uh, it depends on the MCP. So some are obviously o per user and so Sam can't off my cursor with my Slack MCP but some
are team team O and those can be set up uh by admins.
>> Yeah. Yeah, that's cool. Um yeah, I mean I think you know we we we had uh Aman on the pod when cursor was five people and like everyone was like okay well what's the thing and then it's it's usually
something something teams and org and enterprise >> but like it's actually working but like usually at that stage when you're five when you're just a VS code fork like it's like >> how do you get there?
>> Will people pay for this? Uh people do pay for it.
>> Yeah. And I think for cloud agents, we expect to have similar kind of PLG things where I think off the bat we've seen a lot of adoption with kind of smaller teams where the code bases are
not quite as complex to set up. You
know, if you need like some insane Docker layer caching thing for builds not to take like 2 hours, that's going to take a little bit longer for us to be able to support that kind of infrastructure. Whereas if you have
infrastructure. Whereas if you have frontend back end like one click, agents can install everything that they need themselves. This is a good chance for me
themselves. This is a good chance for me to just ask some technical sort of check the box questions. U can I choose the size of the VM?
>> Not yet. We are planning on adding that.
We have >> because obviously you want like L, XXL, whatever, right? Like it's like the
whatever, right? Like it's like the Amazon like sort of menu.
>> Yes, exactly. We will add that.
>> Yeah. In some ways you have to basically become like a EC2 almost like you rent a box.
>> You rent a box. Yes. We talk a lot about brain in a box.
>> Uh so cursor we want to be a brain in a box.
>> But is the mental model different? Is it
more serverless? Is it more persistent?
Is it something else?
>> We want it to be a bit persistent. I
mean, the desktop should be kind of something you can return to even after some days. Like maybe you go back,
some days. Like maybe you go back, they're like still thinking about a feature for some period of time. So
>> full like sus like suspend the memory and bring it back and then keep going.
>> Exactly. That's an interesting one because like uh what I actually do want like from like a man and open crawl or whatever is like I want to be able to log in uh with my credentials to the thing but like not actually store it in
any like secret store whatever cuz it's like this is the my most sensitive stuff. This is like my email whatever.
stuff. This is like my email whatever.
>> Um and just have it like you know persist through like the image. I don't
know how it was under but like to rehydrate and then just keep going from there. But I don't think a lot of info
there. But I don't think a lot of info works that way. A lot of it stateless where like you save it to a Docker image and it's only whatever you can describe in a Docker file and that's it because that's the only thing you can clone multiple times
>> in parallel.
>> Yeah, we have a bunch of different ways of setting them up. So there's a Docker file based approach. Um the main default way is actually snapshotting like a Linux VM like a VM, right? You um run a
bunch of install commands and then you snapshot more or less the the file system. And so that gets you set up for
system. And so that gets you set up for everything that you would want to bring a new VM up from that template basically. Yeah. And that's a bit
basically. Yeah. And that's a bit distinct from what Sam was talking about with the hibernating and rehydrating where that is a full memory snapshot as well. So there if I had like the browser
well. So there if I had like the browser open to a specific page and we bring that back that page will still be there.
Was there any discussion internally in just building this stuff about like you know every time you show the video is actually you show a little bit of the desktop and the browser and it's kind of not necessary if you just show the browser like if you know if you know
you're just demoing a front end >> application why not just show the browser like it's >> we do have some panning and kind of zooming like it can decide that when it's actually recording and cutting the
video to highlight different things. I
think we've played around with different ways of segmenting it and kind of >> Yeah, there's been some different revs on it for sure.
>> Yeah, I think one of the interesting things is the version that you see now in cursor.com actually is like half of what we had at peak where we've decided to unhip or
unshipped quite a few things. So, two of the interesting things to talk about, one is directly in answer to your question where we had a native browser that you would have locally. It was
basically an iframe that via port forwarding could load the URL could talk to local host in the VM. So that gets you basically >> in your machine's browser
>> in your local browser.
>> You would go to localhost 4000 and that would get forwarded to localhost 4000 in the VM via via port forwarding. We
unshipped that >> like an enro.
>> Like an enrock. Exactly. We unshipped
that because we felt that the remote desktop was sufficiently low latency and more general purpose. So we build cursor web but we also build cursor desktop and
so it's really useful to be able to have the full spectrum of things and even for cursor web as you saw in one of the examples the agent was uploading files and like I couldn't upload files and open the file viewer if I only had
access to the browser and we've thought a lot about this might seem funny coming from cursor where we started as this you know VS code fork and I think inherited a lot of amazing things but also a lot
of legacy UI from VS Code and so With the web UI, we wanted to be very intentional about keeping that very minimal and exposing the right of set of primitive sort of app surfaces we call
them that are shared features of that cloud environment that you and the agent both use. So agent uses desktop and
both use. So agent uses desktop and controls it. I can use desktop and
controls it. I can use desktop and controlled. Agent runs terminal
controlled. Agent runs terminal commands. I can run terminal commands.
commands. I can run terminal commands.
So that's our our philosophy around it.
The other thing that is maybe interesting to talk about that we unshipped is uh and we may both of these things we may reship and decide at some point in the future that we've changed our minds on the trade-offs or gotten it to a point where
>> get it out there let let users tell you they want it and all right fine >> so one of the other things is actually a files app and so we used to have the ability at one point during the process of testing this internally to see next
to I had g desktop and terminal on the right hand side of the the tab there earlier to also have a files app where you could see and edit files.
>> And we actually felt that that in some ways by restricting and limiting what you could do there, people would naturally leave more to the agent and fall into this new pattern of delegating
which we we thought was really valuable.
And so there's currently no way in cursor web to edit these files.
>> Yeah. Except you like open up the PR and go to GitHub and like do the thing.
>> Yeah. Which is annoying.
>> Just tell the agent. I have criticized OpenAI for this uh because OpenAI's Codex app doesn't have a file editor.
Like it has file viewer, but isn't it a file editor? Do you use the file viewer
file editor? Do you use the file viewer a lot? No, I understand. But like
a lot? No, I understand. But like
sometimes I want it only way to do it is like freaking go in the uh No, they have a they have a open in cursor button or open in anti-gravity or you know opening whatever. And people pointed that. So I
whatever. And people pointed that. So I
was you know I was part of like the early testers group. People pointed that and they were like this is like a like a design smell. It's like you actually
design smell. It's like you actually want a VS code fork that has all these things but also a file editor and they were like no just trust us.
>> Yeah. I think we as cursor will want to as a product offer the whole spectrum and so you want to be able to work at really high levels of abstraction and
double click and see the lowest level.
That's important but I also think that uh like you won't be doing that in Slack. uh and so there are surfaces and
Slack. uh and so there are surfaces and ways of interacting where in some cases limiting the UX capabilities makes for a cleaner experience that's more simple
and drives people into these new patterns where even locally we kicked off joking about this uh people like don't really edit files hand code anymore. Uh and so we want to build for
anymore. Uh and so we want to build for where that's going and not where it's been.
>> A lot of cool stuff and okay uh I have a couple more observations about the design elements about these things. One
of the things that I'm always thinking about is well curser and and other uh peers of curser start from like the dev tools and work their way towards cloud agents. Other people like the lovables
agents. Other people like the lovables and bolts of the world start with like here's like the vibe code full cloud thing. They were already cloud agent
thing. They were already cloud agent before anyone else cloud agents were there and and we'll give you the full deploy platform. So we own the whole
deploy platform. So we own the whole loop. We own own all the infrastructure.
loop. We own own all the infrastructure.
We we we have the logs we have the uh the live site whatever and you can do that cycle. Curso doesn't own that cycle
that cycle. Curso doesn't own that cycle even today right you don't have the versel. you don't have the whatever
versel. you don't have the whatever deploy infrastructure that that you're going to have which gives you powers because anyone can use it and any enterprise who whatever you're in for I don't care but then also gives you
limitations as to how much you can actually fully debug end to end I guess I'm just putting out there that like is there a future where there's like full stack cursor where like cursorapps.com where like I host my cursor site this
which is basically a versell clone right like I don't know >> I think that's a interesting question to be to be asking And I think like the logic that you laid out for how you would get there is logic that I largely agree with.
>> Yeah. Yeah.
>> I think right now we're really focused on kind of what we see as the next big bottleneck. And because things like the
bottleneck. And because things like the data dog MCP exist, I don't think that the best way we can help our customers ship more software is by building a hosting solution right now.
>> By the way, these are things I've actually discussed with some of the companies I just named.
>> Yeah. Yeah. Sure.
>> Right now, just this this big bottleneck is getting getting the code out there.
And also unlike uh a lovable and a bolt we focus much more on sort of existing software and the zero to one green field is just a very different problem. you
know, imagine going to like a Shopify and convincing them to deploy on your deployment solution. That's very very
deployment solution. That's very very different and I think will take uh much longer to see how that works. May never
happen um relative to oh it's like a 0ero to1 app. I'll say it's kind of tempting because look like 50% of your apps are versel zase tailwind react like
you know it's the stack it's it's what everyone does. So it's kind of
everyone does. So it's kind of interesting.
>> Yeah. The other thing is the models ledger dying right now in cloud agents it's like stuck down bottom left sure it's codec high today but like do I care if it's suddenly switched to opus probably not
>> we definitely want to give people a choice across models because I feel like it the meta changes very frequently I was a big um like opus 4.5 maximalist and when um codeex 5.3 came out I hard
hard switch so that's all I use now >> so yeah agreed I don't know if any basically like when I use it in slack right uh cursor does a very good job of exposing uh here's if if people go use
it here's the model we're using here's here's how you switch if you want >> but otherwise it's like kind of extracted away which is like beautiful because then you actually you should decide >> yeah I think we want to be doing more
with defaults >> where we can suggest things to people a thing that we have in the editor the desktop app is auto which will sort of
route your request and do sort of things there so I think we will want to do something like that for cloud agents as well. We haven't done it yet. Uh and so
well. We haven't done it yet. Uh and so I think we have both people like Sam who are very savvy and want know exactly what model they want and we also have people that want us to pick the best
model for them because we have amazing people like Sam and we you know we are the experts. We have both the traffic
the experts. We have both the traffic and the internal taste and experience to know what we think is best.
>> Yeah. So you know I have this ongoing pieces of agent lab versus model lab and to me cursor and other companies are are example of agent lab that is you know building a new playbook that is different from a model lab where it's
like very GPU heavy although obviously has a research uh team and my thesis is like you just every agent lab is going to have a router you because you are going to be asked like what's what like
I don't keep up to every day I'm not a Sam I don't keep up every day for using the arbiter of taste um put me on cursor auto Is is is it free? Is it it's not free?
>> Uh auto is not free, but there's different pricing tiers. Yeah.
>> Put me on Chrisardo. You decide for me based on like all all the other people you know better than than me. And like I think every agent lab uh should basically end up doing this because that that actually gives you extra power
because you like people stop caring or having loyalty with anyone lab.
>> Yeah. Two other maybe interesting things that I don't know how much they're on your radar are one the best of end thing we mentioned where running different models headtohead is actually quite interesting because
>> which exists in cursor >> that exists in cursor ID and web so the problem is where do you run them >> okay >> um and so I I can share my screen again if that's interesting >> yeah yeah I mean obviously parallel agents very popular
>> yes exactly parallel agents >> in your mind are they the same thing best event and parallel agents I don't want to put words in your mouth >> best event is a subset of parallel agents where they're running on the same
prompt. That would be my answer. So this
prompt. That would be my answer. So this
is what that looks like. And so here in this drop down picker, I can just select multiple models. Yeah.
multiple models. Yeah.
>> And now if I do a prompt, I'm going to do something kind of silly.
>> I am running uh these five models.
>> Okay. This is a straight clone of Cursor 2.0. Yeah.
2.0. Yeah.
>> Yes. Exactly. Uh but they are running uh so the cursor 2.0 you can do desktop or cloud. And so this is cloud specifically
cloud. And so this is cloud specifically where the benefit over work trees is that they have their own VMs and can run commands and like won't try to kill ports that the other one is running which are some of the pains.
>> These are all cloud work trees.
>> No, these are all cloud agents with their own VMs. >> Okay. But
>> Okay. But >> when you do it locally, sometimes people do work trees and that's been the main way that people have set out parallel.
>> I got to say that's so confusing for folks.
>> Yeah. No one knows what work trees are.
>> Exactly. I think we're phasing out work trees.
>> Really?
>> Yeah.
>> Okay. But yeah and one other thing I would say though on the multimodel choice. So this is another experiment
choice. So this is another experiment that we ran last year and um didn't decide to ship at that time but may come back to and there was an interesting learning that's relevant for um these
different model providers. Um it was something that would run a bunch of best of ends but then synthesize and kind of um basically run like a synthesizer
layer of models and uh that was kind of other agents that would like LM Judge but one that was also agentic and could write code. So it wasn't just picking
write code. So it wasn't just picking but also like taking the learnings from two models or and models that it was looking at and writing a new diff. And
what we found was that at the time at least there were strengths to using models from different model providers as the base level of this process. Um like
basically you could get almost like a synergistic output that was better than having like a very unified like bottom model tier. Um, so, so it's really
model tier. Um, so, so it's really interesting because it's like potentially even though even in the future when you have like maybe one model is ahead of the other for a little bit, there could be some benefit from
having like multiple top tier models involved in like a model swarm or whatever agent swarm that you're doing, but they each have strengths and weaknesses. Yeah.
weaknesses. Yeah.
>> Andre called this a council, right? Like
>> Yeah. Yeah. Yeah. Exactly. We actually
Oh, that's another internal command we have that um Ian wrote slashcounsel which we'll do some >> Yeah. I mean yes this this idea is in
>> Yeah. I mean yes this this idea is in various forms everywhere and like I think for me like for me the productization of it you guys have done like like this is very flexible
>> but if I were to add another what your thing is on here it it would be too much I don't want let's say >> well ideally it's all it's something that the user can just choose and it all happens under the hood in a way where
like you just get the benefit of that process at the end and a better output basically but don't have to get too lost in the complexity of judging ing along the way.
>> Okay, another thing on the many agents and different parallel agents that's interesting is an idea that's been around for a while as well that has started working recently is sub agents.
And so this is one other way to get agents of different prompts and different goals and different models, different vintages to work together and
collaborate um and delegate.
>> Yeah. Um I'm very like I I like one of my I always looking for like this is the year of the blah, right? Yeah. I think
one of the things on the blas is subject. I think these are the um but I
subject. I think these are the um but I haven't used them in cursor.
>> Um are they fully forms or what how do like I almost need like an intro because do I form them from new every time? Do I
do I have fixed sub agents? Uh how are they different for slash commands? Like
there's all these like really basic questions that no one stops to answer for people because everyone's just like too busy launching. We had to >> honestly I mean well you could you can
see them in cursor now if you just say like spin up like 50 sub aents to >> cursor defines what sub aents you know.
>> Yeah. So basically um I think well I shouldn't speak for the whole sub aents team. This is like a different team
team. This is like a different team that's been working on this, but um what thesis are thing that we kind of saw internally is that like they're great for context management for kind of longunning threads or if you're trying
to just throw more compute at something.
We have strongly used almost like a generic task interface where then the main agent can define like what goes into the sub aent. So if I say like explore my codebase, it might decide to
spin up an explore sub aent and um or mighti decide to spin up like five explore sub aents, >> right? But I don't get to set what those
>> right? But I don't get to set what those sub agents are, right? It's it's all defined meta model.
>> I think I actually would have to refresh myself on the sub aent.
>> There are some built-in ones like the explore sub aent is sort of free pre-built. Um but you can also instruct
pre-built. Um but you can also instruct the model to use other sub aents and then it will um and one other example of a built-in sub aent is >> um I actually just kicked one off in
cursor and I can show you what that looks like.
>> Yes.
>> Cuz I tried to do this in pure prompt space.
>> So this is the desktop app and uh >> that's all you need to do, right? Yeah,
that's all you need to do. So I said use a sub aent to explore and I think yeah so I can even click in and see what the sub agent is working on here. It ran
some fine command and this is a composer uh under the hood even though my main model is opus it does smart routing to take like in this instance the explore sort of requires reading a ton of things
and so a faster model is really useful um to get an answer quickly but that this is what sub aents look like and I think we want to do a lot more to expose hooks and ways for people to configure
these um another example of a c sort of built-in sub agent is the computer use sub aent in the cloud agent where we found that those trajectories
can be kind of long and involve a lot of images obviously and sort of execution of some testing verification task. We
want to use models that are particularly good at that. So that's one reason to use sub aents and then the other reason to use sub aents is we want uh context
to be sort of summarized reduced down at a sub aent level. That's a really neat boundary at which to compress that roll out and testing into a final message that that agent writes that then gets
passed into the parent rather than having to do some global compaction or something like that.
>> Awesome. Cool. Uh while we're in this sub agent conversation, I can't do a cursor conversation and not talk about listen stuff. What is that?
listen stuff. What is that?
>> What is there?
>> Uh he built a browser. He built an OS.
>> Yes. and he sort of experimented with a lot of different architectures and basically ended up reinventing the software engineer OG chart which is all cool but like what's your take what's is
there any whole behind the the scenes stories about that kind of that whole adventure >> some of those experiments have found their way into a feature that's available in cloud agents now the longunning agent mode internally we call
it grind mode um and I think there's like some hint of grind mode accessible in the picker today because you can do choose grind until done and so that was really the result of experiments that Wilson started in this vein where he
kind of I think the Ralph Wigum loop was like floating around at the time but it was something he also independently found and he was experimenting with and that was kind of what led to this product service
>> and it's just simple idea of like have criteria for completion and do not stop until you complete. There's a bit more complexity as well in in our implementation. Um like there's a
implementation. Um like there's a specific you have to start out by aligning and there's like a planning stage where it will work with you and it will kind of not get like start grind
execution mode until it's decided that the plan um is amenable to both of you basically >> like I refuse to work until you make me happy. We found that it's really
happy. We found that it's really important where people would like to give like very underspecified prompt and then expect it to come back with magic.
And if it's going to go off and work for 3 minutes, that's one thing. When it's
going to go off and work for 3 days, probably should spend like a few hours up front making sure that you have communicated what you actually want.
>> Yeah. And just to like really drive home the point, we really mean three days.
No, no human.
>> I don't know what the record is, but this has been a long time with the grinds.
>> Yeah. Yeah.
>> And so the thing that is available in cursor the longunning agent is sort of if you want to think about it very abstractly that is like one worker node
whereas what built the browser is a society of workers and planners and sort of different agents collaborating. Um
because we started building the browser with one worker node. At the time that was just the agent uh and it became one worker node when we realized that the throughput of the system was not where
it needed to be to get something as large of large of a scale as the browser done.
>> Yeah.
>> And so this has also become a really big mental model for us with cloud cloud agents is uh you know there's the classic engineering latency throughput trade-offs uh and so you know the code
is water flowing through a pipe. the we
think that over the coming months the big unlock is not going to be one person with a model getting more done like the water flowing faster and we'll be making the pipe much wider and so paralyzing
more whether that's swarms of agents or parallel agents both of those are things that contribute to getting much more done in the same amount of time but any one of those tasks doesn't necessarily
need to get done that quickly and so um throughput is this really big thing where if you see this system of like a 100 concurrent agents outputting thousands of tokens a second. you can't
go back like that just you see a glimpse of the future where obviously there are many many caveats like no one is using this browser IRL there's like a bunch of things not quite right yet but we are going to get to systems that produce
real production code at the scale much sooner than people think and it forces you to think like what even happens to production systems like we've broken our GitHub actions recently because we have
so many agents like producing and pushing code that like CI/CD is just like overloaded because suddenly It's like effectively we grew cursor is growing very quickly anyway but you grow headcount you know 10x when people run
10x as many agents um and so a lot of these systems exactly a lot of these systems will need to adapt it also reminds me know you we we all the three of us live in the app layer but if you
talk to the researchers who are doing RL infrastructure it's the same thing it's like all these parallel rollouts and like uh scheduling them and making sure as much throughput as possible uh goes through them it's the same thing
>> weird We were talking briefly before we started recording, you were mentioning sort of memory chips and some of the shortages there. The other thing that I
shortages there. The other thing that I think is just like hard to wrap your head around the scale of the system that was building the browser, the concurrency there, if Sam and I both
have a system like that running for us shipping our software, the amount of inference that we're going to need per developer is just really mindboggling.
And that makes sometimes when I think about that I think that even with you know the most optimistic projections for what we're going to need in terms of buildout are underestimating the extent
to which these swarm systems can like turn at scale to produce code that is valuable um to the economy and you know >> you can cut this if if it's sensitive but I'll just cut do you have estimates
of how much your token consumption is like per developer?
>> Yeah. Or or yourself? I don't I don't need like comfy average. I just
>> I feel like for a while I wasn't an admin on the usage dashboard so I like wasn't able to actually see but it was a >> min has gone up.
>> Oh yeah >> but like you know I think >> in terms of how much work I'm doing it's more like I have no worries about developers losing their jobs at least in the in the near term cuz I feel like
well I mean that's a more broad disc.
>> Oh gosh. Yeah. Yeah. You you went there.
I didn't go I wasn't going there.
>> I was just like how how much more are you using? there's so much stuff to be
you using? there's so much stuff to be built and so I feel like I'm basically just trying to constantly like I have more ambitions than I did before personally. So can't speak to the
personally. So can't speak to the broader the broader thing but for me it's like I'm busier than ever before.
I'm using more tokens and I am also doing more things.
>> Yeah.
>> Yeah. I don't have the stats for myself but I think broadly a thing that we've seen that we expect to continue is Javan's paradox uh where >> you can't do the podcast without seeing it.
>> Exactly. We've done it now. We we can rap. We've done we said the words uh you
rap. We've done we said the words uh you know phase one tab autocomplete people paid like 20 bucks a month and that was great. Phase two where you were sort of
great. Phase two where you were sort of iterating with these local models today people pay like hundreds of dollars a month. I think as we think about these
month. I think as we think about these highly parallel kind of agents running off for a long times in their own VM system. We are already at that point
system. We are already at that point where people will be spending thousands of dollars a month per per human and I think potentially tens of thousands and beyond where it's not like we are sort
of greedy for like capturing more more money but what happens is just individuals get that much more leverage and so you know if one person can do as much as 10 people like yeah that tool that allows them to do that is going to
be tremendously valuable and worth investing in and taking the best thing that exists. One more question on on
that exists. One more question on on just the cursor in general and then uh you know open-ended for you guys to plug whatever you want to plug. How is cursor hiring these days?
>> What do you mean by how?
>> Uh so obviously lead code is dead.
>> Oh okay.
>> Um everyone says work trial. Different
people have different levels of adoption of agents. Some people can really adopt
of agents. Some people can really adopt uh you can be much more productive but other people you just need to give them a little bit of time and sometimes they've never lived in a token rich place like cursor. Mhm.
>> And once you live in a token rich place, you're you you just work differently.
You need to have done that. And a lot of people Anyway, this is kind of open-ended like how's agentic engineering, agentic coding changed your opinions on hiring? Is there any like broad like insights?
>> Yeah. Um basically I'm asking this for other people, right?
>> Yeah, totally. Totally. To hear Sam's Sam's opinion, we haven't talked about this uh the two of us. I think that we don't see sort of necessarily
being great at the latest thing with AI coding as a prerequisite. I do think that that's a sign that people are sort of keeping up and curious and willing to upskill themselves and like what's happening because as we were talking
about the last 3 months the game has completely changed. It's like what I do
completely changed. It's like what I do all day is very different.
>> Like it's kind of my job and I can't >> Yeah. Yeah. Yeah. Yeah. Totally. I do
>> Yeah. Yeah. Yeah. Yeah. Totally. I do
think that's still like as as Sam was saying sort of the fundamentals remain important in the current age and being able to go and double click down and models today do still have weaknesses where if you let them run for too long
without cleaning up and refactoring the code will get kind of sloppy and there'll be bad abstractions and so you still do need humans that like have built systems before know good patterns when they see them and know kind of where to steer things.
>> Yeah, I would agree with that. I would
say like again cursor also operates very quickly and leveraging agentic engineering is probably one reason why that's possible in this current moment.
I think in the past it's just like people coding quickly and now there's like people who use agents to move faster as well. So it's kind of like part of our process will always look for like um we'll select for kind of that
ability to like make good decisions quickly and like you know move well in this environment. Um, and so I think
this environment. Um, and so I think being able to kind of like figure out how to use agents to help you do that is an important part of it, too.
>> Yeah. Okay. Uh, the fork in the road, either predictions for the end of the year, if you have any, or plugs.
Predictions are not going to go well.
>> I know it's hard.
>> They're so hard. Get it wrong. It's
okay. Sorry. Yeah.
>> Well, one other plug that may be interesting that I feel like we touched on but haven't talked a ton about is a thing that the kind of these new interfaces and this parallelism enables
is the ability to hop back and forth between threads really really quickly.
And so a thing that we have >> you want to show something or Yeah, I can show something. A thing that we have felt with uh local agents is this pain around um context switching. And you
have one agent that went off and did some work and another agent that that did something else. And so here by having, you know, I just have three tabs open, let's say, but I can very quickly,
you know, hop in here. Um, this is an example I showed earlier, but the actual workflow here, I think, is really different in a way that may not be obvious where, you know, I start the morning, I kick off 10 agents or
something, the first one of them finishes, come in, watch the video.
Either is kind of close. And so I might send a follow-up. I might say, "Hey, make it red." or I might hop into the desktop and try it out. Um, and within,
you know, 90 120 seconds, I've kicked this one back off and either started the merge process like CI is running now and I'll come back to it later or it's off with some additional follow-up information. And then I can hop into the
information. And then I can hop into the next one. And in the next one, I hop in
next one. And in the next one, I hop in and I'm like, okay, this looks kind of interesting. Um, actually try it out for
interesting. Um, actually try it out for real in the app. I want to see it um in action, not just in the gallery. So, I
can kick that off and the agent will go and work on that because maybe I wanted to try it out like what the button looks like in the actual thing. And then here I might hop in as well and you know check the video here or or or do
something. And so you're really
something. And so you're really parallelizing much more and uh you know follow up here, check in there. It's
much more this higher level of abstraction and having the uh different desktops where you can hop back and forth and you're not like oh I checked out this branch oh where was that work
tree again is really like solving for that which we've ourselves have struggled with in cursor and and and these local agents to be like where was that diff again it's lost in some work
tree never going to find it oh my local thing is rebuilding um oh oh just make another one right uh that that's what you end up with and then you wait for five more minutes for it to run. Um, and
so this is really like a new way of of just parallelizing that we found to be really fun honestly where you're just hopping in and injecting taste and you're like that doesn't quite feel right. Oh actually this is not
right. Oh actually this is not architected quite right but you're just focusing on those like taste interesting questions.
>> And for me the cloud ecosystem too also enabled this to be like something that is kind of like adding productivity to my dead time like commuting or kind of like overnight or something like that.
the fact that I don't have to leave my computer open.
>> There's no cursor. There is a cursor mobile app.
>> If there is, I'm not sure. It's like the current thing. We I use it on my phone
current thing. We I use it on my phone all the time, just on the web. So,
pretty good experience there for checking in um and unblocking. Um I
think Yeah. I mean, you can see the videos and stuff in the in the web app, which is awesome.
>> Yeah.
>> Yeah. Well, I mean, I think I think this is uh one that that the add will inherit the earth like the if you're if you're like your attention span is cooked, but like you know you
still you still can manage like actually this is good for you.
>> Yeah. Yeah. Uh but also uh I think this is where the coding tools start coming into conflict with the productivity tools where like the linear >> right the the canban boards because what
you have there is cool but you know what you actually need a cabin board like which people have like vibe vibe can vban is is out there open source uh I'm sure you guys have like talked about it but like we'll start to conflict because actually the code doesn't matter anymore
it's it's the process of like the human interacting and checking in and like saying like getting the World of Warcraft uh on package to go like work work or whatever like job done or I don't know it's it's it's like an
interesting like future productivity thing.
>> Yeah.
>> I also think like another big theme like last year it's called like the year coding agents. This year another like
coding agents. This year another like coding agent spill over to the real world into cloud co-work and all the other stuff. Yeah. I'm sure cursor is
other stuff. Yeah. I'm sure cursor is going to focus on on software but like let's call it like openclaw is like extremely mind expanding in terms of
like I did not know that could happen.
>> Yeah. And it's all like based on a cod agent based right you know >> totally >> and I think one of the things that like cotton kai you know friends and family that are not in the software world
that's interesting is I do uh speaking of predictions I do think that we are going to start see other industries go through what software development has started going through I think by virtue of how good models are at writing
software and how early adopter the people building the new technology are and trying it out and applying it to themselves that certain kinds of shifts will happen too to other industries and there's a
lot to be learned from how that's gone down and is continuing to go down in software um in terms of you know all the interesting questions about to what point do people get more leverage when
do you start changing the role to become much more generalist like all of these questions that we've seen some data on but we'll see a lot more in the coming months that will happen everywhere any
parting thoughts any plugs of your own >> not Good. We covered so much good. We
Good. We covered so much good. We
covered we cover coming up with a prediction. I mean, I just think agents
prediction. I mean, I just think agents are going to keep getting better. Going
to stop doing as much manual coding.
Probably zero lines of code written in the whole month of December this year by myself. 100% agents is a personal
myself. 100% agents is a personal prediction but >> Oh, you're not at zero today.
>> Um, >> what what in what cases?
>> I think honestly it's like 1% if I like just I'm like get frustrated and I'm like I don't want to like go have it tell an agent to like change this one thing. prompting. Sometimes I feel like
thing. prompting. Sometimes I feel like working on prompts sometimes I still go in and manually edit because it's so like bare intent transfer that like telling the agent what I want. It's like
writing an essay where like I don't use agents to write essays yet because the process of writing it is the thinking.
>> I still can't stand AI generated writing. So yeah, I also can't have the
writing. So yeah, I also can't have the agent write prompts.
>> Uh so no DSPI, no Jeepa, nothing like that here. uh we have some internal uh
that here. uh we have some internal uh tooling around some of some of the prompt optimization things but uh there's a fair amount of just like what concepts do I need to communicate to the agent to the model. I also noticed like
you know another thing I'm also also looking for is voice. I noticed that you didn't use your voice to code. Uh even
open when when we do podcast with them they don't use their voice and I'm like well at some point this gets good you can stop typing.
>> We have some people who like that a lot internally and I think we'll be experimenting in that space too for sure. Do you use voice a lot?
sure. Do you use voice a lot?
>> Not a lot. Sometimes I mean that's bound to my caps lock, right? So I can press it. I just
it. I just >> And when you use it, do you want it to talk back or you just want >> Yeah.
>> Just dump it in.
>> Yeah. Yeah.
>> But like the brain dump is good. Yeah.
Because you can interrupt yourself. You
can go on a tangent, whatever. It just
captures everything and slop it into LM.
It's fine.
>> Yeah. The way that we did this with Autotab was people would record full screen recordings with audio to teach the model like how to do a task. And one
of the funny things that we learned was people would use their Siri voice where they would like start talking in like short stilted sentences and enunciate really clearly because they were used
they last used AI like 2 years ago where you had to like >> Apple has damaged like an entire generation of like people's expectations.
>> Exactly. And we had to be like no like I mean you're very native so you do this but just like dump everything in. You
can say um you can repeat yourself. You
can like contradict yourself. the models
are smart enough to figure it out, >> but it's still very bad. Like, so voice coding was always I considered like the hardest part because you have to say like technical things that the spell like spelling matters, you know,
capitalization matters and like it's all not in not in voice. So, we'll see. Uh,
so far it's been more sort of emotional companionship, that kind of stuff, but at some point it's going to hit voice uh coding.
>> Yeah. Um, I have a prediction for you.
Yeah. I predict that by the end of the year the volume on I think it will take longer than people think and longer than we
think for cloud and agents working in their own boxes to surpass local agents.
But I think that crossover will happen before the end of the year and probably by the end of the year agents running in the cloud will be a multi like more than 2x the volume of local agents.
>> Okay, you're leaving me an opening.
What's not good today? Yeah, there's a bunch of hard things. So, one of them is just getting those sandboxes to be really, really good. And a thing that was part of this launch that we spent an inordinate amount of time on is
cursor.com/onboard
cursor.com/onboard where you pick a repo, add secrets, give it access to things, and the agent just goes off and installs things.
>> Yes, I think the whole thing that was my favorite.
>> Yeah, we worked a lot on that. Sam and I in particular um spent a lot of late nights making that good. But, um there's still a lot to do there, right? like
setup one two two things maybe it's too bad >> it's too slow um working on it uh setup is not like a unitary thing where everything is set up or not right like
things will break over time you have new dependencies you need access to new systems like you change where your database lives so that's one part of it and then the other part of it is um
having these agents run in the cloud and and be more autonomous we've really started to see the lack of memory and uh Sam you know as someone who's thought a thought about this uh once you start getting the model kind of doing
operating the codebase there's more particularities that are not like it's not just a read file tool it needs to know how do I start up the backend how do I check the status of the backend that's very particular to your codebase
and even if it's great at npm run watch or whatever the like you know default things are there's always quirks like everyone has quirks and getting the model good at those things will require
more work and we're working on that but we think that that will be one of the big unlocks is having them be onboarded not only in terms of their environment but also in terms of their understanding
of design trade-offs, how the codebase works, how to be a good developer in any one codebase. It's locker rules. It's
one codebase. It's locker rules. It's
going to be something else. Is it going to be a file? Is it we just call it the markdown file a different name? And
>> I I don't know. I mean, one thing that we learned at Ki being cursor the company this year um there there's a really great blog post um that the Jai and other people in the agent quality team put out about um dynamic file
context.
>> Is that is that your team or is there a different team?
>> Different team. Yeah. Um and they were working on basically um doing a lot everything is file system and so a lot of my thinking personally on memory this past year has changed to be more aligned with that where it's like giving the
agent pointers to things, annotations to things. The second thing I think um that
things. The second thing I think um that I've kind of started to think differently about memory is memory is a subset of agent self auditability and self-awareness. So basically like the
self-awareness. So basically like the agent might want to propose annotations or links or memory like files to itself when it finds that there's like some gap
in its functionality in its own harness that might need to be filled by like some piece of information on a semi-permanent basis. But there's a
semi-permanent basis. But there's a whole bunch of other things that are a side effect of self- auditability that are really interesting. Like potentially
finding like conflicting instructions or like skills and rules that like might like be like eh these are kind of like bugging each other and also things like fixing like devx problems that it runs
into. I think that basically like the
into. I think that basically like the dynamic file system stuff is probably like very promising for memory and there's also this notion of needing to have the agent be a little bit more self-aware in terms of like being able
to identify gaps in its own functionality and decide how to fill them.
>> That's such a good point. Like
self-awareness broadly has been a really big thing that I think Sam has pushed us to do more and more of where the agent should understand how its environment works. It should understand how secrets
works. It should understand how secrets work. Like it needs to be kind of
work. Like it needs to be kind of self-aware about its own harness and its environment.
>> And you think this is not inherent in the model. You have to do
the model. You have to do >> well specifics, right? Like if it's running in cursor versus some other sandbox, that's a bit different. Um and
then the other part of it that starts to get really interesting is when the model starts editing its own system prompt.
>> Yeah.
>> What does that even mean? How do you do that safely? And in
that safely? And in >> do that. This is just research, right?
This isn't this is like I think I will do that. Yeah, it will manage its own
do that. Yeah, it will manage its own context >> and so system prompt is part of the context and you can argue about >> yeah like other things that it might decide to turn off or on depending and
all this I mean self-awareness to us in this context is kind of like um not like the model itself having a notion of consciousness but more like knowing like what system it's operating in and the
the constraints of that system and potentially being able to have agency in like optimizing itself to operate best in the in that system. Um this was like one of the first things I learned at dot
when we launched was that um I like we had like made the model or made the agent or whatever we would call it at that time. It was far less agentic made
that time. It was far less agentic made the product work very well at a certain number of things but didn't have complete self-awareness of like its own boundaries. So people would be like hey
boundaries. So people would be like hey can you do this thing and like the thing was there and could be done and the and and the product would be like oh no and I'd be like but you can. And so like basically like that was one of the
earliest things I found.
>> Just believe in yourself.
>> I know as a product developer like it needs to both be able to do the thing and it needs to have complete knowledge of its ability to do the thing. Those
are not always obviously the same like part of the prompt at all.
>> Yeah. Yeah,
>> it's something that I think has continued to be a theme in the ecosystem um that users will often attribute increased intelligence to a system that is more highly self-aware um and is more
able to like manipulate itself to do well in a system if that makes sense.
>> Yeah, this is more abstract than I ever thought we'll get at this tier discussion.
>> Couple this in the kind of conversation that you have and like >> we talk about this stuff all the time >> to improving agents in general. Yeah, I
think to your point right about the the agent layer and thinking a lot about models and the harness and the product and the affordances like that falls from that.
>> No, I mean you guys are like my sort of needing example what an agent lab looks like and like can be successful and like I think people always hungry for insights into how you guys operate. So,
thank you for taking the time to share.
Yeah, >> thanks for coming.
>> Yeah, thank you.
Loading video analysis...