Meet Droids: The Agent That Outperforms GPT-5

By Ray Fernando

Summary

## Key takeaways - **Droids Beats GPT-5 on Benchmarks**: A small team built an AI agent that outperforms GPT5 and Claude using the exact same models, achieving 58% on terminal benchmark vs Anthropic's 43% with Claude Opus. [00:09], [01:09] - **31x Faster Development for Companies**: Companies like MongoDB and Zapier see 31 times faster development with Droids, leading to a $50 million raise. [00:15], [00:39] - **Use Benchmarks to Identify Agent Bugs**: Benchmarks help explore and test micro-decisions like waiting for commands in agent scaffolds faster than user feedback, without overfitting. [03:37], [04:07] - **Model-Specific Editing Formats Essential**: GPT5 prefers patch-based editing while Claude uses string replace; mismatched tools hurt performance due to post-training differences. [17:09], [17:26] - **Shift-Tab Activates Specification Mode**: Enter specification mode with shift-tab for complex tasks; focus on what to build and validation criteria, not step-by-step plans. [08:13], [36:11] - **12,000-Line Refactor from Meeting Transcript**: Team transcribed a 60-minute design meeting, fed it to Droid which created specs and delegated to sub-droids for a successful 12,000-line frontend refactor. [42:27], [44:31]

Topics Covered

Part 1
Part 2
Part 3
Part 4
Part 5

Full Transcript

All right, so here's just what happened.

A small team in San Francisco just beat OpenAI, Anthropic, and Google at their own game. And they built an AI agent

own game. And they built an AI agent that basically outperforms GPT5 and Claude even when using those exact same models. And they just raised $50 million

models. And they just raised $50 million because companies like MongoDB, Zapier are seeing 31 times faster development with their tool. And yesterday I tested

Droids Live for the first time and honestly it did things that I haven't seen any other agent pull off like basically planning the error handling the execution of the code. And to me it

feels like a whole another paradigm of AI coding. And if you've been coding

AI coding. And if you've been coding with me for a little while, especially since the very beginnings, we've been involved in the evolution of seeing agents like cursor, uh, other agents like Windsurf and all these like agents

coming through into the scene and kind of taking over and helping us assist with our AI coding. So, I want to introduce to you today my guest. His

name is Ino Reyes from Factory AI and he's a technical founder who are using droids to write code daily. And here's

what we're going to be covering in the next 45 minutes. We're going to cover why they can extract 58% performance from claude on the terminal benchmark when enthropic themselves can only get

about 43%. We're also going to talk

about 43%. We're also going to talk about the specific techniques that you can steal for any AI coding agent tool that you're using. Also go over like what's the current ecosystem uh that's like what's actually missing

from the current ecosystem especially with these different AI coding tools.

And we're also going to go over the answer to the question that I know half of you are asking right now is do I really need another CLI coding tool.

So whether you're brand new to coding or you're just trying to build your first app or you're a seasoned dev who's actually burned by u the token AI promises or maybe you're just a designer who wants to ship without learning

React. Uh I think we're going to go

React. Uh I think we're going to go through a lot of different types of things here. And if you haven't had a

things here. And if you haven't had a chance to, make sure you check out the uh link in the description on YouTube.

You're going to get 40 million tokens if you go ahead and sign up using my link.

It's RF. me/Rayactory.

And I'll have some that actually pinned into the live stream right now on YouTube. So, if you go ahead and click

YouTube. So, if you go ahead and click that link right up above, you'll be able to do that. So, um, you know, thank you so much for joining into the show. It's

really a pleasure to kind of have you on the stream. We're going to be talking

the stream. We're going to be talking about a lot of different things here as as you may imagine. And so I think kind of the first thing is like you're beating OpenAI's agents and GPT5 like

how how is that even possible?

>> Yeah. No, thank you for having me. I'm

I'm really pumped to to to chat about this. You know, the the sort of general

this. You know, the the sort of general principle that we go into everything with is there are lots of uh unknowns with LLMs. And if you start to look at

all the different things that are needed in order to solve these more complex benchmarks like terminal bench, you there's just such a long tale of things that are not related to the model's own

capabilities itself. And what I mean by

capabilities itself. And what I mean by that are if you need to, as an example, run a a a command and then wait for it

to finish while you're doing something else, that is a feature of a product, not uh something that is like inherently LLM bottlenecked. uh and there are

LLM bottlenecked. uh and there are hundreds of these behavioral patterns and and and configurations and and really like product level decisions that you have to bring into your agent that

are that really fall ultimately on the scaffold. Um and so what we found is

scaffold. Um and so what we found is that a lot of benchmarks really help you explore and test all of these micro decisions at a much faster rate than

what uh even like user feedback can provide. Um and and so optimizing for uh

provide. Um and and so optimizing for uh a benchmark is never a good idea uh because you're going to end up trying to overfit. But if using if you use

overfit. But if using if you use benchmarks as sort of like a way to identify bugs, identify um features or patterns that you actually have to build

out. Um I think that that allows you to

out. Um I think that that allows you to unlock a lot more capability in the products. Um, so if you think about what

products. Um, so if you think about what these tools were built for, right, they were built for OpenAI and for anthropic, yeah, referring specifically at least to

claude code and and codeex CLI. Um,

first and foremost to help basically post-train the models with longunning agentic trajectories. That that is their

agentic trajectories. That that is their goal, right? They wanted to make the

goal, right? They wanted to make the models good at being an agent. And one

of the only places where you can run uh an LLM for many steps and get feedback uh is coding right now. And so I think that although teams weren't necessarily

thinking 100% this is our end all beall product. They were thinking we really

product. They were thinking we really want to optimize for post training and and and the research component. And so I think of a lot of what companies like us

exist for is to say, well, what if we made this into the best possible product experience that it could be?

>> And in terms of product experience, when people are signing up right now, they're going to get that 40 million tokens, you know, as behalf of this whole sponsorship. Uh for those who don't

sponsorship. Uh for those who don't know, uh Factory AI is actually sponsoring this entire weekend. And so

we got to code live yesterday. And if

you haven't had a chance to, make sure you grab those 40 million tokens. Uh so

I really appreciate you kind of giving that out as a a thank you for the community uh for reaching out and kind of getting inspired because uh people may have heard your name factory and then you know I think you're involved in

enterprise and very much so forth but now uh like you're saying for this specific use cases to reach out to you know like as far as like the impact of things how how would you say that people

should approach using you know the droid this CLI type of tool in their workflow because I think a lot of people like me it's like I have my existing prompts. I

have my existing slash commands. I've

made so many investments in cloud code in OpenAI's new codec with GPT5. And so

I come in with all this pre-existing thinking. How should I kind of approach,

thinking. How should I kind of approach, you know, right out the get-go with this little box? Like what should I do or

little box? Like what should I do or what what do you find are the best practices for people who have been kind of migrating over?

>> Yeah, totally. And and we tried to make it really easy to one in like adopt the standards that a lot of other tools are using. So you know, we support uh

using. So you know, we support uh agents.mmd. We were actually like

agents.mmd. We were actually like working really closely with OpenAI um when we sort of launched that together with a bunch of other tools AMP um uh I

think now it's very well supported but for things like slash commands uh you know I will be the first one to say I think that people sometimes are scared to say look that's a great idea let's

just implement that right uh but we we looked at all these tools and the there's a lot of good things that already existed so we said well let's make sure that developers ers can have

consistency in a product experience that they might expect like all idees have a lot of these patterns for example that are adopted across them or all um you know all tools that help you manage

projects have certain concepts that uh you know like linear jiraa um and so for us we don't want to go too far away so things like custom slash commands uh are

implemented in in the droid at least uh and you can import them from your other tools so you just hit I and then you import exactly from claude or any other tool that you actually already have

custom slash commands or custom sub aents um or even we're about to ship hooks and we'll do the same thing for that. Um, wow. So,

that. Um, wow. So,

>> yeah. So, so it's it should be pretty easy to transition um and and honestly share those sorts of standards and if you're just getting started, I think

that probably where you will see the biggest difference in our product uh is whatever is like the maximum scope like the the the I'd say like just the edge of what you think is too challenging for

a coding agent. I would encourage you to actually maybe even start by saying let me tackle that slightly more complex task uh because we sort of designed the

system to be better at that sort of thing. And so I think that you'll

thing. And so I think that you'll actually see the biggest delta by tackling a larger project, putting it in specification mode, which is basically

when you hit shift tab. Um, and it it is a form of of of not planning, but rather specification of what it is that you want to build. Uh, and and that I think will give you that best intro

experience.

>> Oh, that's really really cool. I'm glad

that you say that because I can't tell you how many tools that I've used and when I speak to the actually founding engineers, I actually that's the first thing I do. I just literally just ble my

whole thing. It's like I want to

whole thing. It's like I want to actually some somebody in my discord, his name is Codex, went through and said I converted everything from Superbase over to Convex cuz I needed a real time

operational syncing thing. And he wrote this really amazing write up and he just says I just threw everything at it all at once. And he he actually did it in

at once. And he he actually did it in less than like I think it was like 20 million less than 20 million million tokens or something. It was a huge migration with all these different types

of points that he wanted to do. And like

it's interesting how you're saying about that shift tab, you know, changing into spec mode and probably a little bit different than Claude's like planning mode because I find Claude's planning mode good, but then it's kind of like

really small and I've had to like make my own documents to kind of work around it saying, you know, here's how actually how I want my spec files to be done with these specific considerations. Um, but

yeah, like one of the impressive things that I've seen is just I can kind of throw lots of ideas. Like yesterday on the live stream, all I did was like I was like, "Hey, I want to add Polar and like here's some docs, here's some other things." And then I was like, "You know

things." And then I was like, "You know what? I've been waiting to refactor this

what? I've been waiting to refactor this large repo of mine in with my transcription app and I want to use the Versel AI SDK." You know, the latest version v5. I don't even have it written

version v5. I don't even have it written for that. I want to have fallback for

for that. I want to have fallback for models. I want to have like, you know, I

models. I want to have like, you know, I have a route handler that's currently being used on Cloudflare and I actually want to, you know, convert it all so I can use uh it on Convex, right? So

that's like rewriting it. So that's all and then Convex has their own schemes and it's a big file as far as like you know all the rules and the reactiveness of data. So like you know like this is

of data. So like you know like this is something I've been wanting to put off for a long time because there's all these different steps right even with these latest AI coding tools and stuff and so I just threw that into Droid

yesterday like it churned and churned and it was actually clarifying with me along the way and didn't gaslight me and say you're absolutely right. I'm just

going to, you know, here's your plan executed just like you've said. Uh, and

that's actually what I really appreciate about the droid agent overall that I've seen, but also others are starting to see that. It's like, oh, okay, like I

see that. It's like, oh, okay, like I can throw a lot at it and it seems to be able to pick things up, but then it doesn't really seem to get lost like other types of things uh or other types

of agents that I've used. And I also find that it doesn't want to like exit out right away and saying, "Hey, I'm I'm done." I I think it seems to be very

done." I I think it seems to be very smart about trying to break up the task into different phases and then complete them and kind of check in with you and say like this is kind of what I've got working so far. Can you verify some

other things or do things like that? And

so in terms of like context management, how how are you doing this? Like I feel like it seems obvious now. I mean

everyone's been talking about context management as a buzzword, but you know kind of where did this thinking come from and how how are you already so good? you know, because like the

good? you know, because like the terminal bench score shows I mean I want to show you real quick I think or most people who haven't seen this yet. Um

basically it's the the score for terminal bench the accuracy by model for opus you're already at 58% using their own model which is insane. GPT5 52%

sonnet 50% and like this is already using claude codes like harness there cloud codes harness here and you're already like eking them out and it's it's for me it's very clear like when I give it all these instructions there's

some obviously some really advanced context management here so I kind of want to hear from you a little bit more like what's the secret sauce like what are you doing if you can I mean I don't want to like you know have you spill all

the beans and stuff like that but I I do want to learn more cuz I this behavior is is emergent to me it feels emergence like it's coming out and I think the more that people are going to use this they're going to start to see like this

is a very different type of thing.

>> Yeah, totally. And by the way, like I will literally spill every bean if I think I think there's so many beans to be spilled that I could be here for hours talking about all these micro

decisions. Uh and and I think that at

decisions. Uh and and I think that at the end of the day, um one thing that I'd like to note on that is these things just change so fast that I

think the biggest advantage or like moat that exists today is not knowing these like this special combination of 30 runes that you have to incite in order

to create an agent like this. but

instead having a team of people that deeply care about doing this every single day and discovering more of these tricks and tips and continuing to optimize. So, so I wanted to start with

optimize. So, so I wanted to start with that. I I think that there's a couple of

that. I I think that there's a couple of different categories of things that people might be interested in and the blog actually goes into some detail on some of these and it goes into less detail on others. So, may maybe like the

higher order thing.

When you think about what makes these CLI tools so addictive, I think that the speed is the thing that makes people care a lot. Uh you want the system to

feel like it's fast, like it's interactive, like if you ceue up a prompt, it's going to, you know, run that command and then it's immediately thinking about the next thing. Um, and

in order to do that, you really have to take advantage of prompt caching. So I'm

that's sort of like one of the big rules for these is you you effectively need to have an appendon agent loop which is actually pretty tricky because how do

you make a decision like for example IDE diagnostics um when a uh when a droid writes code and it's in your IDE um and and you've set it up for that it will in

it will ingest the diagnostics from your IDE to figure out did it write some code that has a lint error um And so where do you put that? Do you put it inside of

the file? Do you put it after the most

the file? Do you put it after the most recent user message? Do you put it uh you know before at the beginning of the session? Um there's a bunch of decisions

session? Um there's a bunch of decisions like that that I think come into things like tool descriptions, the actual base behavioral guidelines, system notifications. And so where you choose

notifications. And so where you choose to even place these pieces of information, um ultimately what you need is you need something like a couple of benchmarks in order to figure out what

the actual best answer is, right? Um and

and there is just a best answer. Um like

you inject system notifications as user notifications immediately after the action is taken. Uh and then you need to remind the agent to continue on its path. Right? So there's a bunch of stuff

path. Right? So there's a bunch of stuff like this, right? and Droid is handling that which is your platform for those who haven't installed it. It's their

their command line interface from factory AI. uh there's a getting started

factory AI. uh there's a getting started guide and you know once you sign up and so forth if you haven't done that so far you can get that up and running and so you know this is like a lot a different

behavior that than other types of agents that folks are using and you know is describing uh this kind of like secret sauce and for those who are listening your competitors are like taking notes take the notes right now

>> but like you're saying I think there's um so many techniques >> that even if you tell everyone and they implement them there's there's still so much more ground to cover and actually I feel like the more that you give this

out the more it's going to help the entire every like ecosystem right because you guys are tackling a lot of different problems. Yeah, >> totally. I mean, I I want everybody to

>> totally. I mean, I I want everybody to have faster agents like like you you should take advantage of prompt caching.

Uh and we have an engineer, his name is Stapon, and he uh I'll I'll let him know that that we shouted him out if he's not watching, but but he was relentless about making sure that every little

action uh is super optimized for the prompt cache and that the whole agent loop handles these uh system diagnostics in a way that keeps the agent going

without breaking the cache. Um, and I think that that really helps um with the with the overall design. There's a

couple other things um with models as well. Uh, and this is actually I think

well. Uh, and this is actually I think one of the big reasons why I really encourage teams to not focus on just one model. I think that's common advice in

model. I think that's common advice in the space right now is just make it work for Sonnet or like just make it work for GPT5 and then you can add more models.

But what we found is that there are these like decisions that get made where if you are looking at how for example an open AI based model works uh and how an

open AI based scaffolding works uh they're going to make a bunch of these decisions that are like just great great decisions that you're you're like oh we should we should do something like that.

Um, you know, GPT5 really likes um a specific type of uh you know, patchbased editing format. Um, but Claude uses like

editing format. Um, but Claude uses like a string replace style editing format.

And so if you don't if you try to switch those, they're going to perform poorly.

So any like scaffolding that or or agent loop basically that gives you the agent a tool that doesn't match the post training of that specific model is going to actually perform worse. So this is

something we see in a lot of the generalist tools is they they they standardize on one specific way to do a thing. Uh and then it ends up being that

thing. Uh and then it ends up being that because of post- training one model's worse or one model's better. Uh now that sounds like it's more work for us. But

what's interesting is what ends up happening is there are these other decisions that you're like well I wonder why they made that. And the realistic answer is is nobody's perfect. And

sometimes people just implement things quickly or you know even this for for how powerful and important the coding space is. It's still really early right

space is. It's still really early right and so what we did is we pick the good out of both systems and then we also encounter a couple of well that doesn't make sense. What if we just did it this

make sense. What if we just did it this other way and what you end up with is something that's just greater than the sum of its parts. And I think that this is something that being a like multimodel company uh helps or like a

company that natively uses both models or more than just those two. Um it gives us this advantage of seeing sort of around the blind spots of the labs themselves when you're building these

sorts of products.

>> Wow. That's really impressive because I think not only that, there was another comment in my discord and I'm also seeing the comments here on YouTube is that like they're saying that the token

efficiency for even just, you know, when they're looking up, you know, like for me, I was shipping a feature in like less than a million tokens and that's kind of unheard of because I would spend a lot more in claim and you see all the

token usage that you use for these types of things. And like you say, there's

of things. And like you say, there's like this interesting balance of the efficiency, but also knowing exactly in the agentic loop, but then the level of detail that you guys are doing in the

testing and then like actually testing the results, getting that feedback loop coming back. And so, uh, for those who

coming back. And so, uh, for those who don't know, you can sign up, I think there's like a 20 bucks a month plan that gets you, I think, 20 million tokens or something like that. But you

can also bring your own key. So, if I'm a user who wants to use Claude Sonet and I bring my own key, I don't really have to be too concerned that they're just going to burn away all my tokens because of, you know, all the stuff that it does, right? I mean, is that kind of

does, right? I mean, is that kind of what I get here? There's like a lot of caching techniques and different things you guys are taking advantage of depending on the provider and depending on the key, you know, the key that they put in, right?

>> Yeah, exactly. We we wanted to make sure that people felt like they had full control over the product. Um I I you know I think that there are there's a lot of stuff when you have things like O

that make it such that you don't really necessarily want to open source everything. Uh but we wanted to get as

everything. Uh but we wanted to get as close as possible to an open extendable tool where you can bring your own model, you can bring your own keys. Uh because

we want developers to know that we're not here to like give you a bunch of free tokens and then pull the rug from you at the last minute, right? Like I

think that that the the pattern of pricing changes and all this other stuff for what it's worth I think is totally just out of the fact that no one really knows too far in advance how these

things might blow up or how um you know how your existing financial modeling or planning accounted for other decisions you want to make as a business. And so I think for us like the one thing that we

really want to do is make sure that no matter what you can use this tool uh and if you have a pre-existing commit to another provider if you are just entering into the space and you want

something that you know effectively gives you um like a beta hedge on the pricing of tokens like when you buy factory uh subscription I think the biggest advantage is you don't have to

pay for three other like max plan subscriptions where you're switching between max $200 plan A versus max $200 plan B. You can just get one where, you

plan B. You can just get one where, you know, our promise is we're going to continue to sell um the the like standard tokens that give you access to all the models from with with uh

priority uh inference for as long as possible. Um and and I think that that's

possible. Um and and I think that that's something that and when I say as long as possible, you have to think that a lot of our business is actually sustained by larger enterprise uh deals where we do a

lot of more platform oriented work uh in order to make sure that we have a strong business. And so I think that knowing

business. And so I think that knowing that our business is actually largely focused on an enterprise revenue stream means that we can actually guarantee a

lot more with respect to uh sustainable consumer token level pricing. Yeah, I

want to touch a little bit about that because I thought I found that really interesting when I was learning more about you guys and I thought it was really interesting because you found that people who were in the enterprise

space when you guys launched a while back, they were actually using this for their side projects and you're just like it's that good where you know you're like hey this usage is not coming from the enterprise it's from this other

really weird app or something like that and you're just like I think this is kind of ready for this other type of space. But the other thing I think you

space. But the other thing I think you guys talked about is that most places like to charge per seat or something like that and you're just like we want to charge you know just per the usage because if you're not using it or you

know what what makes a deal and um I feel like talk a little bit more about the business philosophy here cuz you talked a little bit about it like you're you know your goal is not to come here bait switch people and be like hey use it get so addicted and you kind of flip

the switch on them. um kind of yeah I want to hear a little bit more about that because I think a lot of people don't really understand where you guys are coming from and then kind of what is your like end goal for uh how how you

see like the agentic world of you know coding with AI and so forth kind of evolving.

>> Yeah totally and and I think that it's probably helpful to sort of almost zoom back um and speak a little high level about what we are trying to do as a business. Um we we say all the time our

business. Um we we say all the time our mission is to bring autonomy to software engineering. But what does that actually

engineering. But what does that actually mean in practice? I I think what we've seen is that there is so clearly a transition that is happening to a new

way to build software, right? This

process of specification of what it is you want built. This process of managing the agent and managing the outcomes and sort of seeing what it's doing. Uh and

ultimately this process of review. uh

this is changing from uh I'm the one driving every single aspect of every line of code uh and when you come to an organization with a very capable tool

and a very capable platform right that's about 50% of what what needs to happen you need to get the tool in your hands and you're maybe halfway there the other

50% is we as a developer community need to start to learn the this new way of building software and so part of that is on the product to be intuitive, but I

also think part of it is is really to to share and build up this like collective patterns that work, these workflows. And

so a lot of the the the the effort that I think we make as a business is not only thinking about the the tool at any given moment, but also thinking about how can 5,000 developers at a large

company actually transition to this. And

and I think the answer cannot be. You

know, you bring in a bunch of consultants or you you you you put on a bunch of workshops. That's important.

The workshop part is important.

Education is a big part of this. But I

think what what what probably needs to happen is the the tools need to evolve and the environments that they actually operate and need to uh evolve as well.

So, what you're actually going to see from us over the next couple of months is a ton of changes in the product that make it more intuitive, but a lot of it is actually going to be focused on how

can you build code bases, environments, and tooling for yourself that maximizes the success of agents. Uh and and I think that that is where we spend a lot

of our time with the largest companies with is actually saying oh well why is you've used agents now for three months

and these five code bases everyone on that team says this is the best thing I've ever tried in in in my life everything's changed but these other five teams are like agents are

garbage we've tried every single tool in the space and none of them are working right >> what what we said is why like why is that happening when When we talk to those teams and you start to look at their code bases, the way that they've

organized them, the tooling that they have, you you start to realize there's actually this concept of like readiness for agents.

>> Um, and and I think that a lot of the the what the world is sort of entering into is the tools are here. If you've used a Droid or if you've used a lot of other

tools, right? Cloud code is fantastic.

tools, right? Cloud code is fantastic.

Cursor is fantastic. Then you know that like the capability set is has come so far. Um but now it's time for us to

far. Um but now it's time for us to start meeting the tools a little bit more. Uh, and I think that that is going

more. Uh, and I think that that is going to be like a very exciting time because it really, you know, our goal is to be, you know, is to create software factories and and part of that is sort

of the the the the robots in the factory, the tools in the factory, the droids of course, but part of that is making sure that your processes are standardized, that your your company

operates in a way such that every little piece sort of falls into place correctly, even if it doesn't have oversight from humans. Um, so that that that's sort of a long that was a long

way to say uh I think a lot of what we're trying to focus on is not just the tool but rather a philosophy of building that is everything that comes after you

have the tool.

I think you guys actually coined the term agent native development which is basically kind of what you're describing and I used to be an engineer at Apple

for 12 years and there was there's projects especially like in the build system where I'm working with thousands of projects like coming in together and something would happen and you'd write a script and you do whatever the build

falls apart you kind of kick it you know one way and so forth and you're dealing it's not only like legacy code bases but new code bases compatibilities you're dealing with hardware types. You're

dealing with lots of different complexity and you don't have enough time because you're just chasing the latest priority because you have to ship. And so

ship. And so how like as these teams are currently adopting this for me like the the the kind of this the smoke that I see that's

like pretty obvious. It's like if the transformer came in with GPT3 to kind of make this evolutionary leap where it can actually understand more of the world

around it just with the more training data and you know do some better classification and then with GPT4 this next leap in like actually generating code and kind of sticking to a

prescriptive type of thing you know and then where prompts are kind of guiding the railings and then you know context is really important uh and then you guys kind came in right around this this time

frame of like, you know, let's just let's just do the agent native way because now that the models can actually output stuff that's good enough for us.

>> Where do you see this kind of going in the next like 18 months? Cuz to be honest, the last 6 months have felt like seven years of work.

>> Yeah. I mean thinking models, >> collod code you know like the agentic workflow the tools the you know every time a tool executes the thinking in between the next step the next step the

next step and then your guys' work coming in with the droids you know to kind of rein this in a little bit and kind of make it much more I guess

steerable but actually work uh how yeah like where where do you see this kind of going in that bigger perspective of things not only for big enterprises but maybe also for uh individual developers too cuz I feel like this is a vast range

of things and now that I'm working independently I'm having to sort of scale down some of these workflows that I used to use but I still want to have them. I still want reliability. I still

them. I still want reliability. I still

want testing. I still want you know uh you know architecture diagrams. I still want to follow a plan. I want to see a progress you know like but I don't necessarily need you know 50 different people to help me with that. I just want

a conversation.

um how do you how I mean because there's there's a lot of gaps to bridge here and I a lot of different directions to go >> some CEOs are even saying that coding is dead you know like I don't know you know

there's like I'm I'm a little confused you know just kind of get more of your insight here because you know this is really speaking me speaking from the heart of where you know you what do I do

now like where is this really going >> yeah totally I mean my my perspective is that uh you know sort of almost like working in reverse order the it

literally never been a better time to start to learn how software works. Um,

and and I think that how accessible it is becoming will not mean that coding is dead. It's really it's it's more like

dead. It's really it's it's more like software is entering a new era and that era is the same way that nobody optimizes like bite code anymore, right?

like no one is out there decompiling their their Python or TypeScript codebase and inspecting the the the the raw compiled code and not not saying that like LLMs are like the new

programming language or something like that. I think they're non-deterministic.

that. I think they're non-deterministic.

There is a difference between the tool and its output with code being the output. Um but but I think that the the

output. Um but but I think that the the thing that is coming next at least from the perspective of the products and then of the outcomes I

think those are two separate things.

I'll start with the outcomes right now today when you and I vibe code and we get an initial project created. Um this

concept of vibe coding even comes from this notion that what you're building is sort of vibe based. It's not necessarily structured in a way that a that a that a

uh that a professional software organization would say this hits our standards but yet it still works and it does what you wanted it to and and even the user doesn't necessarily understand

exactly what's happening. Our thought

process is we should keep pushing that to the point where it does look like what a professional software organization would accept and the user still can at a high level understand why

some of these things exist. So a couple of things that every software organization like theoretically should have. You should have agent

have. You should have agent documentation for your agent that helps guide it on how to navigate in the project. You should have testing

project. You should have testing infrastructure unit endtoend tests. You

should have automated validation like llinters and type checkers um and and and secret scanners, right? You should

have pre-commit hooks. You should have CI/CD. You should have logging, error

CI/CD. You should have logging, error tracking, um you should have security scanning. who if you are not necessarily

scanning. who if you are not necessarily a professional software developer, you shouldn't need to think much more than I know that I've got tests in my codebase

and I know that thing tests this and I know that secrets get scanned so no one can ever when I deploy my my my vibecoded website no nothing's bad is going to happen right and so when you

think about it from that perspective there are all these things that you can actually build such that the agents enforce these highquality practices um where They they say to you, "Oh, by the

way, like as we're getting started, I noticed you don't have testing infrastructure. Before we proceed, I

infrastructure. Before we proceed, I think it's probably important that we get this set up." And then you can say, "What's an end toend test?" And probably by the end, you'll learn what an end

toend test is. You'll learn that concept. You don't need to know about,

concept. You don't need to know about, you know, high quality dependency injection in order to know that testing your product is good. And I think most people will start building this intuition around these like primitives

of good software development. Um but

they won't need to be you know 10 years in the industry know every little nuance of the language in order to get that success. So I think that's one big area

success. So I think that's one big area like agent readiness means having high quality software practices built into the code bases and that means your agents need to know how to make that

work. So that's that's I think probably

work. So that's that's I think probably like 50 60% of what is coming uh very soon.

>> Wow.

>> That's very very promising because I can't tell you how like exhausting it is for me sometimes to bring the models forward that I've been using. You know I

have to I mean I have made up a specific documentation folder for when I ship features. I have you know like it's like

features. I have you know like it's like a mini project management thing that I had an agent build for me that says like keep sh whenever this actual documentation here is in active folder it's active right so I'm just taking

advantage of the agentic loop to like read the slash files you know the actual file system to know like oh in my context I see active so therefore I'm going to think it's active >> if I see archived that so it's like you

know these little kind of small mini hacks are just you know I don't want to write a tool I just wanted to have it when it's reading my codebase to already understand that in one pass. Um, and so it's like, uh, these little tweaks are

now you're saying like I I feel like everyone's kind of sharing these publicly alive and that's kind of why you're saying like I want these agents overall like as a philosophy, you know, to share what we're doing underneath the hood in some way because there's just

almost too much to chew off. But every

single thing that you guys are doing is kind of getting everyone closer to be there. And so if if someone who hasn't

there. And so if if someone who hasn't tried Droid yet and and you're kind of just paying just kind of popping into the live stream right now, what what is the first thing that they should be kind

of taking advantage of to get started?

Like you know, in terms of like is it just like the turning on the spec mode and just having a conversation or like what do you think that's kind of will get them uh to see what we're already seeing in this

like you know agentic future with their coding stuff?

>> Yeah, totally. I mean, I think that the the the best way to get started um is to when you have this when you have a task in mind that you want to delegate to a

coding agent, um I think you should really think this is a task that might take me somewhere between 1 to 3 hours, 1 to four hours to do manually, right? I

I'm I'm going to read through the codebase. I'm going to make a couple of

codebase. I'm going to make a couple of changes. uh and and when you have a task

changes. uh and and when you have a task like that that is something that is like very squarely in the domain of a single session with a coding agent right you

enter into the droid you uh I would suggest using shift tab to move into specification mode >> u and then what you do is you say this is what I want to build

>> and then you let the agent sort of enter that and build that specification uh and and I think one big thing people may be familiar with like planning modes um

with in in other tools what we saw was planning was very oriented around what the agent is going to like do next like first I will do this then I will do that um and because that's that's what a plan

is it's what you're going to do next it's your step by step for us specification is much more about what it is that you'll actually build and how

you'll validate that what you've built is correct so that might mean it has this functionality and like do not stop until it has you

know functionality. But it also may mean

know functionality. But it also may mean passes these tests or uh properly sends this sort of message. Um and you'll even see the droid occasionally and you can

also just prompt it if you don't see this will write up temporary scripts to like test its own functionality and then it'll remove it because the user didn't

ask for those tests to be like fully uh you know checked in. Uh, and so for us, this validation loop is incredibly important in getting the agent to do the thing that you'd like it to do. And I

think you see that best when you actually specify in advance what it is that you want.

>> Wow. Wow. So, that's that's a really amazing pro tip that if you're just tuning in right now, maybe you just got Droid installed. maybe the folks who

Droid installed. maybe the folks who were testing with me live like I didn't even I sort of kind of broke into it but I didn't really like realize that I should just kind of start maybe start out with this and kind of see you know

like better define the spec with the droid. Uh so I'm I'm like I'm just can't

droid. Uh so I'm I'm like I'm just can't wait to get off the stream and start like coding you know live with this cuz uh that's it's >> I I can't Okay, so there is a thing for

me I feel like um I I kind of poke fun of it. We call ourselves like the AI

of it. We call ourselves like the AI anonymous where like you know it's like we we use AI a lot. We really are into these tools. Uh there are a couple

these tools. Uh there are a couple people in my community. You all know who you are. Uh they hit the three comma

you are. Uh they hit the three comma club and they basically have hit have consumed three a billion tokens, you know, like in a month, you know, when using these types of AI coding tools. So

they have lots and lots of experience.

They're constantly like they're just working all the time and they're like super locked in. And I think here something like that just hearing that keyword of like the spec driven. So when

you you're using the Droid, just hit shift tab. You'll see the little thing.

shift tab. You'll see the little thing.

It'll turn purple and say spec. Uh I

think it's in spec mode or something, right?

>> Yeah. There there's a couple of different like modes like you can move to spec mode and then there's auto run.

And uh auto run sort of lets you select a autonomy level which sounds a little complicated but basically what we saw is a lot of tools gave you two options like

I have to approve every single decision or dangerously accept all commands that might get fired off by this system. And

so our our thought process was we should probably make that a little bit more uh everybody is going to go and say just let this thing run. And I think that ends up actually being more unsafe in

the end. But and and so spec mode is

the end. But and and so spec mode is basically the shift tab and it should say spec mode and it'll be purple.

>> Wow. So if you guys haven't had a chance to try, ladies and gentlemen, I have a link that's currently in the description and it's also in the YouTube live stream. if you haven't have a chance.

stream. if you haven't have a chance.

I'm going to also type it into the comments so you can see because there's some folks watching on X. There's also

some folks watching on LinkedIn. Uh on

LinkedIn, it'll also be in the description and then also on Twitch as well. So, that'll get you started if you

well. So, that'll get you started if you haven't had a chance to. It's going to be 40 million tokens you can get to use over 2 months. Uh I know some folks are only seeing it show up for 30 days or something like that. So, I'm going to be

working with the team to make sure that uh that gets kind of looked into and so forth. But, I mean, 40 million tokens,

forth. But, I mean, 40 million tokens, it's enough to ship, I'd say, like 40 features. Uh each feature that I've been

features. Uh each feature that I've been shipping has been around like 1.2 million tokens. It doesn't really

million tokens. It doesn't really consume a lot of tokens as much as I thought it would be because they seem to be extremely efficient with uh their uh actual token usage. Uh it only cost you

20 bucks to get about 20 million tokens worth of uh stuff per month. So if you sign up with my link, basically you get two months worth of tokens for free just to get started, which is awesome. I

don't earn any commission, but this is just really a part of the sponsorship that Factory AI is doing to not only sponsor the channel, but also actually give you access to get started with this. And as as we found out through

this. And as as we found out through this thing, it's like their mission isn't to get you addicted to like stay with them. I think it's just more about

with them. I think it's just more about like literally putting this type of software out there into the world so that it's like like their belief system just right from the very beginning has

just been you know the agents like this specific type of detail uh Eno's background and uh stuff like that. We

didn't even get into that so we could pop into a little bit but like for me it's just been really important to make sure that you everyone in my community can code with these types of tools and understand that something like this is

out there. Uh, and this is something

out there. Uh, and this is something that I so heavily believe in as a future because I have been coding with these AI tools from the very beginning. And I do believe that if you can control like the

input, control the output, if you can actually kind of get what you're ever in here out into a computer and let it do its work, we're at that stage now where these types of like tooling is finally

arrived. late 2025 entering 2026 is

arrived. late 2025 entering 2026 is going to just be absolutely insane because this whole year I remember people were saying 2025 was a year of agents you know I don't think they

really understood what that really was going to play out into uh and and and you know the coding stuff is kind of like the best benchmarks here and so um as technical founders like have there

been anything that you've used droids you're like damn this is actually something really special that we've built here have there been any projects like that that you've been kind of working on you're like dang

Yeah. Yeah. Yeah. Totally. I I think

Yeah. Yeah. Yeah. Totally. I I think that the the thing that has been most magical is um when you you sort of go into we have this very common workflow

that we have internally at factory. So

we will enter into a meeting room and we will sit there with a a transcription tool. And so we'll actually have this

tool. And so we'll actually have this full like 60inute engineering design discussion where we will talk it through. will be on a whiteboard and

through. will be on a whiteboard and we're recording every word that is being spoken inside of these meetings. And so

we we chat, we talk it through, we we sometimes even reference the fact that we're being recorded. We'll say like, "Oh, actually make sure to note, you know, that thing." Um, so it's almost like there's like another system in the

room with us. And then what we do is we take that transcription and we put it into a droid. Um, and then the droid builds out a specification for exactly

the like very very precisely exactly what it is that we're trying to build with tradeoffs. And we and we actually

with tradeoffs. And we and we actually have native integrations. This is like one of the lesser advertised features, but you know, the factory platform is not just a CLI tool, right? We have our

CLI which is super capable but we have a web platform that has uh you know remote agents, background agents as well as like this local bridge that lets you use the web platform on your local machine.

Um that web platform is mobile optimized so you can actually access it on the web uh like on your mobile phone. Uh we have Slack and linear integrations so that you can just tag tickets with droids or

send Slack messages tagging the droid.

Um, and and and I think that in in building all of that, what what we did was we would basically go in and uh and and talk through what needed to be built, transcribe it, put it in a droid.

Droid outputs this long form engineering design dock of exactly and precisely what needs to be built. And for the larger changes, we have it build a

phaseby-phase implementation, right?

where each phase is a ticket and then we delegate those individual tickets to droids and they one at a time will go in and actually build these features. And

so the largest feature that uh or the largest like change that has been done with this was literally like I'm not kidding like 12,000 line diff uh when

the final PR was created from all of the sub PRs being like merged together and and it was >> successful like it worked. It's a it is the part of our product that basically

handles the the the the a lot of front-end rendering was not working very well. Um and so we spent this there was

well. Um and so we spent this there was like this hourlong conversation. People

went in the room. Um one of our engineers Alvin has been super hardcore about performance. We talked it we we

about performance. We talked it we we hashed it out and then he sent to like hundreds of sub sub droids basically uh this large scale refactor and it worked.

And that for us was companywide a very mindblowing moment because this is something we have been saying is going to happen and then we sort of stumbled into it happening like we we we weren't

sure if it was going to work exactly the way that we that I just described and then it just kept working and kept working and when we ended we had this massive change and uh and it was successful.

>> Wow. So the name of your actual company I think is called the San Francisco AI factory. Is that what it's called?

AI factory. Is that what it's called?

>> Yeah, the San Francisco AI factory.

>> You can see why and where these guys are going. And like

going. And like I I kind of want to talk a little bit like kind of toward towards the end here. It's like a little bit about your

here. It's like a little bit about your background. Kind of like if you don't

background. Kind of like if you don't know these folks, they are like I mean I'm going to say right here right now.

They're cracked as hell, you know? I

think and they're very humble about it.

And I think that's kind of what what I really appreciate about this whole thing. It's just like this this yearn

thing. It's just like this this yearn this this need to want to solve this deep problem for not only themselves but for the world and then like not be like

hey this is my secrets I can't tell you how my secret sauce works. So just kind of like you know y'all y'all chill out here.

>> It's like the opposite. It's like yo I think we just we have too much work to do. In fact if you're interested come

do. In fact if you're interested come work with us you know like I think you guys are hiring right? So, it's like um you know, for anyone who's really interested and if you're really playing around with these types of systems,

definitely hit them up because they're looking for people who are on the edge.

my folks who are on that billion uh the three comma clubs if you've you probably have been playing around with a lot of these these different agents and know a lot of different techniques and so like tell me a little bit more about kind of

I know your founding background like how this kind of got started as far as an inception and then why you guys chose to stay with your vision uh because I feel like a lot of stuff has really changed

in the industry just over two years I mean just even 6 months is crazy but like two years uh so I kind of want to hear a little bit about your background kind of the evolution from this as well.

>> Yeah, totally. We'd be would be happy to share that and and if you are in the three comma club by the way for prompts for token usage uh please do uh reach out because I' I'd love to to chat with

you. That's exactly the persona that

you. That's exactly the persona that that is great that would work well at factory. Um the the TLDDR for me I

factory. Um the the TLDDR for me I started off um in in in undergrad a lot of the the work that I did was like actually like neuroscience, cognitive

science focused. I was really obsessed

science focused. I was really obsessed with the idea of like machine intelligence, computational models of cognition, like how can we build systems that more closely resemble how humans

think. Um, and

think. Um, and the lab that I was in and my thesis was actually related to deep learning on like EEG activity, fMRI data, all this stuff was very cool ve and very

powerful. And I think that like very

powerful. And I think that like very very long term that's something that I'm always going to be obsessed with is that is that idea of of how can we build systems that more closely resemble

ourselves. Uh but I ended up sort of

ourselves. Uh but I ended up sort of taking this path where I worked at Microsoft for a bit uh which where I saw this huge company dynamic. I saw that's

a true software factory, right? Like

that company ships even though it's it's large companies are not always lauded as the fastest shippers. The reality is is that company is is a behemoth, right?

And and they they build so much software. Um and then I went to hugging

software. Um and then I went to hugging face because my obsession with uh at the time small language models uh as maybe

the path towards this sort of more general machine intelligence uh started to become too strong. So I was like oh I really just need to work directly on this again. Um and hugging face I'm I'm

this again. Um and hugging face I'm I'm very grateful for that opportunity because uh the leadership team there probably through someone who was uh it

was maybe not uh my background at that time was not uh as maybe like flashy as one might need in order to meet with

these fantastic engineering leaders. Um

you know I got to meet with people who were startup founders. I got to meet with like companies like data bricks and bloomberg and grammarly. I was meeting with the people who were deciding their

AI roadmap and figuring out like how do we even train LLMs? What does it mean to train an LLM? Who uh knows how to train?

What is our hiring strategy for people who know how to build these LLMs? And so

I was really in the room with all these enterprise engineering leaders and it became very clear that everyone had one goal which was engineering velocity. But

no one knew how to achieve that. And

their bit they're like the main thought loop was LLMs seem pretty interesting.

Copilot gives you this autocomplete. So

if we fine-tune an LLM on our own codebase then we'll have the best autocomplete. And at the time that

autocomplete. And at the time that actually seemed like pretty like that that seemed to be like what most people were interested in. Um, and meanwhile, like I was talking to the folks at like

Ling Chain and all these people who were interested in in chaining together LM because the term agent wasn't really thrown around. Uh, it was more like uh,

thrown around. Uh, it was more like uh, you know, chaining LLMs or putting them in a while loop and then agent was just starting to sort of like bubble up as a

term for these sorts of LLM based systems. Um, and I managed to to sort of build out some prototypes and then I serendipitously met my co-founder,

Maton. Um, and he and I were, you know,

Maton. Um, and he and I were, you know, he likes to say it's like intellectual love at first sight. We really met each other and I think we realized we had such a similar vision of what had

progressed so far, what was going to change that we met and like seven days later he dropped out of his PhD. I quit

my job. We started factory. We turned

that around and immediately raised uh a very small uh uh like seed round uh in order to to get started with a small team because we were very focused on not

chasing like a bunch of hype, but instead we wanted enough money to be able to take a small team in San Francisco and ruthlessly iterate until we felt like we had a real product. not

like a researchoriented thing but like a real tool that people could use. Um and

so you know seven days uh seven days of chatting and then we we we decided to go allin together.

>> I love this comment uh from Abe. It says

seven days later love at first sight.

>> Yeah. I love it. First insight. I love

that. That's

>> Yeah, first insight. Yeah.

that that is absolutely a amazing founding story because uh I mean to also kind of have that world of experience and then to get that conviction you know

I think when you're in this space you're so like focused and you're like okay this this actually makes sense to me and what do what do you think is going to be

most surprising to the world because as I talk to this to like any other person who's not in the AI space outside of the San Francisco bubble um I don't think they really see what's coming. How would you describe this to

coming. How would you describe this to like your grandparents or someone who's not really into AI and like like either like what you're doing and then how this is really going to impact maybe their lives and whether they realize it or

not.

>> Yeah, totally. We It's funny one of our team members had their uh their one of their grandparents in the office actually uh looking at looking at some of the stuff we were doing. I mean, the

the way that I would say it is what's funny is everybody, no matter who you are, if you have access to a a computer and you're on the internet, uh,

or you or you or you're watching TV, I think that you're going to see so much negativity actually about what AI is going to do in the future. you have like

CEOs and investors sort of shouting from the rooftops things like maybe we'll be able to like you know at some point we're going to build machine God and the

GDP will go to 10 trillion and and and and then you know if you ask them why is that going to happen they're going to be like oh I don't know maybe we'll fire every all human beings and like replace them with robots and then on the other

end you have uh you know people talking about the climate impact and there there's so much negativity uh And then there's also like very almost like what I would call like naive optimism for AI.

And so I think that most people basically see like two straw men arguments for like what AI is going to do. Um and I think that what's what's

do. Um and I think that what's what's what's interesting is like first of all like that concept of like we're just going to go to a 10 trillion and fire everybody and replace them with machines. Like it's just totally

machines. Like it's just totally unrealistic, right? It's something that

unrealistic, right? It's something that you can use to maybe temporarily raise your stock price. Um but but I think that the people who are clearly most on the ground are the people who are like

in this live stream, the people who are using AI coding tools, the people who are because what what you see is that clearly these things make your life easier. They're fun to use. They're

easier. They're fun to use. They're

incredibly interesting. They let you get more done. They let you be more

more done. They let you be more creative. They let you explore ideas

creative. They let you explore ideas faster. Um and at the same time, you're

faster. Um and at the same time, you're piloting them. like you're still in

piloting them. like you're still in charge like you have agency. You have

this ability to build so much more. And

so I think that the thing that I would say to you know let's say my grandparents is you know I think that we're about to enter an explosion of creativity and of software abundance

where really like 50 years ago 60 years ago we were just building the beginning of the internet.

Uh uh you know it'd been around to some extent. uh even like computers in

extent. uh even like computers in general, right, are less than a hundred years old. Uh we are we've just really

years old. Uh we are we've just really entered into the new era of what technology is going to do for us. Uh and

the people who are clearly on the forefront of that era are software developers. And so if you want to

developers. And so if you want to experience what the closest thing to like true artificial like general intelligence is going to feel like, I think you need to be using software

coding tools because then what you see is that it's not scary. It's not like Terminator. What it is is it it means

Terminator. What it is is it it means this huge unlock. I I also want to add, this is where I'm going to go a little like woo woo, a little crazy, but I think you should, you know, think about what you're doing when you interact with

these tools. Like, have you ever chatted

these tools. Like, have you ever chatted with an LLM? Like, have you ever spoke to your coding tools? Uh, have you ever asked the LLM like how does it prefer to be used? Have you interacted with it

be used? Have you interacted with it more as a colleague and less like a a tool? I think that you should you should

tool? I think that you should you should give that a try, too. Uh, you you should ask the system, what can I do better here? Uh, someone was asking me on

here? Uh, someone was asking me on Twitter like, "Do you have any do you have any tips for getting it to like it feels like I get in these doom loops with other tools? Like, how do you get around that?" And I was like, I think

around that?" And I was like, I think you should just ask the system. You are

currently in a doom loop and I don't really know how to get you out of it.

So, do you have any advice on how to get yourself out of it? Right. Uh the OpenAI team told us that GBT5 is like the best metaprompter that they've ever interacted with because GPT5 deeply

understands its own system prompting techniques.

And so you can actually ask it, hey, what's wrong with my prompt right now?

>> Wow, >> that's that's really impressive. I I

want to involve my community at this last part of this live stream because I know a lot of you have been like really kind of streaming in the comments and I really appreciate it. So, any questions you guys have, I'm going to go ahead and review them. And I kind of want to turn

review them. And I kind of want to turn this into like if you've made it this far, we're like 56 minutes in. Uh we we have been cooking and kind of really kind of going over the whole factory

thing. And I'm really proud to have you

thing. And I'm really proud to have you guys uh as my sponsor for this channel.

And it's just been really amazing to kind of play around with the tool. And

like for me, I really believe in this type of future like what you guys are all about, what's actually happening, and the impact that it's going to have on computers going forward. uh which is really interesting and it's like if you haven't had a chance to try it out, you

know, I have the link in the description, you know, um and that'll get you basically two months worth or 40 million uh tokens uh for you to go try out this droid. And if you're not liking it, you could even bring your own key.

There's just so many different techniques we've talked about. And I

kind of want to use this last part to kind of geek out a little bit because you talked about GPT5 metapar prompting and just go just go a little bit deeper beyond the surface as far as like you know I we talked we touched on it at the

very beginning. It's like you know what

very beginning. It's like you know what is the difference between maybe the tool harnessing that's happening in in droid for what you're seeing in GPT5 versus claude and stuff earlier and kind of how

that makes a difference where um you know maybe if specific uh foundational lab is only focused on their own model they're not going to actually see the other holes uh especially if you're doing aentic loops it makes a lot of

sense to have you know these types of considerations for that. So, I want you to go a little bit deeper here like on the GPT5. I know a lot of people are

the GPT5. I know a lot of people are saying it's just kind of at the very beginning they didn't really like it and then now they're saying they really like it and it's better and you talked about this metapar prompting thing like what

are you seeing in GPT5 more specifically that's makes it interesting that can kind of explain the vibes that people are seeing, right?

>> Yeah, totally. Um, you know, personally I I GBT5 codecs is my like daily driver.

Um, GPT5 and GPD5 codecs are actually very similar models. Um, and the the OpenAI team worked really closely with us to make sure that the experience was uh high quality uh in in in in our

product suite with with both of these models. And I think that the the thing

models. And I think that the the thing that's interesting about GBT codeex, it's very clearly like a a a quiet driver of productivity. It it is less

chatty. It likes to interact less

chatty. It likes to interact less outside of its like core goal. It's very

agentic and it does more deeply understand this idea of like specification and checkpointing and I think that those are the things that make it feel really nice. It is also

very nice that it's cheaper than the other models, right? And so I think that like the the the interesting thing about about GBT5 is I honestly think that this is a pattern

that you're going to start to see a lot more frequently. It's and it's going to

more frequently. It's and it's going to condense though the the period of oh this new model came out and then suddenly I think a lot of product teams aren't used to the changes that you have

to make. Earlier in the stream I I said

to make. Earlier in the stream I I said uh for the folks who weren't here if you're using GPT5 then the way that you for example choose to edit code is different from if you're using sonnet.

And so if you're building an agent that uses that a coding agent for example that uses Sonnet and uses GPT5, you need two different tools in order to make

those both work optimally. And I think that what's going to continue to happen is models are going to keep coming out and they are going to have these sorts of nuance. They're going to have

of nuance. They're going to have different personalities. They're going

different personalities. They're going to have different patterns. They're

going to have different tool use specs based on the post training. Um, and

because I think this is actually going to accelerate because most model providers are in the um the the post-training era where they're building interactive reinforcement learning environments, they're getting really specific about how a model's agent

trajectory should look. And so based on how they've chosen to build those things, different tools are going to work better or your tool definition may be dangerously close to one that they've already post-trained in a slightly

different way. So you're going to keep

different way. So you're going to keep trying to like get the model to do it one way and it's going to keep sliding back into its post training where it acts a different way. And so what I

think you're you everybody saw when GBT5 first came out is a bunch of companies are not used to building like that.

They're used to just like we're swapping the model and then they get an instant 50% boost. Um and when you just do that,

50% boost. Um and when you just do that, when you like just swap the model, uh I think what ends up happening is your product just gets a little bit better.

but it also does weirder things and you're not really maximizing the outcome of the model. Um, and so a lot of what our product does is and our scaffold does is it makes sure that it's easy for

us to quickly figure out exactly what the nuances of the model are and then quickly update things like how we inject information, what the tool definitions are, how the prompt caching works to

make sure that we continue to have like the best possible experience with a given model.

I I think that's super duper impressive as far as like the insights here and then kind of, you know, like that type of thing that people are sharing. I know

people are currently asking right now, how do I get the 40 million tokens? I

want to just drop this link on you guys real quick. So, if you haven't had a

real quick. So, if you haven't had a chance to, it's right here. It's

rf.me/Ratefactory.

I noticed that when I type stuff into uh my little program, it doesn't actually put it on X for whatever reason. So, I

know the X folks are currently asking.

So, just go ahead and type in that link there. Uh that'll get you if you just

there. Uh that'll get you if you just click there, it's going to take you right to the sign up. Uh and then when you sign up, it's going to get you boom, you know, 40 million tokens dropped right into your account so that you can start start cooking with this droid because I feel like this is going to be

uh the best way to kind of experience it. Obviously, like learn by doing uh

it. Obviously, like learn by doing uh and kind of approach it in this type of way. And a question I had from Git Max

way. And a question I had from Git Max is basically about interrupting the droid agent for steering. Uh you know, cloud code recommends this sort of practice, but like how does that work with the droid? Because I've noticed that too. I just kind of answer my

that too. I just kind of answer my question and I see it kind of pop up in the middle of its thinking phases and so would love your insight on that.

>> Yeah, totally. So the way that this was a contentious I think that it's honestly it it's still up in the air. We have

both implementations basically feature flagged for this. So I'll tell you how it works today. Um when you send a message uh to a droid uh it cues it for

the next uh iteration of the agent loop.

So, a lot of products wait for the agent loop to fully complete before it then inserts the next message. But what we found is that if you send your message like as it's about to take its next

action, most of what people are actually doing when we when we spoke to users is they're they're trying to get it to switch what it's doing in the middle of its work. So that they might say

its work. So that they might say something like, "Oh, wait, actually, it's not over there. It's over here." or

hey actually like you know uh you know stop editing that file because it's it's not working. I instead want you to edit

not working. I instead want you to edit this other this other file. So that's

how it works today. You send your message, it gets cued and then it gets inserted in the next the immediate next action. But we've also heard from other

action. But we've also heard from other folks that they prefer maybe waiting for the droid to fully complete and then they want to basically cue a net new

task. like for example go do this thing

task. like for example go do this thing and then they want to say oh and I forgot to say this but once you're done write unit tests for it right um so I I know other products work like that but

what we found is that the the the queueing as the next immediate action uh helps you steer the droid like very very quickly um without breaking the prompt caching. Ah, okay. That's that's pretty

caching. Ah, okay. That's that's pretty interesting to know as far as kind of where we're at here. And it's kind of I like how honest you guys are like this is still to be determined, you know, obviously from the based off the

feedback and if people have feedback, I think it hits like slashbug report or something to give some feedback to your team. Is that what it is?

team. Is that what it is?

>> Yeah. Yeah. If you have feedback um slashbug uh helps you, it builds a little um uh example of a bug report that you can share with us. Um, we're

active on email on uh I think the the web app has a built-in support functionality, so you can send support messages there. Please don't spam that.

messages there. Please don't spam that.

But but uh you know, if you have real bugs uh and and honestly like if you follow me on on Twitter or anything like that, I will take if you hunt me down on the street and you find me, I will

happily write a bug report live on the spot with you together. Like I I really want to make sure that this experience is so great and so amazing. And I think that can only happen from the community also sharing the things that they want

to see in it, the bugs they encounter, features. Um, we definitely want to

features. Um, we definitely want to build for you guys.

>> That's that's really amazing. I I

definitely appreciate that. You know,

it's just like you guys are not trying to hide. I think uh Fetty Hustle was

to hide. I think uh Fetty Hustle was asking here about the BMAD method for basically specs and planning. Uh

interesting if you had experience with that, if you know what the differences are or kind of what you know like what does the Droid do? Does it can you use both? I'm just kind of curious.

both? I'm just kind of curious.

>> Yeah, totally. So, in in terms of the spec thing, that's definitely very native. Like you can use specification

native. Like you can use specification mode in our product which basically lets you you can hit shift tab and that will bring you into spec mode. There's

actually a setting I haven't said this on the stream yet. If you go into your settings, there's a setting that lets you save those specs as markdown whenever you accept them. Um, so you can actually bring those specs into the

codebase directly. Um but uh with

codebase directly. Um but uh with respect to the BMAD uh like workflow or method um we do have custom slash commands and so if you wanted to have uh

spec mode for example properly create those um like prd and architecture docs.

Uh what you can do is you can create a custom slash command uh call it slashbad or something like that or slash plan or really anything you'd like. Um and then you can that will prompt the agent to

follow the BMAD method. Um, and so you you you you actually should be able to do both of these things. Uh, uh, and if you've already done this for um, uh, for

cloud code, you can easily import custom commands that you've already created from cloud code. You can import custom sub aents from cloud code so that it's very easy to keep all of that work that you've already done so far.

>> Oh, that's excellent. Yo, that's

amazing. I think that's going to just make it easier for people who are migrating between the the platforms and stuff. And that's definitely really

stuff. And that's definitely really appreciated. Also appreciate that you

appreciated. Also appreciate that you guys are just like pretty much on the feedback. I do see a lot on Twitter.

feedback. I do see a lot on Twitter.

People are responding and they post something. You're like, "Hey, do you

something. You're like, "Hey, do you want to try it? Let me know with your email." Like you guys are super duper

email." Like you guys are super duper active. Uh I find that super impressive

active. Uh I find that super impressive and I'm kind of blown away to be honest.

It's like how do you guys how are you guys like staying you guys for sleep?

You know that type of thing. So

>> yeah. Well, we've got some droids that help us out uh with with finding things and then we always as like human beings respond to people. Um, but but that actually helps.

>> That's amazing. And I think you talked a little bit about this cuz you guys had a really hairy problem in your organization and you literally had a meeting, recorded the transcripts and

just deployed a bunch of factories, uh, you know, agents, factory droids to go like do the task basically. And so are we pretty close to this reality here where we can just say improve my app

based on the customer feedback because you know you have Sentry hooked up and everything and should be able to get that input and all that stuff's in your code right?

>> Yeah. Yeah. I mean, I think that for the folks who are willing to to get a little bit more in the weeds with factory, um, I said this earlier, but you know, we actually have this full platform and so

you can our our personal take is you can do that today if you want to get the end toend loop from customer feedback to code change. Um, you can integrate

code change. Um, you can integrate factory with your Slack and you can integrate it with your linear uh and and so if you use Slack uh where a customer

uh maybe every bug report they file falls into Slack or it autocreates a ticket um or if you even want to automate that process with a Droid, right? you can do that and and then if

right? you can do that and and then if you've integrated this um like what we have is when bug reports come filed in in our in our slack channel where those

get automatically passed um you just tag at droid uh or you would install it as at factory uh because that's what it is on the public slack but um this is all

in our platform by the way so you just go to settings install slack um and then you tag the ticket the droid will grab all of the context from that Slack

thread. So, even if there's a

thread. So, even if there's a conversation inside the Slack thread or you link a website or you link maybe even other tickets, Droid will grab all of those different tickets and context.

Um, and then it will build out that feature and send it over in in uh in Slack with a pull request. So, so we're sort of there at this moment of you can

go fully end to end without touching a line of code uh from customer feedback.

>> That's so crazy. I think that makes sense though because that's your vision from the very beginning and that you've been building that out and that's actually possible today which is super

duper nice that this type of integration can kind of you know that people can go ahead and try it out. I like this comment here that says I'm not in the three comma club but I've been through a few hundred million tokens. Uh thanks

for the 40 million. This is much appreciated. For sure. For sure. So

appreciated. For sure. For sure. So

>> glad you liked it.

>> Yeah. Also, for those who are wondering on Twitter what the handle is, at GitMax dropped it here below. It's enor.

So, yeah, go ahead and hit them up. I

think there's like the factory team in general. If uh you even just tag their

general. If uh you even just tag their factory AI um on X, they're extremely responsive and they'll definitely get back to and so forth like that. I know

some other folks who are having some issues with the 30-day free like they'll see 30 days in their account. Um and I'm not sure if like the the the 40 million credits will continue to roll over because I think in the UI it just shows

30 days for the billing. Um, so,

>> oh, >> it might just be like Yeah.

>> Yeah. No, that's that that is that is very likely a UI bug. Um, and so uh I'll literally send off a droid immediately after this uh to go and fix that. But,

uh, in the meantime, uh, please, it is 60 days, uh, in the for the referral code. Um, and if anybody ever has any

code. Um, and if anybody ever has any issues with the with these, one thing that maybe is worth shouting out, actually, I I sort of forgot about this.

Um, we have a open sour uh an open source GitHub repository. Um, if you go to github.comfactory-aifactory

to github.comfactory-aifactory um there's a community forum there, our discussion board. Um, you can uh make

discussion board. Um, you can uh make pull requests to the docs that we have public docs and we have cookbooks and other examples. You can also open up

other examples. You can also open up issues if you have any issues with the product. Um, so that might be a good

product. Um, so that might be a good place as well to to find other people in the community. And uh, Ben Tossel, our

the community. And uh, Ben Tossel, our head of developer relations, manages that community.

>> And then the factory slash I think factory-ai slash uh, what was the slash command?

>> Um, uh, it's just that first one, factory.

>> Okay. Oh, okay. Got it. Ah, got it.

Cool.

>> Um, yeah, that first repo in the upper left corner.

>> Perfect. Awesome.

I'll drop this in the chat. And so this is where they can submit issues and so forth. Get into the discussions uh and

forth. Get into the discussions uh and all the various things here. That's

awesome. Oh, and this has a the website.

Okay, cool. I'm going to drop this in the chat so that that way people have this and so forth.

>> Awesome. Dang. Yeah, this is super duper cool. And I think a lot of people um are

cool. And I think a lot of people um are if you haven't had a chance to try the factory stuff, definitely go ahead and try it out. Like I said, I have that link there. If you're watching on X, uh

link there. If you're watching on X, uh I dropped it earlier. um you know rf.me/Rayfactory

rf.me/Rayfactory and then that's where you can kind of get started with the trial uh to get that going. Um you know this has been an

that going. Um you know this has been an amazing conversation and we've been cooking for about like an hour and stuff and I appreciate you you taking the time to just get on the show and kind of get

here with us and like show you guys' like level of commitment and kind of be a part of this what I call like the earliest movement we have for AI coding.

And historically, I think I've I think this is actually something that's really going down in history of like all of us understanding that this is the time and place that change is really happening

and that even if we're just getting started today, like just prompting with the models like you said, how important is just to have those conversations is going to teach you a lot. You're going

to break through all the noise from obviously the external world where everyone's saying, you know, doom this or whatever. And I think, you know,

or whatever. And I think, you know, software is the greatest leverage right now. And now you're having pieces of

now. And now you're having pieces of software that helps you get your ideas out and you can learn to be, you know, a better, you know, orchestrator and all these various things. So I'm extremely

excited that you guys, you know, have are sponsoring the show obviously and are giving out this like, you know, 40 million tokens for my audience for everyone to kind of go ahead and try out. And for those who don't know, uh,

out. And for those who don't know, uh, you can pay 20 bucks a month, you know, and you'll get, you know, for one month you'll get 20 million tokens or you can bring your own keys as well. And the

Droid CLI is like the latest thing that they've released. And you know, it's not

they've released. And you know, it's not like every it's it may look like every other CLI, but it actually does a lot more behind the hood for shipping features. And for me, I feel like you

features. And for me, I feel like you could just throw the kitchen sink at it and it will really really cook for for you and everyone. Um, if you haven't had a chance to try, go ahead and check that out. U any closing thoughts? Any things

out. U any closing thoughts? Any things

you want to kind of leave the audience with that they should be doing or anything you want to share on your behalf?

Yeah, I mean I think the biggest thing is just you know we are coming into this space and obviously you guys uh as developers have so many options and

tools available to you like you you and it's a really it's an awesome time to be a dev right like you you can pick up there's a new tool every other week um and I also understand though that that can lead to this feeling of like you

know I'm sort of overwhelmed by there being hundreds of different options you know our goal first and foremost most is to just come in here and say we are

honestly we're a team of 35 uh people in San Francisco that really just want to build something that makes you guys significantly more productive that makes

you have fun, something that you can use and that helps you be creative and we're going to keep working extremely hard to make that happen for you. Um, and so

besides any given individual feature or any individual component that you know makes it better or worse than other tools, I think that one thing that we'd love to be as a company is the one that

really speaks truth about what we're doing, which is working really hard, trying really hard, building something that we think is going to change your workflow, um, and listening really

intently to what you guys need. Um, and

so please do blow our emails up, blow our channels up, uh, for with feedback, um, with nits. Every bug you find helps

us improve. So, don't be afraid to give

us improve. So, don't be afraid to give it a try and say, "This thing sucks."

But I'd just love for you to explain why it sucked so that we can make it better for you.

>> Excellent. Excellent. you know, thank you so much for hopping on the stream uh for kind of being this open really uh for leading up this charge for this like monumental task that you guys have taken

up. I I sincerely believe in the future

up. I I sincerely believe in the future and this is kind of why it's really important that you know I want to have this partnership sponsorship with you guys. Uh it really does mean a lot to

guys. Uh it really does mean a lot to me. It means a lot to my community as

me. It means a lot to my community as well that you know you guys are on here.

Um, you're actually really my first official sponsor because I deeply believe in what you guys are doing and uh I want to continue to see this grow and I I want everyone to experience it

and just also the big thank you for, you know, not only extending the 20 million tokens, but like adding, you know, doubling that so people can get that 40 million to really give this thing a spin and kind of try to throw a lot of stuff

at it. And I think the bigger part of it

at it. And I think the bigger part of it is like your responsibility on the other half, right? like like I would be so

half, right? like like I would be so afraid if like you guys just were not responsive or it was just kind of like ah you know like I I don't like to recommend things where people don't get

the other half of like my own belief system right like I deeply believe in good customer service I deeply believe in these small attention to details you know I deeply believe actually in

agentic coding being the future not that we are as individuals are going to get replaced as coders but more more than ever there's going to be more demand for more people like you if you're designer, if you're a developer, if you're a

person with ideas, making them come to life, but then where you're going with this product pipeline in the future like you're saying for these other planning things, other suite workflows that are,

you know, senior engineers have that now if you're a junior, you can actually be as powerful as a senior engineer and maybe get those opportunities that maybe they're currently missing right now or

to go build that next startup or to do whatever dreams they want to chase. So

deeply enthralled uh super excited uh about everything you guys are doing there at Factory AI. So Eno, thank you so much for coming on the show. Uh also

thank you to the community of people who have been able to support here as members and uh big shout out to new subscribers, everyone on X on YouTube.

Thank you so much for supporting the stream and we'll be having some more streams soon and you'll see me cooking throughout the week uh checking out Factory AI and you know trying to share some of these techniques back. So, if

you haven't had a chance to, make sure you subscribe. Obviously, thumbs up for

you subscribe. Obviously, thumbs up for the algo because we like to do that type of thing around here. And uh we'll see you guys next time. Thank you.

>> Thanks so much.

>> All right. Peace out.

Loading...

Loading video analysis...