LongCut logo

Self Coding Agents — Colin Flaherty, Augment Code

By AI Engineer

Summary

Topics Covered

  • AI Agents Self-Build 90% of Codebase
  • Agents Optimize Their Own Performance
  • Context Engine Powers Agent Success
  • Onboard Agents Like New Hires
  • Tests Unlock 20% Agent Autonomy

Full Transcript

[Music] hi everyone thanks for coming today so I

want to talk to you about something that sounds like science fiction but very much is reality an AI coding agent that helped build itself my name is Colin I'm an AI researcher at augment code a

company building AI power Dev tools for software engineering orgs and I want to share with you a little bit about my our journey working on AI coding

agents so zooming out AI Dev tools is a fast changing space everyone remembers in 2023 we're all talking about autocomplete models GitHub co-pilot

being the one that probably really comes to mind in 2024 chat models really started to penetrate software engineering Orcs in 2025 though we think AI agents are

going to dominate the conversation about how software engineering is changing so naturally a few months ago we started building our own agent at

augment I want to show you a sneak peek of what we built and share some hard-learned lessons about how this Tech works and I just want to you know reiterate I've been really amazed to see the extent to which this agent has

helped build itself uh I'll not a kind of fun statistic so we have about 20,000 thousand lines of codee in our uh agent code base and over 90% of that was

written by our agent with with human supervision so what does it mean for the agent to write itself implementing core features so one of the first things we had we had to add was third-party

integration so our agent you know if it's going to work like a software engineer it needs to interact with slack linear jira notion search Google um muck

around in your code base and so we wanted to have the agent help us build these features uh we had it we found after we added the first few ourselves when we add asked it to you know gave it an

instruction like add a Google search integration it was able to go look in our code base for the right file to add it in uh figure out the right interface to use and go add it uh one kind of fun

an anecdote is when we were adding the linear integration uh it didn't know the linear API docs the foundation model we're using uh didn't have those memorized and so it used the Google

search integration which it had written previously to go look up the linear API docs and then was able to add that uh we used it to write tests so we

found if we asked it something like add unit tests for the Google search integration it was able to go add those uh in order to do this we just had to give it some basic Process Management

tools things like running a subprocess interacting with it uh not hanging if there's an infinite Loop in some test it wrote and and reading

output um I think this is super interesting so everyone's seen the Twitter demos of these agents writing features and writing tests but I haven't yet seen a compelling example of them

performing some kind of optimization well over the course of our project we KN noticed the agent was pretty slow and we weren't sure why so we asked it to profile itself and what it ended up doing using all these tools

we'd given it was add some print statements to its own code base run essentially sub copies of itself looked through these print statements and it figured out there was a part of our code base where we were loading up all the

files in the users's repository uh synchronously and hashing them synchronously and then it added a process pool for these to speed it up and a stress test to confirm it was all

working and by the end of this we reached about 20,000 lines of code and again over 90% of that was written by the agent with with our help in

supervision so let's walk through a quick example a couple quick examples to see how the agent works I focus on simple examples where it's reliable so you can uh follow

along easily so here I asked the agent are you able to search Google and then it notes that it found a tool called Google search for those who aren't familiar with the notion of tools I'm sure most

of you are but I'll just kind of quickly reiterate the idea is we have this kind of Master Level agent that's doing all the planning and it has access to certain tools that it can use to interact with it its environment with

that's the third party Integrations I talked about like Google or it's editing a file in the user's repository and then it wants to confirm this that this Google Search tool is working so it

sends a query to it of tests and the agent uh uh responds to us yes I can search Google and I see the first 10 results let's try something a little bit

more complicated I ask it instrument agent's Google Search tool with logs and then generate an example then it uses our retrieval tool which is you know allows to search uh the local codebase

and it's looking for a file related to Google search Integrations it finds this file deep in our directory hierarchy at Services integration thirdparty gooogle

search to.py and then it calls its file

search to.py and then it calls its file editing tool to quickly and performant edit that file to add those print statements uh this is a continuation of

the last example so it added those print statements and now it wants to run a a sub copy of itself so it can look at the output of those print statements uh because we asked it for example logs uh

but in doing so it finds that we don't have Google uh credentials authorized so it uses its clarify tool to ask for clarification from the user it asks I don't see Google credentials would you

like me to one add stub for Google API or to guide you through setting up credentials I note that the credentials are actually stored in augment gooogle

api. Json it had just missed this and

api. Json it had just missed this and then here's a a really cool extra feature we have which is we want the agent to continuously learn as it interacts with humans and so here it

thought well it's probably a good idea to remember where the Google credentials are stored so it called this memory tool to create a memory of the where where the Google credentials are stored to save that for later this is another

example if you have that really good context engine uh it's really critical to getting the agent to to work well and so now we get our output so it prints out these logs that it searched

with an example string p programming language and it gives some uh uh example URLs that were returned by Google python.org and Wikipedia.org so we have the agent add logs to itself run itself

learn from user feedback and it used all kinds of tools Google search codebase retrieval file editing clarification from the user and and memorizing uh useful

learnings so let's fast forward and talk through some of our lessons building this uh I just want to know you know we've been working on AI coding tools for a couple years now and we didn't set

out to build agents we've worked on things like completion models and Chad and so forth but our Focus the whole time was around building a super powerful scalable Enterprise ready

context engine because we knew no matter what no matter how good these llms get you're going to need that context and we also thought a lot about how do you build great uiux so AI can seamlessly um

interoperate with humans it turns out this context ENT and all these thoughts around design provided a great foundation for us to quickly build this agent in just a couple months the three most important things were that access

to context that context Engine with all those different types of context sources whether it's slack or the codebase the reasoning capabilities from a best-in-class uh Foundation model and

that code execution environment so you can safely run uh commands in a uh customer's uh environment so let's talk through a

couple assumptions that we frequently fall into we we have frequently fallen into and remedied and and some of you might encounter as well uh so the first one is that you

know L5 agents are here the senior soft agents are at senior software engineering level if you look at the Twitter demos it oftentimes can seem like this you have an agent write an

entire website all on its own in reality professional software engineering is rarely zero to one and the environments that we're coding in are are a lot messier than what those

demos uh show you as a result these you know aren't quite there yet but they're still super useful um the way one framework I've seen people think through when they're

trying to figure out you know how to use these agents and how to build them is they think agents will take over entire categories of tasks so first you build an agent that will uh solve backend programming and then you build an agent

focused on front- end and maybe one focused on testing in reality this technology is very general purpose and so instead of thinking about categories of tasks we found it more helpful to

Think Through levels of complexity so our agents you know kind of good decently good at tasks across front end backend security and so forth and we're we're improving the capability level

along all those fronts at once because again it's a very general purpose technology um we've also seen people anthropomorphize agents so they think they're just like human software

engineers and they map the characteristics of a weak software engineer to what they think a weak agent would look like and vice versa for strengths as well in reality agents have

different strengths and weaknesses than humans and so you may have an agent that can't do math but it can Implement a whole front- end feature way faster than any human could and it's important that

we keep this in mind uh let's talk through a couple Reflections and lessons uh so here I ask aaman can you create a stack of two PRS for the new

reasoning module using graphite unfortunately so graphite is a Version Control tool for working with Git you can like stack PRS it makes it a lot easier to review unfortunately

Foundation models have not memorized how graphite works so our general agent responds I don't know what graphite is so I'll use git and then it calls our terminal tool to run a command running get checkout well what do we do here we

wanted it to use graphite we can't necessarily go tell open AI or anthropic to retrain the model understand graphite overnight so what we came up with this notion of a knowledge base which is

essentially a set of a s a set of information that we want the agent to understand that it currently doesn't we can kind of patch holes um one thing we wanted to add to it was this graphite

knowledge so we created this markdown file describing graphite how to you know run common commands things like how to create a PR use GT create some things not to do um we created other files in

our knowledge base for things like details on our tool stack how to run tests the style guide and then we add this into the context for the agent so it can dynamically go searching this um

knowledge base when it doesn't understand something and uh once we added this then you know we go ask it can you create a stack of two PRS for the new reasoning module using graphite

and it calls that Knowledge Graph reads about graphi and then can run the GT create command so what's the learning here well onboarding the agent to your organization is crucial the analogy I

like to think about is if you just hired a new a new hire software engineer you wouldn't go tell them to just stare at the code base for 3 Days to figure out how your Tech stack Works you'd let them ask you questions maybe there's some

things they didn't understand and you add some additional documents to your notion uh we should think similarly about agents uh recall I was talking about how we had all these uh thirdparty

Integrations we added whether it's linear tools or slack tools and so forth when we when we were working on these we weren't really sure of which ones to prioritize and start with on our product road map and in a normal World we'd make

some educated guesses we'd Implement a couple of them and go from there but with the agents we were able to iterate them um uh build them all at once and so this starts to change the calculus

around how product management works uh if you can build everything at once well then maybe um maybe uh engineering hours aren't the bottleneck on what we build

and it starts to uh we start to be bottleneck a little bit more on good product insights and good design so when code is cheap you you can explore more

ideas uh also recall earlier we were talking through this example of you instrumenting the agent Google Search tool with logs uh and it was able to go find the file to edit notice here how we didn't have to give a very precise

instruction to the model we just told it in natural language like how we talk to another engineer to instrument the agent's Google Search tool and I was able to go figure out the file to edit this only worked because we have that

really good uh codebase awareness um we can also use the agent for tasks outside of writing code but still within the software development life cycle so here we asked it to look

at the latest PRS uh in our codebase and generated an announcement on them and then we posted it to slack and so uh it was titled new tools for this uh CLI agent and we talked about some things

around slack notifications and linear uh linear Integrations this only works because we had that slack integration and and understood our code base well this uh figure may look familiar from

the beginning of the talk um we actually had the agent make this as well so we asked it make me a plot of the interactive agents line of code as a function of the

date um and so good context is critical in all three of these tasks we needed to pull in some different context from some different sources and it's not just the codebase context comes in many forms and

also note that it's multiplicative so having access to the codebase and having access to slack is Forex is useful as just having access to one of

those finally I want to uh switch over and talk about uh testing so uh here's a really a hard to test Edge case in our code the agent

actually wrote this um and we only caught it because of some unexpected runtime Behavior so we have these caches that the agents store relevant information for their runs in uh we can

run multiple agents in parallel and they all write to the same cache location and the agent wrote this save function to save to that location and I had this

lock around the Json dump so there were no race conditions that would explicitly fail if you had multiple agents all right to this cach at the same time but

notice here how there's no read before writing to the cache and as a result you could hit a race condition where if multiple agents are running in parallel they're all overriding each other's

caches and when the agent wrote this save function why did it Miss this issue well these agents make mistakes and this is a hardto test situation there's some

parallel programming there's a cache involved and so we didn't have a test and because we didn't have a test the agent messed up my learning here is we need to be very careful about having

sufficient tests um we have this pretty incredible statistic so we have a internal bug fix bug fixing Benchmark uh we found when we

upgraded our foundation model by about 6 months our score in this Benchmark improved by 4% but when we added the uh ability to run tests so the agent could

suggest a fix for the bugs run tests look at the feedback suggest another fix run test and do that four times that led to a 20% gain on this Benchmark so what's the lesson well

better tests enable more autonomy you can trust these agents more and it just makes them smarter what is software engineering look like in a world of Agents well

agents didn't work last year but now we're pretty good if you'd ask me two years ago if we'd be working on this Tech I frankly wouldn't have guessed it there's a compounding effect where these

agents are staring starting to help build themselves and that's only going to accelerate the pace at which they improve code isn't going away because it's a spec of our systems but our

relationships to it is changing good test harnesses are becoming more important than ever and we need to be especially careful about those parts of our code bases that tend to be less well

tested and the calculus of product development is changing if code becomes super cheap to write then the focus our focus is more on good product work Gathering customer feedback quickly

building insights we're really excited for how this Tech's going to positively transform our industry and we'll be releasing our agents soon so I'm really excited to share that with you uh find

me after the talk if you want to discuss anymore thanks

[Applause]

[Music]

Loading...

Loading video analysis...