Karpathy's "autoresearch" broke the internet
By Greg Isenberg
Summary
Topics Covered
- AI Intern Runs Model Optimization Overnight
- Niche Agents Automate Painful Experiments
- AB Testing Evolves Into AI Optimization Loops
- Research Loops Power Market Intelligence Services
- Auto Research Transforms Clinical Trials
Full Transcript
Andre Carpathy, I mean, one of the godfathers of AI, has just launched something called auto research. And auto
research is a huge deal, and it's going viral on Twitter. And I just wanted to do an episode where I can explain to you in the clearest way possible what it is, what are the use cases, how to make
money from it, how to be more productive with it, how to create impact with it.
And by the end of this episode, I'm going to give you a bunch of different ideas, use cases for how to use auto research. I'm going to explain it to you
research. I'm going to explain it to you in the most clear way possible and at the end I'm going to tell you how you can actually get started with it. Um, so
let's go right into it.
>> So what is auto research? Well, it's
like having a super nerd robot intern that runs science experiments on a AI models for you all night without you doing the boring stuff. I mean, sounds intriguing, right? So how do you
intriguing, right? So how do you actually you know program it or get started with it? Well, the first thing is you got to give it a goal. So you can say something like make this small AI
model smarter. That's the goal. And then
model smarter. That's the goal. And then
an AI agent will actually plan what to do like different settings, code changes, edits the Python code for you, runs a short training experiment on a GPU
um for about 5 minutes. it reads the results it and then it decides what to change next and to repeats the loop. So,
in some ways, you know, if you've seen my video on the Ralph loop, where it basically would do engineering 24/7 and you'd wake up to new stuff happening, in
simplest terms, that's what auto research is helping you. You know, it do. You give it a goal, the AI agent
do. You give it a goal, the AI agent does a thing. You know, you tell the AI what better means. uh cheaper leads, more clicks, higher sales, better model school, and then the AI keeps changing
things, testing them, and it only saves the changes that improve. So, what's
really cool about it is you wake up, you grab the best version, and then hopefully you turn it into something you charge for or, you know, you give it away. I saw this tweet by Toby, who's
away. I saw this tweet by Toby, who's the uh CEO and co-founder of Shopify.
Auto research works even better for optimizing any piece of software. make
an auto folder. Add a program MD that's just a markdown file which is really the foundation of what you know how you're going to be using auto research and a bench script make a branch and let it
rip. So that's why I started paying
rip. So that's why I started paying attention to auto research right when Andre Carpathy Legend and Toby and and and more people you know started playing with it I'm like okay I got to pay
attention. So I created this little
attention. So I created this little visual for for how to think about what auto research is. So you set the goal.
Uh the the AI plans an experiment. It
edits and trains the code and settings.
It runs a short training on a GPU. By
the way, this is an important I should I should mention that you need a uh a Nvidia chip to actually run auto
research or you can do it in the cloud.
I'll talk about this at the end of the episode, but you you know you do need that. You can't just run it on, let's
that. You can't just run it on, let's say you have a MacBook M1 or something like that. It reads metrics. It says, is
like that. It reads metrics. It says, is it a better result? If it's if it's not, it's going to log the attempt and it's going to discard the config. If it's
yes, it saves it to the config. Um, and
then just plans a different experiment and it just, you know, hopefully gets better on your goal, whatever it is. So,
um let's uh let's get into um we're going to get into some of the ideas, business ideas around it. But
right before that, I just want to say here's a simple mental model for how I'm thinking about uh auto uh auto research.
So, imagine you have a research boss you can boss around. Number one, you write, you know, a clear task. So, for code experiments, maybe it's improve this model test score. for business. Figure
out the top five competitors for product XYZ and make a short report. Step two is you give the uh you give the bot um you know access to the code, a GPU for ML
experiments. You obviously need to give
experiments. You obviously need to give it access to the internet and documents.
If you're doing reading task, the bot then runs a loop. So it it plans, it acts meaning it might run code or search, it reads results, it updates the
plan and then you just come back later you know uh it could be 12 hours, 20 hours, 6 hours and you see if it's logged everything charts and metrics and then it gives you a written sum summary
in normal language. So you know think of auto research as a research bot that runs experiments for you while you sleep tries lots of ideas fast and keeps the winners. Quick break to invite you to
winners. Quick break to invite you to something. Now this isn't an ad. I just
something. Now this isn't an ad. I just
want to invite you to a free event because I think that you're going to get a lot out of it. I wanted to take 1 hour of time where we just talk about building businesses in the age of AI.
People say SAS is dying. I actually
believe the quite opposite. I think that SAS is just evolving. I think right now is an incredible time to be building software startups that help you craft your dream life. And for all those
reasons, I'm said I said, let's just book 1 hour of time. It's going to be 11 a.m. March 12th. That's a Thursday,
a.m. March 12th. That's a Thursday, where we can go and lock in and just talk about building businesses in the Ajai. I'll include a link in the
Ajai. I'll include a link in the description in the show notes to join and I can't wait to see you there. Okay,
how do we use it? Here are some ideas for you. So, the first idea for you I
for you. So, the first idea for you I have is a niche agent in a box. You
know, products. This can be multiple products. And by the way, I put out
products. And by the way, I put out these ideas. I want you to do these
these ideas. I want you to do these ideas. I think that you know even if
ideas. I think that you know even if they don't turn into businesses you will learn about these tools and that is going to help you outperform 99.9% of
people on this planet. So uh you package tiny auto research loops tuned for one painful niche. So the example I think of
painful niche. So the example I think of is an Amazon listing experimentter an email sequence tuner for real re realtor uh a pricing optimizer for SAS. Those
are, you know, auto research loops and an and ideally in a na a niche that you understand well and then you charge a monthly fee. So the value prop is this
monthly fee. So the value prop is this thing runs experiments for you 24/7 and just show shows you the winner to click accept. How valuable is that? And how
accept. How valuable is that? And how
many different niches are there that you know this plays into? The hard part is figuring out what the what what's the pain points and then and then obviously you know you want to be quick quick to market, right? So, here's a visual of
market, right? So, here's a visual of it. Pick the painful niche. Design the
it. Pick the painful niche. Design the
tiny auto research loop. Run experiments
automatically. See which setup works best. Turn best setup to a simple agent
best. Turn best setup to a simple agent product. And then you charge that
product. And then you charge that monthly subscription. Number two, you're
monthly subscription. Number two, you're going to want to, you know, here's an idea. Print money using an AB testing
idea. Print money using an AB testing for marketing. So this is it's it's very
for marketing. So this is it's it's very similar um but instead of you know uh instead of uh you know doing it for realtors or whatever you're doing it for
ads and landing page experiments. So
landing pages so the agent writes varants of headlines layouts and offers pushing them to traffic measures which one converts better and keeps iterating.
So this is like conversion rate optimization around landing pages. you
know the old think of you know tools like optimizely that's a SAS tool that you know when I first moved to San Francisco I remember how big they were and everyone was talking about optimizely and AB testing and it's like
well this is the future of that uh auto research for different landing pages you can also do use auto research for something like ads which auto test creatives it auto tests angles and
audiences and then it keeps the combo uh combos that lower CAC or raise rowaz so you know you profit by running this for your own pro products like if you if you want to build your own products and just
use this internally that works or you know offering an always on experiment engine to clients as a retainer service for 5k a month I'm going to you know give you the best landing pages every
single month and it's just going to come to your inbox that sort of thing visual of it business goals uh you know the goal that you're giving the auto research is more sales it's generating
things like pages and ad versions sending traffic to the versions, measuring conversion and revenue. Um,
does any version beat the current best?
Um, you know, if it doesn't, then you're going to keep the current control, but if it does, you know, you're promoting the winner to a new control, and it you're asking for the AI for new ideas.
All right. Hope hope your creative juices are are starting to get flowing.
You're starting to understand a little bit more about how how it's working, how you think about goals, how you can think about agents, and how you can set up these loops. Number three, research as a
these loops. Number three, research as a service. So auto research's recipe is
service. So auto research's recipe is basically a loop for doing research, right? Cuz you're searching, reading,
right? Cuz you're searching, reading, summarizing, and you're comparing, and then you're repeating. So how do you point that at money problems like market and competitor research for startups? So
constantly updated reports on who's doing what? Pricing, features, and gaps,
doing what? Pricing, features, and gaps, super valuable. investor and M&A decks,
super valuable. investor and M&A decks, fast technical and market due diligence summaries, super valuable compli um compliance and regulation tracking for niches, I don't know, crypto,
healthcare, finance, super valuable. So,
you can charge per report like a one-off or you can set up like a monthly subscription for always fresh dashboards. So visual uh define client
dashboards. So visual uh define client research question, auto research searches and reads, um summarize and compare findings, creates reports and dashboards, deliver insight to client
and the client pays per report or monthly, whatever you decide. Number
four, uh power tool inside your own product. So if you already have built a
product. So if you already have built a SAS or workflow, embed an auto research style agent so your users can press optimize just like a big I envision like a big button that just says optimize and
the system runs a mini research loop for them. So for example, tune prompts, pick
them. So for example, tune prompts, pick best pricing, rank suppliers, then you can charge higher tiers for this feature or you can use it as a wedge to upsell pro and enterprise brands. So maybe you
uh maybe that's a part of proin enterprise. Maybe it's something that
enterprise. Maybe it's something that you just send an email to, you know, your entire list and you're like, "Hey, you know, we have this really powerful tool. Imagine you press this button.
tool. Imagine you press this button.
It's like it's like bending spoons, right? It's like bending spoons. Like
right? It's like bending spoons. Like
how is this um bending spoons, not the private equity group I'm talking about.
I'm like the idea of you can bend a spoon, right? It's incredible that you'd
spoon, right? It's incredible that you'd be able to optimize, press a button, and this would happen." So visual over here have an existing SAS. Add an optimize button. Users run many research loops.
button. Users run many research loops.
Tools suggest better settings or prices.
Users see better results. Offer higher
price proprans and enterprise plans.
Number five. This is a saucy episode by the way. This is saucy. All right.
the way. This is saucy. All right.
Agency that sells. We run more tests than anyone else. Because Auto Research lets you run hundreds of experiments instead of a few. You have a simple pitch. We do a 100 times more testing
pitch. We do a 100 times more testing than other shops for the same or lower fee. A niche example, a Shopify store
fee. A niche example, a Shopify store conversion lab, B2B SAS pricing experiment service, email subject line and sequence optimizer. You charge per month and a bonus if you hit specific
KPI lifts, revshare performance fee.
People love that. You know, of course, they're going to be, you know, interested in Yeah. If you can do if you can lift this KPI, we'll give you some bonus. So, here's the uh the visual.
bonus. So, here's the uh the visual.
Start an optimization agency. Use auto
research to run many tests. Improve
stores pricing, emails, and funnels.
Show clients more experiments and wins.
Charge monthly retainer and performance fee.
Number six, and we've got about uh 10.
Yeah. So, we're almost almost done. And
then after we're going to talk about um just some cool interesting, you know, stories around auto research. And then
I'll end with uh you know how you can set this up you know very briefly. So
autoquant for trading ideas. So you can use auto research to run small fast uh back tests of many simple trading rules.
So LLM base factor screens sentiment filters on one GPU overnight. So you can keep the few strategies that look promising then either trade on your own account or sell signals and strategy
reports. So depends if you're a trader,
reports. So depends if you're a trader, maybe you're doing yourself. Um or yeah, you can just, you know, sell this as a digital product or Yeah. Yeah. Yeah.
basically a digital product. So you
define the simple trading rules, you run many back tests overnight, you review the strategy performance, you keep only promising strategies, trade your own capital, or you can sell the signals. I
think finance is changing a lot. Um, and
I think with things like auto research, uh, you know, it just it's it's going to be an unfair advantage for a lot of people. Um,
so I think you're going to see a lot more digital products uh that people sell and also, you know, just using their own money, trading themselves
instead of giving uh 1% or whatever to a financial adviser.
Um, I'm sure also, by the way, pe a lot of people are going to get burned by this, too. Like, they're not they're
this, too. Like, they're not they're just going to blindly just trust an auto research. You need to have a human in in
research. You need to have a human in in the loop and you need to manage that obviously accordingly. But, yeah, you
obviously accordingly. But, yeah, you can just see. Yeah, there's definitely going to be some people are going to get burnt. You just give the entire um
burnt. You just give the entire um they're just like give a bank account and just let auto research just trade for it. I mean, would be interesting. It
for it. I mean, would be interesting. It
would be an interesting test, that's for sure. Number seven, always on lead
sure. Number seven, always on lead qualification and follow-up. Point an
auto research style agent at your CRM, so like a Salesforce or something like that, and inbound leads. Let it test rules and messages to see which leads are most likely to buy, right? It
autogrades the leads, suggests next actions, and drafts follow-up. So sales
people only focus on high value deals.
So it's more revenue per hour spent.
Visual over here for you. Connect to
CRM, you know, auto research, test the leads, rank leads by likelihood to buy, draft follow-up messages, sales focus on best leads, revenue per sale increases.
Eight, finance ops, autopilot for businesses. Use the loop to grind
businesses. Use the loop to grind through invoice matching, expense report generation, and exception detection with continuous small improvements to rule and prompts. You can sell this as we cut
and prompts. You can sell this as we cut your AP expense time in half either as software or as an op service with a small team and agent. By the way, I can totally see someone like someone
starting this and this gets acquired by one of the large fintech companies or one of the large banks. Uh so visual uh here ingest invoices and expenses. The
auto research improves rules and prompts, matches invoice and detects exceptions. It generates clean expense
exceptions. It generates clean expense reports. reduces manual finance work and
reports. reduces manual finance work and then you can sell it as a software or op service or you start maybe you start as ops service and then uh you kind of evolve into the software. Two more for
you. Number nine, an internal
you. Number nine, an internal productivity lab for your own org. I
thought this was interesting. So treat
your company like Carpathy's GPU lab.
Define KPIs. So like response time, close rate, ticket resolution, and let agents iterate on workflows and templates and routing rules. So you just get fewer meetings, less manual grunt
work, and then you personally touch only the high impact decisions when everyone else rides the improved process. So the
goal here is defining the key metrics.
Auto research is testing the new workflows. It's improving templates and
workflows. It's improving templates and writing rules. You're cutting meetings
writing rules. You're cutting meetings and manual tasks. That's good. Team
focuses on high impact work. And then
higher productivity and ideally higher profit. Last idea for you, done for you
profit. Last idea for you, done for you research or due diligence shop. So you
use the research loop to chew through docs, filings, product pages, and reviews and keep an evolving living memo for clients like investors, acquirers, execs. You make money by selling fast,
execs. You make money by selling fast, well ststructured briefs, and a monthly uh update packs instead of one-off manual research logs. Um so uh you know
the the goal get investor or acquire question. This happens all the time.
question. This happens all the time.
Auto research reads through docs and filings. It summarizes that product
filings. It summarizes that product markets and risks. It maintains a living memo for the client. It delivers the brief and updates packs and the client pays for reports and ongoing access. Um,
I would pay for something like this. Um,
so hopefully someone builds it. All
right, so those are a bunch of ideas for you. I also saw a couple interesting
you. I also saw a couple interesting things this morning. uh my good friend Morgan Linton uh who's you know been on the pod before he says I woke up this morning and all I can think about is
auto research so many idea ideas ideas ideas ideas ideas ideas ideas ideas ideas swirling around in my head not sure 99% of the world realized the incredible breakthroughs carpathy is making and just sharing casually on X
right now where my mind is going is medicine it feels like in many ways clinical trial design is itself kind of like a hyperparameter search I know
right Now trials cost tens of millions of dollars minimum. It feels like an agent swarm could optimize treatment protocols on small proxy experiments, promote the most promising candidates
and then move to humans to review. So
humans still very much in the loop but later on and experimentation going much deeper happening faster and for far less money. I think for me while I'm not a
money. I think for me while I'm not a doctor, he's an engineer. What I'm the most excited about when it comes to AI is the impact it will have on human
health and critical areas like disease treatment. Might be a crazy idea so a
treatment. Might be a crazy idea so a real doctor can jump in the comments and slap me on the wrist here. I looked at the replies. I didn't see uh you know
the replies. I didn't see uh you know any any doctors come in, but I don't know. I just can't stop thinking about
know. I just can't stop thinking about how what Carpathy has discovered here could have some pretty profound implications. So, only halfway through
implications. So, only halfway through my coffee though, but woke up this morning and this is what I'm thinking about. So, a thought I'd share. I agree.
about. So, a thought I'd share. I agree.
I think there's a lot of really interesting not just like business profit ideas, but also just like medicine, science, uh research. So, I'm
excited for people to to take this and and to continue with it. Um, I also saw this tweet here. Uh, what's after auto research? It's Carpathy's new
research? It's Carpathy's new open-source project, AgentHub. So,
Carpathy also launched AgentHub. What is
AgentHub? It's GitHub for humans. Uh,
sorry, GitHub is for humans. AgentHub is
for agents. So, it's basically a GitHub for for agents. An agent swarm collaboration platform. A very promising
collaboration platform. A very promising direction. I'm watching him speedrun a
direction. I'm watching him speedrun a $1 billion uh company. If you look at the GitHub
uh company. If you look at the GitHub for AgentHub, it says first use case is for auto research, but it's a lot more general than that. Exploratory project.
He says, "Agent first collaboration platform, a bare git repo, a message board designed for a swarm of agents working on the same codebase.
Think of it like a stripped down GitHub where there's no main branches, no main branch, no PRs, no merges, a sprawling dag of commits in every direction with a
message board for agents coordinate uh coordinate coordinate. I think this is
coordinate coordinate. I think this is really interesting and just like whenever Car Path is up to something, I'm always paying attention, so I had to put that one in there as well. So, um,
you know, maybe you've gotten to the end of, uh, this episode and you're kind of like, okay, I kind of I think I understand what auto research is. I
think I know what, you know, Carpathy's a G. Um, Toby's a G. Like, all these
a G. Um, Toby's a G. Like, all these smart people are are are playing with it. Um, how do I get started? Well, to
it. Um, how do I get started? Well, to
get started, I'd recommend um just tell Claude code to get you started. So, you know, I went ahead and
started. So, you know, I went ahead and I basically was like uh I I gave um
Cloud Code the list this um this GitHub repo, the GitHub, the auto research GitHub repo, and wow, 25,000 stars already. So, this is crazy. Um, it's
already. So, this is crazy. Um, it's
really growing growing quick.
Um, so I just gave it I gave uh gave it the link and I was just like I need help installing auto research by Carpathy.
Um, and it says here's how to install it and set up auto research by Cararpathy.
You need an Nvidia GPU. So I talked I talked about that in the beginning. It
was tested on a H100 but other Nvidia GPUs should work. And you need a UV package manager. So you have to install
package manager. So you have to install UV. You clone the repo, you install the
UV. You clone the repo, you install the dependencies, um, you prepare the data and run a training experiment. In my case, I don't
training experiment. In my case, I don't have an Nvidia GPU. I'm actually using a MacBook and an M1 Pro. I know I'm I need
a I need a upgrade um to a new Mac. So,
I was like, "So, wait, I need an Nvidia GPU to do this." Um, but there's a few options. cloud GPU. Um,
options. cloud GPU. Um,
you know, you can, so you can rent an Nvidia GPU from a service like Lambda Labs, Vast AI, RunPod, or Google Collab.
Some offer free tier GPUs. This is the most straightforward forward path. So,
that's that's the answer to people who don't have an Nvidia chip. Just rent it on one of these services. I personally
use Google Collab. Why? Um, I just know Google the best and trust Google the best. Um,
best. Um, you know, it also says you can try it, you know, via Apple Silicon via an MPS backend. I'm like, no, I'm not going to
backend. I'm like, no, I'm not going to do that. Um, so with that, that's what
do that. Um, so with that, that's what route I did. I went on Google Collab.
The easiest way to get started, you go to collab.google.com. You create a new
to collab.google.com. You create a new notebook. You change the runtime to
notebook. You change the runtime to change runtime T4 GPU. And you run a bunch of commands. That might be like complicated, sound complicated. You,
this is what Collab looks like. you
literally just tell you know you you listen to what um cloud code tells you to do and you just paste it in and you can get started. So um you know if if
people are interested I can spend you know more time with this with auto research as I'm learning sharing more about it but I just wanted to do give
you a quick primer on what it is why it's important what are some ideas on how you can actually use this thing um and then how are people installing it
they're just you know you can use cloud code as your helper to get it instal installed and you're going to want to rent uh a a GPU in the cloud, at least
to start. So, hope this has been
to start. So, hope this has been helpful. Um, this is another solo
helpful. Um, this is another solo podcast that I'm doing on the Startup Ideas podcast. The last time I did this
Ideas podcast. The last time I did this last week, I had a lot of comments that said, "Yeah, Greg, I actually really like when you just come in solo and just start like telling us what's on your mind and stuff like that in real time."
So, I'm here. I read every single comment. So, you know, keep commenting,
comment. So, you know, keep commenting, keep liking, keep subscribing, and I'll keep, you know, putting this out there for you for free. Yeah, I'm excited to see what you end up using this for. Um,
of course, it's early, right? Like, this
is this is brand new. Um, people are still trying to figure out what are the use cases, but I always find that, you know, in the in the fog in the fog, people don't really understand where the
opportunity is is when there's sometimes an opportunity. So, um, one thing I've
an opportunity. So, um, one thing I've just learned in my career is just like when I see people like Carpathy doing things like this, you want to pay attention. You want to tinker with it.
attention. You want to tinker with it.
You want have some fun with it, and you want to see what it's all about. So,
thanks again for, you know, giving me your time. Um, hope this has been clear.
your time. Um, hope this has been clear.
Share this with a friend, uh, who you think would see it valuable. Um, and if you need if you need any ideas, more ideas on startups to build, u you know,
with AI, ideabrowser.com definitely your place to go. And I'll see you in the comment section. And, uh, I'll see you
comment section. And, uh, I'll see you next time. You know, have a creative
next time. You know, have a creative
Loading video analysis...