How to Build a Personal LLM Knowledge Base (Karpathy’s Method)
By Upgraded
Summary
Topics Covered
- The smartest AI experts use AI to build wikis, not code
- LLMs know consensus, not cutting-edge truth
- Ask your wiki to reveal your blind spots
Full Transcript
A few days ago, Andrej Karpathy, who is one of the early members of OpenAI and literally coined the term vibe coding, posted a Tweet that got 14 million views in 72 hours.
And when Lex Friedman saw it, he replied and said, I do the exact same thing.
So what are two of the smartest people in AI doing with their time right now?
And the answer is that they are building personal LLM knowledge bases.
I've also heard this called an AI second brain, but it is an AI maintained wiki that grows over time.
You can ask complex questions against it and it never forgets anything.
So it gets stronger over time.
And they do the whole thing using just Obsidian, Claude and your local computer files.
And in this episode, I'm gonna show you exactly how to build one.
Because if the smartest people in AI are doing this, we probably should be too.
So before I show you the tutorial, let me walk you through how this thing works at a high level.
Cause once you understand the concept, the tutorial will make way more sense.
There's three phases.
Collect, compile, and query.
Step one is collect.
You find the best content on the topic that you want to get smarter about.
The example that I'm going to use for the tutorial is lifting for hypertrophy.
So building muscle, because that's something I let slip over the past few months, but I'm getting back into now and really want to optimize it.
And the thing is, LLMs know a lot of information, but it's sort of an aggregate of what is generally agreed upon.
And so if you want cutting edge research or more niche information, it's harder to get that from an LLM.
So you find the best content on the topic that you're interested in.
You can pull from YouTube transcripts, substacks, podcast transcript research summaries, and then you save them all into a folder on your computer.
The tool that they use for this and I will also be using is the Obsidian Web Clipper, which makes it really easy to save any of this information into a file on your computer.
So these are your raw ingredients.
You're not organizing anything yet, just collecting.
Phase two is compile.
You take everything in that folder and you hand it to Claude, you give it a specific prompt and I'll show you the one I use.
And then Claude reads every file and creates a structured wiki.
So it's a set of interconnected markdown files where each one is linking to other relevant ones and broken down by concept.
To view and navigate that wiki, we will use Obsidian, which, if you don't know, sort of like a note taking and mind mapping tool that's really popular, but it is especially good at allowing you to read documents in a really clean and beautiful way.
Think of Claude as the one building and maintaining the wiki.
It's sort of like your librarian.
And then Obsidian is the tool that you use to look at it.
Phase three is query.
So now that we have the wikis built, you go back to Claude, point it at the wikis and start asking complex questions.
So not just what did X person say about this topic?
But what's the actual debate between all of the sources in my wiki around this topic?
Or what are the gaps in my knowledge right now?
And then every time you find something new, you can just drop it into the wiki.
You can add your own thoughts or information in there, tailor it as needed, then down the line you've got something insanely valuable that grows with you.
So now let's jump into the tutorial.
Step one is downloading Obsidian.
So just type it in, click the first link and download.
Once that's installed, we'll open it up in the onboarding flow.
It will ask you to create a new vault.
So we'll hit create, and then this is where we're going to give our vault a name.
Typically, people use the name raw as your raw files that you're inputting everything into.
And then you'll pick a location on your computer to put this and they'll hit create.
So now we have our new vault and we don't have to do anything here just yet because we are going to compile information and put it into that vault or that folder that we just created.
Now the next thing we're going to do is go to the Chrome web store and download the Obsidian Web Clipper.
So this one here.
And now we are going to start gathering information.
So I would start with a topic you're really interested in, one that you want to learn and grow.
Like I said, for me, that is hypertrophy, building muscle as efficiently as possible.
So, for example, here I pulled up a podcast transcript of an interview with Dr. Andy Galpin, who is like a muscle building expert.
He was on the Huberman Lab podcast.
So I've got this.
I'm just going to click the little Obsidian web Clipper and I'm going to hit add to Obsidian.
Boom.
Now I have that one in here.
If there's something that you would rather download as opposed to clipping, you can just download it, find where it was downloaded to and then move it into that Obsidian folder.
So here I'm moving it into the raw folder.
So that's how you add information.
Now I'm going to fast forward here through me adding a bunch of this stuff.
But this is a very key phase as we talked about.
So spend some time here finding the best resources, the things that you want to pull from and adding them into Obsidian.
And then one nice tip if you want to grab YouTube videos is YouTube does have built in transcripts, so just go to whatever video that you want, scroll down in the description here and then you'll see this, the transcript.
So just click show transcript and then you can just highlight this whole thing here, copy this and then create a new text file and save that in your raw file with the transcript of the video.
So now I've compiled a bunch of resources you don't need a ton.
I would say start with 10 really solid resources rather than trying to get a hundred right off the bat.
Remember, you can always add more.
Now that we have all of that information, we're going to go into Claude code.
So you will need claude code or cloud cowork for this, which does cost $20 a month for the monthly plan.
I'm going to paste in this prompt.
Now I will link to a prompt that is more general and has fill in the blanks that you can apply it to your specific use case.
But this one illustrates the point just fine.
So I have a folder of raw sources from four experts.
These guys here read all of these and build me a structured wiki organized by concept, not by source or person.
So then for each major concept, write a standalone article that does this, summarize what the evidence says, notes where people disagree, flags when they disagree, and why, and then links to related articles.
Also create an index MD file that lists every article so that you know, you can go back and reference these.
Now we are going to select our folder.
So we're going to click this raw folder where we have everything and then we're going to hit enter.
You'll have to allow things as it goes through this.
Now the real question is, what do you do while you're waiting for cloud code?
If anybody figured out a good answer for that, let me know.
Okay, we are done.
So this took about 20 minutes, 32.1k tokens.
I did have some pretty long files in there.
They're pretty wordy.
Yours might be shorter.
So now you can see in the Bends Obsidian folder.
If we go into Raw, we have our wiki, we have all of these markdown files of the key things that we asked about.
So now if we go into Obsidian and we'll do file and open Vault, we'll choose to open this wiki folder.
Now you can see on the left here we have all of these files with the key concepts that we wanted to learn about.
So this is pretty cool.
We've got body recomposition, calorie and surplus, hypertrophy mechanisms, strength versus hypertrophy, sleep and recovery.
This is awesome.
And then we have the index file that has all of the key articles here.
It talks about progressive overload.
If you want to understand the single most important training principle, body recomposition, if you want to build muscle and lose fat at the same time, then it breaks down everything into these articles.
This is sweet.
Another cool thing with Obsidian is they have a graph view where you can see how all of these things, things are interconnected.
So for example, body recomp is tied heavily to genetics and age, sleep and recovery, calorie, surplus, macronutrient, et cetera.
Honestly, sometimes I think this looks cooler than it really is helpful.
But depending on who you are, how advanced you are with your usage, it can be really cool to see how things string together.
But this isn't even the end of it.
If we go back to Claude code now, we can query and ask it questions based on the wiki.
So now I'm going to ask it something like, based on the information in my wiki, what is the most important variable for hypertrophy, how do I optimize it, and where do my four sources agree versus disagree?
And then we just have to make sure that we change the file to that actual wiki file.
So go in here and change it to the wiki.
And one of the great things about having the wiki and specifically the index in the wiki is, is that it's way easier, quicker and more efficient to get the information.
So you're not going to use a bunch of tokens once you have the index built out.
So saying the most important variable, mechanical tension, is unambiguous.
They all agree that mechanical tension is the primary driver of hypertrophy.
Progressive overload is the meta principle.
It's how you ensure mechanical tension keeps increasing over time.
Sources call it non negotiable.
So this is, this is awesome.
Great answer.
But then another one that I really like to ask is where are the gaps in my wiki?
What knowledge is underrepresented that I should be learning about more or should include more of?
So this basically allows you to self improve your wiki, you can ask what information is missing from it and then it'll suggest things to add in.
So what does it say?
Wiki is genuinely strong on fundamentals, but the gaps are mostly in application troubleshooting and specialization.
So no decision tree for debugging stalled progress.
That's a cool one.
Deloading strategies is a pretty cool idea and advanced intensity techniques so things like drop sets, rest, pause, my rep, supersets, etc. Yeah, so all of this, all of this makes sense, but again, great one to throw in consistently and then build on this one.
Cool use case here is Lex Friedman has said that he's taken this and then fed it into voice mode so that when he goes on long runs he can chat with his wiki about key information.
He could even try explaining something.
This is like the Feynman technique, if you've ever heard of that, and see what the wiki says he's missing.
Where is he not understanding something in his explanation?
So I hope this is helpful.
I really think this idea is going to be very popular in the next few months and years, so it makes sense to jump on the train now and use it to acquire knowledge and improve your life.
Loading video analysis...