The 7 Levels of Claude Code & RAG
By Chase AI
Summary
Topics Covered
- Context rot makes AI worse the longer you chat
- Obsidian solves 99% of use cases for free
- Naive RAG is just an overcomplicated Control-F
- Graph RAG delivers 100%+ better results than naive RAG
- Experiment and scale up only when needed
Full Transcript
Let's solve the problem of clawed code and memory. Getting AI systems to
and memory. Getting AI systems to reliably and accurately answer questions about past conversations or giant troves of documents is a problem we have been
trying to solve for years. And the
typical response has been rag retrieval augmented generation. And while this
augmented generation. And while this video is titled the seven levels of clawed code and rag, what this video is really about is deconstructing that problem of clawed code and really AI
systems in general and memory. And even
more importantly, this video is about giving you a road map that shows you where you stand in this fight between AI systems and memory and what you can do to get to the next level. So as we journey through these seven levels of
claude code and rag we are going to hit on a number of topics but we are not going to start here in graph rag or anything complicated we're going to start at the beginning which is just the
basic memory systems that are native to cloud code because sad as it is to say this is where most people not only begin but it's where they stay from automemory and things like cloud MD we're going to
move to outside tools things like Obsidian before we eventually find ourselves with the big boys with the through rag systems. At these levels, we'll talk about what rag actually is,
how it works, the different types of rag, naive rag versus graph rag versus agentic rag, things like rerankers and everything in between. And at each level, we're going to break it down in the same manner. We're going to talk
about what to expect at that level, the skills you need to master, the traps you need to avoid, and what you need to do to move on to the follow-on level. What
this video will not be is a super in-depth technical explanation of how to necessarily set up these specific systems because I've already done this in many instances when we talk about
graph rag and light rag for example or even more advanced topics like rag anything and these different sort of embedding systems. I've done videos where I break down from the very beginning to the very end how to set
that up yourself. So when we get to those sections, I will link those videos. And this is for both our sakes.
videos. And this is for both our sakes.
So this video isn't 5 hours long. But
for those levels, we're still going to talk about what that actually means, what each system buys you, and when you should be using it. But before we start with level one, a quick word from today's sponsor, me. So just last month,
I released the Claude Code Masterass, and is the number one way to go from zero to AI dev, especially if you don't come from a technical background. And
this master class is a little bit different because we focus on a number of different use cases to learn how to use cloud code. One of those is something like production level rag. How
to build the rag systems you're going to see in this video in a real life scenario and actually use it as a member of a team or sell it to a client. That's
the kind of stuff we focus on. So if you want to get access, you can find it inside of Chase AI plus. There's a link to that in the pinned comment and we'd love to have you there. So now let's start with level one and that's
automemory. These are the systems that
automemory. These are the systems that Cloud Code automatically uses to create some sort of memory apparatus to actually remember things that you've talked about. And you know you're here
talked about. And you know you're here if you've never set anything up intentionally to help claude code remember context in general about previous conversations or just stuff that's going on in your codebase. And
when we talk about automemory, that is quite literally what it is called. The
automemory system, which is automatically enabled when you use claude code, essentially allows cloud code to create markdown files on its own
that sort of lists out things it thinks are important about you and that particular project. And this is purely
particular project. And this is purely based off of its own intuition based on your conversations. And I can see these
your conversations. And I can see these memory files it's created. Again, it
does this on its own. If you go into yourcloud folder, you go into projects, you will see a folder there that is called memory. And inside that file,
called memory. And inside that file, you'll see a number of markdown files.
Here, there are four of them. And
they're like Claude Code's version of Post-it notes saying, "Oh yeah, he mentioned this one time about his YouTube project growth goals. Let's
write that down." And inside of everyone's memory folder, there will be a memory file. So you see in this memory file, it has a little note about one of my skills. And then it has, you know,
my skills. And then it has, you know, essentially an index of all these submemory files saying, "Hey, there's a YouTube growth one in here, a revenue one, a references one, and here's what's inside of it." So, if I'm just talking
to Claude Code in my vault file and I mention something about YouTube and sort of my goals with growth, whatever, it's going to reference this and say, "Oh, yeah, Chase is trying to get, you know, x amount of subscribers by the end of
2026."
2026." It's cute, but ultimately it's not that useful. It's kind of like when you're
useful. It's kind of like when you're inside of chat GBT and it will bring up random stuff about previous conversations and it almost like shoehorns it in. It's like, "Okay, I get it. You remember this, but I don't
it. You remember this, but I don't really care." Aaron, honestly, it's a
really care." Aaron, honestly, it's a little weird you keep bringing that up.
I prefer if you didn't. And
unfortunately, this is where most people stay in their memory journey. And it's
built upon a somewhat almost abusive past that we all have when it comes to using these chat bots because these chat bots don't have any sort of real memory from conversation to conversation. And
so, we're always scared to death of having to exit out of a chat window or exit out of a terminal session because you think, "Oh my gosh, it's not going to remember my conversation." And this
is actually a real problem because what is everybody's answer to the chat window not being able to remember anything?
Well, the answer is you just keep that conversation going forever because you don't want to get to a scenario where you have to exit out and it forgets everything. This is a fear that is born
everything. This is a fear that is born here inside of these chat windows beginning with chat GPT and same thing with Claude's web app. And honestly,
used to be infinitely worse with Claude's web app because I think we all remember before the days of the 1 million contacts window where you would have like 30 minutes to talk with Claude and be like, "Well, see you in four hours." The issue is people have brought
hours." The issue is people have brought that sort of psychotic neurotic behavior to the terminal. And what they do in large part because you now can get away with it with a 1 million context window
is they never clear. They just keep talking and talking and talking with Cloud Code because they never want it to forget what they're talking about because of these memory problems. And the issue with that is your efficiency
goes way down over time the more you talk with cloud code inside of the same session. And this is the fundamental
session. And this is the fundamental idea of context raw. If you don't know what context rot is, it's the phenomenon that the more I use an AI system within its same session, within its same chat,
and I fill up that context window, the worse it gets. You can see that right here. Clawed code 1 million context
here. Clawed code 1 million context window at 256k tokens. aka I've only filled up about a quarter of its context window. We're at 92%. By the end, I'm at
window. We're at 92%. By the end, I'm at 78. So, the more you use it in the same
78. So, the more you use it in the same chat, the worse it gets. And that's one of the primary issues people have with AI systems in memory. I have clawed code. It has a million contexts now. And
code. It has a million contexts now. And
yet, I do not want it to forget about the conversation I'm having. So, I just never exit the window. I just fill it up and fill it up and fill it up. And two
things happen. One, effectiveness goes down like you just saw. Two, your usage fills up a ton.
because the amount of tokens that are used at 1 million at 800,000 you know context is way more than at 80,000 context. So this isn't the only issue
context. So this isn't the only issue but kind of off topic we in a current ecosystem where everyone complains about cloud code being nerfed and my usage just gets run up automatically. There's
a number of reasons for that, but one of them undoubtedly is the fact that since 1 million context got introduced, people have no clue how to manage their own
context window. And they aren't nearly
context window. And they aren't nearly they aren't nearly as aggressive with clearing and resetting the conversation as often as they should. But that's kind of off topic. The point of that whole
discussion is that when it comes to memory in this discussion about rag in claude code, we have to keep context rot in the back of our mind because we're constantly trying to deal with this tension of okay, I want to ingest
context so cloud code can answer questions about a number of things yet at the same time I don't want the context to get too large because then it's worse.
So we just that always needs to be something we're thinking about in this conversation about memory. But to bring this back to the actual video in level one, what are people doing at level one?
The answer is they're not really doing anything. And because they're not doing
anything. And because they're not doing anything, they just rely on a bloated context window to remember things. So
you know you're here when you've never edited a cloud. MD file and you've never created any sort of artifact or any sort of file that allows cloud code to realize what the heck is going on, what
it's actually done in the past, and what it needs to do in the future. So what do we need to master at this level? Well,
really all you really need to master despite everything I've wrote here is you just need to understand that automemory isn't enough and we need to take an active role when it comes to cloud code and memory because a trap at this level if you don't take an active
role you you have no control and we need to control what claude code considers when it answers our questions. And so to unlock level one and move on to level
two, we need memory that's explicit and we need to figure out how to actually do that. What files do you need to edit and
that. What files do you need to edit and understand that they even exist in order to take an active role in this relationship? Now, level two is all
relationship? Now, level two is all about one specific file, and that is the claw.md file. When you learn about this
claw.md file. When you learn about this thing, it feels like a godsend. Finally,
there is a single place where I can tell claude code some rules and conventions that I always wanted to follow, and it's going to do it. And in fact, I can include things that I wanted to remember, and it always will. And it
definitely feels like progress at first.
So here's a template of a standard claw.md file for a personal assistant
claw.md file for a personal assistant project. Now cloud code is going to
project. Now cloud code is going to automatically create a claw.mmd file,
but you have the ability to edit this or even updated on demand by using a command like for slashinit. And the idea of this thing is is it is again like the holy grail of instructions for cloud
code for that particular project. For
all intents and purposes, claude code is going to take a look at this before any task it executes. So if you wanted to remember specific things, what are you going to do? You're going to put it in
the cloud MD theoretically. It's a bit smaller scale than something like rag, you know, we aren't putting in, you know, complete documents in here, but it's things you want cloud code to always remember and conventions you want
it to follow. So for this one, we have an about me section. We have a breakdown of the structure of the file system and how we want it to actually operate when we give it commands. And like I said, because this is referenced on
essentially every prompt, Cloud Code is really good at following this. So the
idea of like, hey, I wanted to remember specific things. This seems like a great
specific things. This seems like a great place to put it. But we got to be careful because we can overdo it. When
we look at studies like this one, evaluating agents.m MD, and you can swap
evaluating agents.m MD, and you can swap agents.mmd for claw.md,
agents.mmd for claw.md,
they found in the study that these sort of files can actually reduce the effectiveness of large language models at large. And why is that? Well, it's
at large. And why is that? Well, it's
because the thing that makes it so good, the fact that it's injected into basically every prompt, is what also can make it so bad. Are we actually injecting the correct context? Have we
pushed through the noise and are we actually giving it a proper signal? Or
are we just throwing in things that we think are good? Because if it isn't relevant to virtually every single prompt that's going to do in your project, should it be here in the claw.
MD, is this a good way to let Claude Code remember things? I would argue no, not really. And that goes contrary to
not really. And that goes contrary to what a lot of people say about claw. MD
and how you should structure it. Based
on studies like that and based on personal experience, less is more.
Context pollution is real. Context rot
is real. So if something is inside of claw.md and it doesn't make sense for
claw.md and it doesn't make sense for again virtually every single prompt you give it, should it be in there? The
answer is no. But most people don't realize that and instead they fall into this trap of a bloated rule book.
Instead, the skills we should be mastering are how do we create project context that is high signal. How do I make sure what I'm actually putting inside this thing makes sense? And with
that comes the idea of context awareness like we talked about in the last level.
And you take all that together and level two feels like you've been moving forward like, hey, I'm taking an active role in memory. I have this claw. MD
file. You realize it's not really enough. And when we talk about level
enough. And when we talk about level three and what we can do to move forward there, we want to think about sort of not a static rulebook, but something that can evolve and it's something that
can include cloud MD instead of relying on cloud MD to do everything. What if we use cloud MD as sort of like an index file that points cloud code in the right direction instead? So what did I mean
direction instead? So what did I mean about cloud.md acting as sort of an
about cloud.md acting as sort of an index and pointing towards other files?
Well, I'm talking about a architecture within your codebase that doesn't just have one markdown file trying to deal with all the sort of memory issues in the form of cloudm. I'm talking about
having multiple files for specific tasks. I think a great example of this
tasks. I think a great example of this in action is sort of what GSD, the get done orchestration tool does. It
doesn't just create one file that says, hey, this is what we're going to build and these are the requirements and this is what we've done and where we're going. Instead, it creates multiple. You
going. Instead, it creates multiple. You
can see over here on the left we have a project.mmd a requirements.mmd a road
project.mmd a requirements.mmd a road map and a state. So the requirements exist. So cloud code always knows and
exist. So cloud code always knows and has memory of what it's supposed to be building. The road map breaks down what
building. The road map breaks down what exactly we are going to be creating not just now but what we've done in the past and in the future. And the project gives it memory gives it context of what we are doing at a highle overview. What is
our northstar? And by breaking up memory and context and conventions and this sort of system, we're fighting against the idea of context raw and the idea brought up in that study which is
injecting these files into every prompt all the time like we do in claw. MD,
it's actually counterintuitive. This
doesn't help us get better outputs.
Furthermore, breaking it down into these chunks and having a clear path for cloud code to go down and says like, hey, I want to figure out where this information is. Oh, I go to cloud.md.
information is. Oh, I go to cloud.md.
Oh, cloud.MD MD says, "These are my five options." Okay, here's that one. Let me
options." Okay, here's that one. Let me
go and find it. That sort of structure is what you're going to see 100% in the follow-on level when we talk about Obsidian. And really is sort of like a
Obsidian. And really is sort of like a crude reimagining of the chunking system and the vector similarity search that we see in true rag systems. But obviously, this is kind of small scale at this
level. We're talking about four markdown
level. We're talking about four markdown files here. We're not talking about a
files here. We're not talking about a system that can handle thousands and thousands and thousands of documents.
But like you're going to hear me talk about a lot, what does that mean for you? Do you need a system that we're
you? Do you need a system that we're going to talk about in levels four, five, 6, 7 that can handle this many documents? The answer is maybe not. And
documents? The answer is maybe not. And
so part of this rag journey is understanding not just where you stand, but like where do you actually need to go? Do you always need to be at level
go? Do you always need to be at level seven and know how to do an agentic rag system of cloud code? It's probably good to know how to do it, but it's also just as good to know when you don't need to
implement that. Sometimes what we see in
implement that. Sometimes what we see in these systems like this is enough for a lot of people. So it's just as important to know how to do it and to know like do you need to should you do it? When we
talk about level three and we talk about state files, how do we know we're here?
Well, we know we're here when we're still strictly inside the cloud code ecosystem. We haven't integrated outside
ecosystem. We haven't integrated outside tools or applications and really we're just at the place where we're just creating multiple markdown files to create our own homemade sort of like memory chunking system. But this still
is really important. We're still
mastering some true skills here. The
idea of like actually structuring docs, having some sort of system in place that updates state at every session because this is can be a problem with rag too.
Like how do you make sure everything is up to date? And chances are you're also starting to lean into orchestration layers at this point. Things like GST and superpowers that do things like
this, this multi markdown file architecture on their own. But there is a real trap here. what we create in this project is very much just for that project. It's kind of clunky to then
project. It's kind of clunky to then take those markdown files and shift them over to another project. So level four is where we bring in Obsidian. And this
is a tool that has been getting a ton of hype and for good reason. When you have people like Andre Carpathy talking about these LLM knowledge bases they've created which are built for all intents
and purposes on an Obsidian Foundation.
It's getting almost 20 million views. we
should probably listen and see how this is actually operating. Now, for context, I've done a full deep dive on this Obsidian Andre Carpathy LLM knowledge base. I'll link that above. So, if you
base. I'll link that above. So, if you want to focus on that, how to build that. Make sure you check that out
that. Make sure you check that out above. And what I also want to mention
above. And what I also want to mention to most people is that this Obsidian thing we're going to talk about right here in level four, this is honestly the level most people should strive for
because this is enough for most people in most use cases. When we talk about levels five, six, and seven, we're going to talk about true rag structures. And
to be honest, it's overkill for most people. This is overkill
people. This is overkill for most people. Like, we love talking about rag. Like, it's great. I
about rag. Like, it's great. I
understand that. But Obsidian is that 80% solution that in reality is like a 99% solution for most people because it's free. There's basically no
it's free. There's basically no overhead. And it does the job for the
overhead. And it does the job for the solo operator. And when I say does the
solo operator. And when I say does the job for the solo operator, I mean it solves the problem of having clawed code connected to a bunch of different documents, a bunch of different markdown
files, and being able to get accurate, timely information from it, and having insight to those documents as the human being. Because when I click on these
being. Because when I click on these documents, it's very clear what is going on inside here. And it's very clear what documents are related to it. When I
click these links, I'm brought to more documents. When I click these links, I'm
documents. When I click these links, I'm brought to more documents. And so for me as the human being, having this insight is important because to be totally
honest, the obsidianbased insight to the documents, I would argue trumps a lot of the insight you get from the rag systems. When we talk about thousands and thousands of documents being embedded in something like a grav rag
system, like this looks great visually, looks very stunning. Do you actually know what's going on inside here?
Maybe you do. to be honest, you're kind of just relying on the answers you get that we'll show and the and the links and stuff, but it's a bit hard. It's
like piece through the embeddings for sure. All that to say is you should pay
sure. All that to say is you should pay special attention to Obsidian and Claude code because when we talk about this journey from Rag, I always suggest to everybody, clients included, like let's
just start with Obsidian and see how far we can scale this and eventually if we do hit a wall, you can always transition to more robust rag systems. So, why not try the simple option? If it
works, great. It's free. Cost me no money versus like, let's try to knock out this rag system, which can be kind of difficult to put into production depending on what you're trying to do.
Like, always start with the simple stuff. It's never too hard to transition
stuff. It's never too hard to transition to something more complicated. So, what
are we really talking about here in level four? What we're talking about
level four? What we're talking about taking sort of that structure we began to build in level three, you know, with an index file pointing at different markdown files and just scaling that up
and then bringing in this outside tool Obsidian to make it easy for you, the human being to actually see these connections. And the platonic idea of
connections. And the platonic idea of this version is pretty much what Andre Carpathy laid out and building a LLM knowledge base on top of Obsidian and powered by cloud code. And what that looks like is a structure like this. So
when you use Obsidian and you download, it's completely free. again referenced
that video I posted earlier. You set a certain file as the vault. Think of the vault as sort of like the rag system.
This this quasi rag system you've created. And inside of the vault, we
created. And inside of the vault, we then architect that. We structure that just with files. So we have the overarching file called the vault. And
inside that vault, we create multiple subfolders. In Andre Carpathy's case, he
subfolders. In Andre Carpathy's case, he talks about three different subfolders.
The reality is they could be any subfolders. It just sort of needs to
subfolders. It just sort of needs to match the theme we're going to talk about. In one folder we have the raw
about. In one folder we have the raw data. This is everything we are
data. This is everything we are ingesting and eventually want to structure so that cla code can reference it later. Think of you know you have
it later. Think of you know you have claude code do competitive analysis on 50 of your competitors and it pulls 50 sites for each, right? We're talking
about a large amount of information.
It's probably 2500 different things. All
that will get dumped into some sort of raw folder. This is like the staging
raw folder. This is like the staging area for the data. We then have the wiki folder. The wiki folder is where the
folder. The wiki folder is where the structured data goes. So we then have clawed code take this raw data and structure it into essentially different like Wikipedia type articles inside of
the wiki folder. Each article gets its own folder. So the idea being when you
own folder. So the idea being when you then ask Claude code information about you know let's say we had it search for stuff about AI agents and I say hey claude code talk to me about AI agents.
It's the same way you would query a rag system. Well, claude code is going to go
system. Well, claude code is going to go to the vault. From the vault, it's going to go to the wiki. The wiki has a master index markdown file. Think of sort of what we were doing with talked about
doing with claw.md before, right? You
see how these sort of themes transition throughout the different levels. It
takes a look at that master index. The
master index tells it what exists in this obsidian rag system. Oh, AI agents exist. Cool. Guess what's going on down
exist. Cool. Guess what's going on down here? It also has an index file which
here? It also has an index file which talks about the individual articles that exists. What am I saying here? I am
exists. What am I saying here? I am
saying there is a clear hierarchy for claude code to reference when it wants to find information about files vault
wiki index article etc. So because it is so clear how to find information and also why it's so clear to first find information then
turn into wiki we can create a system that has a lot of documents without rag hundreds thousands if you do this properly because if the system is clear
hey I check the vault and I check the index and that has a clear delineation of like where everything is well then it's not too hard for cloud code to figure out where to find stuff and so you can get away with a non-rag
structure for thousands of documents and it's been really hard to do that in the past. And that's because most people
past. And that's because most people don't structure anything with any sort of structure. They just have a billion
of structure. They just have a billion documents sitting in one folder. It's
the equivalent of having 10 million files strewn across the factory floor and being like, well, Claude Code, find it. Like, no, you actually just need a
it. Like, no, you actually just need a filing cabinet. Like Claude Code's
filing cabinet. Like Claude Code's actually pretty smart. And you can see that architecture in action right here.
So, right now, we're looking at a cloud.
MD file that is in an Obsidian vault.
And what does it say? breaks down the vault structure, the wiki system, you know, the overall structure of the subfolders and how to essentially work it. Right? So again, we're using claude
it. Right? So again, we're using claude MD as a conventions type file. Over here
on the left, you can see the wiki folder. Inside the wiki folder is a
folder. Inside the wiki folder is a master index and it lists what is inside of there. In this case, there's just one
of there. In this case, there's just one article. It's on cloud managed agents.
article. It's on cloud managed agents.
Inside that folder, we see cloud managed agents. It has its own wiki folder
agents. It has its own wiki folder breaking down the articles inside until you get to the actual article itself. So
very clear the steps it needs to take.
And so when I tell Claude Code, talk to me about the manage agents, we have a wiki on it. It's very easy for it to search for it via its built-in GP tool.
It links me the actual markdown file and then breaks down everything that's happening. Now the question at level
happening. Now the question at level four really becomes a level of scale.
How many documents can we get away with where this sort of system continues to work? Is there a point at which Andre
work? Is there a point at which Andre Carpathy system begins to fall apart where hey like I get it there's a very clear clear path that cloud code needs to follow goes to the indexes yada yada
yada does that sustain itself at 2,000 documents 2500 3,000 is there a clear number the answer is we don't really know and there isn't a clear number because all your documents are also
different and in terms of hitting a wall it isn't just as simple as well cloud code's giving us the wrong answers it has too many files in the subsidian system how much is it costing you in terms of tokens now that we've added so
many files and how quickly is it doing it because RAG can actually be infinitely faster and cheaper in certain situations. What we're looking at here
situations. What we're looking at here is a comparison between textual LLMs right in the giant bars and textual rag in terms of the amount of tokens it took
to get the correct answer and the amount of time it took to get that answer. What
do we see here? We see that textual rag versus textual LLMs, there's a massive difference to the tune of like 1,200 times. I'm saying rag is 1,200 times
times. I'm saying rag is 1,200 times cheaper and 1,200 times faster than textual LLM in these studies. Now,
context, this was done in 2025. This was
not done with clawed code. These models
have changed significantly since then.
These are just straight up LLMs. This isn't a coding artist, etc., etc., etc. However, we were talking a,200x
difference. So, when we're evaluating,
difference. So, when we're evaluating, hey, is Obsidian what I should be doing versus is should I be doing rag system?
It isn't as simple as just, well, it's giving the right answer or not because you could be you could have a scenario where you get the right answer with Obsidian, yet if you went to Rag, it's a
thousand times cheaper and faster, right? So it's this very fuzzy line
right? So it's this very fuzzy line between when is obsidian good enough and these sort of like just markdown file architectures good enough or when like we need to use rag. There's not a great answer. I don't have a great answer for
answer. I don't have a great answer for you. The answer is you have to
you. The answer is you have to experiment and you need to try both and see what works because this is frankly out of date totally like 2025 older models. The difference between rag and
models. The difference between rag and textual LLM is not 1,200 times. But how
much has that gap shrunk? Because that
is an insane gap. That isn't like 10x.
It's 1,200x the so there's a lot you have to know and again you you you won't know the answer ahead of time. You just won't watch every video you want. No one's
going to tell you where that line in the sand is. You literally just need to
sand is. You literally just need to experiment and see what works for you as you increase the amount of documents you're asking cloud code to answer questions about. So on that note, let's
questions about. So on that note, let's move on to level five, which is where we finally begin to talk about real rag systems and talk about some of the rag fundamentals like embeddings, vector
databases, and how data actually flows through a system when it becomes part of our rag knowledge base. So let's begin by talking about naive rag, which is the most basic type of rag out there, but it
provides the foundation for everything else we do. Now, you can kind of think of rag systems being broken out into three parts. On the left hand side we
three parts. On the left hand side we have the embedding stage. We then have the vector database and then we have the actual retrieval going on with the large
language model. So one 2 and three. And
language model. So one 2 and three. And
to best illustrate this model, let's start with sort of the journey of a document that is going to be part of our knowledge base. Remember in a large rag
knowledge base. Remember in a large rag system we could be talking about thousands of documents and and each document could be thousands of pages.
But in this example, we have a onepage document that we're talking about. Now,
if we want to add this document to our database, the way it's going to work is it's not going to be ingested as a whole unit. Instead, we are going to take this
unit. Instead, we are going to take this document and we are going to chunk it up into pieces. So, this one pager
into pieces. So, this one pager essentially becomes three different chunks. These three chunks are then sent
chunks. These three chunks are then sent to an embedding model. And the job of the embedding model is to take these three chunks and turn it into a vector
in a vector database. Now a vector database is just a different variation of your standard database. When we talk about a standard database, think of something like an Excel document, right?
You have columns and you have rows.
Well, in a vector database, it's not two-dimensional columns and rows. It's
actually hundreds, if not thousands of dimensions. But for the purposes of
dimensions. But for the purposes of today, just think of a three-dimensional graph like you see here. And the vectors are just points in that graph. And each
point is represented by a series of numbers. So you can see here we have
numbers. So you can see here we have bananas. And bananas is represented by
bananas. And bananas is represented by 0.52, 5.12, and then 9.31. You see that up here. Now that continues for hundreds
up here. Now that continues for hundreds of numbers. Now where each vector gets
of numbers. Now where each vector gets placed in this giant multi-dimensional graph depends on its semantic meaning.
What what do the words actually mean? So
you can see over here this is like the the fruit section. We have bananas, we have apples, we have pears. Over here we have ships and we have boats. So going
back to our document, let's imagine that this document is about World War II ships. So each of these chunks is going
ships. So each of these chunks is going to get turned into a series of numbers and those series of numbers will be represented as a dot in this graph.
Where do you think it's going to go?
Well, they'll probably go around this area, right? So that would be 1 2 and
area, right? So that would be 1 2 and three. So that's how documents get
three. So that's how documents get placed. Every document is going to get
placed. Every document is going to get chunked. Each chunk goes through the
chunked. Each chunk goes through the embedding model and the embedding model inserts them into the vector database.
Repeat, repeat, repeat for every single document. And in the end, after we do
document. And in the end, after we do that several thousand times, we get a vector database, which represents our knowledge graph, so to speak, our our knowledge base. And that moves us on to
knowledge base. And that moves us on to step three, which is the retrieval part.
So, where do you play into this? Well,
normally, let's let's depict you. Well,
we'll give you a different color. You
can be you get to be pink. So, this is you. All right.
you. All right.
You normally just talk to Claude Code and you ask Claude Code questions about World War II battleships. Well, in your standard non-rag setup, what's going to happen? Well, the large language model,
happen? Well, the large language model, Opus 4.6, is going to take a look at its training data, and then it's going to give you an answer based on its training data, information about World War II battleships. But with a rag system, it's
battleships. But with a rag system, it's going to do more. It's going to retrieve the appropriate vectors. It's going to use those vectors to augment the answer.
it generates for you. Hence, retrieval
augmented generation. That's the power of rag. It allows our large language
of rag. It allows our large language models to pull in information that is not a part of its training data to augment its answer. In this example, World War II battleships. Yes, I
understand the large language model already knows that, but replace this with any sort of proprietary company data that isn't just available for the
web and do it at scale. That's the cell for rag. Now, in our example, when we
for rag. Now, in our example, when we ask Claude Code for questions for information about World War II battleships and it's in a rag setup, what it's going to do is it's going to take our question and it's going to turn
our question into a series of numbers similar to the vectors over here. It is
then going to take a look at what the number is for our question and the numbers of the vectors and it's going to see which of these vectors most closely
matches the questions vector. Right? how
similar are the vectors the question pretty much. And then it's going to pull
pretty much. And then it's going to pull a certain amount of vectors whether that's 1 2 3 four or five or 10 or 20.
And it's going to pull those vectors and their information into the large language model. So now the large
language model. So now the large language model has its training data answer plus say 10 vectors worth of information. Right? That was the
information. Right? That was the retrieval part and then it augments and generates an answer with that additional information. And that is how rag works.
information. And that is how rag works.
That is how naive rag works. Now, this
is not particularly effective for a number of reasons. This very basic structure kind of falls apart at the beginning when we begin to think about, okay, how are we chunking up these documents?
Is it random? Is it just off a pure number of tokens? Do we have a certain number of overlap? Are the documents themselves set up in a way where it even makes sense to chunk them? Because what
if you know chunk number three is referencing something in chunk number one and in our vector situation when we pull the chunks what if it doesn't get the right one? What if it doesn't get
that other chunk that's required as context to even make sense of what number three says? You get what I'm saying? Like very often the entire
saying? Like very often the entire document itself is needed to answer questions about said document. So this
idea of getting these peacemeal answers doesn't really work in practice yet.
This is how rag was set up for a long long time. Other issues that can come
long time. Other issues that can come into play are things like what if I have questions about the relationships between different vectors because right now I kind of just pull vectors in a
silo, but what if I wanted to know how boats related to bananas?
Sounds random, but what if I did? You
know, this standard sort of vector database naive rag approach.
Everything's kind of in a silo. It's
hard to connect information and a lot of it just depends on how well those original documents are even structured.
Are they structured in a manner that makes sense for Raget? Now, over the years, we've come up with some ways to alleviate these issues. Things like
rerankers or ranking systems that take a look at all the vectors we grab and essentially then do another pass on them with a large language model to rank them in terms of their relevance. But by and
large, this naive rag system has kind of fallen out of vogue. Yet,
it's still important to understand how this works at a foundational level so it can inform your decisions if you go for a more robust rag approach. Because if
you don't understand how chunking or embeddings even work, how can you make decisions about how you should structure your documents when we talk about something like graph rag or we talk about more complicating embedding
systems like the brand new one from Google which can actually ingest not just text but videos. And if you don't understand this sort of foundation, it's hard for you to actually understand this trap. And the trap is that we've kind of
trap. And the trap is that we've kind of just created a crappy search engine because with these naive rag systems where all we do is grab chunks and we can't really understand the relationships between them. How is that
different from basically just having an over complicated control F system? The
answer there's really not much of a difference. Which is why in these simple
difference. Which is why in these simple when in these simplistic kind of outdated rag structures that actually are still all over the place. If you see someone who's like oh here's my pine cone rag system or here's my superbase
rag system. They don't mention anything
rag system. They don't mention anything about graph rag or they don't mention anything about like hey here's how we have like the sophisticated reranker system and they it's going to suck to the tune of like oh the actual
effectiveness of this is like 25% of the time you get something right like you're almost better guessing. So if you don't know that going in, you can definitely be sort of hoodwinkedked or confused or in some cases like basically scammed
into buying these rag systems that do not make sense. And so level five isn't about implementing these sort of naive rag systems. It's about understanding how they work so that you when it comes
time to implement something more sophisticated, you actually understand what's going on. Because that
five-minute explanation of rag is sadly not something most people understand when they say, "I need a rag system."
Well, do you? Because you also have to ask yourself what kind of questions are you actually asking about your system.
If you're just asking, you know, essentially treating your knowledge base as a giant rulebook and you just need specific things from that knowledge system brought up, well then Obsidian is probably enough or you could probably even get away with a naive rag system.
But if we need to know about relationships, if we need to know about how X interacts with Y and they're two separate documents, they never even really mention each other and it's not
something I can just stick inside the context directly because I have thousands of said documents. Well, that
is when when you're going to need rag and that's when you're going to need something more sophisticated than basic vector rag. That is when we need to
vector rag. That is when we need to start talking about graph rag. So when
we talk about level six of cloud code and rag, we're talking about graph rag and we're talking about this. And in my opinion, if you are going to use rag, this is sort of the lowest level of infrastructure you need to create. This
is using light rag, which is a completely open- source tool. I'll put a link above where I break down exactly how to use it and how to build it. But
the idea of graph rag is pretty obvious.
It's the idea that everything is connected. This isn't a vector database
connected. This isn't a vector database with a bunch of vectors in a silo. This
is a bunch of things connected to one another. Right? I click on this
another. Right? I click on this document. I can see over here on the
document. I can see over here on the right and I'll move this over. You know
the description of the vector, the name, the type, the file, the chunk, and then more importantly the different relationships. And this relationship
relationships. And this relationship based approach results in more effective outcomes. Here is a chart from light
outcomes. Here is a chart from light rags GitHub. This is about I would say 6
rags GitHub. This is about I would say 6 to 8 months old. And also of note, light rag is the lightest weight graph rag system out there that I know of. There's
some very robust versions including graph rag itself from Microsoft it's a graph it's literally called a graph rag but when we compare naive rag to light
rag across the board we get jumps of oftent times more than 100% right 31.6 versus 68.4 4 24 versus 76 24 versus 75
on and on and on. And that being said, according to light rag, it actually holds its own and beats out graph rag itself. But hey, these are light rags
itself. But hey, these are light rags numbers, so take them with a grain of salt. Now, when we look at this
salt. Now, when we look at this knowledge graph system, right away your mind probably goes to obsidian because this looks very similar. However, what
we're looking at here in Obsidian is way more rudimentary than what's going on inside of Lightra or any graph rag system because this series of connections we see here, this is all
manual and somewhat arbitrary. It's only
connected because we set related documents or claude code set related documents when it generated this particular document. For example, just
particular document. For example, just added a couple brackets, boom, that document's connected. So, in theory, I
document's connected. So, in theory, I could connect a bunch of random documents that in reality have nothing to do with one another. Now, because
Cloud Code isn't stupid, it's not going to do that. But that's a lot different than what went on here. Like, this went through an actual embedding system. It
looked at the actual content. It set a relationship. It sent an entity. There's
relationship. It sent an entity. There's
a lot more work going on here inside of LightRag in terms of defining the relationships than Obsidian. Now, does
that difference actually equate to some wild gap in terms of the performance at a low level? No. at a huge scale
maybe. Again, we're in sort of that gray
maybe. Again, we're in sort of that gray area. Kind of depends on your scale and
area. Kind of depends on your scale and what we're actually talking about. And
nobody can answer that question except you and some personal experience. But
understand these two things are not the same. We are not the same, brother. Two
same. We are not the same, brother. Two
totally different systems. One is pretty sophisticated. One's pretty rudimentary.
sophisticated. One's pretty rudimentary.
Understand that. And so to wrap up level six in graph rag, we're really here when we're when we've decided, hey, stuff like Obsidian isn't working. We can't
use something like naive rag because it just doesn't work. And we need something that can extract entities and relationships and really leverage the sort of hybrid vector plus graph query
system design. But there are some traps.
system design. But there are some traps.
There are some serious roadblocks. Even
here at level six, when we talk about light rag, this is just text. What if I have scannable PDFs? What if I have videos? What if I have images? We don't
videos? What if I have images? We don't
live in a world where all your documents are just going to be Google Docs.
And so what do we do in those instances?
So multimodal retrieval is a huge thing.
And on top of that, what about bringing some more agentic qualities to these systems? Give it a little more AI power,
systems? Give it a little more AI power, some sort of boost in that department.
Well, if we're talking about things that are multimonal, then we can finally move to sort of like the bleeding edge of rag in today's day and age as of April 2026.
And that's what level 7's all about.
Now, when we talk about level seven in aentic rag, the big thing we kind of want to index on here is things that have to do with multimodal ingestion.
Now, we've done videos on these things, things like rag anything, which allow us to import images and non-ext documents, again, think scannable PDFs into
structures like the light rag knowledge graph you saw here. We also have new releases like Gemini embedding too, which just came out in March, which allows us to actually embed videos into
our vector database, videos itself. And
this is frankly where the space is going. It's not enough to just do text
going. It's not enough to just do text documents. how much information, how
documents. how much information, how much knowledge is trapped on the internet, especially on places like YouTube where it's just purely video and we want more than just a transcript as well. A transcript doesn't do enough.
well. A transcript doesn't do enough.
So, this sort of multimodal problem is real. And again, this is stuff that just
real. And again, this is stuff that just came out weeks ago. And level seven is also where we need to start paying attention to our architecture and pipelines when it comes to the data going in and out of our rag system. It's
not enough to just get data in here.
Like this is great, you know, okay, we have all these connections and stuff.
How does the data getting there? How is
the data getting there in the context of a team? How is data getting out of
a team? How is data getting out of there? Like what if some of the
there? Like what if some of the information here has changed in a particular document? What if somebody
particular document? What if somebody edits it? How does it get updated? What
edits it? How does it get updated? What
if we had duplicates?
Who can actually put these things in there? When it comes to production level
there? When it comes to production level stuff, these are all questions you need to begin to ask yourself. And so when we look at an agentic rag system like this one from NAND, you can see the vast
majority of the infrastructure, everything outlined here is all about data ingestion and data syncing. There's
only a very small part that has anything due to rag, which is right there.
Because we need systems that clean up the data, that are able to look at, okay, we just ingested this document. In
fact, this was version two of version one. Can we now go back and clean that
one. Can we now go back and clean that data? Here's something like a data
data? Here's something like a data ingestion pipeline where documents don't get directly put into the system or in light rag. We instead put it inside of
light rag. We instead put it inside of like a Google drive and from there it gets ingested into the graph rag system and logged. These were the sort of
and logged. These were the sort of things that will actually make or break your rag system when you're using it for real. And when we talk about a gentic
real. And when we talk about a gentic rag, you can see here, and I know this is rather blurry, but if we have an AI agent running this whole program, so you set up, imagine some sort of chatbot for
your team. Does it always need to hit
your team. Does it always need to hit this database?
The answer is probably not. Chances are
in a team setting, in a business setting, you're going to have information that's in a database like this like text or something, but you probably also have another set of databases like just standard Postgress databases with a bunch of information
you want to query with SQL as well. So
when we talk about in a gentic rag system, we need something that has all of that. The ability to intelligently
of that. The ability to intelligently decide, oh, am I going to be hitting the graph rag database represented here or am I just going to be doing some sort of SQL queries in Postgress? These things
can get complicated, right? And all of this is use case dependent, which is why it's kind of hard to sometimes make these videos and try to hit every single edge case. The point here at level seven
edge case. The point here at level seven is not that there's necessarily some super rag system you've never heard of.
It's that you're actually the devil's in the details here. And that's really mostly the data ingestion piece and keeping it up to date, but also like how do you actually access this thing? Easy
to do in a demo right here. Oh, we just go to the light rag thing and I go to retrieval and I ask it questions.
Different scenario when we're talking about it with a team and everyone's approaching it from different angles and you probably don't want everyone to have access to actually uploading it to light
rag itself on a web app. That being
said, for the solo operator who is trying to create some sort of sophisticated rag system that is able to do multimodal stuff, I would suggest the rag anything plus light rag combination.
I've done a video on that and I'll if I haven't linked it already, I'll link it above. I suggest that for a few reasons.
above. I suggest that for a few reasons.
One, it's open source and it's lightweight. So, it's not like you're
lightweight. So, it's not like you're spending a bunch of money or time to spin something like this up to make sure it actually makes sense for your use case. Again, the the thing we want is we
case. Again, the the thing we want is we don't want to get stuck in systems where there's no way out and we spent a bunch of money to get there, which is why I do love Obsidian. And I always recommend
love Obsidian. And I always recommend things like light rag and rag anything cuz hey, if you try this out and it doesn't work for you and it doesn't make sense, okay, whatever. You wasted a handful of hours, you know, it's not like you are spending a bunch of money
on Microsoft's graph rag, which is in no in no way is cheap. And so when do you know you're in level seven? really
multimodal stuff like you need to index images, tables, and videos and you're integrating some sort of agent system where it can intelligently decide like which path it goes down to answer information because at level seven
you're probably integrating all this stuff. You probably have a claude MD
stuff. You probably have a claude MD file with some permanent information.
You probably have it in a codebase with some markdown files that sort of makes sense for easy retrieval. Perhaps you're
also including Obsidian. It's in some sort of vault. Plus, you probably have some section of documents that are in a graph rag database and you have a top of
the funnel AI system that can decide they ask this question, I go down this route. That's a mature sort of memory
route. That's a mature sort of memory architecture that I would suggest. But what's the trap here? The trap honestly is trying
trap here? The trap honestly is trying to force yourself into this level and this sort of sophistication when it's just not needed. To be honest, after all this, most of you are fine with
obsidian. This is more than enough. You
obsidian. This is more than enough. You
don't need graph rag. You really don't need rag in general. And if it's not obvious that you need level seven, and certainly if you haven't already tried the obsidian route, you don't need to be here. It's probably a waste of your
here. It's probably a waste of your time. But the whole point of this video
time. But the whole point of this video was to the best of my ability was to expose you to what I see is the different levels of rag and memory and claude code and what this problem is,
what some of the tensions are, what the tradeoffs are, and where you should probably be for your use case. And
again, the biggest thing is just experiment. You don't have to know the
experiment. You don't have to know the answer before you get into this. Just
try them out. And I would try in ascending order. If you can get away
ascending order. If you can get away with just markdown files in a cloud system and it's basically just claw.md
on steroids, sweet, go ahead and then try Obsidian. If Obsidian's not enough,
try Obsidian. If Obsidian's not enough, try LightRag and so on and so forth. So,
that is where I'm going to leave you guys for today. If you want to learn more, especially about the production side of Rag, like how to spin this up for a team or package it for a client, we have a whole module on that inside of
Chase AI plus. So, check that out. Other
than that, let me know what you thought.
I know this was a long one and I will see you
Loading video analysis...