LongCut logo

The 7 Levels of Claude Code & RAG

By Chase AI

Summary

Topics Covered

  • Context rot makes AI worse the longer you chat
  • Obsidian solves 99% of use cases for free
  • Naive RAG is just an overcomplicated Control-F
  • Graph RAG delivers 100%+ better results than naive RAG
  • Experiment and scale up only when needed

Full Transcript

Let's solve the problem of clawed code and memory. Getting AI systems to

and memory. Getting AI systems to reliably and accurately answer questions about past conversations or giant troves of documents is a problem we have been

trying to solve for years. And the

typical response has been rag retrieval augmented generation. And while this

augmented generation. And while this video is titled the seven levels of clawed code and rag, what this video is really about is deconstructing that problem of clawed code and really AI

systems in general and memory. And even

more importantly, this video is about giving you a road map that shows you where you stand in this fight between AI systems and memory and what you can do to get to the next level. So as we journey through these seven levels of

claude code and rag we are going to hit on a number of topics but we are not going to start here in graph rag or anything complicated we're going to start at the beginning which is just the

basic memory systems that are native to cloud code because sad as it is to say this is where most people not only begin but it's where they stay from automemory and things like cloud MD we're going to

move to outside tools things like Obsidian before we eventually find ourselves with the big boys with the through rag systems. At these levels, we'll talk about what rag actually is,

how it works, the different types of rag, naive rag versus graph rag versus agentic rag, things like rerankers and everything in between. And at each level, we're going to break it down in the same manner. We're going to talk

about what to expect at that level, the skills you need to master, the traps you need to avoid, and what you need to do to move on to the follow-on level. What

this video will not be is a super in-depth technical explanation of how to necessarily set up these specific systems because I've already done this in many instances when we talk about

graph rag and light rag for example or even more advanced topics like rag anything and these different sort of embedding systems. I've done videos where I break down from the very beginning to the very end how to set

that up yourself. So when we get to those sections, I will link those videos. And this is for both our sakes.

videos. And this is for both our sakes.

So this video isn't 5 hours long. But

for those levels, we're still going to talk about what that actually means, what each system buys you, and when you should be using it. But before we start with level one, a quick word from today's sponsor, me. So just last month,

I released the Claude Code Masterass, and is the number one way to go from zero to AI dev, especially if you don't come from a technical background. And

this master class is a little bit different because we focus on a number of different use cases to learn how to use cloud code. One of those is something like production level rag. How

to build the rag systems you're going to see in this video in a real life scenario and actually use it as a member of a team or sell it to a client. That's

the kind of stuff we focus on. So if you want to get access, you can find it inside of Chase AI plus. There's a link to that in the pinned comment and we'd love to have you there. So now let's start with level one and that's

automemory. These are the systems that

automemory. These are the systems that Cloud Code automatically uses to create some sort of memory apparatus to actually remember things that you've talked about. And you know you're here

talked about. And you know you're here if you've never set anything up intentionally to help claude code remember context in general about previous conversations or just stuff that's going on in your codebase. And

when we talk about automemory, that is quite literally what it is called. The

automemory system, which is automatically enabled when you use claude code, essentially allows cloud code to create markdown files on its own

that sort of lists out things it thinks are important about you and that particular project. And this is purely

particular project. And this is purely based off of its own intuition based on your conversations. And I can see these

your conversations. And I can see these memory files it's created. Again, it

does this on its own. If you go into yourcloud folder, you go into projects, you will see a folder there that is called memory. And inside that file,

called memory. And inside that file, you'll see a number of markdown files.

Here, there are four of them. And

they're like Claude Code's version of Post-it notes saying, "Oh yeah, he mentioned this one time about his YouTube project growth goals. Let's

write that down." And inside of everyone's memory folder, there will be a memory file. So you see in this memory file, it has a little note about one of my skills. And then it has, you know,

my skills. And then it has, you know, essentially an index of all these submemory files saying, "Hey, there's a YouTube growth one in here, a revenue one, a references one, and here's what's inside of it." So, if I'm just talking

to Claude Code in my vault file and I mention something about YouTube and sort of my goals with growth, whatever, it's going to reference this and say, "Oh, yeah, Chase is trying to get, you know, x amount of subscribers by the end of

2026."

2026." It's cute, but ultimately it's not that useful. It's kind of like when you're

useful. It's kind of like when you're inside of chat GBT and it will bring up random stuff about previous conversations and it almost like shoehorns it in. It's like, "Okay, I get it. You remember this, but I don't

it. You remember this, but I don't really care." Aaron, honestly, it's a

really care." Aaron, honestly, it's a little weird you keep bringing that up.

I prefer if you didn't. And

unfortunately, this is where most people stay in their memory journey. And it's

built upon a somewhat almost abusive past that we all have when it comes to using these chat bots because these chat bots don't have any sort of real memory from conversation to conversation. And

so, we're always scared to death of having to exit out of a chat window or exit out of a terminal session because you think, "Oh my gosh, it's not going to remember my conversation." And this

is actually a real problem because what is everybody's answer to the chat window not being able to remember anything?

Well, the answer is you just keep that conversation going forever because you don't want to get to a scenario where you have to exit out and it forgets everything. This is a fear that is born

everything. This is a fear that is born here inside of these chat windows beginning with chat GPT and same thing with Claude's web app. And honestly,

used to be infinitely worse with Claude's web app because I think we all remember before the days of the 1 million contacts window where you would have like 30 minutes to talk with Claude and be like, "Well, see you in four hours." The issue is people have brought

hours." The issue is people have brought that sort of psychotic neurotic behavior to the terminal. And what they do in large part because you now can get away with it with a 1 million context window

is they never clear. They just keep talking and talking and talking with Cloud Code because they never want it to forget what they're talking about because of these memory problems. And the issue with that is your efficiency

goes way down over time the more you talk with cloud code inside of the same session. And this is the fundamental

session. And this is the fundamental idea of context raw. If you don't know what context rot is, it's the phenomenon that the more I use an AI system within its same session, within its same chat,

and I fill up that context window, the worse it gets. You can see that right here. Clawed code 1 million context

here. Clawed code 1 million context window at 256k tokens. aka I've only filled up about a quarter of its context window. We're at 92%. By the end, I'm at

window. We're at 92%. By the end, I'm at 78. So, the more you use it in the same

78. So, the more you use it in the same chat, the worse it gets. And that's one of the primary issues people have with AI systems in memory. I have clawed code. It has a million contexts now. And

code. It has a million contexts now. And

yet, I do not want it to forget about the conversation I'm having. So, I just never exit the window. I just fill it up and fill it up and fill it up. And two

things happen. One, effectiveness goes down like you just saw. Two, your usage fills up a ton.

because the amount of tokens that are used at 1 million at 800,000 you know context is way more than at 80,000 context. So this isn't the only issue

context. So this isn't the only issue but kind of off topic we in a current ecosystem where everyone complains about cloud code being nerfed and my usage just gets run up automatically. There's

a number of reasons for that, but one of them undoubtedly is the fact that since 1 million context got introduced, people have no clue how to manage their own

context window. And they aren't nearly

context window. And they aren't nearly they aren't nearly as aggressive with clearing and resetting the conversation as often as they should. But that's kind of off topic. The point of that whole

discussion is that when it comes to memory in this discussion about rag in claude code, we have to keep context rot in the back of our mind because we're constantly trying to deal with this tension of okay, I want to ingest

context so cloud code can answer questions about a number of things yet at the same time I don't want the context to get too large because then it's worse.

So we just that always needs to be something we're thinking about in this conversation about memory. But to bring this back to the actual video in level one, what are people doing at level one?

The answer is they're not really doing anything. And because they're not doing

anything. And because they're not doing anything, they just rely on a bloated context window to remember things. So

you know you're here when you've never edited a cloud. MD file and you've never created any sort of artifact or any sort of file that allows cloud code to realize what the heck is going on, what

it's actually done in the past, and what it needs to do in the future. So what do we need to master at this level? Well,

really all you really need to master despite everything I've wrote here is you just need to understand that automemory isn't enough and we need to take an active role when it comes to cloud code and memory because a trap at this level if you don't take an active

role you you have no control and we need to control what claude code considers when it answers our questions. And so to unlock level one and move on to level

two, we need memory that's explicit and we need to figure out how to actually do that. What files do you need to edit and

that. What files do you need to edit and understand that they even exist in order to take an active role in this relationship? Now, level two is all

relationship? Now, level two is all about one specific file, and that is the claw.md file. When you learn about this

claw.md file. When you learn about this thing, it feels like a godsend. Finally,

there is a single place where I can tell claude code some rules and conventions that I always wanted to follow, and it's going to do it. And in fact, I can include things that I wanted to remember, and it always will. And it

definitely feels like progress at first.

So here's a template of a standard claw.md file for a personal assistant

claw.md file for a personal assistant project. Now cloud code is going to

project. Now cloud code is going to automatically create a claw.mmd file,

but you have the ability to edit this or even updated on demand by using a command like for slashinit. And the idea of this thing is is it is again like the holy grail of instructions for cloud

code for that particular project. For

all intents and purposes, claude code is going to take a look at this before any task it executes. So if you wanted to remember specific things, what are you going to do? You're going to put it in

the cloud MD theoretically. It's a bit smaller scale than something like rag, you know, we aren't putting in, you know, complete documents in here, but it's things you want cloud code to always remember and conventions you want

it to follow. So for this one, we have an about me section. We have a breakdown of the structure of the file system and how we want it to actually operate when we give it commands. And like I said, because this is referenced on

essentially every prompt, Cloud Code is really good at following this. So the

idea of like, hey, I wanted to remember specific things. This seems like a great

specific things. This seems like a great place to put it. But we got to be careful because we can overdo it. When

we look at studies like this one, evaluating agents.m MD, and you can swap

evaluating agents.m MD, and you can swap agents.mmd for claw.md,

agents.mmd for claw.md,

they found in the study that these sort of files can actually reduce the effectiveness of large language models at large. And why is that? Well, it's

at large. And why is that? Well, it's

because the thing that makes it so good, the fact that it's injected into basically every prompt, is what also can make it so bad. Are we actually injecting the correct context? Have we

pushed through the noise and are we actually giving it a proper signal? Or

are we just throwing in things that we think are good? Because if it isn't relevant to virtually every single prompt that's going to do in your project, should it be here in the claw.

MD, is this a good way to let Claude Code remember things? I would argue no, not really. And that goes contrary to

not really. And that goes contrary to what a lot of people say about claw. MD

and how you should structure it. Based

on studies like that and based on personal experience, less is more.

Context pollution is real. Context rot

is real. So if something is inside of claw.md and it doesn't make sense for

claw.md and it doesn't make sense for again virtually every single prompt you give it, should it be in there? The

answer is no. But most people don't realize that and instead they fall into this trap of a bloated rule book.

Instead, the skills we should be mastering are how do we create project context that is high signal. How do I make sure what I'm actually putting inside this thing makes sense? And with

that comes the idea of context awareness like we talked about in the last level.

And you take all that together and level two feels like you've been moving forward like, hey, I'm taking an active role in memory. I have this claw. MD

file. You realize it's not really enough. And when we talk about level

enough. And when we talk about level three and what we can do to move forward there, we want to think about sort of not a static rulebook, but something that can evolve and it's something that

can include cloud MD instead of relying on cloud MD to do everything. What if we use cloud MD as sort of like an index file that points cloud code in the right direction instead? So what did I mean

direction instead? So what did I mean about cloud.md acting as sort of an

about cloud.md acting as sort of an index and pointing towards other files?

Well, I'm talking about a architecture within your codebase that doesn't just have one markdown file trying to deal with all the sort of memory issues in the form of cloudm. I'm talking about

having multiple files for specific tasks. I think a great example of this

tasks. I think a great example of this in action is sort of what GSD, the get done orchestration tool does. It

doesn't just create one file that says, hey, this is what we're going to build and these are the requirements and this is what we've done and where we're going. Instead, it creates multiple. You

going. Instead, it creates multiple. You

can see over here on the left we have a project.mmd a requirements.mmd a road

project.mmd a requirements.mmd a road map and a state. So the requirements exist. So cloud code always knows and

exist. So cloud code always knows and has memory of what it's supposed to be building. The road map breaks down what

building. The road map breaks down what exactly we are going to be creating not just now but what we've done in the past and in the future. And the project gives it memory gives it context of what we are doing at a highle overview. What is

our northstar? And by breaking up memory and context and conventions and this sort of system, we're fighting against the idea of context raw and the idea brought up in that study which is

injecting these files into every prompt all the time like we do in claw. MD,

it's actually counterintuitive. This

doesn't help us get better outputs.

Furthermore, breaking it down into these chunks and having a clear path for cloud code to go down and says like, hey, I want to figure out where this information is. Oh, I go to cloud.md.

information is. Oh, I go to cloud.md.

Oh, cloud.MD MD says, "These are my five options." Okay, here's that one. Let me

options." Okay, here's that one. Let me

go and find it. That sort of structure is what you're going to see 100% in the follow-on level when we talk about Obsidian. And really is sort of like a

Obsidian. And really is sort of like a crude reimagining of the chunking system and the vector similarity search that we see in true rag systems. But obviously, this is kind of small scale at this

level. We're talking about four markdown

level. We're talking about four markdown files here. We're not talking about a

files here. We're not talking about a system that can handle thousands and thousands and thousands of documents.

But like you're going to hear me talk about a lot, what does that mean for you? Do you need a system that we're

you? Do you need a system that we're going to talk about in levels four, five, 6, 7 that can handle this many documents? The answer is maybe not. And

documents? The answer is maybe not. And

so part of this rag journey is understanding not just where you stand, but like where do you actually need to go? Do you always need to be at level

go? Do you always need to be at level seven and know how to do an agentic rag system of cloud code? It's probably good to know how to do it, but it's also just as good to know when you don't need to

implement that. Sometimes what we see in

implement that. Sometimes what we see in these systems like this is enough for a lot of people. So it's just as important to know how to do it and to know like do you need to should you do it? When we

talk about level three and we talk about state files, how do we know we're here?

Well, we know we're here when we're still strictly inside the cloud code ecosystem. We haven't integrated outside

ecosystem. We haven't integrated outside tools or applications and really we're just at the place where we're just creating multiple markdown files to create our own homemade sort of like memory chunking system. But this still

is really important. We're still

mastering some true skills here. The

idea of like actually structuring docs, having some sort of system in place that updates state at every session because this is can be a problem with rag too.

Like how do you make sure everything is up to date? And chances are you're also starting to lean into orchestration layers at this point. Things like GST and superpowers that do things like

this, this multi markdown file architecture on their own. But there is a real trap here. what we create in this project is very much just for that project. It's kind of clunky to then

project. It's kind of clunky to then take those markdown files and shift them over to another project. So level four is where we bring in Obsidian. And this

is a tool that has been getting a ton of hype and for good reason. When you have people like Andre Carpathy talking about these LLM knowledge bases they've created which are built for all intents

and purposes on an Obsidian Foundation.

It's getting almost 20 million views. we

should probably listen and see how this is actually operating. Now, for context, I've done a full deep dive on this Obsidian Andre Carpathy LLM knowledge base. I'll link that above. So, if you

base. I'll link that above. So, if you want to focus on that, how to build that. Make sure you check that out

that. Make sure you check that out above. And what I also want to mention

above. And what I also want to mention to most people is that this Obsidian thing we're going to talk about right here in level four, this is honestly the level most people should strive for

because this is enough for most people in most use cases. When we talk about levels five, six, and seven, we're going to talk about true rag structures. And

to be honest, it's overkill for most people. This is overkill

people. This is overkill for most people. Like, we love talking about rag. Like, it's great. I

about rag. Like, it's great. I

understand that. But Obsidian is that 80% solution that in reality is like a 99% solution for most people because it's free. There's basically no

it's free. There's basically no overhead. And it does the job for the

overhead. And it does the job for the solo operator. And when I say does the

solo operator. And when I say does the job for the solo operator, I mean it solves the problem of having clawed code connected to a bunch of different documents, a bunch of different markdown

files, and being able to get accurate, timely information from it, and having insight to those documents as the human being. Because when I click on these

being. Because when I click on these documents, it's very clear what is going on inside here. And it's very clear what documents are related to it. When I

click these links, I'm brought to more documents. When I click these links, I'm

documents. When I click these links, I'm brought to more documents. And so for me as the human being, having this insight is important because to be totally

honest, the obsidianbased insight to the documents, I would argue trumps a lot of the insight you get from the rag systems. When we talk about thousands and thousands of documents being embedded in something like a grav rag

system, like this looks great visually, looks very stunning. Do you actually know what's going on inside here?

Maybe you do. to be honest, you're kind of just relying on the answers you get that we'll show and the and the links and stuff, but it's a bit hard. It's

like piece through the embeddings for sure. All that to say is you should pay

sure. All that to say is you should pay special attention to Obsidian and Claude code because when we talk about this journey from Rag, I always suggest to everybody, clients included, like let's

just start with Obsidian and see how far we can scale this and eventually if we do hit a wall, you can always transition to more robust rag systems. So, why not try the simple option? If it

works, great. It's free. Cost me no money versus like, let's try to knock out this rag system, which can be kind of difficult to put into production depending on what you're trying to do.

Like, always start with the simple stuff. It's never too hard to transition

stuff. It's never too hard to transition to something more complicated. So, what

are we really talking about here in level four? What we're talking about

level four? What we're talking about taking sort of that structure we began to build in level three, you know, with an index file pointing at different markdown files and just scaling that up

and then bringing in this outside tool Obsidian to make it easy for you, the human being to actually see these connections. And the platonic idea of

connections. And the platonic idea of this version is pretty much what Andre Carpathy laid out and building a LLM knowledge base on top of Obsidian and powered by cloud code. And what that looks like is a structure like this. So

when you use Obsidian and you download, it's completely free. again referenced

that video I posted earlier. You set a certain file as the vault. Think of the vault as sort of like the rag system.

This this quasi rag system you've created. And inside of the vault, we

created. And inside of the vault, we then architect that. We structure that just with files. So we have the overarching file called the vault. And

inside that vault, we create multiple subfolders. In Andre Carpathy's case, he

subfolders. In Andre Carpathy's case, he talks about three different subfolders.

The reality is they could be any subfolders. It just sort of needs to

subfolders. It just sort of needs to match the theme we're going to talk about. In one folder we have the raw

about. In one folder we have the raw data. This is everything we are

data. This is everything we are ingesting and eventually want to structure so that cla code can reference it later. Think of you know you have

it later. Think of you know you have claude code do competitive analysis on 50 of your competitors and it pulls 50 sites for each, right? We're talking

about a large amount of information.

It's probably 2500 different things. All

that will get dumped into some sort of raw folder. This is like the staging

raw folder. This is like the staging area for the data. We then have the wiki folder. The wiki folder is where the

folder. The wiki folder is where the structured data goes. So we then have clawed code take this raw data and structure it into essentially different like Wikipedia type articles inside of

the wiki folder. Each article gets its own folder. So the idea being when you

own folder. So the idea being when you then ask Claude code information about you know let's say we had it search for stuff about AI agents and I say hey claude code talk to me about AI agents.

It's the same way you would query a rag system. Well, claude code is going to go

system. Well, claude code is going to go to the vault. From the vault, it's going to go to the wiki. The wiki has a master index markdown file. Think of sort of what we were doing with talked about

doing with claw.md before, right? You

see how these sort of themes transition throughout the different levels. It

takes a look at that master index. The

master index tells it what exists in this obsidian rag system. Oh, AI agents exist. Cool. Guess what's going on down

exist. Cool. Guess what's going on down here? It also has an index file which

here? It also has an index file which talks about the individual articles that exists. What am I saying here? I am

exists. What am I saying here? I am

saying there is a clear hierarchy for claude code to reference when it wants to find information about files vault

wiki index article etc. So because it is so clear how to find information and also why it's so clear to first find information then

turn into wiki we can create a system that has a lot of documents without rag hundreds thousands if you do this properly because if the system is clear

hey I check the vault and I check the index and that has a clear delineation of like where everything is well then it's not too hard for cloud code to figure out where to find stuff and so you can get away with a non-rag

structure for thousands of documents and it's been really hard to do that in the past. And that's because most people

past. And that's because most people don't structure anything with any sort of structure. They just have a billion

of structure. They just have a billion documents sitting in one folder. It's

the equivalent of having 10 million files strewn across the factory floor and being like, well, Claude Code, find it. Like, no, you actually just need a

it. Like, no, you actually just need a filing cabinet. Like Claude Code's

filing cabinet. Like Claude Code's actually pretty smart. And you can see that architecture in action right here.

So, right now, we're looking at a cloud.

MD file that is in an Obsidian vault.

And what does it say? breaks down the vault structure, the wiki system, you know, the overall structure of the subfolders and how to essentially work it. Right? So again, we're using claude

it. Right? So again, we're using claude MD as a conventions type file. Over here

on the left, you can see the wiki folder. Inside the wiki folder is a

folder. Inside the wiki folder is a master index and it lists what is inside of there. In this case, there's just one

of there. In this case, there's just one article. It's on cloud managed agents.

article. It's on cloud managed agents.

Inside that folder, we see cloud managed agents. It has its own wiki folder

agents. It has its own wiki folder breaking down the articles inside until you get to the actual article itself. So

very clear the steps it needs to take.

And so when I tell Claude Code, talk to me about the manage agents, we have a wiki on it. It's very easy for it to search for it via its built-in GP tool.

It links me the actual markdown file and then breaks down everything that's happening. Now the question at level

happening. Now the question at level four really becomes a level of scale.

How many documents can we get away with where this sort of system continues to work? Is there a point at which Andre

work? Is there a point at which Andre Carpathy system begins to fall apart where hey like I get it there's a very clear clear path that cloud code needs to follow goes to the indexes yada yada

yada does that sustain itself at 2,000 documents 2500 3,000 is there a clear number the answer is we don't really know and there isn't a clear number because all your documents are also

different and in terms of hitting a wall it isn't just as simple as well cloud code's giving us the wrong answers it has too many files in the subsidian system how much is it costing you in terms of tokens now that we've added so

many files and how quickly is it doing it because RAG can actually be infinitely faster and cheaper in certain situations. What we're looking at here

situations. What we're looking at here is a comparison between textual LLMs right in the giant bars and textual rag in terms of the amount of tokens it took

to get the correct answer and the amount of time it took to get that answer. What

do we see here? We see that textual rag versus textual LLMs, there's a massive difference to the tune of like 1,200 times. I'm saying rag is 1,200 times

times. I'm saying rag is 1,200 times cheaper and 1,200 times faster than textual LLM in these studies. Now,

context, this was done in 2025. This was

not done with clawed code. These models

have changed significantly since then.

These are just straight up LLMs. This isn't a coding artist, etc., etc., etc. However, we were talking a,200x

difference. So, when we're evaluating,

difference. So, when we're evaluating, hey, is Obsidian what I should be doing versus is should I be doing rag system?

It isn't as simple as just, well, it's giving the right answer or not because you could be you could have a scenario where you get the right answer with Obsidian, yet if you went to Rag, it's a

thousand times cheaper and faster, right? So it's this very fuzzy line

right? So it's this very fuzzy line between when is obsidian good enough and these sort of like just markdown file architectures good enough or when like we need to use rag. There's not a great answer. I don't have a great answer for

answer. I don't have a great answer for you. The answer is you have to

you. The answer is you have to experiment and you need to try both and see what works because this is frankly out of date totally like 2025 older models. The difference between rag and

models. The difference between rag and textual LLM is not 1,200 times. But how

much has that gap shrunk? Because that

is an insane gap. That isn't like 10x.

It's 1,200x the so there's a lot you have to know and again you you you won't know the answer ahead of time. You just won't watch every video you want. No one's

going to tell you where that line in the sand is. You literally just need to

sand is. You literally just need to experiment and see what works for you as you increase the amount of documents you're asking cloud code to answer questions about. So on that note, let's

questions about. So on that note, let's move on to level five, which is where we finally begin to talk about real rag systems and talk about some of the rag fundamentals like embeddings, vector

databases, and how data actually flows through a system when it becomes part of our rag knowledge base. So let's begin by talking about naive rag, which is the most basic type of rag out there, but it

provides the foundation for everything else we do. Now, you can kind of think of rag systems being broken out into three parts. On the left hand side we

three parts. On the left hand side we have the embedding stage. We then have the vector database and then we have the actual retrieval going on with the large

language model. So one 2 and three. And

language model. So one 2 and three. And

to best illustrate this model, let's start with sort of the journey of a document that is going to be part of our knowledge base. Remember in a large rag

knowledge base. Remember in a large rag system we could be talking about thousands of documents and and each document could be thousands of pages.

But in this example, we have a onepage document that we're talking about. Now,

if we want to add this document to our database, the way it's going to work is it's not going to be ingested as a whole unit. Instead, we are going to take this

unit. Instead, we are going to take this document and we are going to chunk it up into pieces. So, this one pager

into pieces. So, this one pager essentially becomes three different chunks. These three chunks are then sent

chunks. These three chunks are then sent to an embedding model. And the job of the embedding model is to take these three chunks and turn it into a vector

in a vector database. Now a vector database is just a different variation of your standard database. When we talk about a standard database, think of something like an Excel document, right?

You have columns and you have rows.

Well, in a vector database, it's not two-dimensional columns and rows. It's

actually hundreds, if not thousands of dimensions. But for the purposes of

dimensions. But for the purposes of today, just think of a three-dimensional graph like you see here. And the vectors are just points in that graph. And each

point is represented by a series of numbers. So you can see here we have

numbers. So you can see here we have bananas. And bananas is represented by

bananas. And bananas is represented by 0.52, 5.12, and then 9.31. You see that up here. Now that continues for hundreds

up here. Now that continues for hundreds of numbers. Now where each vector gets

of numbers. Now where each vector gets placed in this giant multi-dimensional graph depends on its semantic meaning.

What what do the words actually mean? So

you can see over here this is like the the fruit section. We have bananas, we have apples, we have pears. Over here we have ships and we have boats. So going

back to our document, let's imagine that this document is about World War II ships. So each of these chunks is going

ships. So each of these chunks is going to get turned into a series of numbers and those series of numbers will be represented as a dot in this graph.

Where do you think it's going to go?

Well, they'll probably go around this area, right? So that would be 1 2 and

area, right? So that would be 1 2 and three. So that's how documents get

three. So that's how documents get placed. Every document is going to get

placed. Every document is going to get chunked. Each chunk goes through the

chunked. Each chunk goes through the embedding model and the embedding model inserts them into the vector database.

Repeat, repeat, repeat for every single document. And in the end, after we do

document. And in the end, after we do that several thousand times, we get a vector database, which represents our knowledge graph, so to speak, our our knowledge base. And that moves us on to

knowledge base. And that moves us on to step three, which is the retrieval part.

So, where do you play into this? Well,

normally, let's let's depict you. Well,

we'll give you a different color. You

can be you get to be pink. So, this is you. All right.

you. All right.

You normally just talk to Claude Code and you ask Claude Code questions about World War II battleships. Well, in your standard non-rag setup, what's going to happen? Well, the large language model,

happen? Well, the large language model, Opus 4.6, is going to take a look at its training data, and then it's going to give you an answer based on its training data, information about World War II battleships. But with a rag system, it's

battleships. But with a rag system, it's going to do more. It's going to retrieve the appropriate vectors. It's going to use those vectors to augment the answer.

it generates for you. Hence, retrieval

augmented generation. That's the power of rag. It allows our large language

of rag. It allows our large language models to pull in information that is not a part of its training data to augment its answer. In this example, World War II battleships. Yes, I

understand the large language model already knows that, but replace this with any sort of proprietary company data that isn't just available for the

web and do it at scale. That's the cell for rag. Now, in our example, when we

for rag. Now, in our example, when we ask Claude Code for questions for information about World War II battleships and it's in a rag setup, what it's going to do is it's going to take our question and it's going to turn

our question into a series of numbers similar to the vectors over here. It is

then going to take a look at what the number is for our question and the numbers of the vectors and it's going to see which of these vectors most closely

matches the questions vector. Right? how

similar are the vectors the question pretty much. And then it's going to pull

pretty much. And then it's going to pull a certain amount of vectors whether that's 1 2 3 four or five or 10 or 20.

And it's going to pull those vectors and their information into the large language model. So now the large

language model. So now the large language model has its training data answer plus say 10 vectors worth of information. Right? That was the

information. Right? That was the retrieval part and then it augments and generates an answer with that additional information. And that is how rag works.

information. And that is how rag works.

That is how naive rag works. Now, this

is not particularly effective for a number of reasons. This very basic structure kind of falls apart at the beginning when we begin to think about, okay, how are we chunking up these documents?

Is it random? Is it just off a pure number of tokens? Do we have a certain number of overlap? Are the documents themselves set up in a way where it even makes sense to chunk them? Because what

if you know chunk number three is referencing something in chunk number one and in our vector situation when we pull the chunks what if it doesn't get the right one? What if it doesn't get

that other chunk that's required as context to even make sense of what number three says? You get what I'm saying? Like very often the entire

saying? Like very often the entire document itself is needed to answer questions about said document. So this

idea of getting these peacemeal answers doesn't really work in practice yet.

This is how rag was set up for a long long time. Other issues that can come

long time. Other issues that can come into play are things like what if I have questions about the relationships between different vectors because right now I kind of just pull vectors in a

silo, but what if I wanted to know how boats related to bananas?

Sounds random, but what if I did? You

know, this standard sort of vector database naive rag approach.

Everything's kind of in a silo. It's

hard to connect information and a lot of it just depends on how well those original documents are even structured.

Are they structured in a manner that makes sense for Raget? Now, over the years, we've come up with some ways to alleviate these issues. Things like

rerankers or ranking systems that take a look at all the vectors we grab and essentially then do another pass on them with a large language model to rank them in terms of their relevance. But by and

large, this naive rag system has kind of fallen out of vogue. Yet,

it's still important to understand how this works at a foundational level so it can inform your decisions if you go for a more robust rag approach. Because if

you don't understand how chunking or embeddings even work, how can you make decisions about how you should structure your documents when we talk about something like graph rag or we talk about more complicating embedding

systems like the brand new one from Google which can actually ingest not just text but videos. And if you don't understand this sort of foundation, it's hard for you to actually understand this trap. And the trap is that we've kind of

trap. And the trap is that we've kind of just created a crappy search engine because with these naive rag systems where all we do is grab chunks and we can't really understand the relationships between them. How is that

different from basically just having an over complicated control F system? The

answer there's really not much of a difference. Which is why in these simple

difference. Which is why in these simple when in these simplistic kind of outdated rag structures that actually are still all over the place. If you see someone who's like oh here's my pine cone rag system or here's my superbase

rag system. They don't mention anything

rag system. They don't mention anything about graph rag or they don't mention anything about like hey here's how we have like the sophisticated reranker system and they it's going to suck to the tune of like oh the actual

effectiveness of this is like 25% of the time you get something right like you're almost better guessing. So if you don't know that going in, you can definitely be sort of hoodwinkedked or confused or in some cases like basically scammed

into buying these rag systems that do not make sense. And so level five isn't about implementing these sort of naive rag systems. It's about understanding how they work so that you when it comes

time to implement something more sophisticated, you actually understand what's going on. Because that

five-minute explanation of rag is sadly not something most people understand when they say, "I need a rag system."

Well, do you? Because you also have to ask yourself what kind of questions are you actually asking about your system.

If you're just asking, you know, essentially treating your knowledge base as a giant rulebook and you just need specific things from that knowledge system brought up, well then Obsidian is probably enough or you could probably even get away with a naive rag system.

But if we need to know about relationships, if we need to know about how X interacts with Y and they're two separate documents, they never even really mention each other and it's not

something I can just stick inside the context directly because I have thousands of said documents. Well, that

is when when you're going to need rag and that's when you're going to need something more sophisticated than basic vector rag. That is when we need to

vector rag. That is when we need to start talking about graph rag. So when

we talk about level six of cloud code and rag, we're talking about graph rag and we're talking about this. And in my opinion, if you are going to use rag, this is sort of the lowest level of infrastructure you need to create. This

is using light rag, which is a completely open- source tool. I'll put a link above where I break down exactly how to use it and how to build it. But

the idea of graph rag is pretty obvious.

It's the idea that everything is connected. This isn't a vector database

connected. This isn't a vector database with a bunch of vectors in a silo. This

is a bunch of things connected to one another. Right? I click on this

another. Right? I click on this document. I can see over here on the

document. I can see over here on the right and I'll move this over. You know

the description of the vector, the name, the type, the file, the chunk, and then more importantly the different relationships. And this relationship

relationships. And this relationship based approach results in more effective outcomes. Here is a chart from light

outcomes. Here is a chart from light rags GitHub. This is about I would say 6

rags GitHub. This is about I would say 6 to 8 months old. And also of note, light rag is the lightest weight graph rag system out there that I know of. There's

some very robust versions including graph rag itself from Microsoft it's a graph it's literally called a graph rag but when we compare naive rag to light

rag across the board we get jumps of oftent times more than 100% right 31.6 versus 68.4 4 24 versus 76 24 versus 75

on and on and on. And that being said, according to light rag, it actually holds its own and beats out graph rag itself. But hey, these are light rags

itself. But hey, these are light rags numbers, so take them with a grain of salt. Now, when we look at this

salt. Now, when we look at this knowledge graph system, right away your mind probably goes to obsidian because this looks very similar. However, what

we're looking at here in Obsidian is way more rudimentary than what's going on inside of Lightra or any graph rag system because this series of connections we see here, this is all

manual and somewhat arbitrary. It's only

connected because we set related documents or claude code set related documents when it generated this particular document. For example, just

particular document. For example, just added a couple brackets, boom, that document's connected. So, in theory, I

document's connected. So, in theory, I could connect a bunch of random documents that in reality have nothing to do with one another. Now, because

Cloud Code isn't stupid, it's not going to do that. But that's a lot different than what went on here. Like, this went through an actual embedding system. It

looked at the actual content. It set a relationship. It sent an entity. There's

relationship. It sent an entity. There's

a lot more work going on here inside of LightRag in terms of defining the relationships than Obsidian. Now, does

that difference actually equate to some wild gap in terms of the performance at a low level? No. at a huge scale

maybe. Again, we're in sort of that gray

maybe. Again, we're in sort of that gray area. Kind of depends on your scale and

area. Kind of depends on your scale and what we're actually talking about. And

nobody can answer that question except you and some personal experience. But

understand these two things are not the same. We are not the same, brother. Two

same. We are not the same, brother. Two

totally different systems. One is pretty sophisticated. One's pretty rudimentary.

sophisticated. One's pretty rudimentary.

Understand that. And so to wrap up level six in graph rag, we're really here when we're when we've decided, hey, stuff like Obsidian isn't working. We can't

use something like naive rag because it just doesn't work. And we need something that can extract entities and relationships and really leverage the sort of hybrid vector plus graph query

system design. But there are some traps.

system design. But there are some traps.

There are some serious roadblocks. Even

here at level six, when we talk about light rag, this is just text. What if I have scannable PDFs? What if I have videos? What if I have images? We don't

videos? What if I have images? We don't

live in a world where all your documents are just going to be Google Docs.

And so what do we do in those instances?

So multimodal retrieval is a huge thing.

And on top of that, what about bringing some more agentic qualities to these systems? Give it a little more AI power,

systems? Give it a little more AI power, some sort of boost in that department.

Well, if we're talking about things that are multimonal, then we can finally move to sort of like the bleeding edge of rag in today's day and age as of April 2026.

And that's what level 7's all about.

Now, when we talk about level seven in aentic rag, the big thing we kind of want to index on here is things that have to do with multimodal ingestion.

Now, we've done videos on these things, things like rag anything, which allow us to import images and non-ext documents, again, think scannable PDFs into

structures like the light rag knowledge graph you saw here. We also have new releases like Gemini embedding too, which just came out in March, which allows us to actually embed videos into

our vector database, videos itself. And

this is frankly where the space is going. It's not enough to just do text

going. It's not enough to just do text documents. how much information, how

documents. how much information, how much knowledge is trapped on the internet, especially on places like YouTube where it's just purely video and we want more than just a transcript as well. A transcript doesn't do enough.

well. A transcript doesn't do enough.

So, this sort of multimodal problem is real. And again, this is stuff that just

real. And again, this is stuff that just came out weeks ago. And level seven is also where we need to start paying attention to our architecture and pipelines when it comes to the data going in and out of our rag system. It's

not enough to just get data in here.

Like this is great, you know, okay, we have all these connections and stuff.

How does the data getting there? How is

the data getting there in the context of a team? How is data getting out of

a team? How is data getting out of there? Like what if some of the

there? Like what if some of the information here has changed in a particular document? What if somebody

particular document? What if somebody edits it? How does it get updated? What

edits it? How does it get updated? What

if we had duplicates?

Who can actually put these things in there? When it comes to production level

there? When it comes to production level stuff, these are all questions you need to begin to ask yourself. And so when we look at an agentic rag system like this one from NAND, you can see the vast

majority of the infrastructure, everything outlined here is all about data ingestion and data syncing. There's

only a very small part that has anything due to rag, which is right there.

Because we need systems that clean up the data, that are able to look at, okay, we just ingested this document. In

fact, this was version two of version one. Can we now go back and clean that

one. Can we now go back and clean that data? Here's something like a data

data? Here's something like a data ingestion pipeline where documents don't get directly put into the system or in light rag. We instead put it inside of

light rag. We instead put it inside of like a Google drive and from there it gets ingested into the graph rag system and logged. These were the sort of

and logged. These were the sort of things that will actually make or break your rag system when you're using it for real. And when we talk about a gentic

real. And when we talk about a gentic rag, you can see here, and I know this is rather blurry, but if we have an AI agent running this whole program, so you set up, imagine some sort of chatbot for

your team. Does it always need to hit

your team. Does it always need to hit this database?

The answer is probably not. Chances are

in a team setting, in a business setting, you're going to have information that's in a database like this like text or something, but you probably also have another set of databases like just standard Postgress databases with a bunch of information

you want to query with SQL as well. So

when we talk about in a gentic rag system, we need something that has all of that. The ability to intelligently

of that. The ability to intelligently decide, oh, am I going to be hitting the graph rag database represented here or am I just going to be doing some sort of SQL queries in Postgress? These things

can get complicated, right? And all of this is use case dependent, which is why it's kind of hard to sometimes make these videos and try to hit every single edge case. The point here at level seven

edge case. The point here at level seven is not that there's necessarily some super rag system you've never heard of.

It's that you're actually the devil's in the details here. And that's really mostly the data ingestion piece and keeping it up to date, but also like how do you actually access this thing? Easy

to do in a demo right here. Oh, we just go to the light rag thing and I go to retrieval and I ask it questions.

Different scenario when we're talking about it with a team and everyone's approaching it from different angles and you probably don't want everyone to have access to actually uploading it to light

rag itself on a web app. That being

said, for the solo operator who is trying to create some sort of sophisticated rag system that is able to do multimodal stuff, I would suggest the rag anything plus light rag combination.

I've done a video on that and I'll if I haven't linked it already, I'll link it above. I suggest that for a few reasons.

above. I suggest that for a few reasons.

One, it's open source and it's lightweight. So, it's not like you're

lightweight. So, it's not like you're spending a bunch of money or time to spin something like this up to make sure it actually makes sense for your use case. Again, the the thing we want is we

case. Again, the the thing we want is we don't want to get stuck in systems where there's no way out and we spent a bunch of money to get there, which is why I do love Obsidian. And I always recommend

love Obsidian. And I always recommend things like light rag and rag anything cuz hey, if you try this out and it doesn't work for you and it doesn't make sense, okay, whatever. You wasted a handful of hours, you know, it's not like you are spending a bunch of money

on Microsoft's graph rag, which is in no in no way is cheap. And so when do you know you're in level seven? really

multimodal stuff like you need to index images, tables, and videos and you're integrating some sort of agent system where it can intelligently decide like which path it goes down to answer information because at level seven

you're probably integrating all this stuff. You probably have a claude MD

stuff. You probably have a claude MD file with some permanent information.

You probably have it in a codebase with some markdown files that sort of makes sense for easy retrieval. Perhaps you're

also including Obsidian. It's in some sort of vault. Plus, you probably have some section of documents that are in a graph rag database and you have a top of

the funnel AI system that can decide they ask this question, I go down this route. That's a mature sort of memory

route. That's a mature sort of memory architecture that I would suggest. But what's the trap here? The trap honestly is trying

trap here? The trap honestly is trying to force yourself into this level and this sort of sophistication when it's just not needed. To be honest, after all this, most of you are fine with

obsidian. This is more than enough. You

obsidian. This is more than enough. You

don't need graph rag. You really don't need rag in general. And if it's not obvious that you need level seven, and certainly if you haven't already tried the obsidian route, you don't need to be here. It's probably a waste of your

here. It's probably a waste of your time. But the whole point of this video

time. But the whole point of this video was to the best of my ability was to expose you to what I see is the different levels of rag and memory and claude code and what this problem is,

what some of the tensions are, what the tradeoffs are, and where you should probably be for your use case. And

again, the biggest thing is just experiment. You don't have to know the

experiment. You don't have to know the answer before you get into this. Just

try them out. And I would try in ascending order. If you can get away

ascending order. If you can get away with just markdown files in a cloud system and it's basically just claw.md

on steroids, sweet, go ahead and then try Obsidian. If Obsidian's not enough,

try Obsidian. If Obsidian's not enough, try LightRag and so on and so forth. So,

that is where I'm going to leave you guys for today. If you want to learn more, especially about the production side of Rag, like how to spin this up for a team or package it for a client, we have a whole module on that inside of

Chase AI plus. So, check that out. Other

than that, let me know what you thought.

I know this was a long one and I will see you

Loading...

Loading video analysis...