RAG From Scratch: Part 1 (Overview)

By LangChain

Summary

Topics Covered

LLMs have never seen your private or recent data
LLMs are the kernel of a new operating system
RAG has three stages: index, retrieve, generate

Full Transcript

hi this is Lance from Lang chain we're starting a new series called rag from scratch that's going to walk through some of the basic principles for Rag and

kind of build up to Advanced topics um so one of the main motivations for rag is simply that llms haven't seen all of the data that you may care about

so like private data or very recent data would not be included in the pre-training Run for these LMS and you can see here on the graph on the xaxis that the number of tokens that they're

pre-trained on which is of course very large um but of course it's still always going to be limited relative to private data that you care about or for example recent

data but there's another interesting consideration is that llms have context windows that are actually getting increasingly large so you know coming going from like thousands of tokens to

many thousands of tokens which represents you know dozens of Pages up to hundreds of pages we can fit information into them from external

sources and a way to think about this is llms are kind of a a kernel of a new kind of operating system and connecting them to external data is kind of a very

Central capability in the development of this kind new emergent operating system so retrieval alment to generation or rag is a very popular kind of General

Paradigm for doing this which typically involves three stages so the first stage is indexing some external documents such that they can be easily

retrieved based on an input query so for example we ask a question we retrieve documents that are relevant to that question we feed those documents into an llm in the final generation stage to

produce an answer that's grounded in those retrieve documents now we're starting from scratch but we're going to kind of build

up to this broader view of rag you can see here there's a lot of interesting methods and tricks that kind of fan out from those three basic components of

indexing retrieval and generation and future videos are actually going to walk through those in detail we're going to try to keep each video pretty short like five minutes but

we're going to spend a lot of time on some of those more advanced topics first over the next three videos I'll just be laying out the very basic kind of ideas behind indexing retrieval

and generation and then we'll kind of build beyond that into those more advanced themes and now I want to show just a quick code walkth through because we want to make these videos also a little bit

interactive so right here and this repo will be shared it's public I have a a notebook open and I've just just basically installed a few

packages and I've set a few environment variables for my lsmith keys which um I personally do recommend it's really useful for tracing observability um particularly when

you're building rag pipelines so what I'm going to show here is the code for a rag quick start which is linked here and I'm going to run this but I'm then going to kind of walk

through everything that's going on so actually if we think back to our diagram all we're doing here is we're loading documents in this case I'm loading a blog post we're then splitting them and

we'll talk about that in future like uh short videos on like why splitting is important but just for now recognize we're splitting them or setting a chunk size of um you know a thousand

characters so we're splitting up our documents every split is embedded and indexed into this Vector store so we say we picked open eye embeddings we're

using chromas our Vector storage runs locally and now we' find this retriever we then have defined a prompt for

rag um we've defined our llm we've done some minor document processing we set up this chain which will basically take our input

question run our retriever to fetch relevant documents put the retrieve documents and our question into our prompt pass it to the LM format the output as a string and we can see here's

our output now we can open up lsmith and we can actually see how this Ran So here was our question and here's our output and we can

actually look here's our retriever here is our retrieve documents so that's pretty nice and ultimately here was the prompt that we actually passed into the LM

you're an assistant for QA task use the following pieces of retrieve content to answer the question here's our question and then here's all the content

this we retrieved and that DRS answer so this just gives a very general overview of how rag works and in future uh short videos we're going to like break down each of these pieces and I in

in a lot more detail uh thanks

Loading...

Loading video analysis...