Integrating LangGraph RAG Agent with FastAPI | Production Setup with Sessions, History, Vector DB
By Pradip Nichite
Summary
Topics Covered
- Store Chat History Outside Agent
- Contextualize Follow-up Queries for RAG
- Delete Documents by File ID Metadata
- Expose Knowledge Management Endpoints
Full Transcript
Hey, hi everyone. In the last video, we built a rag agent using langraph and this was the architecture of the agent.
We started with the uh you know the collab notebook and uh eventually we had this agent which can you know use the rag which can you know do the web search and eventually answer the question. If
you want to find the code for this you know uh the last tutorial you can go to the future smarti blog and you will find all the code and explanation here. You
can also find the link for the collab notebook that you can you know directly go and use. So what we're going to do in this particular video, so we're going to uh build the fast API. Basically, we're
going to integrate the the agent that we build in this collab. With this
particular, you know, uh fast API. If
you want to try yourself the agent, you can go to the agent.future smart and you can try this, you know, agent here. So
let me show you what we got here in this particular video. So we will be creating
particular video. So we will be creating uh this API which will have let's say these four endpoints. So one of them is the chat endpoint. Basically you can talk to your agent which has access to your documents and there are a couple of
endpoints which you could you know use to manage your vector database which we didn't cover in the earlier tutorial. So
this will be the new section that we are able to add the document list those document delete those documents right basically able to update because if you build any production ready system you're going to require this uh you know endpoints because you are not going to
do this one time. Okay. So let's try this endpoint. First of all let's see
this endpoint. First of all let's see what documents we got in the system. So
we don't have u you know any documents here. So we can try and you know uh
here. So we can try and you know uh upload some documents. Let's go to the upload document endpoint. I'm going to use the same documents what we got. So
we have this hypothetical document called green grow you know company. Let
me execute this thing.
Okay. So it is saying the uh you know the document has been you know indexed and it has given me the file ID that I could use for the further purpose. You
can verify whether the file got indexed or not. Yes, this is the list of files
or not. Yes, this is the list of files that is got indexed and I got the ID of the file. If I want to delete, I could
the file. If I want to delete, I could you know uh use this delete endpoint to delete this particular thing. Now let's
see whether we are able to. So this is the document you know which has something uh you know information about some eco harvest system. So I'm going to ask some question. I just kept one uh you know uh notion handy here so that I
can simply copy paste. So let's go here and you know let's try to chat with those you know endpoint. maybe you know uh I don't have any existing session so this is the first message in the uh
session we could use 4.1 menu 4.1 whatever you know so let me ask this question and we will see in the languid what is happening so if I go to the lang this is the
langid traces so that I could see what is happening inside the langraph and uh you should be seeing you know that we got some content from the rag which is related to though it is not par
but it is coming from the you know rag agent And uh we should be having the final answer from the answer no which seems to be from 2018. Do I need
to zoom it? Yes. So we got an answer related to let's say 2018 which is something coming from this document.
And if you look at again the chain uh sorry not here you know uh so basically we had a router node which decided that we should first check in
the rag and from the rag lookup we got these uh you know documents.
Basically if you look at the vector store we got this couple of documents and using those uh document basically uh we got the answer. So you can see we got
some context from our uh you know uh knowledge base. Now what we will do
knowledge base. Now what we will do let's say we will delete this thing and uh we will see what happens. So let me uh go back to here. Let's go and let's delete this document and we're going to
ask the same question again.
Maybe you know I go there. Uh
we can list the documents. We know the ID is four. Let's go and delete this document.
Say ID is four.
It is saying the document has been deleted. Let's confirm from the you know
deleted. Let's confirm from the you know we can again list documents.
It is saying empty. There is no documents. Now let's ask the same
documents. Now let's ask the same question.
Let me go back here and uh let's run the chat ending point. Let's ask the same question.
It is taking more time but we got some random answer that you know it was introduced in the May 2022 which is definitely not coming from our you know documents. We also got this session ID
documents. We also got this session ID that we will see how we maintain the session and what is it you know importance. It is very important when
importance. It is very important when you have the rag to maintain the session. Okay. So let's see what's
session. Okay. So let's see what's happened inside the languid.
If I go to the we got something called contextualized chain and you know I will be explaining why it is important without this contextualizing your query your rag is not going to work when someone asks the followup question. So
it is very much important that you should stick around to see why it is you know important.
Now if I go to the um the lang graph here and let's see what has happened.
So router decided to use the rag here.
You could do one thing if there is nothing in the knowledge base you can even handle this case. But here based on the question it decided to first check in the rag and uh from the vector database it got let's say nothing there are no
documents in the vector database. So if
you go to since there were no documents in the vector database it decided to call the web search and it did some web search and somewhere on the internet it got some information
which is not related to what we got there and eventually the answer node has answered this thing. So we got the answer but it didn't come from our knowledge base because we didn't have any data and that's why we could you
know uh see what was happening here. Now
let me show you what is the importance of this contextualized chain that what we're going to uh you know see. So let
me show you one thing. Uh maybe I will show you uh once we start the you know explaining the uh you know code and uh let's start with the chat endpoint that what we have in the chat endpoint. So
let me go to my VS code here. We got our you know uh code.
here. We got our you know uh code.
So I have this API folder where I got all the fast API code and the docs folder has some sample documents that we can upload. Then we got the virtual
can upload. Then we got the virtual environment. This is where I'm running
environment. This is where I'm running uh you know the code. So let's start with our API folder. So before we jump we can just look at what is the requirement file uh we got and basically
you know uh what is the requirement file we got and u I would say the uh what is the database structure we uh have. So
first let's look at the database structure how we are able to handle the chat endpoints and the you know this document listing and all those thing. I
kept this notion page handy so that we can look at. So I got uh basically you know two table one is to maintain the chat history. So this time uh earlier I
chat history. So this time uh earlier I use I guess chat history uh from the agent uh you know uh the langraph agent but here I'm putting chat history outside of the agent so that I can uh
you know manipulate it uh and I can store or persist in a DB. So that is where I'm creating one table called chat history. There I'm maintaining the
history. There I'm maintaining the session ID because this is the chat model. We going to have a multiple uh
model. We going to have a multiple uh messages inside the same chat session.
Uh as I going to show you how we we are generating the session, how the session is impacting and again that contextualizing thing. Then we're going
contextualizing thing. Then we're going to store what was the user query, what GPT has response, which model did we use and the time stamp of that particular conversation. Very basic information
conversation. Very basic information that we maintain uh you know to retrieve the all the messages for a particular session. And then we got one more table
session. And then we got one more table for the document purpose. So we are just storing which file did we index and when did we upload it and we will be deleting from it. Again the content of the file
from it. Again the content of the file is going into the vector database which is the same vector DB chroma DB what we are using in this particular video. Okay
let's start with the uh okay let's have a look at the even the requirement file we got couple of things from the launch.
All of these are coming from let's say the langen uh most of them are launching core. This is for specific to open. This
core. This is for specific to open. This
is for specific to let's say chroma and this is the web search we are using that a web search the docx fi pdf and all these things to uh you know parse our document so this is helping us to parse
our docx file this is helping us to parse our pdf file and this is so that we can upload the file using fast API we got fast API there's a ubicon we got the identic model so that we can define the
structure and this environment so that we can get the environment very typical python files that you know uh we might expect and we will see how we going to uh use them so We start with the main file and the chart endpoint. This is
where our chart uh you know endpoint that we just explored. Let's look at once again. Let me reset what you see
once again. Let me reset what you see for the chart endpoint. So chat endpoint has this schema you know input schema.
So it is expecting a question that user wants to ask. It is expecting the session ID. Why it is important? Because
session ID. Why it is important? Because
if you want to ask a follow-up question in the same conversation, you better pass the session ID and this session ID is generated at the back end. So I will show you how it is generated and you could pass the models. Uh you know it's better to have the choice. So that you
know UI can decide which model you know they want to call and let's look at so this is where the uh we got our query input that we just saw. So let's look at the query input those all request and
response models which is the pentic models are defined in this file called the pyentic model and we could see that we got the query input which just has a question session and model we just saw
and there is also structure to our response. So that is also defined by the
response. So that is also defined by the schema. In the response we get the
schema. In the response we get the answer session ID and the model that was used. So that query response is also uh
used. So that query response is also uh structured is uh you know we put here and we did similar for the document and we also have the list of models that is allowed. If I tried something else you
allowed. If I tried something else you know it will give me the error. Let's go
back to again the main endpoint. So we
got the query input we just saw what is it structure and then we call this is the response structure that we are passing. So at the end when we reply
passing. So at the end when we reply this is the query response we are giving. Now let's say what happens when
giving. Now let's say what happens when you get the request until you get the you know the answer. The first thing we check do we need to create the new
session ID because the session ID generation task is at the back end. So
let's see what is happening here. So we
are in the utils folder. We just check do we have the session ID provided from the UI. If not generate the unique
the UI. If not generate the unique session ID. That's it. That's what it
session ID. That's it. That's what it you know does. So we're going to get the session ID either from the user or we going to generate it. Next thing we want to do is to see whether we have any chat
history associated with that particular session ID. So we already know that
session ID. So we already know that there's a chat history table. So if I click on this one, we go to the database utils file and there is a simple SQL query that goes into our chat history
table and see and retrieve all the messages for that session ID. That's
what we're going to do. And then we're going to format this messages because we know that if you look at our schema once again our schema has user query and the GP response. So this is a human message.
response. So this is a human message.
This is the AI message. So we're going to simply restructure them. So we create our messages such that we got the human role and we got the uh let's say the AI uh message from this. This is how we
typically you know structure those messages. Once again let's go back to
messages. Once again let's go back to the uh you know the chat thing. So we
got the session we got its previous messages but to pass this messages to the line chain chain we need to convert those messages into the uh you know line chain format. So lchain has basically
chain format. So lchain has basically you know human message format or the AI message right these are the typical messages supported by uh you know so we simply you know uh converting in this format there is no logic change nothing
is happening so that we can pass to this the prompt chain once again now we got the history in the format that is expected by let's say both the lang chain and the langraph but I told you
right we require something uh you know to uh in the middle so basically we need to something called contextualized chain now let me give you what I mean by that so Let's go back to uh this page where I
put some example.
Imagine we pass this thing first query in the session. We ask what is the green bro harvest system. So we are asking about something. Now it can go this
about something. Now it can go this query is independent query. We can pass that query to uh you know let's say the vector database and we can get the relevant document. But imagine user ask
relevant document. But imagine user ask this followup question when was it introduced? And if I only take this part
introduced? And if I only take this part when was it is introduced which is the user's latest message basically this thing what we got from the user user's question if I only take this message I
will not find the relevant document because when you try to match this query when was it introduced with this thing it's going to not it's ambiguous query it's not going to match with anything
that's why whenever you have the rag or the knowledge base you need to convert this incomplete query into the something which is standalone question that can match. Again, this has nothing to do
match. Again, this has nothing to do with the LLM. If you pass LLM, LLM already has a history. So, you don't require this for LLM, but you require it for the retriever part because retriever
don't have the history and you need to pass the complete else it will give you any random thing and that is very much important thing. I see many times you
important thing. I see many times you know when uh people follow the tutorial or the you know the newcomers or interns they simply miss this point that the to make your rack working you need to handle the follow-up questions you know
properly. So this thing you need to
properly. So this thing you need to understand. So let's see how we are you
understand. So let's see how we are you know handling this thing so that we can complete this message you know in a complete message. So let's let's take an
complete message. So let's let's take an example. We can try uh the same question
example. We can try uh the same question here. Let's start the fresh session. So
here. Let's start the fresh session. So
I'm starting a fresh session and maybe you know I put this what is the green grow harvest system. Uh I might get any answer. So answer is not something that
answer. So answer is not something that I bother. I rather want to show you uh
I bother. I rather want to show you uh how the follow-up question work.
So we have I think I should have used the mini so that we could have got the you know the quicker response.
Uh we got some answer uh the harvest system is something you know uh it's hopefully coming from the document or it might not be coming. It is coming from the web because if there is no document I guess we have deleted the uh document.
So let's confirm so that we can you know add the document. Let's say we can even try here and check whether there is a document or not.
Okay there is no document. So let's add the document.
Where is the upload document? Yeah, here
is the upload document. Let's quickly
upload the document.
Now, let's ask the question. Hopefully,
you will see the quick response because uh it will not go to the web search if it is able to uh find it uh you know inside the rag.
We can observe in in the you know blanksmith. I think this is where our
blanksmith. I think this is where our query is getting run.
So we asked this question what is the green grow harvest. So router decided to maybe use the rag. We could see there is a rag and rag was able to retrieve the relevant document.
Okay here's interesting you see what query is going to the rag rag is getting this query. What is the green don't
this query. What is the green don't harvest this whatever the system. This
is fine because this is not the follow-up question, right? And if you go back to now to the uh you know and ask the follow-up question, I would require the session ID so that we can get the uh
you know the previous messages from it.
Let's put that session ID so that we can ask the follow-up question inside the same session and let's ask this thing when was it introduced and let's hit it.
Meanwhile, we can also check whenever we answer you know rag or the router there is no earlier previous uh you know chat history because this was the first message inside the uh session. So
hopefully now we have the better you know. Now let's look at what is
know. Now let's look at what is happening in the contextualizing chain.
So this is the uh you know the contextualized chain we are using the GP pro mini. We are saying that you will
pro mini. We are saying that you will get the user latest question and it's a chat history and you want to return or reformulate the question which is standalone. So we ask so it has history
standalone. So we ask so it has history this one that this is the history of whatever we asked. Then we ask when was it introduced now it needs to convert into the standalone question. This is
how it converted when was the green grow har. So imagine it is able to you know
har. So imagine it is able to you know take the history and give me the complete query. Now this complete query
complete query. Now this complete query can be uh you know used later in the uh what I say in the line graph thing. I
don't know why it is still uh did we got the answer or it is still running. Okay,
we got the answer. It is so this is coming from the document because I know that is coming from the document hopefully. Do I need to refresh? Let's
hopefully. Do I need to refresh? Let's
refresh this one. Okay, we got actually the the langraph is sorry langu. This is
taking a bit time to refresh the records.
So here I wanted to just quickly point out to you that what has happened that u you know when it goes to the langraph
our query was rewritten when was the green grow uh you know eorest system is introduced. The problem with this
introduced. The problem with this actually is that our chat history itself has the answer because when I ask the first question that first question answer itself has whatever I'm the asking that is the reason it didn't go
to the rag but you got my point that it has rewritten our query. So if it went to the rag it would have you know uh written this thing. So now since it itself has this particular you know
answer it didn't require need to actually you know uh do with the rag. So
maybe we can find some question that might require to go to the rag or something. Let me check whether does it
something. Let me check whether does it has anything that where we could. So this 40% was also part of the uh 30 40%. I'm just finding whether is
30 40%. I'm just finding whether is there anything that is not in that answer at the heart of the AI center or I guess you got the point what I was trying to uh you know say rather than wasting you know things that we are able
to reformulate uh the latest user question so that even if you call the rag this question itself is a complete and not ambiguous okay that's what's happening in the contextualizing thing and then once we have this thing we take
the previous messages from the chat history take our standard question append together so we got the complete history of the messages that we are passing to the agent and this agent is the same agent what we build in the
collab notebook. So there is nothing has
collab notebook. So there is nothing has changed for this particular agent. It's
the same agent what I uh you know have in this particular uh notebook only it is refactored a bit for example uh langrap actual agent you will see here
then I got other files so basically I separated the nodes so all the nodes are defined in this particular file you got the router node so I just copy pasted those functions so nothing big deal rather than going through all of this is
once you look at the code since I will be also uh you know uploading this code to the GitHub rep you can check So this agent has the same content what you know uh we have seen the uh earlier. Now
let's go back to again main endpoint that after you know invoking this agent we will insert the response inside the chat history so that next time that response is available and basically we
give the query response. So this is the typical working of the you know the chat endpoint. So the only interesting thing
endpoint. So the only interesting thing here that we need to uh you know think when you do any production or you know something that is going to use by other services that handling the session ID making sure that you are generating the
session so that they are able to uh you know pass the follow-up question maintaining the chart history I prefer to maintain the chart history outside of the agent rather than you know passing through the you know or remembering
through lang chain or langraph memory and then um contextualizing this. This
particular thing will require if your agent has access to the rack and you need to reformulate it to make it a standard. So this is the most important
standard. So this is the most important part of the uh you know your chart endpoint. The other endpoints are more
endpoint. The other endpoints are more of like just you know um you know database insertion all this thing. So
let's look at what happened during the upload document when you upload the document.
So when you upload the document um I'm currently supporting PDF and the docx file. So I'm just checking you know
file. So I'm just checking you know whether it is a part of my allowed you know extension. Yes, it says that you
know extension. Yes, it says that you know u basically typical Python code that it is not supported. Then we
basically you know uh save that file to the you know temporary location and then we got these two functions. Let's see
what is happening. Insert the document record. Here we simply inserting that
record. Here we simply inserting that document record saying that this particular file name uh you know as a part of the document store. Why? Because
we will be deleting from it uh you know so we are just maintaining what are the files that we have you know inserted and then we will go to the actual insertion.
This is where the actual insertion is happening inside the vector database. So
let's look at the index document to chroma. So the first thing we're going
chroma. So the first thing we're going to read the content and we're going to split it into the small document. This
is the same function that we had in the notebook. It simply reads if it is PDF
notebook. It simply reads if it is PDF it uses the PDF reader. If it is the docx it uses the docx reader and eventually uh you know we use the recursive character splitter and split
those into the small small documents.
That's what it you know uh it does.
Okay. So index documents. Now once we have these documents splitted basically here we will come and then this is what we are doing since we want to make sure
we should know which chunk in the vector database belong to which file ID and this file ID we got from the document store. So when we inserted we got the
store. So when we inserted we got the file ID uh you know back. So we are putting that into the metadata. So we
are having each split which is basically a document we are adding a metadata called file ID so that we could use this file ID later for the deletion purpose.
And then basically you know we use this vector store you know function. So what
is vector store? It is just defined here. It is the chroma uh you know
here. It is the chroma uh you know vector store where we are defining the chrom as a directory. It is the same thing what we did in the collab. So I
would highly urge you to um you know if you find this is very disconnected or something go and watch this video without that it doesn't it will not make any uh you know sense. And uh so we created this vector store here and we
are just calling it add documents method adding this document and that's it. So
what steps we did, we inserted this into the our SQL database record. Then we
split the documents and eventually we index the document. Let's also look at the other functions what we got here.
So this one uh you know uh let's say upload document we just saw. Let's look
at what happened during the list document. This is is the simplest one
document. This is is the simplest one actually we got. Basically it just called the uh you know document store and return the file that is inserted and its id so that we could use during the deletion and during the delete endpoint.
This is where we need to make sure we delete from both the places. We delete
from the vector database and also from the document server. So first we will delete from the vector database. So we
got this function here which take the file ID we got and then it it can use it's a metadata filtering. Basically
from the vector store we want to get all the doc all the docs where the file ID is the matching file ID and once we got this docs we will use ids the document basically these are the chunk ids and
then we will delete using the vector store delete method we will delete those chunk so let's not confuse so this file ID first we will get the documents these documents are nothing but the chunks
from the vector database and each chunk has the unique ID from the uh you know this chrome comma vector DB and the delete method requires the ids of the
chunk. That's why we are using file id
chunk. That's why we are using file id to find first chunks and then using their you know uh ids directly here uh I couldn't find a method which will directly use the metadata and delete. So
basically what if the delay takes the you know file id and delete it. If you
are able to find it just comment uh you know uh this so that I can uh replace that in the next video. So this is what the deation happens. So we just saw
where is this? Yeah. So we saw this knowledge management you know uh endpoints basically uploading the document listing and this is very important if you are building solutions uh you know in your company or
for your you know uh the client to have this endpoint so that they can manage their knowledge base else they will have to send you the email the files and then you have to add so basically you give them a uh control that's what we you
know uh do here. So let me see if I miss anything that I should be you know uh telling else once you got the code you can you know basically look at uh you
know uh everything. So let me see if I uh miss anything here.
No. So I think uh main point during the integration which I already did tell the session management the handling the follow-up question for your rack pipeline and um yeah so this is how we
integrate the you know uh lang graph with the fast API. Now let me know whether you want next part where I could what I say I could deploy this fast API
that could be the next part or you might want to integrate the memory like a you know mean zero or uh this you know with the langraph so let me know whether you are
interested so I can extend this tutorial so that we could use you know uh this memory management for our agent which is you know me uh zero and that's how we could you know extend so either we can
have the next video where I integrate the memory to the agent or I can have a video on the deployment of this agent. I
guess deploying this fast API is as similar as uh this particular agent is just same as deploying the fast API on AC2VM or something which I already you know covered. If you want more complex
know covered. If you want more complex deployment for example uh you know AWS ECS deployment or you know Fargate deployment then we can create the video but I guess the basic deployment I
already have uh videos on my channel that you can you know uh try. So let me know in the comments whether you found this video useful, whether you want u um you know any other tutorial or what changes you want me to make to this
particular tutorial so that I can at least add to the code or maybe you know add it to the blog. And again let's not forget to try the agent here. So you can go to agent.feuttersmart.ai and you can
try the agent here. Thank you.
Loading video analysis...