AI agent long-term memory with memory bank
By Google Cloud Tech
Summary
Topics Covered
- Semantic search finds meaning, not just keywords
- Memory is a service, not just storage
- Ingest any media into long-term memory
- Agents recall memories without special logic
- Three-layer memory enables persistent personalization
Full Transcript
[music] Hello everyone. Welcome back to this
Hello everyone. Welcome back to this Asian memory series where we cover how to manage your Asian memory to build a personalized agent.
In the last two episodes, we covered short-term memory with session and state. Consistent memory solution with a
state. Consistent memory solution with a database. And in this episode, we will
database. And in this episode, we will build a long-term memory that spans many conversations and even different types of input with text, image, audio, and
video. We will cover one, the difference
video. We will cover one, the difference between a session service and a memory service. And two, two memory service
service. And two, two memory service option, a simple in-memory version for a quick test. And What's AI memory bank
quick test. And What's AI memory bank service for production with safe storage and search by meaning. And three, how to
ingest entire sessions or direct media files into the memory bank and how to retrieve relevant facts automatically in
a new chat using preload memory tool.
Before we build our personalized agent with What's AI memory bank service, we need to understand two service roles.
First is session service that manage active chats and let you resume a live conversation. And second is memory
conversation. And second is memory service that manage a long-term archive.
It is the filing cabinet. We have two way of using memory service. The simple
in-memory memory service, it is very good for quick local test. It doesn't
save across restarts and use basic keyword search. With What's AI memory
keyword search. With What's AI memory bank service, it saves to the cloud and support semantic search. For example, a
search for two-wheeled vehicle can find a note about a bicycle. So in this episode, we will use What's AI memory
bank service and What's AI session service powered by Agent Engine that can extract facts using Gemini
and generate embeddings and also store meanings, not just text.
All right, let's start by setting up the memory bank. To configure the memory
memory bank. To configure the memory bank, we set up an agent engine that powers it.
During this configuration, we choose two types of models. One model to extract facts from conversation and media. And
another model to embed it those facts so that we can search by meaning. We can
also define topics to organize what we store like user preference and travel experience.
It is important to remember that this is not just a table. It is a service that process content, finds the useful facts
and make them searchable.
And now our agent has a long-term memory backend. Once we have it set up, we need
backend. Once we have it set up, we need to find a way to save memories.
Here, we can ingest memories in two ways.
The first way is to archive a full conversation. At end of a session, we
conversation. At end of a session, we call add session to memory.
The memory bank process user messages, agent replies, image reference, video reference, and audio reference. It
stores key facts. The second way is to upload facts directly. As you can see from this code example, we can preload from a file. We can send an image, a
video, or an audio file with some text context to generate and store facts directly even if those files didn't come directly
from a chat. Either way, we're building a long-term knowledge base that the agent can use later. And once we have this knowledge base, we need to retrieve
memories automatically and we add preload memory tool to the agent. It
runs at the start of every turn. So
every turn, before responding, the tool reads the user's new message, runs the semantic search in the memory bank, gathers the most relevant facts, injects
relevant facts into the prompt. And the
agent doesn't need special logic. The
tool enrich the context automatically.
And let's test this demo. In session A, a user shares a photo of a historical building and a short video of a sea and a small audio note from a town. So we
end the chat and add the session to the memory bank with add session to memory bank. The engine extracts fact such as
bank. The engine extracts fact such as like historical building, enjoy the coast, and visited the town.
And we simulate a restart and some time passing. In session B, this brand new
passing. In session B, this brand new chat with empty state, the user says, "Based on what I shared before, picture, video, and audio, can you suggest a
cultural destination?" Before the agent
cultural destination?" Before the agent replies, this preload memory tool searches the memory bank and injects those facts. Preload memory
tool finds memories like user likes historical architecture, enjoys seaside areas, visited the town. The agent then recommends a destination that matches
the historical architecture and respond with a personalized suggestion.
This is long-term multimodal recall in action. You can also use the link on the
action. You can also use the link on the screen or QR code to find the whole demo. All right, to wrap up our three
demo. All right, to wrap up our three episode series, we now have three layers of memory. One is session and state for
of memory. One is session and state for working memory during a live chat. And
two, persistent sessions and a user profile that survive restarts and personalize new chats. And three, a memory bank that archives full
conversation and media, then searches by meaning to bring back useful facts later. And with this episodes, now you
later. And with this episodes, now you can build your personalized agent that is consistent, context aware over days and weeks. So in the description of this
and weeks. So in the description of this video, you can also find the links to the demos and setup steps. Try them and tell me what you want to build next. All
right, I will see you in the future videos. Bye.
videos. Bye.
[music] [music]
Loading video analysis...