我和最懂数据库的大佬聊了聊:AI数据爆炸,AMD和阿里云竟在联手搞事情?
By 老石谈芯 Shilicon Talk
Summary
## Key takeaways - **Alibaba's PolarDB: A Database Built Ground Up**: PolarDB, Alibaba Cloud's database, was built from scratch and is now one of China's largest database suppliers, serving over 15,000 enterprises with a sizable development team. [04:02], [04:11] - **AI Demands New Database Capabilities**: AI necessitates databases handling multimodal data like text and vectors, moving beyond structured data. Databases must also embed AI for easier querying, like natural language to SQL, and manage concurrent requests from numerous AI agents. [07:36], [08:00] - **AMD EPYC's Chiplet Architecture Scales for Cloud**: AMD's EPYC processors utilize a chiplet architecture, allowing scalability from 8 to 192 cores. This design is optimized for cloud environments, maximizing compute density to support a large number of virtual machines, database users, and AI applications on a single machine. [09:53], [10:20] - **CXL Enables Memory Expansion for Databases**: CXL standard allows for memory bandwidth and capacity expansion, crucial for AI and transaction processing. It enables disaggregation of CPU and memory, allowing users to access large memory pools and improving overall TCO by enabling resource sharing. [15:22], [16:02] - **Hardware-Software Co-design Drives Database Evolution**: Cloud databases like PolarDB are developed with a philosophy of hardware-software co-design, aligning with advancements like AMD's scalable architecture. This collaboration drives the development of elastic, utility-like database services that can scale in and out based on demand. [12:24], [13:03] - **CXL Ecosystem Growth Hinges on Hardware Adoption**: The widespread adoption of CXL is currently slowed by the difficulty in hardware development, particularly for CXL switches. Building a fleet of CXL servers will be key to driving ecosystem growth and enabling broader utilization of CXL memories. [22:14], [22:54]
Topics Covered
- How AI's Multimodal Data Challenges Database Design
- Hardware-Software Co-Design Drives Cloud Database Innovation
- CXL: Disaggregating CPU and Memory for Efficiency
- The Future of Computing: In-Memory and In-Storage Processing
- Young Engineers: Embrace AI and Continuous Learning
Full Transcript
I'm basically a database guy.
>> I grew my team from a small like a 10
member team to 400 engineers.
>> Colb is like a cloud database.
>> How many best papers last couple of
years?
>> Three just two years. Three best paper.
Yeah. So this is Alibaba heavily invest
in DB. Yeah. I think CXO the ecosystem
kind of going a little bit slow in the
past few years mainly because there's no
hardware because it's always hard to
make a chip. Right now I try to explain
that now I had to pay like $4 or $3 for
per minute to call India. My kids don't
believe that. What is wrong with you
guys?
>> I think AI is changing everything.
AMD
polar
foreign.
So before we dive into the technical
details, uh I'm very curious to know how
you both get into this um exciting field
and have been working on it for so long.
>> Thank you Ken. Um I'm a brauner. I've
been in the industry for a very long
time. So I started my career with HPE
building server and um storage systems.
Then um um I joined Cisco um when Cisco
started a server business it was uh 2006
2007 time frame. My last role was the
CTO of that business. Then I was looking
for exciting opportunities. Okay. Then
of course you know I talked to the AMD
leadership and decided to join AMD. um
joined AMD in 2018, it was a very
different AMD at that time. Um but if
you look at look back last seven eight
years, it has been a really exciting
ride. So my my uh charter was to create
a ecosystem of partners and uh optimize
software both open source and close
source software stacks on AMD's epic
processor. So we have done a pretty good
job today. We support 75 plus ISVS um
and support over 300 unique software
stacks. So, so I grew my team from a
small like a 10 member team to 400
engineers. Actually, I do have a
presence in China. That is one area that
uh we are going to invest because uh we
have customers like Alibaba. I mean the
they've been really really good good
customer you know very close
collaboration on several areas. One of
the idea is of course you know database
that is close to uh you know Jimmy's uh
heart.
>> What about you Jimmy?
>> Yeah I think I'm I'm much simpler. I'm
basically a database guy. So I started
my career in uh a database called cybase
uh which later acquired by uh SAP. So uh
at that time uh cybase was its
competitor is Oracle. So uh after a
while I joined the cyber competitor
after cyber get acquired by SAP. So I
worked for Oracle for quite a bit time
then later I joined Alibaba uh seven
years ago. So I mainly uh working on uh
a cloud database called pol in uh
Alibaba cloud uh database uh business
unit. So uh pol is uh quite a bit uh
interesting. We basically build it from
ground up. So it's one of the largest uh
uh database supplier in China right now.
So we do a lot of business with um I
think uh over 15,000 uh enterprise now.
So it's quite a a big database. So uh we
have uh very sizable team as well uh
about two 300 uh just for the uh uh
development. So this is uh uh Alibaba
heavily invest in PB. So yeah,
>> I want to add one more thing actually.
Uh my experience with the databases um
you know I've been associated with the
TPC, the transaction processing
performance council for a very long
time. So this is an industry standard
body that brought industry standard for
database performance characterization
like a TPCC TPCH benchmark. So long
history that that's that's nice. I
didn't even know. So yeah. So, so yeah,
today's uh conversation is all about
databases, cloud um and computing,
right? So, could you each describe in
one keyword, you know, in your opinion,
what's the most impressive advancement
in the past few years in this area?
>> So, Ken, um I don't know whether I can
represent uh uh things with one word,
maybe like a three words. Okay. Uh first
one is um big data analytics. I think uh
the data revolution started in the 2010
you know time frame. I think one of the
reasons behind that was uh the explosion
of connected devices, smartphones um and
other devices generating all kinds of
data and organizations have realized
that collecting and processing large
amount of data can give them a business
advantage and of course on the on the
software side there are technologies
evolved like Hadoop and other software
defined storage system and you know
relational database management systems
also build capabilities to run complex
queries on unstructured data. So big
data analytics is one. Second one is u
um advancements in um in hardware
technologies like if you look at the AMD
uh we brought epic processors to the
market in 2018 with 32 processor cores.
Today we support 192 processor cores in
a in in in a chip. So the second aspect
is um the enhancements or advancements
in hardware technologies to process the
the data. Then third one is AI. So so
the both um the data from the big data
revolution as well as the enhancements
uh advancements in hardware technologies
are basically accelerating AI.
>> Yeah, I think I totally agree with Ragu.
But if want me to pick one word I think
AI is more important even though I'm a
database guy but uh I think uh even with
AI data is most important. So where you
store the data, how you train the uh uh
LM. So those kind of thing are
important. So databases play a very
crucial role in this AI uh way.
>> Let's talk about AI, right? So the rise
of AI both new opportunities uh but also
new troubles, right? Especially for
databases. For example, AI brings
semantic search or vector search uh
which makes databases more expressive uh
but also more technically difficult. uh
can you please describe in more details
about the new requirements that brought
by AI for databases specifically?
>> Sure. uh I I think uh as we said uh AI
come from uh data and uh use data for
inference right so uh then the we for
database guy we used to deal with
structured data most of because we are
working on relational database uh but
nowadays uh all kinds of different data
come up right so it's a multimodel data
this is where we want to really handle
the multimodel data there are more
videos uh more kind of text right so
they are also vector data as you said.
So there are semistructured and
structured nonstructured data come up.
That's where it's a challenge for
database. How we going to store uh so
much data? It's a large volume of data
much more than the structured data you
used to have. So this is the one
challenge we have and we kind of have to
transform our database to be more open
to less transactional but be able to
handle large number of data large uh
quantity of uh metadata as well. I think
that's one thing. Another thing is that
we need to embedded the AI into the
database system. So uh make uh the query
make how people use AI to query the
database easier. So uh actually we do a
lot of uh research on natural language
to SQL. So people don't need to uh
understand no SQL be able to query data
right. So another things is um they are
coming up a lot of agents. So yesterday
I think the keynotes there could be more
agents than people in the world. So the
agents basically they also uh uh
interact with uh database as well. how
the database handles so much concurrent
queries right so many agents uh uh
asking questions query the database
that's where we're looking into as well
so those are I think three main
important area we are looking at
>> thanks Jimmy all that requirements
brings the requirements on the um the
hardware right so everything will come
down into the hardware so um how was the
epic processor specifically when it was
originally designed how does it anticip
ipate this requirements like before this
new requirements actually coming out.
>> When we brought this product to the
market to 2017 time frame, we introduced
the chiplet architecture. Here you see
this is a chiplet. The chiplet has you
know 8 cores 16 cores. In the future
we'll have more cores. So so we are able
to build uh CPUs by interconnecting
chiplets using a high-speed bus. So that
way we can scale from very small like 8
cores to you know all the way up to 192
cores. So this is primarily built for um
the cloud you on the cloud um the one of
the number one criteria is how many
virtual machines how many database users
or how many AI users can um run on one
one machine. So the chiplet architecture
worked really well with you know the
compute density that cloud vendors and
others were looking for. Our initial
focus from a solution software
perspective, it was you know high
perform cloud and high performance
technical computing where we have made a
really big impact. If you look at um
companies like Alibaba, they have you
know adopted AMD epic technologies
actually since last four from a sixth
generation right so now you know we
recently announced um ninth generation
um instances powered by um AMD. So now
it's all about AI right so we have a
unique capabilities in AMD epic for
support you know new generation of AI
applications including um agents and of
course we have a um very compelling GPU
product line as well. So uh one key word
I heard from your response is to scale
out from the chip level like by using
the chiplet technology to adding more
more and more cores into it. But and
also uh interestingly from the database
point of view from especially in polar
DB is also scale out in the cloud by you
know building the um the storage
computing separation architecture. I'm
wondering if these two scale out uh
philosophies seem to align perfectly. Is
that happy coincidence or like is there
a inevitable technology trend like you
you you find and can converge into
together?
>> So I think uh we're all thinking in the
right way right you know how do we you
know support a large number of customers
large number of you know applications.
So our scale up and scale out
architecture is pretty much
complementing you know polar DB's
architecture. So by the way I'm really
excited about uh you know the
capabilities that Polar DB brought to
the table including winning how many
best papers last couple of years
>> uh three just two years three best
paper. Yeah.
>> So what about you Jimmy?
>> Yeah I think we uh one philosophy for
Apollo DB is like a hardware software
code design. So we go along with new
hardwarees uh like MD have this scale uh
structure. I think we are on the same
road because cloud data centers cloud is
kind of driven the whole development in
the past few years. So other than AI I
think cloud is very important. So mainly
a lot of uh uh business are run on
cloud. So how to utilize all those uh
resources hardware resources is one
question. So for us holo DB is like a
cloud database. So you have to use all
the resource we have. So this is where
we coming from. it become more elastic
to serving the uh customer we call that
called serless but it's kind of things
that use as you need like a water you
like a utilities right so that's where
we have scale out not only scale out we
can also scale in right so we can scale
back so that's one uh thing that driven
us have this whole architecture not only
separate the compute with storage but
also we separate the memory that's where
coming from uh the CXO story.
>> Okay. So, so what's the primary focus of
Polar DB in the AI era and uh like what
other features are added or will be
added to um polar? I think of pol as I
said one important feature is serless
like elastic you can scale out scale up
so multiple ways how you u do that but
also as we said we have uh a lot of
features coming up like uh we have
so-called Apollo DB limited list that's
where we uh break the TPCC uh benchmark
we have the uh so-called 2.055 055
billion transaction per minute. So
that's serving like a uh 1.6 billion
people buy things online at the same
time. That's one thing we want to make
sure the things can scale out. One
database can serve all the requirement.
But we're also working on uh edge tab.
We're working on u kind of uh AI in pol
as well. So that's DB4 AI features we
have. So this is like a people just I
said do more kind of vector search uh
more kind of NL to SQL or they can they
can even run their uh models using using
SQL uh language. So that's where uh we
are uh working on.
>> Okay. Thank you very much. So let's talk
about some of the technologies like
drive hardware and software especially
database uh together. One of the main
technology innovation well personally I
think is very important is the CXL. So
but I understand a lot of our reviewers
probably won't don't understand what CXL
is right but so can you please explain
uh a bit more on to our audience like
what CXL is and why it is considered to
be one of the most important um
technology since PCIe probably. So um
CXL is a standard for u connecting
devices to the CPUs. It is uh something
that um AMD has pioneered. So from uh
third generation of AMD epic processor
onwards we have been supporting CXL. U
it's all about uh you know as Jimmy said
you know if you look at you know polar
DB right it's a disagregation of you
know compute storage and memory. So uh
you know there are many use cases for
CXL but uh one of the most compelling
use cases especially in the datadriven
era AI era is the memory expansion. When
you talk about memory expansion there
are two aspects. Okay one is memory
bandwidth expansion. So for the many of
the AI you know high performance
technical compute applications memory
bandwidth is you know critical. Then
there are applications like you know
relational like transaction processing
system where memory capacity is
important. So CXL is C CXL enables
memory bandwidth expansion as well as
memory capacity expansion.
>> So from a database point of view what
actual changes would that bring to the
end users?
>> Yeah just as I said we try to make a
database use like a utility bill. So you
get a bill how much you used. So if you
look at online utilization of the CPU
and uh memory it's very low. So memory
is about 40% memory overall being uh
used and CPU is even lower. Some
sometimes it's 20%. Uh one reason is
that you always pack a CPU with memory.
So some somebody want more memory,
somebody want more CPU but they don't go
together. So uh that's why in one box uh
either memory is used less or the CPU is
used less. So uh that's where the uh CXL
comes in that allow us to disagregate uh
CPU and memory. So uh people can use uh
a little bit of CPU but if they want a
huge memory pool that's why they can
access using CXL with a large memory
pool. Uh if somebody use uh a lot of CPU
they use less memory but their memory
can be shared by uh others. So that's
where uh making the uh overall TCO uh a
lot better than used to be right. So uh
you can uh pay less because you just
share your stuff to others. You give
your stuff that non-user stuff to
others. So that's where I think uh it's
very important to the uh uh cloud
database. I I think not only for the
cloud database, maybe even for cloud
itself. Uh
>> yeah as you mentioned this is a is a
absolutely uh critical feature
especially in the modern modern
databases and and the cloud data centers
and uh also you mentioned previously the
core of polar DB is the separation of
the storage of computing and you also
got lots of award-winning papers in the
research talking about the next step is
to build a decoupled memory pool as you
just mentioned. So it seems like CXL is
actually providing a a perfect vehicle
for achieving that vision. So in your
perspective, what's the CXL enabled PRO
DB will look like in the future?
>> Right. I think you mentioned that we
actually get a a best paper award for
the uh a sigmoid. Sigmoid is a kind of
international conference for database.
So uh we just got the best paper this
year in Berlin. So uh we talk about the
CXL uh why we use how we use the CXL.
Basically CXL is kind of very tightly
coupled with CPU. So CPU can access the
remote memory just as if the memory are
local. So that's making it has a load
store kind of instruction and make it a
lot better. We used to access the remote
memory use IDMA which is a network uh
feature but uh it it's kind of hard. You
have to learn the verbs. you have to pro
program with uh kind of nick all those
things but CXL brings it kind of
co-design with CPUs so we work with uh
MD a lot on how make it work I think it
has advantage over IDMMA uh even in
terms of latency it's 10 times better
it's get getting into the nanconds like
hundreds of nanconds uh range but IDMA
is always in microsconds so I think
that's another uh advantage of not not
only for the uh uh uh ease of use uh but
also for the uh latency also for the
bandwidth as well.
>> But what about the um the elastic uh
issues like originally for the cloud
data centers you have to elastic to to
thousands if not hundreds of thousands
of users.
>> Yeah. Yeah. I think we still work with
uh RDMA. We're not just saying we get
rid of RDMA because uh sex is still uh
more like over the PCIe. So it's more in
rack communications. So if you see our
uh machines uh back in the uh uh CC
conference we have the whole rack build
with CXO but cross rack we still use
IDM. So those like uh we can co evolve
>> right. So um from hardware point of view
the introduction of CXL uh actually
allows the database structure to become
more flexible allows you have more
performance benefits um and uh it allows
people to have actually data separation.
So you have have a hot data saved in
local memory of your CPU and also have
warm or cold data stored in elsewhere.
So what hardware features does epic
processor offer to help this kind of u
operating systems or upper level
software to manage data across different
memory architectures.
>> Yeah. So um objective here is to give a
full transparency from an application
perspective. So application don't know
whether memory is here or memory is over
here. Right. So we have been supporting
CXL from Genova CXL 1.1 and CXL 2.0 in
the current touring generation and we
will be supporting CXL 3.0 in our next
generation silicon. So the point I want
to make is that we are 100% committed on
CXL because we understand that complex
applications like you know polar DB
needs that disagregation and the ability
to um memory capacity as the application
demands.
>> Okay. So yeah, we talk about the
benefits of CXL like in hardware is also
fully dedicated to support it but it
must come with challenges and as I
understand it one of the main challenges
is the ecosystem right. So in your
vision both of you actually what should
be done to help the Excel to be more
widely utilized.
>> Yeah I think uh the the hardware
technology especially from you know
switching perspective has to evolve. We
have been working with a few companies I
mean things are going the right
direction.
>> So what about you? Yeah, I think uh uh
CXL uh the ecosystem
it's kind of going a little bit slow in
the past few years mainly because
there's no uh hardware. So hardware is
going a little bit slow because it's
always hard to make a chip. So we
working with uh kind of uh our some
startups and uh to uh actually develop
the first uh sex of switch uh this year
actually. So this is like a I think it's
kind of one breakthrough even our like a
best paper is based on this uh new uh 6L
switch and uh for us to driven this
ecosystem I think the best is to use it.
So we're going to uh build a fleet of
CXL uh servers. So that's where
everybody can start to use CXL uh
memories. That's where I think
everything will take off.
>> So thank you. So uh I think today's
conversation is a great uh
demonstrations of how hardware and
software teams can you know mix things
together. Uh so with your team focusing
on defining chips for the future and
with Jimmy's teams actually maximizing
the cap capabilities and how to use it
to provide actual the true value to the
to the end users. So um I'm wondering if
there's uh any you know examples any u
future area like both of your team will
work together in the future. So I mean
there is very close collaboration with
the Jim. We are looking at opportunities
to optimize both hardware and software.
So we have a joint road map. Of course
you know CXL memory expansion is one
area that we are you know jointly
working on. Other area is I think we
should work on you know demonstrating
like a super high performance of polar
DB on AMD epic based instances. Yeah, I
think uh we just met Lagu last last week
in Silicon Valley. So, so we met Lagu
very often. So, not only uh in China but
also uh back in the US. I think both
team are very closely because Lagu has a
a size of team in Shanghai as well in
here. So, they do a lot of support. uh
also any questions uh on even uh for the
sexual uh how work with CPU uh lag the
full team is in support not only from
here but also everywhere I think in the
world I I very appreciate the logo
support
>> thank you actually know I want to
compliment you for the great work you
have done on thank you so it's a it's a
truly commendable
>> yeah it's actually my followup question
so I'm very curious on like how you both
teams work on a day day-to-day basis cuz
From my understanding cross globe
collaboration especially from two
different companies can be very
challenging. Can you share some you know
uh examples how of how your teams you
know work close and collaborate
together? That's that's very interesting
you know so we have a team in the US you
know same thing Jimmy right you know in
the it's it's sometimes it's challenge
but if you look at you know today's
collaboration tools there are
significant you know and improve
enhancements so I don't feel that you
know it's two different continents it's
many cases like one company one team
coming together very very quickly
>> yeah I think just as ro says we kind of
have like a a virtual task team working
on particular projects maybe the CXO
projects. So we have our local engineers
working on that and the LU has a
dedicated kind of team supporting us. So
any question we have any problem we have
and we can talk to each other and get it
resolved very quickly. So I think
>> another thing is uh we have visibility
into longerterm road map. So um many
surprises our objective is to you know
minimize the number of surprises. Having
a close road map alignment is really
really really helping us a lot.
>> So firstly we have we need to have a
very close future vision and also we
need to work very closely.
>> Yeah because chip developments years
like a like two three years how to get
it mature is also important. So and a
lot of feature you have to plan in
advance and the problems we have maybe
will affect the next generation design.
So that's I think very important to work
together.
>> Okay. Thank you. So let's look into the
future, right? So for the next 3 to 5
years, uh in addition to AI or sex we're
talking about today, what other emerging
technologies or application scenarios do
you see will you know impact especially
on the database design and uh what's
next for uh Polo DB?
>> Yeah, I think uh uh we actually looking
at a lot of uh uh interesting area but
it's kind of built on top of AI and the
database. We're looking at a kind of uh
inmemory close memory computations. So
once we have sex we have sex switch and
sex controller a lot of uh even vectors
uh computation can push down to the
memory or push down to the story. So
that's where I I think we are closely
looking at it because uh Google has TPUs
right they also have this uh processing
uh cap capability. So we're actually
also looking at those uh more
innovations uh in terms of closed
memory, closed storage computations.
>> So a similar question for you then uh
what's the next thing for uh especially
for hardware?
>> I I think it is going to be uh AI
hardware acceleration to support uh you
know large language models and also like
you know emerging other applications
emerging in the AI world. So uh we truly
believe that AI is going to change the
world in a good way and uh and our our
our strategy is to build the best
product to uh bring efficiency to to
support emerging AI applications.
>> Um a lot of my viewers is actually very
young people passionate engineers
entering the career. Um so you both have
brilliant careers and uh starting from
different but converging to um small but
uh common places like you're working on
together. Uh your experience I think
will be very inspiring and um to the
people who watching this uh this
interview. So the question we ask for
every interview is that if you were just
entering your career like 10 or 20 30
years ago uh what's the most important
piece of uh knowledge or skills you wish
you have uh back then at that time? So I
think that the new generation is very
very lucky possibilities for them. Uh
it's endless. So um I give this example.
So when I when I was in in the US came
the US in the '90s that my number one
expense was rent. Number two expense was
phone calls. So right now I try to
explain that you know I had to pay like
$4 or $3 for per minute to call India.
My kids don't believe that. What is
wrong with you guys? Right? So the
technology has changed a lot. It's like
for example internet. Internet is as a
topic that nothing new okay like for 10
years ago um it was very different right
so now for the new generation like they
have the data available you don't have
to go to libraries to you know look for
something now everything is available
online second thing is there are a lot
of opportunities for online training so
whether it is AI or you know databases
right there are a lot of online
trainings available the next one is AI
right that is going to change the world
and the new generation has lot of opport
opportunities to shape the future of AI.
I
>> I think AI is changing everything. So I
I I don't have a a definite answer for
uh for how to tell them what going to
look like in like a 10 20 years into
their career from my old past
experiences like I think you spend
enough enough time to explore all the
possibilities and find something you are
passionate about and stay with it and
build a career out of it. So that's
where I think you stay in something and
build into a if you're really good in
some niche market or some niche skill,
you're going to get rewarded. So that's
that's only my thing. But AI is going to
change everything. So actually I think
uh for the new generation, right, there
are way more opportunities today than it
was like 20 or 30 years ago. The thing
is like you need to take advantage of
opportunities available to you. I think
one advice is work hard. There's no
compromise. Thank you very much Ragu and
Jimmy for today's conversation.
Loading video analysis...