我和最懂数据库的大佬聊了聊：AI数据爆炸，AMD和阿里云竟在联手搞事情？

By 老石谈芯 Shilicon Talk

Summary

## Key takeaways - **Alibaba's PolarDB: A Database Built Ground Up**: PolarDB, Alibaba Cloud's database, was built from scratch and is now one of China's largest database suppliers, serving over 15,000 enterprises with a sizable development team. [04:02], [04:11] - **AI Demands New Database Capabilities**: AI necessitates databases handling multimodal data like text and vectors, moving beyond structured data. Databases must also embed AI for easier querying, like natural language to SQL, and manage concurrent requests from numerous AI agents. [07:36], [08:00] - **AMD EPYC's Chiplet Architecture Scales for Cloud**: AMD's EPYC processors utilize a chiplet architecture, allowing scalability from 8 to 192 cores. This design is optimized for cloud environments, maximizing compute density to support a large number of virtual machines, database users, and AI applications on a single machine. [09:53], [10:20] - **CXL Enables Memory Expansion for Databases**: CXL standard allows for memory bandwidth and capacity expansion, crucial for AI and transaction processing. It enables disaggregation of CPU and memory, allowing users to access large memory pools and improving overall TCO by enabling resource sharing. [15:22], [16:02] - **Hardware-Software Co-design Drives Database Evolution**: Cloud databases like PolarDB are developed with a philosophy of hardware-software co-design, aligning with advancements like AMD's scalable architecture. This collaboration drives the development of elastic, utility-like database services that can scale in and out based on demand. [12:24], [13:03] - **CXL Ecosystem Growth Hinges on Hardware Adoption**: The widespread adoption of CXL is currently slowed by the difficulty in hardware development, particularly for CXL switches. Building a fleet of CXL servers will be key to driving ecosystem growth and enabling broader utilization of CXL memories. [22:14], [22:54]

Topics Covered

How AI's Multimodal Data Challenges Database Design
Hardware-Software Co-Design Drives Cloud Database Innovation
CXL: Disaggregating CPU and Memory for Efficiency
The Future of Computing: In-Memory and In-Storage Processing
Young Engineers: Embrace AI and Continuous Learning

Full Transcript

I'm basically a database guy.

>> I grew my team from a small like a 10

member team to 400 engineers.

>> Colb is like a cloud database.

>> How many best papers last couple of

years?

>> Three just two years. Three best paper.

Yeah. So this is Alibaba heavily invest

in DB. Yeah. I think CXO the ecosystem

kind of going a little bit slow in the

past few years mainly because there's no

hardware because it's always hard to

make a chip. Right now I try to explain

that now I had to pay like $4 or $3 for

per minute to call India. My kids don't

believe that. What is wrong with you

guys?

>> I think AI is changing everything.

AMD

polar

foreign.

So before we dive into the technical

details, uh I'm very curious to know how

you both get into this um exciting field

and have been working on it for so long.

>> Thank you Ken. Um I'm a brauner. I've

been in the industry for a very long

time. So I started my career with HPE

building server and um storage systems.

Then um um I joined Cisco um when Cisco

started a server business it was uh 2006

2007 time frame. My last role was the

CTO of that business. Then I was looking

for exciting opportunities. Okay. Then

of course you know I talked to the AMD

leadership and decided to join AMD. um

joined AMD in 2018, it was a very

different AMD at that time. Um but if

you look at look back last seven eight

years, it has been a really exciting

ride. So my my uh charter was to create

a ecosystem of partners and uh optimize

software both open source and close

source software stacks on AMD's epic

processor. So we have done a pretty good

job today. We support 75 plus ISVS um

and support over 300 unique software

stacks. So, so I grew my team from a

small like a 10 member team to 400

engineers. Actually, I do have a

presence in China. That is one area that

uh we are going to invest because uh we

have customers like Alibaba. I mean the

they've been really really good good

customer you know very close

collaboration on several areas. One of

the idea is of course you know database

that is close to uh you know Jimmy's uh

heart.

>> What about you Jimmy?

>> Yeah I think I'm I'm much simpler. I'm

basically a database guy. So I started

my career in uh a database called cybase

uh which later acquired by uh SAP. So uh

at that time uh cybase was its

competitor is Oracle. So uh after a

while I joined the cyber competitor

after cyber get acquired by SAP. So I

worked for Oracle for quite a bit time

then later I joined Alibaba uh seven

years ago. So I mainly uh working on uh

a cloud database called pol in uh

Alibaba cloud uh database uh business

unit. So uh pol is uh quite a bit uh

interesting. We basically build it from

ground up. So it's one of the largest uh

uh database supplier in China right now.

So we do a lot of business with um I

think uh over 15,000 uh enterprise now.

So it's quite a a big database. So uh we

have uh very sizable team as well uh

about two 300 uh just for the uh uh

development. So this is uh uh Alibaba

heavily invest in PB. So yeah,

>> I want to add one more thing actually.

Uh my experience with the databases um

you know I've been associated with the

TPC, the transaction processing

performance council for a very long

time. So this is an industry standard

body that brought industry standard for

database performance characterization

like a TPCC TPCH benchmark. So long

history that that's that's nice. I

didn't even know. So yeah. So, so yeah,

today's uh conversation is all about

databases, cloud um and computing,

right? So, could you each describe in

one keyword, you know, in your opinion,

what's the most impressive advancement

in the past few years in this area?

>> So, Ken, um I don't know whether I can

represent uh uh things with one word,

maybe like a three words. Okay. Uh first

one is um big data analytics. I think uh

the data revolution started in the 2010

you know time frame. I think one of the

reasons behind that was uh the explosion

of connected devices, smartphones um and

other devices generating all kinds of

data and organizations have realized

that collecting and processing large

amount of data can give them a business

advantage and of course on the on the

software side there are technologies

evolved like Hadoop and other software

defined storage system and you know

relational database management systems

also build capabilities to run complex

queries on unstructured data. So big

data analytics is one. Second one is u

um advancements in um in hardware

technologies like if you look at the AMD

uh we brought epic processors to the

market in 2018 with 32 processor cores.

Today we support 192 processor cores in

a in in in a chip. So the second aspect

is um the enhancements or advancements

in hardware technologies to process the

the data. Then third one is AI. So so

the both um the data from the big data

revolution as well as the enhancements

uh advancements in hardware technologies

are basically accelerating AI.

>> Yeah, I think I totally agree with Ragu.

But if want me to pick one word I think

AI is more important even though I'm a

database guy but uh I think uh even with

AI data is most important. So where you

store the data, how you train the uh uh

LM. So those kind of thing are

important. So databases play a very

crucial role in this AI uh way.

>> Let's talk about AI, right? So the rise

of AI both new opportunities uh but also

new troubles, right? Especially for

databases. For example, AI brings

semantic search or vector search uh

which makes databases more expressive uh

but also more technically difficult. uh

can you please describe in more details

about the new requirements that brought

by AI for databases specifically?

>> Sure. uh I I think uh as we said uh AI

come from uh data and uh use data for

inference right so uh then the we for

database guy we used to deal with

structured data most of because we are

working on relational database uh but

nowadays uh all kinds of different data

come up right so it's a multimodel data

this is where we want to really handle

the multimodel data there are more

videos uh more kind of text right so

they are also vector data as you said.

So there are semistructured and

structured nonstructured data come up.

That's where it's a challenge for

database. How we going to store uh so

much data? It's a large volume of data

much more than the structured data you

used to have. So this is the one

challenge we have and we kind of have to

transform our database to be more open

to less transactional but be able to

handle large number of data large uh

quantity of uh metadata as well. I think

that's one thing. Another thing is that

we need to embedded the AI into the

database system. So uh make uh the query

make how people use AI to query the

database easier. So uh actually we do a

lot of uh research on natural language

to SQL. So people don't need to uh

understand no SQL be able to query data

right. So another things is um they are

coming up a lot of agents. So yesterday

I think the keynotes there could be more

agents than people in the world. So the

agents basically they also uh uh

interact with uh database as well. how

the database handles so much concurrent

queries right so many agents uh uh

asking questions query the database

that's where we're looking into as well

so those are I think three main

important area we are looking at

>> thanks Jimmy all that requirements

brings the requirements on the um the

hardware right so everything will come

down into the hardware so um how was the

epic processor specifically when it was

originally designed how does it anticip

ipate this requirements like before this

new requirements actually coming out.

>> When we brought this product to the

market to 2017 time frame, we introduced

the chiplet architecture. Here you see

this is a chiplet. The chiplet has you

know 8 cores 16 cores. In the future

we'll have more cores. So so we are able

to build uh CPUs by interconnecting

chiplets using a high-speed bus. So that

way we can scale from very small like 8

cores to you know all the way up to 192

cores. So this is primarily built for um

the cloud you on the cloud um the one of

the number one criteria is how many

virtual machines how many database users

or how many AI users can um run on one

one machine. So the chiplet architecture

worked really well with you know the

compute density that cloud vendors and

others were looking for. Our initial

focus from a solution software

perspective, it was you know high

perform cloud and high performance

technical computing where we have made a

really big impact. If you look at um

companies like Alibaba, they have you

know adopted AMD epic technologies

actually since last four from a sixth

generation right so now you know we

recently announced um ninth generation

um instances powered by um AMD. So now

it's all about AI right so we have a

unique capabilities in AMD epic for

support you know new generation of AI

applications including um agents and of

course we have a um very compelling GPU

product line as well. So uh one key word

I heard from your response is to scale

out from the chip level like by using

the chiplet technology to adding more

more and more cores into it. But and

also uh interestingly from the database

point of view from especially in polar

DB is also scale out in the cloud by you

know building the um the storage

computing separation architecture. I'm

wondering if these two scale out uh

philosophies seem to align perfectly. Is

that happy coincidence or like is there

a inevitable technology trend like you

you you find and can converge into

together?

>> So I think uh we're all thinking in the

right way right you know how do we you

know support a large number of customers

large number of you know applications.

So our scale up and scale out

architecture is pretty much

complementing you know polar DB's

architecture. So by the way I'm really

excited about uh you know the

capabilities that Polar DB brought to

the table including winning how many

best papers last couple of years

>> uh three just two years three best

paper. Yeah.

>> So what about you Jimmy?

>> Yeah I think we uh one philosophy for

Apollo DB is like a hardware software

code design. So we go along with new

hardwarees uh like MD have this scale uh

structure. I think we are on the same

road because cloud data centers cloud is

kind of driven the whole development in

the past few years. So other than AI I

think cloud is very important. So mainly

a lot of uh uh business are run on

cloud. So how to utilize all those uh

resources hardware resources is one

question. So for us holo DB is like a

cloud database. So you have to use all

the resource we have. So this is where

we coming from. it become more elastic

to serving the uh customer we call that

called serless but it's kind of things

that use as you need like a water you

like a utilities right so that's where

we have scale out not only scale out we

can also scale in right so we can scale

back so that's one uh thing that driven

us have this whole architecture not only

separate the compute with storage but

also we separate the memory that's where

coming from uh the CXO story.

>> Okay. So, so what's the primary focus of

Polar DB in the AI era and uh like what

other features are added or will be

added to um polar? I think of pol as I

said one important feature is serless

like elastic you can scale out scale up

so multiple ways how you u do that but

also as we said we have uh a lot of

features coming up like uh we have

so-called Apollo DB limited list that's

where we uh break the TPCC uh benchmark

we have the uh so-called 2.055 055

billion transaction per minute. So

that's serving like a uh 1.6 billion

people buy things online at the same

time. That's one thing we want to make

sure the things can scale out. One

database can serve all the requirement.

But we're also working on uh edge tab.

We're working on u kind of uh AI in pol

as well. So that's DB4 AI features we

have. So this is like a people just I

said do more kind of vector search uh

more kind of NL to SQL or they can they

can even run their uh models using using

SQL uh language. So that's where uh we

are uh working on.

>> Okay. Thank you very much. So let's talk

about some of the technologies like

drive hardware and software especially

database uh together. One of the main

technology innovation well personally I

think is very important is the CXL. So

but I understand a lot of our reviewers

probably won't don't understand what CXL

is right but so can you please explain

uh a bit more on to our audience like

what CXL is and why it is considered to

be one of the most important um

technology since PCIe probably. So um

CXL is a standard for u connecting

devices to the CPUs. It is uh something

that um AMD has pioneered. So from uh

third generation of AMD epic processor

onwards we have been supporting CXL. U

it's all about uh you know as Jimmy said

you know if you look at you know polar

DB right it's a disagregation of you

know compute storage and memory. So uh

you know there are many use cases for

CXL but uh one of the most compelling

use cases especially in the datadriven

era AI era is the memory expansion. When

you talk about memory expansion there

are two aspects. Okay one is memory

bandwidth expansion. So for the many of

the AI you know high performance

technical compute applications memory

bandwidth is you know critical. Then

there are applications like you know

relational like transaction processing

system where memory capacity is

important. So CXL is C CXL enables

memory bandwidth expansion as well as

memory capacity expansion.

>> So from a database point of view what

actual changes would that bring to the

end users?

>> Yeah just as I said we try to make a

database use like a utility bill. So you

get a bill how much you used. So if you

look at online utilization of the CPU

and uh memory it's very low. So memory

is about 40% memory overall being uh

used and CPU is even lower. Some

sometimes it's 20%. Uh one reason is

that you always pack a CPU with memory.

So some somebody want more memory,

somebody want more CPU but they don't go

together. So uh that's why in one box uh

either memory is used less or the CPU is

used less. So uh that's where the uh CXL

comes in that allow us to disagregate uh

CPU and memory. So uh people can use uh

a little bit of CPU but if they want a

huge memory pool that's why they can

access using CXL with a large memory

pool. Uh if somebody use uh a lot of CPU

they use less memory but their memory

can be shared by uh others. So that's

where uh making the uh overall TCO uh a

lot better than used to be right. So uh

you can uh pay less because you just

share your stuff to others. You give

your stuff that non-user stuff to

others. So that's where I think uh it's

very important to the uh uh cloud

database. I I think not only for the

cloud database, maybe even for cloud

itself. Uh

>> yeah as you mentioned this is a is a

absolutely uh critical feature

especially in the modern modern

databases and and the cloud data centers

and uh also you mentioned previously the

core of polar DB is the separation of

the storage of computing and you also

got lots of award-winning papers in the

research talking about the next step is

to build a decoupled memory pool as you

just mentioned. So it seems like CXL is

actually providing a a perfect vehicle

for achieving that vision. So in your

perspective, what's the CXL enabled PRO

DB will look like in the future?

>> Right. I think you mentioned that we

actually get a a best paper award for

the uh a sigmoid. Sigmoid is a kind of

international conference for database.

So uh we just got the best paper this

year in Berlin. So uh we talk about the

CXL uh why we use how we use the CXL.

Basically CXL is kind of very tightly

coupled with CPU. So CPU can access the

remote memory just as if the memory are

local. So that's making it has a load

store kind of instruction and make it a

lot better. We used to access the remote

memory use IDMA which is a network uh

feature but uh it it's kind of hard. You

have to learn the verbs. you have to pro

program with uh kind of nick all those

things but CXL brings it kind of

co-design with CPUs so we work with uh

MD a lot on how make it work I think it

has advantage over IDMMA uh even in

terms of latency it's 10 times better

it's get getting into the nanconds like

hundreds of nanconds uh range but IDMA

is always in microsconds so I think

that's another uh advantage of not not

only for the uh uh uh ease of use uh but

also for the uh latency also for the

bandwidth as well.

>> But what about the um the elastic uh

issues like originally for the cloud

data centers you have to elastic to to

thousands if not hundreds of thousands

of users.

>> Yeah. Yeah. I think we still work with

uh RDMA. We're not just saying we get

rid of RDMA because uh sex is still uh

more like over the PCIe. So it's more in

rack communications. So if you see our

uh machines uh back in the uh uh CC

conference we have the whole rack build

with CXO but cross rack we still use

IDM. So those like uh we can co evolve

>> right. So um from hardware point of view

the introduction of CXL uh actually

allows the database structure to become

more flexible allows you have more

performance benefits um and uh it allows

people to have actually data separation.

So you have have a hot data saved in

local memory of your CPU and also have

warm or cold data stored in elsewhere.

So what hardware features does epic

processor offer to help this kind of u

operating systems or upper level

software to manage data across different

memory architectures.

>> Yeah. So um objective here is to give a

full transparency from an application

perspective. So application don't know

whether memory is here or memory is over

here. Right. So we have been supporting

CXL from Genova CXL 1.1 and CXL 2.0 in

the current touring generation and we

will be supporting CXL 3.0 in our next

generation silicon. So the point I want

to make is that we are 100% committed on

CXL because we understand that complex

applications like you know polar DB

needs that disagregation and the ability

to um memory capacity as the application

demands.

>> Okay. So yeah, we talk about the

benefits of CXL like in hardware is also

fully dedicated to support it but it

must come with challenges and as I

understand it one of the main challenges

is the ecosystem right. So in your

vision both of you actually what should

be done to help the Excel to be more

widely utilized.

>> Yeah I think uh the the hardware

technology especially from you know

switching perspective has to evolve. We

have been working with a few companies I

mean things are going the right

direction.

>> So what about you? Yeah, I think uh uh

CXL uh the ecosystem

it's kind of going a little bit slow in

the past few years mainly because

there's no uh hardware. So hardware is

going a little bit slow because it's

always hard to make a chip. So we

working with uh kind of uh our some

startups and uh to uh actually develop

the first uh sex of switch uh this year

actually. So this is like a I think it's

kind of one breakthrough even our like a

best paper is based on this uh new uh 6L

switch and uh for us to driven this

ecosystem I think the best is to use it.

So we're going to uh build a fleet of

CXL uh servers. So that's where

everybody can start to use CXL uh

memories. That's where I think

everything will take off.

>> So thank you. So uh I think today's

conversation is a great uh

demonstrations of how hardware and

software teams can you know mix things

together. Uh so with your team focusing

on defining chips for the future and

with Jimmy's teams actually maximizing

the cap capabilities and how to use it

to provide actual the true value to the

to the end users. So um I'm wondering if

there's uh any you know examples any u

future area like both of your team will

work together in the future. So I mean

there is very close collaboration with

the Jim. We are looking at opportunities

to optimize both hardware and software.

So we have a joint road map. Of course

you know CXL memory expansion is one

area that we are you know jointly

working on. Other area is I think we

should work on you know demonstrating

like a super high performance of polar

DB on AMD epic based instances. Yeah, I

think uh we just met Lagu last last week

in Silicon Valley. So, so we met Lagu

very often. So, not only uh in China but

also uh back in the US. I think both

team are very closely because Lagu has a

a size of team in Shanghai as well in

here. So, they do a lot of support. uh

also any questions uh on even uh for the

sexual uh how work with CPU uh lag the

full team is in support not only from

here but also everywhere I think in the

world I I very appreciate the logo

support

>> thank you actually know I want to

compliment you for the great work you

have done on thank you so it's a it's a

truly commendable

>> yeah it's actually my followup question

so I'm very curious on like how you both

teams work on a day day-to-day basis cuz

From my understanding cross globe

collaboration especially from two

different companies can be very

challenging. Can you share some you know

uh examples how of how your teams you

know work close and collaborate

together? That's that's very interesting

you know so we have a team in the US you

know same thing Jimmy right you know in

the it's it's sometimes it's challenge

but if you look at you know today's

collaboration tools there are

significant you know and improve

enhancements so I don't feel that you

know it's two different continents it's

many cases like one company one team

coming together very very quickly

>> yeah I think just as ro says we kind of

have like a a virtual task team working

on particular projects maybe the CXO

projects. So we have our local engineers

working on that and the LU has a

dedicated kind of team supporting us. So

any question we have any problem we have

and we can talk to each other and get it

resolved very quickly. So I think

>> another thing is uh we have visibility

into longerterm road map. So um many

surprises our objective is to you know

minimize the number of surprises. Having

a close road map alignment is really

really really helping us a lot.

>> So firstly we have we need to have a

very close future vision and also we

need to work very closely.

>> Yeah because chip developments years

like a like two three years how to get

it mature is also important. So and a

lot of feature you have to plan in

advance and the problems we have maybe

will affect the next generation design.

So that's I think very important to work

together.

>> Okay. Thank you. So let's look into the

future, right? So for the next 3 to 5

years, uh in addition to AI or sex we're

talking about today, what other emerging

technologies or application scenarios do

you see will you know impact especially

on the database design and uh what's

next for uh Polo DB?

>> Yeah, I think uh uh we actually looking

at a lot of uh uh interesting area but

it's kind of built on top of AI and the

database. We're looking at a kind of uh

inmemory close memory computations. So

once we have sex we have sex switch and

sex controller a lot of uh even vectors

uh computation can push down to the

memory or push down to the story. So

that's where I I think we are closely

looking at it because uh Google has TPUs

right they also have this uh processing

uh cap capability. So we're actually

also looking at those uh more

innovations uh in terms of closed

memory, closed storage computations.

>> So a similar question for you then uh

what's the next thing for uh especially

for hardware?

>> I I think it is going to be uh AI

hardware acceleration to support uh you

know large language models and also like

you know emerging other applications

emerging in the AI world. So uh we truly

believe that AI is going to change the

world in a good way and uh and our our

our strategy is to build the best

product to uh bring efficiency to to

support emerging AI applications.

>> Um a lot of my viewers is actually very

young people passionate engineers

entering the career. Um so you both have

brilliant careers and uh starting from

different but converging to um small but

uh common places like you're working on

together. Uh your experience I think

will be very inspiring and um to the

people who watching this uh this

interview. So the question we ask for

every interview is that if you were just

entering your career like 10 or 20 30

years ago uh what's the most important

piece of uh knowledge or skills you wish

you have uh back then at that time? So I

think that the new generation is very

very lucky possibilities for them. Uh

it's endless. So um I give this example.

So when I when I was in in the US came

the US in the '90s that my number one

expense was rent. Number two expense was

phone calls. So right now I try to

explain that you know I had to pay like

$4 or $3 for per minute to call India.

My kids don't believe that. What is

wrong with you guys? Right? So the

technology has changed a lot. It's like

for example internet. Internet is as a

topic that nothing new okay like for 10

years ago um it was very different right

so now for the new generation like they

have the data available you don't have

to go to libraries to you know look for

something now everything is available

online second thing is there are a lot

of opportunities for online training so

whether it is AI or you know databases

right there are a lot of online

trainings available the next one is AI

right that is going to change the world

and the new generation has lot of opport

opportunities to shape the future of AI.

I

>> I think AI is changing everything. So I

I I don't have a a definite answer for

uh for how to tell them what going to

look like in like a 10 20 years into

their career from my old past

experiences like I think you spend

enough enough time to explore all the

possibilities and find something you are

passionate about and stay with it and

build a career out of it. So that's

where I think you stay in something and

build into a if you're really good in

some niche market or some niche skill,

you're going to get rewarded. So that's

that's only my thing. But AI is going to

change everything. So actually I think

uh for the new generation, right, there

are way more opportunities today than it

was like 20 or 30 years ago. The thing

is like you need to take advantage of

opportunities available to you. I think

one advice is work hard. There's no

compromise. Thank you very much Ragu and

Jimmy for today's conversation.

Loading...

Loading video analysis...