The AI Revolution Finally Comes to Structured Data
By Gradient Flow
Summary
Topics Covered
- Textifying Tables Fails for Predictions
- Zero-Shot Beats Month-Long Data Science
- Every Database is a Graph
- DoorDash 30% Recs Boost in Weeks
- Attention Reveals True Explanations
Full Transcript
[music] All right. So today we have Yur
All right. So today we have Yur Lecovitz. He is a professor at Stanford
Lecovitz. He is a professor at Stanford in computer science and more importantly for this episode he is co-founder at kumu.ai
kum o.ai.
kum o.ai.
taglines are the first foundation model built for enterprise rel enterprise data relational enterprise data get
instantaneous prediction straight from your data warehouse zero shot out of the box fine-tuned when needed no ML pipelines required and with that UR
welcome to the podcast >> uh thanks for having me >> all right so uh I guess we'll start at the beginning we'll go through the solution mentions
your technology uh in detail but let's start at the basics what problem is kumu targeting
>> um yeah so the problem we are targeting is AI over structured uh business data so you know the most valuable data every
enterprise has is this ground truth of the business stored in data warehouse in this structured tabular form this is custo customer records, product
catalogs, transaction records, supply chain management data, all that all that is kind of the ground truth of the business and we are doing AI
predictions, risk, risk scoring over this type of data and traditionally I would say the traditional solutions would be either
you have some sort of data science team.
I guess you will have a data science team. And so the question is will the
team. And so the question is will the data science team be build their own ad hoc data science tools internally with a
data platform team or they can you go to one of these cloud hyperscalers like Google or Azure who have some sort of
data science solution. Is that correct?
is that kind of uh uh the the de facto the de facto solution is sort of like this right >> yeah I would say there are kind of two potential solutions right one solution
is that you would say we have the large language models so why don't we just textify the the table the tabular data throw that into the large language model
and you know it's going to give me accurate predictions accurate forecasts uh accurate estimates uh and it turns out that that works terribly and it's it's it's a bad idea. People have tried
it and and it doesn't really work like you know next token prediction is not the same as forecasting. So then what is the today's solution or this status quo solution is basically let's do machine
learning which basically means let's hire a data science team let's uh engineer a number of features let's create a training data set let's train this task specific model let's put the
features in production and then you know let's put the model in production. Uh
the issue with that approach is that it's super manual and super expensive.
It on average takes about two full-time employees to take care of a single model. Uh more than half of these models
model. Uh more than half of these models that are being developed in industry don't ever reach production. Developing
a model for a single task is like a six month type effort and just maintaining these models uh the production pipelines uh takes a very long time right it's
like it the cost is significant. So
somehow it seems that you know the AI revolution skipped structured data. If
you want to do something with structured data, you basically use 30-y old technology. So so the on the data
technology. So so the on the data science team a few things from several years ago, right? So one is the the time
series libraries and the data science solutions they tried to do some sort of autoML, right? So, so they were trying
autoML, right? So, so they were trying to automate but then as you point out the the problem there is it might work but it depends on the how important the
forecast is. If the forecast is super
forecast is. If the forecast is super important you probably don't want to go with the AutoML solution.
>> Yeah. Yeah. AutoML promised a lot but uh it hasn't really delivered. And I think the reason it hasn't delivered is because it's like it's too brute force, >> right? is just for loops over
>> right? is just for loops over hyperparameters, automated feature engineering. Sometimes
the runs arbitrary SQL queries over databases creating humongous joins and then you know training 17 different challenger models and hoping that one of
them will work. So it's just kind of a big random search over o over the humongous space of possibilities. So the
other de recent more recent development which I have to profess I don't have a good feel for I I'm a bit skeptical uh is the so-called time series foundation
models right so where as you point out so there's uh different kinds of them one as you point out just takes kind of like here's a text model we'll we'll
treat temporal data as if they're text sequential data and then hope for the best. And then there were some other
best. And then there were some other people who were trying to build time series foundation models much more from the ground up. Uh I profess I don't know
anyone who actually uses this seriously in production.
>> Yeah. But I I think that work is uh I like I like that work. Right.
>> Yeah.
>> Yeah. There's promise. What what those what that line of work did was basically say let's represent a time series of as a sequence of tokens and time series is naturally a sequence of tokens. So we
can kind of feed in one time series to the model and then the model is going kind of to to to predict it forward and of course you need to change the transformer architecture you need to change the tokenization uh attention
mechanism things like that but you can pre-train um and it works quite well right so I think the what this showed is that it's possible to do a foundation
model I would argue that foundation model for time series is is is some is somewhat easy because time series is a sequence we know how to attend over a sequence right the foundation model I'm
talking about is a foundation model that can be applied to any database any set of tables any kind of predictive task so you know it's another two steps forward
from uh you know let's say time series foundation models that are basically meant for time series forecasting on a single single type single single time series at a time
>> so on the time series foundation models the I I think there are the o there's some open weights ones, there's uh research ones, there's ones that the
cloud providers are starting to introduce.
Um and my impression, Yuri, is that uh it's still at the point where it depends on how important the forecast is or the task is for you, right? So, so in other
words, if it's super important, you probably still want a data scientist to look through the the results and and
compare it and kind of really uh uh test it. But if you're one of these kind of
it. But if you're one of these kind of if the application is like you know DevOps where have millions of time series and you know maybe you know
alerts maybe as long as you don't get alert fatigue you're okay with using one of these. Is that am I correct?
of these. Is that am I correct?
>> I I think I I'm I'm not sure I would entirely um kind of agree with this. I
think with foundation models you you are able to make very accurate predictions as well. And of course you can have them
as well. And of course you can have them uh give you these predictions almost like in a ad hoc way with a frozen model. But of course if you really want
model. But of course if you really want high performance you can always fine-tune for that specific thing. And
and what we see and we can talk more about this later you get to the superhuman performance. So you can build
superhuman performance. So you can build models, predictions that are more accurate, like 10% more accurate, 20% more accurate than any human with
traditional machine learning toolbox can do.
>> Okay. So now let's go to this notion of relational foundation model which you point out goes beyond time series. It's
basically any sort of machine learning task that involves relational data. So
relational foundation model. So the
phrase foundational model is in there.
So is it a true foundation model in the sense of what we think foundation models are?
>> Um yeah. So I can explain. So the way this works is that you have a pre-trained model that is frozen that has never seen a given database. So now
you can >> never seen my company's data.
>> It doesn't need to see your company's data. You connect this model to your
data. You connect this model to your company's data to a set of tables that are interlin with primary foreign key relations. Maybe that's is you know 10
relations. Maybe that's is you know 10 tables, 20 tables, five tables, whatever it is, right? Some you say this is my schema over which the foundation model operates and then you have a querying
mechanism a prompting me mechanism where you prompt the model with a specific prediction task and then the model in a single forward pass of its architecture
is going to give you an accurate prediction. Our tests show that the
prediction. Our tests show that the accuracy of this prediction is as good as a one month of a data PhD level data scientist building a manual model.
>> So that's what you can do. You basically
say I want to predict a 60-day churn for this particular user. 200 milliseconds
you have the answer. That's as good as one month of you know a data scientist building a custom neural network decision tree whatever it is for 60-day churn prediction problem. Of course, if
you don't want 60-day churn, you want a 48 day churn, you can do you can you just change your prompt, you get a different thing. You don't want churn
different thing. You don't want churn based on the number of purchases. You
want to say is this person spending less than $100 in the next three months. You
ask this, you get an answer. So
basically you can basically now ask these predictive questions in a ad hoc uh in a real-time way getting back
accurate answers uh 200 milliseconds later of course now if you say but I really care about this answer to be as accurate as possible because it's so
business important you can always fine-tune the model for that specific question for that specific data and then you basically now have a fine-tuned let's a single task model that is super
accurate.
>> So I have uh several questions. So the
first one is somewhat of this the kind of a cold start question right so so if you if you
look at what's happening in the text to SQL space right so all these people data bricks of which I'm an advisor uh
snowflake all almost all of them have this notion of oh yeah so we can help you with text to SQL but first we have to build a what they call the semant semantic layer which is basically
metadata and and all of this right so >> so I'm assuming you need some it's not going to I mean how out of the box is it
so is it yeah it take a while for it to learn something about my data >> uh there are two separate problems here one problem is how do you prompt the
model we have actually developed a domain specific language that looks like SQL But rather than starts with select it starts with predict and then you
basically in you know with SQL operators you say what am I predicting am I predicting you know 30-day count of purchases in the future and then you say
who am I predicting this for so you say for whatever entities maybe you are predicting this for a user maybe you are predicting this for a product because you are saying what's my product sales
forecast so you have this structured language to prompt the model It looks like SQL but the fundamental difference is that SQL is about the past. SQL you ask what happened last
past. SQL you ask what happened last week. Here you are asking what happens
week. Here you are asking what happens next week or what happens next quarter.
So it's like forward looking with that you um you prompt uh you prompt the model. Now of course the same as in text
model. Now of course the same as in text to SQL where we take text and translate it to SQL. You can take text and translate that to my predictive query
language and for that you need a semantic model. So you don't need the
semantic model. So you don't need the semantic model to make predictions per se. the same way as you don't need
se. the same way as you don't need semantic model to run SQL queries but you need semantic model to go from text to the predictive specification because
if you say predict me 30-day churn in some organization this means number of purchases in other convers in other organization this means uh number of
clicks on the website in some other organizations it may mean total total estimated purchase amount less less than $100 or
$10 or whatever it is, right? So for
that you need some kind of semantic model.
>> Yeah. Yeah. And also uh if you go into a company, right? So they could have
company, right? So they could have multiple data warehouses and uh to to a large extent a lot of a
lot of uh how these analysts and data science teams work over time is they develop institutional knowledge, right?
So, oh yeah, for for sales in the Southeast, we should hit this data warehouse, not this one, cuz cuz the data for the Southeast is better on this data warehouse. So, how do you how do
data warehouse. So, how do you how do you learn all of this? How does your system learn all of the metadata and institutional knowledge I
guess? So I would say this is part uh of
guess? So I would say this is part uh of the setup process where where either this information is entered uh during the setup where let's say the data
scientist or the operator of the system can select what what is the data what's the semantics and all that I would say that we need to qualify that and that is
part of the setup process for the model and once the model is set up now you can ask it any predictive question over that uh database in the typical setup what is
the typical setup process you're in terms of if you go to like a company how long does it take normally >> a couple of days a day
>> so you but you need you need access to a few key people who know a lot about the underlying data >> exactly exactly right there's there's
there's no way uh there's no way uh there's no way uh around that we need that uh mainly to go from the textual
description of the task to the structured prediction of the task that then is executed by the by the foundation model at the bottom. And then
of course foundation model needs access to the data. So part of the setup process is to decide what tables is the semant is the is the foundation model
using what are the semantic types what are the the the data types and how the primary foreign key relations work. So
in the forecasting example you gave you said okay after 200 milliseconds I get a prediction right? Uh
prediction right? Uh >> so why is it so fast? Cuz uh you know normally deep learning and neural nets are not as
fast as traditional statistical models, right? So
right? So >> yeah, the reason why it's fast is because there's no model training happening, right? At the at the query
happening, right? At the at the query time, at the inference time, the model is frozen.
>> Oh, I see.
>> So So the model is frozen. The model
doesn't do any parameter updates.
there's no gradient descent, the model is frozen. So the way the model the
is frozen. So the way the model the foundation model operates is that once you specify the predictive task, the foundation model goes into the data and
generates a set of incontext examples and then those in context examples are then fed into the model and in a single forward pass of the model the
prediction, the forecast, the recommendation is then made. Right? So
the the reason this is this is fast because we can generate the in context examples based on the task specification very quickly and then send them through this pre-trained architecture without
any feature engineering without any any model training no parameter updates to give you an accurate prediction. That's
why it's fast.
>> So this in context retrieval has to in context examples is some sort of graph retrieval process. Is that correct? uh
retrieval process. Is that correct? uh
uh that's great right so uh you mentioned graph retrieval so the way the way the underlying technology works and what the key insight is that any database any set of tables can be
represented as a graph and then if you just think about let's say two tables right imagine you have a user catalog and a set of user transactions imagine you want to predict user churn like a
classical kind of boring task right what today you have to do you take the user table the transactions table you join the two together and then you need to aggregate the transactions per user. So
you could say count of transactions last week, sum of transaction prices over the last three weeks, uh count of transaction prices over the last two weeks, uh but between 6:00 and 9:00 a.m.
in the morning and those are your features, right? But the difference here
features, right? But the difference here is that we can represent these let's say these two tables as every user is a node, every transaction is a node and then a user points to its own
transaction or his or her own transactions. And now the attention
transactions. And now the attention mechanism attends over the raw transactions and because of the attention mechanism now attends over the row events there's
no feature engineering needed, right? Um
so it just attends over those raw events. The attention mechanism has so
events. The attention mechanism has so much more let's say finesse or fidelity to extract signal from the data than manual features do. So you don't need to
do any data joins no feature engineering. Um so you get these
engineering. Um so you get these predictions super fast and because they are done actually on the raw data the attention mechanism is able to extract
more signal than manual features typically do. So these models tend to be
typically do. So these models tend to be very accurate and go to this superhuman regime of accuracy.
>> And to our listeners uh of course this probably is years of research in Urus uh group in Stanford that uh uh makes
this real. So you're would you consider
this real. So you're would you consider this an example of a graph neural network then?
>> Um this is the so this um great question. So my group pioneered graph
question. So my group pioneered graph neural networks. Um um this is the next
neural networks. Um um this is the next generation of that.
>> Okay.
>> Uh and the beautiful thing is that it's um it's general. It's not about graphs.
It's about databases. So everyone has a database.
>> Yeah. Yeah. Yeah. Because the problem with graph neural networks has always been no one has a graph.
>> Nobody has a graph. But now I'm saying any database is a graph. You can now train what we call relational graph transformers directly on the raw data.
Um and you get all the benefits of let's say true deep learning right think of computer vision pre-deep learning post deep learning pre-deep learning it was about detecting edges and sift features
and gabbor filters and then training a classifier post deep learning era learn directly on the raw pixels of the image we are doing the same thing right where
we say let's learn directly on your database um how do we do that because you know a database is not an image database is not a sequence of text. We say you represent
the database as a graph and now you can your neurons can directly reason over that graph and you get all the benefits that we have already seen you know in in computer vision as well as in natural
language uh understanding with large language models we just again learn over a sequence. The only difference is that
a sequence. The only difference is that image is a fixed fixed grid of pixels.
Text is this linear sequence and a database, you know, is is a more complicated structure. It's a graph.
complicated structure. It's a graph.
So, so we we started this out. I I kind of brought in the example of forecasting and time series, but as you point out,
this technology applies to relational data in general. And so I assume that it would I would be able to use this to do things that I would do as a data
scientist like someone will ask me for I don't know churn prediction score score the customer database for
likelihood to churn right so in the past I would have to go into the database start gathering the features and then build the model there's some feature
engineering I mean this could take weeks right?
>> Exactly. Exactly. So this is the traditional approach where this takes weeks and then you know it's actually much much harder, right? Because now
after weeks of work, you just built a model. Now putting this model in
model. Now putting this model in production is is is even harder, right?
Because this means that actually especially with this temporal type problems, right? For every new event,
problems, right? For every new event, you have to recomputee all the features, right? So keeping features up to date in
right? So keeping features up to date in a production setting is super hard and either these features get stale because they haven't been updated up to the
present time or these features um are kind of have time travel issues, information leakage issues and your your models are kind of miscalibrated and mistrained. Right? So that is a huge
mistrained. Right? So that is a huge issue. The difference the difference
issue. The difference the difference here is that the data scientist just says these are the tables I want to I want to build my model over and it's a neural network who takes care of time
temporal consistency attending over the row data to give you that uh accurate prediction. So you're one of the most
prediction. So you're one of the most one of the painful things about data science work obviously is data itself right the data has missing values and
needs needs to be standardized and cleaned and so uh I'm assuming that your technology requires the data to be at a certain
state that it's not going to work miraculously if the data is a mess.
Uh good question. I think there are two parts to it, right? One fundamental
truth is like uh you know garbage in garbage out, >> right? Um
>> right? Um >> and also and also the whole notion that data science work is 80% data cleaning kind of I don't know some some >> it it yeah I I hear I hear that I think
the difference is is the following right it's like as a data scientist you are operating truly at the modeling uh modeling side of things rather than at the data cleaning side of things. You
say these are my tables these are my row events. Yes, it might be noisy. It might
events. Yes, it might be noisy. It might
have missing labels. But because the the the model is learning over the graph, over the relationships, these models are actually more robust to these kind of data issues.
>> I see. You know, because they can kind of fill in missing values. Every data
point is situated in its relational context. So you can so the model can
context. So you can so the model can kind of learn to borrow information from nearby data points to robustify itself >> or or sometimes sometimes you know you
have categorical values that are a mess right so state of California can be ca c
a l i f you know uh what what what can your technology overcome kind of messy data like that? Uh that's a that's a
good point right like um the way we operate with that is that it's it's a collaboration between the model and the and the and the data scientist right we
would provide tools diagnostics and then of course the the the operator decides what do they want to do with like a super messy categorical column do they want to model that as a categorical
column do they want to model that maybe as text and then you know all these different spellings uh will naturally uh um kind of naturally be able to uh to be
to be considered together. That's part
of the modeling process I would say.
>> So another another task that might fall under the purview of relational data at least uh in some cases right so you can
imagine you're an e-commerce site your data is in a relational database and I'm the I'm a data scientist I'm tasked with
building a recommener system. So would
would this technology be able to do a first pass recommendations model?
>> This technology would allow you to build state-of-the-art recommener system.
>> Oh, really?
>> Uh because so you have actual users doing it?
>> I have actual users doing it.
>> Wow.
>> Um I can I can tell I can tell the story. So for example, one of one of the
story. So for example, one of one of the users of this in the recommener system space is Door Dash. The use case is restaurant recommendation.
Um, right. It's a it's a flagship problem for Door Dash internally.
>> Oh yeah. Yeah. This is one of the most important things that they do. Right.
>> Exactly. Right. And uh the use case specifically is try something new. So
recommend a restaurant you never ordered from before.
Um a after we implemented this technology it was a 30 so 30% improvement over the existing internally
built model right and this was a flagship model this was not somebody you know at door dash woke up yesterday let's build a recommener >> in in this case
they hooked up your technology to their data and pretty quickly they got to this 30% So what they did is it you know as
everything it take it takes a couple of iterations but you know maybe it took a month or two. Um so was very short but essentially the idea was what are the tables we need. Um it was very easy to
experiment with different data maybe you know first we started with users orders uh and restaurants but then very quickly was like oh why don't we also include the user behavior data on the website and that was another table and then it
was like okay we have restaurants but maybe geography is important let's add the geography table as well and as as this u uh schema was curated then the next step is to decide what's the
objective function what's the predictive task you want to do are you predicting let's say orders next week, next month, in the next 48 hours. So we we we tested
played with that as well. But the
platform allows you to do this kind of exploration uh very quickly. Um and and then and then we measured uh measured accuracy and it was a it was a humongous
it was a humongous lift.
>> And what you des what you just described there is essentially no code. So it's
all just prompting.
Um I mean you prompt you prompt through code. So at the end you use an SDK to
code. So at the end you use an SDK to write to write these things. I wouldn't
say it's just prompting is is data schema organization. It is about
schema organization. It is about deciding what is the proper predictive task. There is you know as you are
task. There is you know as you are hearing this is a fine-tuned model because you really care about accuracy parameters.
>> You still need data scientists.
>> You you need a data science operator who does this. Yes. Exactly. This wasn't
does this. Yes. Exactly. This wasn't
just prompted. This is fine-tuned because it's a super high value task and you know the benefit to to to Door Dash in this case was in hundreds of millions of dollars of you know additional food
orders because recommendations notifications are now more accurate >> and then once the once they built the recommendation model easy to deploy
>> uh it is easy to deploy actually it's that's a good point the deployment with this type of technology also gets easier because all you need to do is refresh the raw data >> and then
>> all right so the >> what about latency >> uh good point so latency is the following you can do online inference over this graph and of course that has I
don't know 100 millisecond type response if you want shorter um what you can always do is you can kind of use the model to prematerial prematerialize the
embeddings and then do the the the either Exactly. You could do you do
either Exactly. You could do you do batch inference and then you do the real-time scoring, you know, in 10 15 milliseconds based on those embeddings with a with a with a very efficient model. So both are possible. It it just
model. So both are possible. It it just depends on the on the application and on the um requirements.
>> So I I'm assuming that you also have a class of users who are just prompting.
>> We have class of users that are prompting. But actually to be to
prompting. But actually to be to honestly say where we see the most benefit is in this uh sophisticated >> data science >> organizations.
>> I see.
>> And the reason the reason for that is that sophistic sophisticated organizations can measure value very quickly.
>> All right. like in if with with less sophisticated ones it's more like oh some data scientist feels they want to do something from ground up with open source and and they keep doing that
right while while I would say in more sophisticated organizations with proper AB testing and so on it's very easy to prove value or it's easier to prove value the the bar of course is much
higher like I just gave you the the door the door dash example but they have right they are sophisticated to say hey Look, this is actually providing this much value. We are uh we are going to do
much value. We are uh we are going to do this. So, you know, not not only not
this. So, you know, not not only not only um Door Dash, right? Like for
example, another use case is advertising models at Reddit, right? These are large scale.
>> Yeah. Yeah. Yeah. Yeah.
>> Flagship flagship models, right? Like
it's about predicting whether user is going to click an ad. That's exactly
what drives revenue, you know. That was
that was another amazing improvement. It
was like four or five years of improvement of what they would do internally if they would kind of because every year they increase you know model accuracy for about 1 2%
uh the team does right because even with all the all the innovation and all the architectural changes and everything but with this relational foundation model type approach with this graph based approach it was like four years worth of
improvement in in [clears throat] a couple of months right so that's another very successful use case and we see you know examples at Coinbase, Expedia, and and others, right?
>> Yeah. This is interesting because when I first read about you, honestly, I my first reaction was, >> man, this sounds cool, but I don't know how they can sell this because they're going to sell it to the people who will
get displaced.
But then it turns out it turns out actually uh uh >> from what you're describing, you're not displacing them. You're making them look
displacing them. You're making them look like rock stars.
>> Exactly. I think this is not displacement. This is not about hey we
displacement. This is not about hey we can replace your job. This is this is a this is a tool that the data scientists can do what the data scientists are
there to do which is impact the business impact the business metrics right so with this type of tool a data scientist becomes a rockstar who's truly driving
business outcomes who who is now you know 20 times more more productive is able to explore more it's able to think more about how do I model here what do I
do here and impact the business downstream So exactly, I think you said it very nicely. It gets from grunt work to being a rock star.
>> So So you've mentioned fine-tuning. So
in in uh foundation model fine-tuning, the typical workflow is here's some labeled examples. Prompt desired output,
labeled examples. Prompt desired output, prompt desired output, upload it to some fine-tuning service, go away, come back, you have a fine-tuned model. So what is
the workflow for fine-tuning a kumu model?
>> Uh great question. So the way the the the the fine-tuning the fine-tuning works is that you specify the you specify the task you specify the
predictive task. The the beautiful thing
predictive task. The the beautiful thing is that a lot of these tasks if you think about churn recommendation they are temporal meaning you kind of can slide through time pretend that the time
of now is kind of moving forward. you
can look into the future what truly happened to have the label and then you can use the past to predict that future label. So basically the point for
label. So basically the point for fine-tuning is that we can automatically generate based on the specification through this time traveling mechanism uh
a data set that is used.
>> Oh you you generate the data set >> I mean data set we generate the labels based on the time travel through the original data set. So you don't have to be doing even that manually. Of course
you can automatically generate it and you know you can modify it and clean it if you like as a as a data scientist but you don't even have to do that. You can
just say predict me this thing and you know we say oh you are predicting 30-day churn in the future churn is you are predicting the count of transactions being zero. Okay let's now kind of have
being zero. Okay let's now kind of have this 30-day sliding window over all your data and that will define the labels.
let's create the subg graphs in the past to predict the label in the future and that generates the the fine-tuning examples that then the platform uh the platform uses
>> and in uh what's the typical number of examples you think that that would make for a good fine-tuned model I guess depends on >> that but kind of
>> yeah that's a [clears throat] that's a great question uh we see that the range can be quite quite humongous we can do well with you know a thousand examples
on the on the low end all the way to tens of billions. Um, if you think about, you know, Reddit, all its users, order clicks, all the posts, all the
votes. Uh, but the key the key is I
votes. Uh, but the key the key is I don't have to generate it, right? I just
describe the key. You don't have to generate it.
>> Yeah, >> exactly. You describe the task, you
>> exactly. You describe the task, you point me to the raw data, and the platform figures out the rest.
>> Ah, interesting. So obviously I'm a data scientist. I I work with structured data
scientist. I I work with structured data but I that's not the only thing I'm responsible for increasingly right. So
unstructured text I don't know semistructured data. So is there is
semistructured data. So is there is there something on your road map that will expand the utility of KU kumu beyond structured?
>> Uh that's a great point. So even now I would say you know um we deal really well with unstructured data as well as
it is part of tables right so you have descriptions strings uh conversations you have images all that can be used yeah >> comments all that can be used
>> to build this predictive models right so Kumar is about building predictive scoring forecasting type models it's Not
we don't do document summarization. We
don't do uh you know um question answering over a static set of documents. in a in an e-commerce example
documents. in a in an e-commerce example where there's a lot of product reviews.
Uh then uh exactly the platform can can leverage those product reviews to make uh to make uh uh to make more accurate recommendations or you know we just work
with one of the or we are working with one of the largest uh dating apps uh out there and you know we were we they are using Kuma to predict who's going to um
I don't know go go on a date with whom and you know their current system did not use images so we were like just let's add an image column here with
images of all these people and let's bend it and it was immediately 15% more accurate.
>> Oh you mean you you actually leverage the images so you use some sort of computer vision.
>> Exactly. So the image the image embedding the image encoding now is part of the recommener system and if you want to say who's going on a date with whom then knowing how that person looks you
know non surprisingly contains a lot of signal right so the beautiful thing is that with this technology you just add another column to the table you say here's the image of the person um here's
the and and the platform takes care of the rest right so it's very easy to also build this multimodel predictive models that are using the textual data, comments, images and things like that
but for predictive tasks.
>> So if you were to rank the top three ways people uh top three kind of families of use cases for kumu will it
be forecasting recommendation what are what are those top three?
>> Yeah let me answer. I think um where we see lots of traction one is kind of in retail commerce which is about user behavior
um type questions right like recommendation churn lifetime value and so on. Then another place where we see
so on. Then another place where we see lots of value is on safety fraud financial abuse uh abuse. So a lot of
fraud models um any kind of bad behavior models things like that across social network financial institutions and so on. Um and then uh another place where
on. Um and then uh another place where we see lots of value is risk models predicting all kinds of risks you know about deciding whether to discharge a p
patient in the hospital deciding to give you know is this fraud in insurance industry uh things like that. So these
are these are the big I would say it's the retail, it's the media um and then and then this kind of fraud financial
use cases. So do do you folks have some
use cases. So do do you folks have some homegrown entity resolution kind of tool was basically like I said some of this data
like fraud, right? So I could be I could be using multiple versions of my name and whatever, right? So is that part of the system or do I have to do I have am I responsible for that?
>> Uh that's a great question. The way we would usually do this is we would like if if we want to go that far to do this kind of uh entity resolution step then the platform basically allows you to quickly build this entity resolution
models as almost like a uh you know pre-processing data prep-processing models that then curate the data for the final model. So that's how we would do
final model. So that's how we would do it. It's almost like a stage process.
it. It's almost like a stage process.
It's in the it's in the platform.
>> The the platform allows for uh to have to to allows you to to do or to build entity resolution models if that is uh but what we see because our data our
information is graph based then the need for entity resolution is actually smaller because these entities kind of get connected to the through the graph anyway.
>> Yeah. And through this attention mechanism over the graph, the the predictive model can kind of learn to do entity resolution on the fly at this as it is making a prediction. So you don't
have to be doing entity resolution separately from the predictive task.
>> Um so as with any AI technology, right?
So you have users uh in order to engender user trust you have to kind of explain there's some sort of explanation for how you arrived
at the decision or some transparency.
So what how do you do this for your uh us surveys of mainly data scientists?
>> That's a great that's a great point. I
think explanation is critical because in contracts contrast to business analytics where once you have the result you can kind of go back and and verify it. In
prediction you need you you you cannot go back and verify it. You have to kind of wait for it to happen to see whether it was correct. Right? So the the
importance of pred of um uh explanations is is even higher. And what do what we have is a completely new approach to um
uh explanations because our models as they attend over the row data we can basically ask ask them we can run them backwards and say what did you attend
over. So our models can go back to the
over. So our models can go back to the row data row tables row columns rows and say this is the data I used to make this
prediction. So you get this truly
prediction. So you get this truly datadriven data rooted uh explanations.
Um and now of course you can get them at the level of the entire model where the model says you know here is what I'm looking to make these predictions and you can say okay yeah this makes sense.
This these are the the right signals to look at. If the model is looking at
look at. If the model is looking at wrong signals you know something something is wrong and you can fix it.
Um and also you can do it at an individual prediction level when you say okay here we make this recommendation and you can then ask okay what events in the past what structures in the past
made me make this prediction. Um and
with that we can either uh present that to the user in a structured form. But
what we see works really well is that we take this structured explanation uh plus the semantic model and put that into an LLM to generate a textbased explanation.
And those explanations are actually easy to consume, easy to read and really good and non-h hallucinated because they are truly done from what the model is
attending over. Well, you might as well
attending over. Well, you might as well in the tax explanation also put charts in there if charts help explain what you did right?
>> You could also Yeah, exactly. Go beyond
and and and and present this in more graphical form >> and just say, you know, here are the things that I used the, you know,
features and then there's a bar chart >> or or or whatever. So another another of obviously trendy topic in AI is agents.
So, how do you fit in the world of agents? And and and the reason I bring
agents? And and and the reason I bring it up, you're agents, one way to think of it is uh maybe you don't need a monolithic model. What you need is a
monolithic model. What you need is a model that mainly breaks down the task and assigns it to either smaller models or specialized tools even, right? So,
external tools like via MCP. So, how do you fit in the world of agents?
>> Uh that's a that's a that's a great point, right? Like one one important
point, right? Like one one important thing of agents is agents need a decision power. That decision power
decision power. That decision power needs to be rooted in data, right? So we
can basically be a tool for the agent to make predictions based on which it it makes decisions, right? So very quickly, right? Like I would say the present
right? Like I would say the present generation of agents, they are mostly let's reason over some static text and do something. But for the agent to be
do something. But for the agent to be truly autonomous, reasoning over text is not enough. It needs to reason over
not enough. It needs to reason over internal structured data to make decisions. Right? So for example, you
decisions. Right? So for example, you could say I'm an agent. I want to see which are the high let's say maybe in an insurance industry. I'm an agent who
insurance industry. I'm an agent who estimates how likely uh what are my top customers who are most likely to churn.
Then I need to estimate what offer do I would I send them and then I need to write an email and send that email.
Right? If you think about this workflow, there are two predictive problems here or two decision problems here. One is
let's estimate the risk score of every agent of every customer and then for every of these high-risk customers, let's go and uh recommend what is the
best uh offer, the most likely offer they're going to take and then let's write the email. Right? And today both these decision problems the the risk scoring estimation and recommendation
you would need a data scientist to manually solve spend months and only then this agent can do something right with a with the relational foundation model agent can just query the database
and say estimate me the risk scores select me the top thousand people who have the higher risk of churn now for each of these people recommend me what is the next best offer uh to give them
>> by the uh uh you basically fit into the category I mean uh you're more you do more than uh what the time series
foundation models do but you basically I can use you as a time series foundation model right >> you could use me as a time series foundation model because of course our our uh >> because it's rel it's still relational data
>> it's it's in a single table but for example where the benefit becomes is time series are not isolated right if you have a time series per product.
Maybe you have a product taxonomy, right?
>> But now our model not only learns from the target product time series, but through the product taxonomy can attend over time series of other products. So
it can start to borrow information from other time series correlated related to the target product to give you much more accurate time series forecasts.
>> All right. So now we're at the stage in the podcast where I have to ask you, what is open source?
what what part of what what part of Kumo is open and what is not?
>> Yeah, great question. So, we are authors of the most popular open-source uh graph learning library called PG or PyTorch Geometric. So, pyg.org is where people
Geometric. So, pyg.org is where people can use this. Of course, I would say graph learning putting that in production is super hard. It requires
years of engineering to uh to do it. And
also the way the way you describe it, it seems like that's only one aspect of the offering right?
>> Exactly. Exactly. Right. So so so doing that. So what what we did at Kuma is
that. So what what we did at Kuma is first we built a large scale production ready graph learning platform that now anyone anyone on the planet uh on the
planet can use. On top of this, we also built a a foundation model um that is accessible through SDK, through MCP and
um we have that available uh for for people to use um through our through our servers. Um we are not able to open
servers. Um we are not able to open source this because infrastructure >> open weights. Yeah,
>> even open weights is invisible because you need all our infrastructure to run this, right? You need a graph learning
this, right? You need a graph learning infrastructure to run the graph learning model. So if I open open weight the
model. So if I open open weight the graph model >> you need infrastructure to run it. So
because of this coupling rather than say we'll open source it we just say here are the APIs you can use it um you can experiment with it for free you can use it for free um you know and then once
you start getting value out of it we can talk about paying but trying this out playing with this is free people can use it um and the reason being is that you know it's a special infrastructure plus
a special model architecture >> by the way I forgot to ask you I don't know how I forgot this but foundation models obviously the uh like as you mentioned you trained the foundation
model. So what was the data?
model. So what was the data?
>> Uh great question. Yeah,
>> the data is it's interesting. It's actually some
it's interesting. It's actually some open data on the web but they need to be databases. It needs to be multitabular,
databases. It needs to be multitabular, right? Single tables uh uh do not work
right? Single tables uh uh do not work here. Um and then what we do is a lot of
here. Um and then what we do is a lot of synthetic data generation, a lot of data augmentation because what you need to do
is you need to teach this transformer how to how to learn from patterns, right? Because then when you give it the
right? Because then when you give it the incontext examples, it needs to kind of learn from those in context example and the and generalize to the unlabeled example. So the key is to think of this
example. So the key is to think of this is teaching the transformer to learn and recognize patterns, right? So it's not so much about ingesting data semantics,
it's really about pattern recognition to some degree.
>> And then and then the typical uh the typical usage of it is uh you pointed to data that's relatively
relatively uh it's already being used by data science science teams for analytics and forecasting. Right? So in other
and forecasting. Right? So in other words, you're you're not going to hit the ERP system which has 10,000 tables, right? So so you're going to the typical
right? So so you're going to the typical data scientist already goes to a data warehouse or lakehouse or something, right?
>> Exactly. You would connect to a to a to to that. Exactly. It's data warehouse.
to that. Exactly. It's data warehouse.
>> So So the type of data you simulated using synthetic data more or less kind of mimics uh data warehouses that kind of data warehouse star schema kind of
thing. Uh exactly whatever the schema is
thing. Uh exactly whatever the schema is right and the platform you know can easily connect to your snowflake datab bricks uh bigquery big table S3 files
paret files iceberg whatever it is right you just say here's my data these are my tables can you can you give an intuitive
explanation for why how does this work so well because in in the following sense right so if you just take univariant time series forecasting.
Well, there's infinite many patterns of time series and you know I mean uh you can't potenti you can't possibly have seen every possible
uh uh time series out there, right? So
look at just the internet of things.
There's so many uh devices out there. So
what's the intuition for why this works?
That >> I mean that's a that's a great point, right? you know what I'm saying and uh
right? you know what I'm saying and uh kind of what I'm promising it seems a bit uh crazy or unbelievable and magical and and we were we were we were
surprised as well. It's unclear why why should this even work and I have I think two answers. I think
two answers. I think >> maybe the first answer is um and I think the the truth the truth is a combination of two things. First is that
transformers, neural networks seem to have this kind of generalization ability where they are going to go kind of beyond what they were trained on, right?
Somehow in some I don't know in some embearing space they're able to to to put kind of similar things together so that it's that it's possible to to connect them and generalize between
them. So I think that's one part of the
them. So I think that's one part of the answer and then I think the other part of the answer is you know maybe the the world the world is less complicated.
>> Yeah. Yeah. than than we think, right?
Like in some sense, you know, um if you think about, you know, maybe the best thing to think about is tailor expansion, right? Like to the first
expansion, right? Like to the first approximation, the world is linear mostly, right? So I think maybe the
mostly, right? So I think maybe the world the universe is more orderly or less complicated to say, oh there is arbitrary functions, gazillion of
arbitrary things, right?
Probably not, right? So I think with these two that there is order in the data and that those patterns yeah everyone is unique but at the
generalization level maybe they collapse to a smaller set of these patterns I don't know maybe I just invented a new word right um you know you call it patterns or something maybe that's what
the model is learning together with the generalization allows you to do this >> I guess the litmus test Yuri is having worked as a quant in a hedge fund is can
I use this in kind of the quant setting to to build trading models. Right? So
that would be interesting. Uh
>> that's that's that's interesting. That's
I think that's interesting.
>> I don't know if anyone would uh anyone listening will attempt to do it but uh cuz uh to the extent that that data is relational, right? So maybe you have
relational, right? So maybe you have some earnings reports that uh you can embed and store in columns and stuff but uh I don't know. So what's your
intuition? Will it work? [laughter]
intuition? Will it work? [laughter]
>> I think it's interesting, right? I think
what this allows you to do is really learn from this, you know, second, third order correlations. And you know, the
order correlations. And you know, the economy is is a network of interdependent intercorrelated parts. I
think you can put you can put all of that data in a relational database and then maybe embed some of the earnings reports and >> Exactly. And I think this system will do
>> Exactly. And I think this system will do all of the work for you. Right.
>> Uh that is true. Right. So at that level you can I think you can get an edge.
Right. At the same time you know can we predict how some dictator is going to wake up tomorrow on what they are going to do. Certain things you know are also
to do. Certain things you know are also unpredictable. But from saying okay
unpredictable. But from saying okay given the data we have what what is the what is the signal we can extract? I
think this technology is the way to extract signal from structured data >> and also the >> to maximum degree >> the benchmark is the human what is the
human quant able to do can I can I >> exactly the bench the benchmark is the human of course maybe one thing you could do is to say okay why don't I
unleash an LLM and LLM is going to uh uh to go through this and and build these models and so on but those are uh those take time >> and one >> right and it's clear. They are subhuman in performance.
>> And one last example I'm going to throw out there is sports betting. All these
people doing sports betting, right? I
mean all that all that all that data, the scores, the historical averages, all of that is in relational database, right?
>> Exactly. That would be one another way how you could get you know of course all these things are ultimately uncertain, right? So predictions come with
right? So predictions come with intervals. My ba my baseline is human.
intervals. My ba my baseline is human.
>> Your baseline is human. Exactly. And
that we can do better, right? I think is the point.
>> All right. So listeners, so don't we are not investment advisors. Do not take our advice, but uh you go ahead and try Kumu. And with that, thank you, Yuri.
Kumu. And with that, thank you, Yuri.
>> Thank you so much, Ben. [music]
>> Thank you so much, Ben. [music]
Loading video analysis...