Learn MLOps with MLflow and Databricks – Full Course for Machine Learning Engineers

By freeCodeCamp.org

Summary

Topics Covered

ML Outputs Are Probabilistic, Not Deterministic
Notebooks Fail at Scale Without Tracking
Production Demands Auditability and Reproducibility
Prompts Replace Models in LLMOps Workflows
Databricks Eliminates MLflow Setup Overhead

Full Transcript

This course is an end to-end guide to mastering ML flow, the industry standard for managing the machine learning life cycle. This course offers a deep dive

cycle. This course offers a deep dive into the internal mechanics and architectural patterns of ML ops. From

your first local experiment to deploying productionready models via a centralized tracking server, you're going to gain the hands-on expertise required to build

reproducible and scalable ML systems. You'll learn how to use data bricks to integrate MLflow's tools into professional workflows as a single

source of truth for your entire model registry. Hi everyone, if you are

registry. Hi everyone, if you are learning machine learning, MLOps or GI at some point you will hear this question often. How do I track

question often. How do I track experiments properly? How do I manage

experiments properly? How do I manage models, parameters and metrics? And if

you are dealing with LLM then how do I manage prompts? That is exactly what

manage prompts? That is exactly what this video is about. In this complete end to- end session, I'll walk you through MLflow not as isolated features

but as a real system that is used in modern MLOps and LLM ops workflow.

This is mostly practical working understanding uh kind of video. You

won't be looking at too much theory except for the first video and across this video you will see how MLflow is

used for experiment tracking parameters and metric logging model versioning and life cycle MLOps workflow and then there will be LLM ops as well like how do I

manage prompts, how do I load prompts, how do I evaluate uh prompts and things like that.

If you are a student, a working ML engineer or someone transitioning into MLOps or geni, this video will give you a solid mental model of how ML flow

actually fits into real projects. You

don't need to watch everything in one go. Treat this like a reference guide.

go. Treat this like a reference guide.

You can come back to now. Let's get

started and build a clear understanding of MLflow from the ground up.

Hello everyone. In this video we are going to discuss uh about the most important and fundamental step before we actually

begin you know doing hands-on with MLflow and that is why ML systems need experiment tracking beyond notebooks and

gates. So let's get started.

gates. So let's get started.

So I have not uh made these uh presentation slides um very textheavy because I wanted to

uh I wanted to give you the gist of why MLflow exist without drowning into too much details.

Okay. So here if you see this diagram this on the left u represents a mess and this on the right represents

something which is ordered and connected and it has some connections. So we are saying this as from ad hoc experiments to production ML systems.

So before we actually dive into anything of that sort, we are going to talk about how most MLflow projects start.

So usually if you have built a machine learning model, you start with a Jupyter notebook. Okay. So that is one notebook

notebook. Okay. So that is one notebook and you will have one data set. It could

be locally downloaded in your laptop or in your compute instance whichever you are using or it could be um very small data set. Okay, it could be a large data

data set. Okay, it could be a large data set or small data set or it could be present uh in in a cloud repository or in your local um notebook where your

notebook instance is running.

Now and there is one model also. Initially

at least you have you will have one model. you won't be you won't be doing

model. you won't be you won't be doing much uh experimentation and um uh you know finetuning

that sort of thing and one person and for most of the people this is fine and if let's say you are into academic and you are doing some sort of research

where you have not more than maybe one or two people in those scenarios I don't think MLflow should be used also because it's like wasting more time on these

type tools if you don't have that much um requirement. Okay.

um requirement. Okay.

So let's move on.

Now I have basically uh given some naming convention that people usually uh use in their initial stages of machine learning experimentation.

They will have some sort of uh naming for uh their notebook. they will have some sort of uh model naming the model that they trained and they that they think is the best not think but uh based

on the metrics they think um they have come to this conclusion that this is the model which is best and there's there's a lot of confusion here why it why I'm

saying this is a confusion because uh you might be work working in a in an organization where some uh 10 20 data scientists also exist and in those

scenarios they will have their own mess. Okay. And

if you have to communicate effectively, you won't be able to um and you can't trust your memory here. So that's where the confusion u exist and and this is

where my next slide uh points to the hidden assumption.

We humans tend to overestimate our power to remember. We say I'll remember how

to remember. We say I'll remember how this model was created. But this does not always happen. We always forget. You

have to trust your experience not um your mind because your experience tells you that uh you forget but your mind will tell you no I'll remember. I don't

have to write it down. Okay. So what is the difference between machine learning and traditional software building? In

traditional software building you have something which is known as deterministic outputs. you have a given

deterministic outputs. you have a given piece of code that will result in a given artifact or given um deployment but that is fixed but in machine learning uh the outcomes are

probabilistic in nature because we are dealing with data and randomness here.

So that is why uh here the diff here the uh meaning of version differs from software building. In traditional

software building. In traditional software uh we have version of code but here version means something close to decision history. So you have got a

decision history. So you have got a different set of u tuning parameters, hyperparameters that you uh can tweak a little bit and then you your result will

be drastically different. So those

parameters have to be logged.

Now what is an ML experiment? What is an experiment? So you can think of

experiment? So you can think of experiment as some sort of encapsulation of all these five components. These five

components are code.

Obviously without code you can't train.

You will have data. You will have you will have parameters. You will have randomness. And there is this fifth

randomness. And there is this fifth point which I think most people overlook and that is environment because you must be using some sort of packages in your

uh training experiments.

So that environment also has to be uh considered a part of ML experiment.

Okay. So these five things will um create your experiment. Now what g captures and what um what uh git cannot capture. So, git only captures code

capture. So, git only captures code changes and that is why we are talking about something different than what traditional git offers uh for machine learning uh systems and experiment

tracking.

Now, why notebooks don't scale? They

don't uh scale because first of all, the most primary reason is that there is no structured metadata. So you uh run

structured metadata. So you uh run things uh line by line, cell by cell and you don't have uh track of how many models you have trained within one

notebook or across several um different notebooks. You don't have a inherent

notebooks. You don't have a inherent execution order. Okay, no structured

execution order. Okay, no structured metadata and that there is very um um you know it is very hard to compare runs. Runs means within an experiment

runs. Runs means within an experiment you can have multiple runs. We'll talk

about that um in maybe later videos. So

what is a core problem? Core problem is that ML systems lose this decision history and this decision history is what we are trying to capture using

MLflow. Now why this becomes a dangerous

MLflow. Now why this becomes a dangerous uh thing in production? Because

production requires uh us to train our model again and again. Why? Because data

changes, team changes and a new person comes they that new person wants to probably wants to reproduce what the old person has created. And not only that

your infrastructure will also change.

You might be working in GCP right now but you will have to let's say switch to Azour. Those things are not in your

Azour. Those things are not in your control because management can decide uh to change these things at any moment. So

every production team must answer why uh a given model is in production. We

should be um clearly able to articulate why um a given model is uh is is basically pushed to production. What is

the reason for that? Maybe its accuracy is changed but maybe it accuracy is best but just answering that uh its accuracy

is uh best is not enough because you will have to answer lot of different things. Okay. So what happens without

things. Okay. So what happens without tracking so you don't have reproducibility, you don't have auditability. Now this is very important

auditability. Now this is very important because in a tight uh organization um when I say tight what I mean is where uh

these compliance checks are u very heavily enforced in those type of organizations auditability is something that

uh needs to be taken very seriously. So

um so this particular uh this particular u thing cannot be uh achieved without tracking. Okay, you can't uh audit

tracking. Okay, you can't uh audit through a notebook. Okay, no safe roll backs. Let's say if you have a model in

backs. Let's say if you have a model in production and if you would like to uh um go to some other previous healthy

version, you can't do that. Uh obviously

you can do that but there are uh manual things that you would like to uh you would you would have to do. Okay. So now

some of the excuses of not applying uh these u principles of ML ops and ML flow are as follows. We will clean up our code later. The code which is which

code later. The code which is which exist in notebook we will we are going to clean up it later. they will say this is just a research and I feel this can

this excuse can be uh taken if let's say your you have a single uh person team and you don't have uh much you know

organization and collaboration then I think that's fine tracking slows us down tracking will will never slow you down tracking will only enhance your future

productivity so those things uh uh are some of the common early stage excuses uses that people make.

Now what breaks? We all know folder based tracking where we let's say create u a folder for let's say uh something like uh experiment this maybe we are

testing a new uh feature within our training pipeline. So we'll create a new

training pipeline. So we'll create a new folder for that that experiment. We'll

create a spread spreadsheet comparison that also does not work and we sometimes also make memory based decision because

we overestimate our power of memory.

Okay. So essentially this is a systems and discipline problem and this problem is actually um a problem when when we are dealing with uh a organization which

is very involved in compliance checks and um it's a huge organization having multiple data scientists and multiple people working on a problem.

So what experiment tracking actually solves? So there are few things that uh

solves? So there are few things that uh experiment tracking uh can help you with. First is decision traceability.

with. First is decision traceability.

You can trace what led to a given decision. For example, what led to a

decision. For example, what led to a model being pushed to production. You

have got systems memory, team alignment and production safety. So how how it leads to team alignment? Because

everybody is aligned uh on let's say a given model being pushed to production.

they know why it is it is so they can clearly go into a centralized tool such as MLflow server which we are going to

talk about later and clearly see why a given model has been pushed into production. Okay. So uh

production. Okay. So uh

yeah. Okay. So now I think that would be it for this uh particular video there.

Uh in this video I just wanted to basically um u give you an motivation to use MLflow

within your organization setup and we won't be doing any MLflow APIs and UI walkthrough um just yet. We'll be

doing that maybe in next uh video. But

yeah, that was the intent of this video.

I hope you liked it. I'll see you in the next video. Thank you so much.

next video. Thank you so much.

Hi everyone, in the last video we discussed why it is important to consider using ML ops or in particular ML flow in your machine learning

projects and systems. Now in this video assuming that uh you are um satisfied with the reasoning that uh I gave you

and ready to basically work with MLflow.

In this video, we are going to be setting up MLflow within our local system that is laptop. But you can

reproduce these steps to basically run it even in your cloud environments.

In cloud environments, you would have to do extra step to configure the UI part because here in uh laptop, I can simply

run a local server and open local host to um get started with UI. But in cloud let's say if you are working with uh

cloud um shell there you would have to maybe depending upon which cloud platform you are working with you would have to do some extra step to enable that UI.

So uh let's get started. What I'll do is uh I have created a separate folder MLflow YouTube. Now within this folder I

MLflow YouTube. Now within this folder I will create my uh demonstration all the scripts that I will be using for demonstration. Okay.

So let me do one thing. Let me open terminal and in this terminal what I'll do is I'll uh first of all navigate into documents

and then mlflow YouTube. Okay. And in

here I can open my visual studio code.

Okay. So we are now in the visual studio code. I won't be using any um AI help in

code. I won't be using any um AI help in this video but I will be copying a lot of code from the documentation of uh MLflow.

So let's get started. So first of all I would like to maybe I will use its own uh visual studio codes terminal and uh let's create a

uh let's create a virtual environment here. Okay. So I

will do something like pi. Okay. So we

have Python installed. You would have to make sure that Python is installed on your system. So I have Python 3.11.2

your system. So I have Python 3.11.2 and I'm actually using Venv to work with multiple versions of Python. So if I do

Python 3, uh I'm getting this but I can also one second get other versions using VNV that I won't be covering and boring you in this video.

So uh what I'll do is I'll create a virtual environment

here in this directory using this name.

Okay. So now what I'll do is I'll activate this source then bin and activate. Okay. So our virtual

activate. Okay. So our virtual environment has been activated. Let's

quickly install the MLflow package.

It would take couple of seconds. It

should not take much time. And

meanwhile, what I can do is I can come here on the documentation and I can go on quick start and I can copy this.

Maybe what I'll do is I'll just copy this script or I'm hesitant in copying.

But let's let's um create a uh lecture 1.py. pi. Okay, inside here I will import mlflow.

Okay, currently it's not using my uh local VNV that is why you see this uh curly lines but that should be fine for

now.

Now after this uh after accessing after basically accessing this the first step is to set the experiment

experiment. I will um name my experiment

experiment. I will um name my experiment as maybe uh something like I don't know uh

a demo experiment.

Okay.

And then I can run this script. Okay.

So, first of all, I will say Python lecture 1.py.

lecture 1.py.

It should do nothing I guess at this point. But you will see bunch of uh

point. But you will see bunch of uh things getting created here.

It will take some time. Let's wait.

Okay. So, you see this a new directory called ML runs getting uh is getting created. But here nothing else happened,

created. But here nothing else happened, right? But let's start a so so basically

right? But let's start a so so basically when you install MLflow what happens is uh you get a command

line tool with the name MLflow. So if

you run this MLflow you see all these options to work with MLflow's commands.

So MLflow server and you will see all the commands related to server but it started the server on the default port. So let's

open the uh server now.

127. Okay. So I'm getting this uh server. You see these steps home

server. You see these steps home experiments models prompts because now earlier in the earlier days of MLflow,

MLflow was used only for traditional machine learning but nowadays it it has been able it has been um used for geni based experiments as well. So you see

these prompts here. So if I come here in the experiments, you see my demo experiment. If I click on it, um one

experiment. If I click on it, um one second, it's asking me because it's my first time I'm visiting this dashboard.

I'll just confirm this.

Okay. So uh yeah. So this is what uh I wanted to show. Now this server will uh continue running. We'll talk about this

continue running. We'll talk about this ML runs um directory in a moment. Okay.

So, I hope you understood what uh I wanted to demo in this video. Now, we

are set up with uh the MLflow. In the

next uh video, we'll be creating our first brand new experiment and also in that experiment, we'll be creating a new run. Okay. So, thank you so much for

run. Okay. So, thank you so much for watching this video. Have a nice day.

Hello everyone. In this video we are going to create our first experiment and also within that experiment we are going to create bunch of runs.

I hope you are excited. Let's get

started.

Okay. So we already have this code but let me create a new file for lecture 2.py.

2.py.

Within this I'll just copy this and paste it right here. Okay.

Now, now that this set set experiment is um uh basically called here, anything that I call after this line will be logged

within this experiment. Okay, how do I create a run within this this experiment? Since experiment has already

experiment? Since experiment has already been um created, what I can do is I can start with the MLflow dot

uh start run.

Now this start run will create a context and within this context I can do some sort of logging. But for now let's keep

it very simple and I will log let's say log params. I'm calling this function log params and let's pass dictionary

here. Maybe I have a hyperparameter. Hi

here. Maybe I have a hyperparameter. Hi

everyone, in the last video we discussed why it is important to consider using ML ops or in particular ML flow in your

machine learning projects and systems. Now in this video assuming that uh you are um satisfied with the reasoning that uh I gave you

and ready to basically work with MLflow.

In this video we are going to be setting up MLflow within our local system that is laptop but you can reproduce these

steps to basically run it even in your cloud environments.

In cloud environments, you would have to do extra step to configure the UI part because here in uh laptop I can simply

run a local server and open local host to um get started with UI. But in cloud let's say if you are working with uh

cloud um shell there you would have to maybe depending upon which cloud platform you are working with you would have to do some extra step to enable that UI.

So uh let's get started. What I'll do is uh I've created a separate folder MLflow YouTube. Now within this folder I will

YouTube. Now within this folder I will create my uh demonstration all the scripts that I will be using for demonstration. Okay.

So let me do one thing. Let me open terminal. And in this terminal what I'll

terminal. And in this terminal what I'll do is I'll uh first of all navigate into documents and then MLflow YouTube. Okay. And in

here I can open my Visual Studio code.

Okay. So we are now in the Visual Studio Code. I won't be using any um AI help in

Code. I won't be using any um AI help in this video but I will be copying a lot of code from the documentation of uh MLflow.

So let's get started. So first of all I would like to maybe I will use its own uh visual studio codes terminal and uh let's create a

uh let's create a virtual environment here. Okay. So I

will do something like py. Okay. So we

have python installed. You would have to make sure that python is installed on your system. So I have python 3.11.2 2

your system. So I have python 3.11.2 2 and I'm actually using venv to work with multiple versions of python. So if I do

python 3 uh I'm getting this but I can also one second get other versions using vnv that I won't be covering and boring you in this video.

So uh what I'll do is I'll create a virtual environment

here in this directory using this name.

Okay. So now what I'll do is I'll activate this source then bin and activate. Okay. So our virtual

activate. Okay. So our virtual environment has been activated. Let's

quickly install the MLflow package.

It would take couple of seconds. It

should not take much time. And

meanwhile, what I can do is I can come here on the documentation and I can go on quick start and I can copy this.

Maybe what I'll do is I'll just copy this script or I'm hesitant in copying.

But let's let's um create a uh lecture 1.py. Pi. Okay. Inside here I will import MLflow.

Okay. Currently it's not using my uh local VNV. That is why you see these uh

local VNV. That is why you see these uh curly lines but that should be fine for now.

Now after this uh after accessing after basically accessing this the first step is to set the experiment

experiment. I will um name my experiment

experiment. I will um name my experiment as maybe uh something like I don't know uh

a demo experiment.

Okay. And then I can run this script.

Okay. So, first of all, I will say Python lecture 1.py.

lecture 1.py.

It should do nothing I guess at this point. But you will see bunch of uh

point. But you will see bunch of uh things getting created here.

It will take some time. Let's wait.

Okay. So, you see this a new directory called ML runs getting uh is getting created. But here nothing else happened,

created. But here nothing else happened, right? But let's start a so so basically

right? But let's start a so so basically when you install MLflow what happens is uh you get a command

line tool with the name MLflow. So if

you run this MLflow you see all these options to work with MLflow's commands.

So MLflow server and you will see all the commands related to server but it started the server on the default port. So let's

open the uh server now 127. Okay. So I'm getting this uh

127. Okay. So I'm getting this uh server. You see these tabs home

server. You see these tabs home experiments models prompts because now earlier in the earlier days of MLflow, MLflow was used only for traditional

machine learning but nowadays it it has been able it has been um used for ji based experiments as well. So you see these prompts here. So if I come here in

the experiments, you see my demo experiment. If I click on it, um one

experiment. If I click on it, um one second, it's asking me because it's my first time I'm visiting this dashboard.

I'll just confirm this.

Okay. So uh yeah. So this is what uh I wanted to show. Now this server will uh continue running. We'll talk about this

continue running. We'll talk about this ML runs um directory in a moment. Okay.

So, I hope you understood what uh I wanted to demo in this video. Now, we

are set up with uh the MLflow. In the

next uh video, we'll be creating our first brand new experiment and also in that experiment, we'll be creating a new run. Okay. So, thank you so much for

run. Okay. So, thank you so much for watching this video. Have a nice day.

dot uh I will I will try to create a new run here.

Okay. All right. So, MLflow dot uh run name test artifact and in place of parameters I would like

to log artifact and here I want to pass the artifact path but let's for the sake of demonstration log the lecture 1.py pi

file and let's run this really quickly.

Lecture 2.py.

Let's see what happens. So it's run. Now

let's look at the test artifact in the demo experiment. So in the test

demo experiment. So in the test artifact, if you see artifacts, I have got this test.py. So this entire test.py

got created. Now if I open ml runs, you see certain structure is being followed.

What is this structure? So let's go and find out.

So in the ML runs you have got this folder for one. One one corresponds to the experiment name experiment ID sorry.

So this one and in fact you can uh see it here also in the one one uh is for experiment ID. What is this 6? 6 stands

experiment ID. What is this 6? 6 stands

for the run ID. So if I um go to the uh test artifact again. So run id is 6.

What is this artifacts? Artifacts is as the name suggest is it's for saving the artifact and the actual artifact. I can

use it to save pickle files. I can use it to save images, data sets, anything that you can uh think of um that should

not be saved in a DB. Now let's see what exist within the mlflow db. So let's

explore. Let me close this out and let's explore. Not here. One second. Explore

explore. Not here. One second. Explore

DB. Explore

uh DB.Py. Okay. So this DB we can connect using SQL Lite. So import

SQL Lite 3.

And here what we need to do is we need to create a connection. Now how can I create the connection? I can simply write something like connection is equal to

SQL light dot connect. In the connect I can pass

dot connect. In the connect I can pass the name of this uh database that is created uh locally.

Connection is equal to this. Now this

connection I can extract all the tables.

So let's let's uh also import pandas for this demonstration.

Funds PD and here PD dot reads SQL query and here I will I will say something

like select name from SQL SQL light master

where type is equal to I will put it inside uh single quotes

in the single quotes I will write table.

So I want to get all the tables and I would like to close this. Now this is the query and what is the database? I

will just uh pass the connection and let's try to print the tables and let's see what do we get.

So what is exactly stored in here?

Okay. So you see there are like uh lot of details which are uh there for for example score versions evaluation data sets and things like that. Let's try to

explore the explore one of the one of these tables. Uh let me see uh which

these tables. Uh let me see uh which which table can we explore.

So we have got we have got runs table.

This is what I'm interested in. So now

I will say runs df is equal to pd dot read sql query select

select star. I would like to uh select all the

star. I would like to uh select all the things from runs.

I would like to close this query and I would like to uh call it on connection.

And now let's print the head of this uh this runs df dot head. Okay. So let's run this now.

dot head. Okay. So let's run this now.

Okay. You see this run uyu ID. Now what

is this UU ID? But let's let's uh for the time being remove all let's try to print all the values.

And you see 6 0 this 6 is the run that I actually created just now. So I hope you are able to see why uh what is the um

you know uh objective of this. Let's

also try to print columns.

So you see run uu ID name source name uh source name entry point name user ID all these details are logged here.

Okay. So now whatever I have explained just now there is a technical name for u both of these

this store that I'm using to store the artifact is known as artifact store okay this database that

I'm using to store all the parameters metrics is known as backend store okay and then this server

that I'm running here it's called tracking server with its UI also. So UI

plus this service that is serving this uh server is known as tracking server.

So these three are one of the important concepts that you have to remember while using MLflow.

Now the next question arises that is it advisable to store these artifacts within your local setup? No, it is not.

In fact, what we do is we are store these artifacts um in some object storage which exists in various cloud providers like S3 or

maybe GCP storage bucket etc. And the same goes for uh this back uh backend store. this back end store we

backend store. this back end store we can go for um uh Postgress or uh SQL server uh MySQL server anything that uh

you can get access of. Now how can we actually uh configure that? That is uh a topic of another discussion that I can

uh probably have um later not right now.

Okay. So I hope you have been able to understand um what I wanted to uh show you.

The the primary thing that you have to take away from this video is that ML runs directory stores your artifacts.

MLflow DB stores your metadata and uh run properties.

Thank you so much. I'll see you in the next video.

Hi everyone. In the previous video, we discussed about what this ML runs directory points to and what does this MLflow DB contains

and we were able to understand that MLflow DB is nothing but a database referencing all the metadata tags and

run history whatever we create during the run.

In the ML runs, we can log any artifact that would traditionally the database will not be able to hold. Although we

can store it in binary format but the idea is that backend store or rather I should say artifact store should be used

for something heavy such as your pickle files, data sets, anything of that sort.

In this video, I would like to take a moment to discuss a list of comprehensive items that you

can potentially log during a run. So,

let's get started.

So, I'll create a new file called lecture 4.py.

lecture 4.py.

Let's import MLFlow.

And inside this I would like to set the experiment mlflow dot set experiment

and I would like to call this experiment as YouTube tutorial. Okay, what's better

to call a experiment than to call it YouTube tutorial?

Okay.

All right. So after setting this experiment, what is the next thing that we want to do?

We would like to create a run, right?

>> Start run. And let me create a run name.

and run name I'm going to call as maybe I don't know uh I can call it as logging uh logging demo okay

now what are the different things we can log first of all I can log parameters and there are two ways to log parameters

key value parameters and a diction dictionary parameter. So naturally it

dictionary parameter. So naturally it exposes two methods to log these two. I

can say something like log mlflow.log

params. I can call log param and I can provide the key. So for example I would like to

the key. So for example I would like to log learning rate of my machine learning algorithm.

Let's set it as 0.03.

And I can log learning rate maybe epoch and epoch can

be 100. Okay. So this is

be 100. Okay. So this is log params log param sorry. And then

there there is a method by which you can have a parameters dictionary and inside this dictionary you can have learning rate.

Maybe I'll keep it as one.

0.04 and then epoch 1 and the epoch one can point to 200. Now I can log this entire

dictionary in a single shot by doing log params and inside the params I can pass parameters. Now let's view the output uh

parameters. Now let's view the output uh really quickly. I would like to run this

really quickly. I would like to run this script and I would like to also capture its run ID so I can reference this uh at

a later stage. Okay, let's first and I would like to also test whether I can specify run ID or not. Let's test it out. Maybe I can specify a run ID of 1 2

out. Maybe I can specify a run ID of 1 2 3. I have not tested it because I got

3. I have not tested it because I got this idea just right now whether we can provide a you know custom run ID or not.

I feel it should not happen but let's see.

Okay. So run with run id. Okay. It

assumes that if you provide a run ID, it will start looking for it. So let's

first create a run.

Okay. So it's successful. Now,

now what I can do is okay YouTube tutorial and in this we have got logging demo and inside here you can see learning rate epoch learning rate one

and epoch one. Okay. So we have got uh this ready but now I would like to capture this run ID and I would like to get this here because I want to demo

this within the same run. I don't want to create unnecessary

run. I don't want to create unnecessary runs. Naturally, after parameters,

runs. Naturally, after parameters, we can have matrix and inside matrix also this is a similar pattern. We can

uh we can let me show you mlflow.log

metric and it's like a key value pair. Maybe I

can I can log a accuracy and accuracy could be something like 90%. And I could I could also do

90%. And I could I could also do something like log matrix.

And this log matrix could be a dictionary. And here what we can do is

dictionary. And here what we can do is we can define a matrix dictionary.

Previously like um the way we define parameters dictionary. We we can in a

parameters dictionary. We we can in a similar fashion define matrix dictionary. This matrix dictionary can

dictionary. This matrix dictionary can have something like accuracy. And since

I have already added one accuracy. So I

would like to rename this accuracy as accuracy one.

And let's put it as uh 80. And here I would like to pass this matrix dictionary. And let's run this.

dictionary. And let's run this.

This time it won't throw. One second.

I see.

I think this is not allowed. Let's let's

rerun this. Okay. So this time it did not throw an error uh regarding the run ID because this run ID. We already made

sure that this exists.

And now that this run with this run ID exist, we can go ahead in the UI to see the output. So we see the matrix

the output. So we see the matrix appearing here. We have accuracy and we

appearing here. We have accuracy and we have accuracy one.

Sweet. Now let's move on to logging some other bunch of stuff. The third possible things that we can log is artifacts and

artifacts would be appearing here. Okay.

So you see currently we have only one uh experiment being uh listed here because there is only one artifact so far we logged and that is lecture 1.py. But now

let's try to log some image. Okay. And

the main thing is that to log any artifact the interface remains same and the interface is that uh mlflow dot log

artifact before logging the artifact you need to get the artifact path and that artifact path can come from anywhere. For the

time being, let's assume we have some uh training job which is running for some number of epochs and we need to look at the learning rate progression or maybe

uh loss progression. So I can say something like loss uh graph machine learning.

Okay.

So what we can do is we can grab this graph and we can save this image in the

directory where we are having this project and I can call this as anything that I want for now. So we

have this images dop. Okay. So now I can do images.png png and I can log this artifact.

So let's log it.

So a lot of things will change after this. First if I open ml runs you will

this. First if I open ml runs you will see two directories. So two subd directories are appearing here.

The first directory is the previously created um experiment and second one is the this one uh this YouTube tutorial.

If I open the second one, you will see 890 and 890 is the run id. Inside it, we have images.png.

have images.png.

This is exploring. This is basically viewing the

exploring. This is basically viewing the artifact within the directory within the store. We can view the artifact here

store. We can view the artifact here also. If I go inside the artifact and if

also. If I go inside the artifact and if I click on this, you can see this image appearing here. And we can zoom in

appearing here. And we can zoom in again.

So this was about logging artifact and you are not limited in the manner you are limited in logging params or

metrics. In uh artifact you are free to

metrics. In uh artifact you are free to log whatever you want. But this power should be used very cautiously because just because you have

the ability to log anything you should not misuse your storing server whatever storing server you are using currently

we are on a local directory but you might be given access to a object storage in a cloud platform you don't

want to misuse that okay So there are some although I personally think that these three covers most of the stuff but

there are some other cool things that you can do. So for example log image is one such thing. If you see this logs

image in a image ND array. If you have worked with deep learning based on images like CNN, you might already be

aware that a image can be represented as an ND array. So that array you can directly provide here.

Okay. So you can log that like this or you can save that image or or log it uh log it separately. Okay. And then

there is there are other methods for logging tables as well such as log table. Inside it you can

pass the data like this. So you can either pass a data frame. Let's create a data frame really quickly and let's see

where uh does that appear. So import

pandas as pd and let's create a demo df pd dot dataf frame and inside here let's create a data frame with only one column called name

Daniel Sam. Okay. So now I can log this df.

Sam. Okay. So now I can log this df.

Let's see what do we see. What do we get? It says artifact file which means

get? It says artifact file which means we want to provide the artifact file name also. Let's say I want to name it

name also. Let's say I want to name it as demo.df.

as demo.df.

data. Okay, let's see what which means when you go inside the UI, it will be appearing in the path that you specify here. Invalid artifact path. Please

here. Invalid artifact path. Please

ensure the file you are trying to okay JSON let's provide JSON here.

Okay. So if I now go inside here you will see demo df.json

and you see this table appearing. Okay.

So you in this way you can see that you can possibly preview also data. So if if I click on these tables I can possibly preview I can sort it out. if you have

very uh large tables. So let's do one small uh project. Let's load some large data here. Uh let me download

data here. Uh let me download download maybe Titanic CSV.

Okay. So, let me grab this file and let me put it inside the And now let's go inside here.

And let's read this Titanic Titanic DF PD. CSV

DF PD. CSV titanic.

CSV. and inside log table I will pass the Titanic DF

Titanic dojson. Okay, let's run this it

Titanic dojson. Okay, let's run this it should be able to log and now if I refresh it we should be able to see our Titanic.json JSON but this time this is

Titanic.json JSON but this time this is taking some time and you get this uh nice looking table which you can obviously click and uh you can sort it out sort the whole table you can sort so

you can do uh really cool stuff and all not all the rows are loaded you see uh there is this pagionation option uh with which you can

move through various rows to view it.

Okay. All right. And you can also choose columns if you go for compact view.

Okay. So these are different options that you can that you get out of the box from the uh from the log table. I think that

would be it. There are other options also. So for example, if I show you

also. So for example, if I show you MLflow dot assessment. So you can log

dot assessment. So you can log assessment also. and assessment would

assessment also. and assessment would take a trace ID. Now this is this becomes important when you are dealing with uh let's say uh geni based application. So we have

got figure. Now figure is similar to uh

got figure. Now figure is similar to uh image you here you have to pass a mattplot lib based figure object and that will be basically rendered in a

artifact here. Okay.

artifact here. Okay.

So there are multi multitude of options that you get in uh MLflow based logging.

So I'll stop this video right here and then I'll continue with next topic that is model registry.

Hi everyone. So in this video I want to start talking about uh logging models.

Now models if you are experienced in machine learning you might be aware that models can come in various flavors.

You can be building models in skarn or pytorch or maybe tensorflow.

All these different frameworks have their own set of requirements and dependencies that you need to um keep in

mind before you actually log the model.

Now there could be a manual logging step or a auto logging step. There are two different ways by which you can log. And

in this video I want to show you the manual model logging using skarn. So let

me create a new script called lecture 5.py.

inside this script I will or maybe let me grab some uh training scripts from

uh the MLflow quick start because that will make make our uh job a lot easier. So we have

got this uh uh this training data this uh training

script uh from MLflow documentation and let me also grab the train model.

Okay. So you see this uh auto log. So

they have enabled it for now. Let's

comment it out and let's run it.

And let me also confirm if my yeah my server is running.

Okay. So let me now go in the tracking server.

I think it would be 127. Yeah. So this

is the trafficking server we have.

So now let's let's first of all um run it as it is to see whether we are getting any error or not.

Okay.

So let's see it's running. Let's see why it is taking

it's running. Let's see why it is taking so much time.

Not sure. Uh one second.

H. Okay. So, it's saying um it's got unexpected argument called multiclass.

Why? Let's comment it out then.

Yeah. So now our model is being trained.

But where exactly is this model? We

don't know yet because we have not uh saved it somewhere. Let's create a experiment. MLflow dot set experiment

experiment. MLflow dot set experiment skarn model logging. Okay. And inside

it, let me start a run.

And let's move these training uh lines within this.

And now what we can do is we can also name this run and also log this params. run name would be

uh run name would be uh skarn model logging.

So this uh this run name is uh skarn model logging.

And now what I can do is I can do ml flow dot log params

and inside it I can pass params. Okay. So how do I log this uh skarn

Okay. So how do I log this uh skarn model? What I can do is mflow dot log

model? What I can do is mflow dot log model.

Okay, no sorry I need to call the flavor first. The flavor is here skarn to skarn

first. The flavor is here skarn to skarn dot log model and inside it first argument will be skel and the sk model is lr and

second argument will be model name and name will be let's say uh simple model.

Okay.

and let's run this.

Okay, so our run is complete. Now let's

move to um to check whether our UI has been updated or not. So we have got skarn model logging and inside it you can see

our params are listed here and logged models inside logged models we have got this and model name is simple model. If

I click on it, so this basically shows us models. Okay. Now if I go inside the

us models. Okay. Now if I go inside the ML runs and open the third um experiment and third experiment

corresponds to SQL model logging.

Then I will see a models directory. Now

this models directory is corresponding to this model ID. Now model ID. So this

model, this simple model has this model ID. Okay. So that's how it is basically

ID. Okay. So that's how it is basically able to reference it.

Now once this model is logged, we can do all sorts of cool things such as I can register this model. Now there is a difference between logging a model and registering a model. These two are

entirely different thing. Logging a

model just means you put it inside a dump in a repository or in a storage.

Registering a model means now you can use that registered model to actually deploy to a HTTP based endpoint that is

uh supported by MLflow.

So so far we have seen that how we can use this to kind of u get a uh get a

manual logging. Let me now copy this and

manual logging. Let me now copy this and uh inside here I will name this as auto.

Now this time I won't be logging anything. I will just be training the

anything. I will just be training the model and MLflow will auto detect all the required parameters, matrices and

all. But I'll just need to enable the

all. But I'll just need to enable the auto logging feature.

Let's run this script now.

So after run has been completed it will take some some time because it is generating some meta data and some graphs and all. So let's go to

okay so now I have got one second so now I have got two runs you see auto and login so let's open auto and inside auto you you see several different arguments

like parameters now where are these coming from these are coming from ML flows backend compatibility with skarn so it is able to detect all these

uh automatically we have model metrics we have system metrics we have traces and artifacts. In artifacts we have

and artifacts. In artifacts we have estimus matrix etc. Now what I can do is I can go inside the logged models

and inside the model I can see a similar structure. So ML ML model cond

structure. So ML ML model cond model model pickle. So model pickle is the actual binary and all these are just ways different ways to manage the

dependencies such as requirements.txt python env. EML MLflow etc.

python env. EML MLflow etc. So that's how we do manual logging with a flavor of skarn and also that's how we

do auto logging with a flavor of skarn.

The process remains same for other flavors as well.

The primary thing that you need to know is how to locate your desired artifact. So for example here I see models. I've got two models.

I've got this run uh which was created uh when I enabled uh the uh autolog.

I hope it makes sense. Next we will cover model registry. Thank you so much.

Hi everyone. In the last video we talked about how we can take the registered model and deploy it.

Now I would like to switch gears a little bit and talk about some of the cool capabilities of MLflow that make it really exciting to use.

In this video we are going to be looking at nested runs in MLflow. Let's see what do they mean.

Lecture 7.py.

I'll just quickly import MLflow.

I will set my experiment as nested run demo and we will get started with the

uh with the creation of nested run. So

the idea behind nested run is that there would be a parent run. Okay, let's

create a parent run first. So mlflow dot start run and inside here I will call this run name as

parent run. Okay. And uh we can also

parent run. Okay. And uh we can also alias u this context manager just like we do it while opening a file. So as

parent run we can print its information.

Parent run maybe parent run dot

info dot run id. Okay. And then within this we can so let's first log the so mlflow dot log

param theta and let's say 100.

Okay, let's create a child run. Within

this, I can specify the run name as child run one

as child run one. And here I can print Ctrl C its properties children.

And here inside here I can specify children and I can copy paste it multiple times to create multiple

childrens. Okay. So I'll say children 2,

childrens. Okay. So I'll say children 2, children 3, children 2, children 3,

children 2, and children 3. Now let's run it.

Although I can also log things but let's uh just run it the way it is right now.

Okay. So we are getting some error.

Let's see where exactly is the error.

It's saying run with.

Okay. So I think uh we need to provide the parameter as nested is equal to true.

Nested is equal to true.

One second. Not here. Sorry. not in the parent.

So parent run uh was created but since parent run was active child run uh we could not create child run.

Let's print let's uh pass the nested is equal to true and let's use nested is equal to true here as well. Let's run this one more

time.

Okay. So you see this parent run child run child run child run. So this is how it uh is done. Let's look at it uh in

the in the uh in our uh UI. So nested

run. So now I have got parent run. This

was the earlier one. And you can you have to understand that names can be duplicate as well. So if I expand this, click on this uh plus icon. You you see

this child run child run. Now what would be the ideal scenario where uh we can potentially use this feature where let's say one run can correspond to a larger

feature and then multiple nested runs can correspond to something more specific within that uh one example that

people often use this nested run feature is uh for hyperparameter tuning. So that

is one of the use case that you can potentially go for. So yeah, so that is what I wanted to demo. I'll see you in the next video. So now that we

understand how to log skarnbased models, it is right time to introduce a new MLflow concept and that is MLflow model registry.

From the name itself, it should be very clear but at the same time I feel that MLflow model registry

name is not very appropriate and I will tell you why I'm thinking uh this way because when you when you think

about registry you often think and associate registry with a storage right but that's not the case with MLflow model registry.

It is not a store. Although we call it as a store but it is just a store for meta data. Ultimately your models will

meta data. Ultimately your models will keep stay staying in inside the um artifact store which for our situation

is our local directory.

Whereas what model registry essentially means is it keeps track of uh your model's lineage

versioning and aliasing. So there are things that you would want to do with your best model. Maybe deploy it into production or maybe you would like to

see which run produced it and things like that that is enabled by MLflow model registry. So it essentially is a

model registry. So it essentially is a centralized model store although there is no storage component in a model registry. It is just it it just creates

registry. It is just it it just creates this registry using a YAML file. Okay.

And that YAML file points back to the back end uh and artifact store. Okay. So

let's see how can we register the model uh using uh Python or maybe UI as well. So let me go here and let's see let's assume that

I want to register this model. The only

thing that I need to do is I will come here. I will select the model. So I have

here. I will select the model. So I have got um I have got um I have I have to basically create a model first. Now what

why what exactly I'm doing? You might be confused here because we already have lots of models that we trained in our last discussion using auto logging and manual logging. But what what exactly we

manual logging. But what what exactly we are creating? We are essentially

are creating? We are essentially creating a registered model here. Let me

go here. Let me come in models. You see

currently there are no registered models. Let us create one. Let us create

models. Let us create one. Let us create best production model. So if I create it there are

model. So if I create it there are certain things which will automatically be created for this model. One important

thing what which uh is is its version.

So if I come here you see currently it it is having no version because we don't we did not associate any model to it.

Okay. So let's see uh let's go in the experiments again. Let's come here.

experiments again. Let's come here.

Let's come to the autologging and let's come to the artifact. Not artifact sorry let's come to the models. Now this model is from the logged models not registered

models. So if I click on it now if I

models. So if I click on it now if I click on register model and if I click this drop-down I see this option best production model. If I register it my

production model. If I register it my model will be registered. And now if I check the version it says version one.

But let me do one thing. Let me come into the experiment. Now let me go inside the logging uh simple model register model. Now if I

select the that same model what will happen? Let's see what happens.

happen? Let's see what happens.

You see it has incremented it uh to the next version which essentially means this is a very um good tool when you are

building and pushing models into production. So if I open this you have

production. So if I open this you have got two versions and I can click on um various versions. So I can maybe click

various versions. So I can maybe click on version one and I can look at um its schema. Now this schema is automatically

schema. Now this schema is automatically inferred by uh its input and outputs.

How does that work? Because um it is created using MLflow. That is why it is able to uh the MLFlow is able to infer uh its input and output schema

essentially the shapes and all. Okay.

Now what I can do is I can come here. I

can add tags to it. I can add aliases to it. Uh aliases would help when let's say

it. Uh aliases would help when let's say I'm deploying the model uh as HTTP based server. Their aliases uh might help.

server. Their aliases uh might help.

Okay. So for now uh the basic idea that you have to take is you can register a already logged model and that gives you some additional features such as

versioning and lineage for the model.

Okay. Now the question is can I do that uh via my code? Yes, we can. In fact

when you are logging the model. So let's

say in auto logging you can't do it. But

if let's say I'm doing uh calling this skarn uh log model inside here I can pass skod and then name some dummy name

and then I can pass something known as registered model name. The moment I pass something like this. Now the moment I run this code, let's let's run this.

Okay, let's say my model and registered model and let's for the for the sake of it,

let's use the same name here and let's run this. Okay, I I'm running this

run this. Okay, I I'm running this lecture five only and let's see what happens.

So we are also using its autolog facility and we are also um uh logging the model by default.

uh manually sorry you see register model exist hence it is creating a new version. So if if you refresh it now you

version. So if if you refresh it now you have got version three. I hope you are able to understand how model registration works. Okay. So I think

registration works. Okay. So I think that would be it for this video because model registry in itself is not a very huge topic. But now in the next video we

huge topic. But now in the next video we will be taking up this uh model and we will be trying to uh access it via uh

HTTP based server. Okay let's get started. And I think uh in the next

started. And I think uh in the next session I will also have to uh talk about model URI. Essentially I think I can I can do it right away. So model URI

is essentially a unique identifier that you can use to reference a registered model. Okay. And the way you do it this

model. Okay. And the way you do it this is models uh forward/model name. Now here my model name is best production model and then

model version. That's it. So this is the

model version. That's it. So this is the model URI that you can essentially use when you have to let's say deploy a model. And in fact we will be uh uh

model. And in fact we will be uh uh looking at how we can do that um as well. We can also reference a model

well. We can also reference a model using uh as runs syntax. So I can have something like runs runs the run name

run name and then there will be the model name.

Okay. So let's uh let us see these uh various examples in the next uh video where we deploy these models. Okay,

thank you so much.

So in the last discussion we talked about how to take your models which are logged which are hidden inside a run

which are stored within the artifact store and take those model and put it inside a separate store which is

essentially referring to the same thing but now with some enhanced metadata.

Registered models are are uh just the same model but with some additional metadata tags which helps in organizational and operational

aspects.

In this video, we will quickly look at how to take those registered model or any model for that matter, even those

which are logged within the run and deploy it using a HTTP based endpoint.

You might be wondering or maybe worrying that I don't know about flask or maybe a fast API. How can I do that? The answer

fast API. How can I do that? The answer

is you don't need to know any any of those things. How can you then uh run a

those things. How can you then uh run a HTTP based server just open a new uh terminal, activate

your virtual environment and run this.

MLFlow provides a utility called model serve.

Now after this serve you need to provide some arguments. The first one is M for

some arguments. The first one is M for model. In the model, you want to specify

model. In the model, you want to specify the URI that we discussed in our last discussion. The URI follows this format.

discussion. The URI follows this format.

Models colon forward slash. Let's go to the registry and let's grab the uh models name. So, best production

model. Let's grab this and let's paste

model. Let's grab this and let's paste it here. And we also need to specify the

it here. And we also need to specify the version. So, let's um use the second

version. So, let's um use the second version. After that we need to use the

version. After that we need to use the port. Port we can use anything. Let's

port. Port we can use anything. Let's

use some random port. I don't know why uh I don't know how I came up with this number but yeah but sometimes what happens is I use some random ports and there is already an existing service

using that same exact port and in that situation MLflow will throw an error.

Finally, there is this uh argument which you need to specify if you are not willing to serve this model using container based uh setup. So I'm

specifying no cond. If I run this, it will ah it has basically started this process

and it it has started a server uh on this uh URL. Now this URL by default exposes several endpoints such as

invocation health etc. Let's try out this. Let's curl into it.

Let me grab this URL first.

Ctrl C. And let me paste it here. If I

hit enter, it says not found. But let's

call it health. In health, it it does not return anything. But it is not also giving us some any error.

There is one in particular uh endpoint that is invocation invocations. Yes,

invocations. Yes, you see method not allowed. So it's

saying uh this method is not allowed.

Why? Let's see. Uh one second. Let me uh expand this.

Okay, it's saying 405.

In the health it said 200. Okay. But in

uh this it is saying not found. Why it

is not able to come? Because invocation

is the one that uh I I must have misspelled it.

Invocations.

Why? Okay. Let's let's pass the argument. I'm not getting why it is not

argument. I'm not getting why it is not accepting all request content type content type

application JSON and we also need to provide data in the data I'll wrap this data in in here

let's pass inputs inputs would be a list of uh all the rows that you want to uh get the pred prediction for yes you see okay so method was not allowed somehow

uh when we don't pass data it simply does not return but let's see if I remove this four it will give this error that uh x has three features but logistic regression expecting four

features as input so our model was trained now where did we set up this let's look at this we when we created this data using uh make regression if I

recall correctly ly. Oh no no no load uh iris. So this is what uh we used and

iris. So this is what uh we used and inside here we have we have uh four features. That's why it is requesting

features. That's why it is requesting for four uh features. Okay. So that's it.

That's how you basically set up a HTTP based model server using your registered or logged models. Now this is very

simple actually right? But this kind of setup is only uh ideal for realtime inferencing or in ML system terminology

it is known as online inferencing where you deploy this rest based API into a lambda function or maybe Azure functions and then you start accepting some

requests and then you serve those requests on the go. Okay. But for the batch based requirements you might have

to create a script and call uh load the model and that batch setup uh what how we can do we can load the model

uh using skarns dot load method. Let's

let's try it out. Okay. So let me now create a new script maybe and inside it I'll just

import mlflow and I'll say mlflow do.load

do.load dot sorry doss sklearn dot load model.

Inside here I need to specify model URI.

So let's uh let's use this same model URI that we just deployed.

One second. Okay. So, this is the model URI that we basically chose.

So, let's grab this and I'll say model URI and inside here I can pass this and what

I will get in return is a MLflow model which I can use for batch inferencing.

Okay, let's run this script just six.

Okay, so you see it is printing logistic regression and that's what our model is.

Now we can use this model just like we would use a normal skarn model. Okay, so

this is how we load a already registered or logged model for our batch inferencing workloads. And for realtime

inferencing workloads. And for realtime inferencing, for online inferencing, we just spin up this nicel looking server.

I hope it is making sense.

Hi everyone. In this video, we are going to be taking a little detour from our traditional ML and we'll start talking

about Genai based MLOps experiment or rather we should call them as geni ops or LLM ops. You can call it whatever you

want but the key thing is to master the fundamentals so that whatever the outside world calls it you know what you

have to do. Okay. So the first topic that we need to cover is how does geni

ops differ from traditional MLOps.

Now in MLOps there is this thing called machine learning model that gets replaced by prompts in u LLM ops because

essentially what you have to do is just like models models can be random models can be um you know can not be very

deterministic they are probabilistic just like that prompts can also be probabilistic it's very simplistic explanation I know but it gives you a

clear difference between um geni ops and mlops.

So uh llm ops uh in llm ops we need a way to manage prompts just like we had

to manage models in the mlops.

Now to manage models, MLflow provided you with the model registry and you must be guessing it right. To

manage prompts, ML MLflow is providing you with the prompt registry. So let me quickly start the MLflow server and show

you what exactly I mean by that. So

let's open Chrome and open this server.

Okay. So you see this prompts. If I go here, you will see some prompts which already got created when I was playing with this concept in uh MLflow. But we

can come here and we can create prompt.

Okay, we can provide the name of the prompt. We can provide we can type uh we

prompt. We can provide we can type uh we can choose uh the type of the prompt whether it is text based and chat based.

And I'll show you what exactly both of these uh mean. I will provide the prompt. I will provide the commit

prompt. I will provide the commit message. So essentially what happens is

message. So essentially what happens is you get to version your prompts. You can

create multiple prompt versions. So

suppose we are building a uh question and answering or frequently answered questions uh FAQ

bot. Okay, we are building FAQ bot and

bot. Okay, we are building FAQ bot and we are trying to optimize and create the best prompts. So how can we go about it?

best prompts. So how can we go about it?

So we have we have named this uh prompt as FAQ bot. Now under it we can basically

uh define what is known as prompt template. Now that prompt template will

template. Now that prompt template will be replaced by the actual variable that we need to inject within the prompt. But

let's start with the text based prompt and I'll tell you what uh chatbased prompt uh looks like or maybe I can show you. In chatbased we have we can have

you. In chatbased we have we can have similar to how we call open AI's API we can have various type of roles user based system based and assistant based

and for these roles we can basically have uh different sort of uh you can say uh messages. Okay. So let's go with text

uh messages. Okay. So let's go with text based prompt and let's write it down.

Okay. So you are a helpful you are a helpful assistant who answers

the provided question concisely and precisely.

Okay. So this is the prompt and now let us write that the question will be here and here we will write the placeholder as suggested by MLflow. We will write question as the placeholder. Now later

you will see how we can replace this question but for now we are just creating prompts. We are not seeing how

creating prompts. We are not seeing how we will be uh basically loading this prompt and passing it to OpenAI or whatever LLM you

are using. I'll use the commit message

are using. I'll use the commit message to basically track. So version one.

Okay. So let's create it. So we just added a new prompt and it is a version one prompt. Okay.

one prompt. Okay.

All right. So it is a version one prompt. Now what is the next step? Next

prompt. Now what is the next step? Next

step would be to basically try out this prompt and evaluate it. We can evaluate it or we can evaluate multiple prompt versions. I can create one more version

versions. I can create one more version of this prompt. So you see the same prompt appearing because I can edit it now. So I can say something like

now. So I can say something like answer answer the question only

in English.

Second answer should be under 3 seconds. Oh sorry

three sentences.

Okay. And that there are more things we can add. And if I can create it, you

can add. And if I can create it, you will see version two is is added. Okay.

Now I have the ability to compare prompt. So if I click on compare,

prompt. So if I click on compare, automatically these two versions are selected and we see color highlighting of what is added in the version two. If

let's say I have got one more version, then I will get the ability to select uh one prompt for the comparison with

the another one. Okay. So this is how MLFlow is providing you with the prompt management or rather I should say prompt registry just like it provided you with

the model registry. Now what you can do is you can create uh and load these prompts through Python uh SDK. You can

actually so what proc the process that I showed you um from UI you can replicate this exact process through the Python

also and I'll show you that in a moment.

Okay. So now

let's try to create a prompt just for the sake of it to to show the demonstration. So I will create lecture

demonstration. So I will create lecture 8.py pi and I'll create a new uh shell

8.py pi and I'll create a new uh shell here and here let's import mlflow

and inside mlflow let's define prompt version one now this time I will be I won't be touching this uh frequently asked questions bot prompt uh I'll be creating a new prompt so let's let's uh

get started with that so prompt version one would be I will I will create a new line and I'll probably escape the first one and then I'll write you are a

helpful or maybe let me grab this entire thing because I should not be typing it right now. Okay. So yeah, so this is my

right now. Okay. So yeah, so this is my prompt. Let me get rid of this.

prompt. Let me get rid of this.

So this is my prompt version one. Now

what I can do is I can do something like mlflow. Genai dot um

mlflow. Genai dot um save pro. Let me type prompt prompt. I

save pro. Let me type prompt prompt. I

think it was load prompt not load register prompt sorry uh register prompt. So inside it we have to pass

prompt. So inside it we have to pass name first. So name would be what? Uh

name first. So name would be what? Uh

just like this I can have frequently b frequently answered question bot. Now if

I use this same exact name what mlflow is do is going to do is it will register one more version. So that's why I will be so it's just like mlflow models. I

will be using first and next we can provide the template. Template would be this string.

Okay. And we can provide commit message here. Commit message the the way we

here. Commit message the the way we provided message in the UI, we can provide it here also. Version one

prompt. Okay.

Now if I run this, we should be able to see our prompt in the prompts. So you see frequently

the prompts. So you see frequently answered question B one that we just created and we have we are seeing this prompt, right? Okay. So what is the next

prompt, right? Okay. So what is the next step? Next step is going to be loading

step? Next step is going to be loading the prompt. How can we load the prompt?

the prompt. How can we load the prompt?

Now obviously I will I want to get rid of this. now. Okay. So, let me uh maybe

of this. now. Okay. So, let me uh maybe comment it out for now. Now, suppose you are working in an organization setting where maybe I don't know hundred of

people are working on a project or or um working as a data scientist or geni specialist and they are all trying out their own prompts and prompt building is

much more easier I would say than model building. So hence um prompt can be

building. So hence um prompt can be built not just by data scientist but can be built by other folks maybe product specialist or maybe managers. So there

will be lot of prompts appearing and MLflow can provide a centralized place where all those prompts can go. Okay.

And they don't need to do it through Python. They can create a prompt from

Python. They can create a prompt from here as well. Now suppose you know a a given prompt is best one performing for

now let's assume that we know um from uh from our experience that given prompt is

working best. Now how can we go about um

working best. Now how can we go about um actually loading the prompt and finally after loading the prompt we should be

able to use it also. when I say use it, uh I should be able to use that prompt and pass it in a llm and I'll show you that u in a moment.

So let's try to load this uh frequently answered questions bot uh using Python.

So how can I do that? So I'll do something like uh mlflow dot uh uh genai

dot load prompt and inside here I need to provide name or the URI. I can either provide the name that is this or URI and

URI follows just like uh models it follows something like this prompts the name of the prompt. So if I use the uh name here and then version let's load

the first version for now. After loading

the prompt what I can do is I can save it in a variable. Now let's try to print it out. How does it look like? Can you

it out. How does it look like? Can you

try to guess how will this prompt uh be looking like? Let's see. So if I run

looking like? Let's see. So if I run this, so this prompt is looking something like prompt version, name version and template is only given some

uh characters of templates are displayed here. Now how can I actually use it? So

here. Now how can I actually use it? So

what I can do is I can call prompt dot format. Now inside this I can provide

format. Now inside this I can provide all the placeholders that I have used here. So let's pass questions here.

here. So let's pass questions here.

How is the weather like? Okay. So now what what do

weather like? Okay. So now what what do you think will happen? Let's run this.

What it will do is it will take my prompt and it will inject this question variable that I passed. So prompt dot format question. So you are a helpful

format question. So you are a helpful assistant who answers the provided question concisely and precisely.

Question is how is the weather like? Now

let's extend it. extend this and let's register one more prompt version. So let

me comment this out first and this time I'll be using uh uh frequently answer bot u one the original one and in the prompt version one or

maybe I can rename to rename this to version three in place of question I I would I would want one additional variable passed here. So what can I

pass? Username.

pass? Username.

Okay, user name. Okay, now what I can do is first of all I need to register this prompt. Let's register it first.

prompt. Let's register it first.

Okay, if I reload this now I will have this prompt which al also expects a username. Now let us comment out our

username. Now let us comment out our prompt registration code and let's uncomment the prompt loading code. So

now besides the question we need to pass the username but let's see what happens if we don't pass the username.

Okay it basically okay no no sorry I think I need to change the version also now.

Okay. So it says this that is missing variables. Now what I can do is I can I

variables. Now what I can do is I can I have two uh ways to solve this. Either I

can provide a user name. Let's provide

Rahul. Okay. And let's run this. It

should solve the issue, right? Or so and Rahul. Or what I can do is I can get rid

Rahul. Or what I can do is I can get rid of this and I can do something like allow partial is equal to true. In that

case also this will solve the error. So

here okay one second. Yeah. So in here what happens is what you get uh now from

the prompt is not the text but a prompt version only. So it was not able to

version only. So it was not able to render it properly. Okay. So I hope you are able to see how we can load the

prompt. But now let us see how we can

prompt. But now let us see how we can actually use the prompt that we will be seeing in the next video. So in this video the purpose was to show you how

prompt management works and how it is similar to model management that we already are familiar with from our earlier lectures. In the next video,

earlier lectures. In the next video, we'll be talking about how to load the prompt and also use it with let's say OpenAI's API. Thank you so much.

OpenAI's API. Thank you so much.

So, in the last lecture we looked at how we can manage prompts using MLflow.

Now, what we are going to do is we are going to use OpenAI to actually uh use the prompt in a real uh

production setting.

So first you need to fe basically make sure that your LLM's API is installed.

So I'll install open AIS API which is already installed. After that you need

already installed. After that you need to somehow save your environment variables and in

particular what uh the variable that we want to save here is called open AI

API key. So this is the environment

API key. So this is the environment variable that you need to save in the

in the uh environment.

Now how can you save it? There are

multiple options. You can use export by coming here in the command prompt. But I

like to load it. I don't want to modify my um my system my computer's environment uh variable. So what I

usually like to do is I load the environment from this uh file called env. Now this env I I won't be showing

env. Now this env I I won't be showing you in this video but this contains a key called openi API. What I can do is I

can use a python package called env. So

from env I'll import load.env. Now what load do.env

load.env. Now what load do.env

does is so when you call load dot uh uh env it will ask you for the path of your

env file which exist in the same uh in the same directory from where the script is going to be run. In that case you don't need to provide the path but after

this you can access your env files but for that you would need to import the OS package. So what happens is when you run

package. So what happens is when you run this script a new process is run and that process uh will have its own set of environment variables and what this load

do um env is it will inject any environment variable that is that it sees in this environment

variable. So I will get envi

variable. So I will get envi API key. Okay.

API key. Okay.

And this will provide me the API key of the open AI. Now what I can do is I can create a client. So let's import the open AI client from

OpenAI.

import OpenAI. and I'll import the client which is going to take API key API key. So now my client is

ready. So now that I have this this

ready. So now that I have this this client I can use it to you know ask questions. Now I will be using prompt

questions. Now I will be using prompt version one that I had.

Uh or maybe let's use two because two was the better prompt. Let's see. Uh

let's go to UI if I go. Yeah. So this

was the better prompt. Let's let's uh although I can I cannot objectively uh say that this prompt is better because I'm only subjectively saying how can I

objectively say that a prompt is better prompt is by prompt evaluation that is a separate topic which um we can take care

in some other discussions.

But here uh from visual look the way humans interpret it it's more detailed prompt. So it should give us the right

prompt. So it should give us the right answers. So let's go here and let's load

answers. So let's go here and let's load the second version of the model and I will now load the message. So I'll I'll

call prompt dot format and I'll ask a question. Let's ask some question. what

question. Let's ask some question. what

is what is uh or maybe I am not able to think about uh any question. So what exactly is the difference

between sun and moon? I'm not sure this is not a very good question. Okay, sun and moon.

Okay, so now uh what I'll do is I will also print this message just to see whether it is looking good or not.

And let's run this.

So you are a helpful assistant who answers this this this what exactly is the difference between sun and moon. So

we have the message. Now what we can do is we can use the client chat and in fact if you want to use responses API that's also fine. It's also it's better

I feel but let's go with uh chat completions because I feel the coverage of chat completion is more uh at this m moment people are moving towards it but

uh still it is not uh very uh prevalent.

So I will be using a mini model and then uh I will I will use messages and inside messages I'll create a list where user

would be sorry role would be user and content would be my message that that I

created from the prompt. Now I will save this in response and this response I can print. I can

either print this response or I can uh print the content. Let's

let's first try to print the um sorry not message but response.

Okay. So let's try to print uh what do we get? So it's taking some time and we

we get? So it's taking some time and we we are getting the chat completion. So

if you see uh the sun is a massive star that produces lights and light and heat through nuclear fusion while the moon is a natural satellite that reflects

sunlight. So I think pretty good right

sunlight. So I think pretty good right but if I have to get this uh uh text from here so how can I do that? So what

I can do is I can get my response dot choices. I'll get the first choice dot

choices. I'll get the first choice dot message dot content.

Okay. So let's try to use that now in place of response.

Okay. Let's run this.

Now currently remember that we are not logging these things like responses.

that is a topic of separate discussion because this uh the logging of both inputs and outputs in MLflow is going to be called as traces in um MLflow but we

will be covering that uh in some later uh video when we start uh discuss discussing the uh evaluation prompt evaluation. So you see this this is what

evaluation. So you see this this is what uh is being returned by the openi here.

So I hope it was clear. So in this video we try to look at how we can use the prompt. Uh so so this prompt registry is

prompt. Uh so so this prompt registry is a centralized place to create and uh manage prompts and from there we can use and try out a bunch of prompts.

Currently we have not seen how to evaluate prompts that we are going to see uh in uh next videos. Thank you

everyone. So, so far we know how to register a prompt within the prompt registry, MLflow prompt registry and we

also know how to load those prompts for our consumption and also uh not just load we um we can basically format those prompts because essentially what we

register in prompt registry is going to be a template and then we can load those templates and use it within the LLM.

Now next thing is the way we evaluate models. We can also evaluate

models. We can also evaluate uh the prompt.

So let's see how can we do that. So let

me create a new file for this.

And in this file I'm going to keep all of this code. Let's let's keep this code. Okay. So we are going to be

this code. Okay. So we are going to be evaluating some prompts here. How can we do that? So let's go backwards. Let's

do that? So let's go backwards. Let's

first see the exact function function from the MLflow that we use to start the evaluation process. So what happens is

evaluation process. So what happens is when you run the evaluation process, a new run is created. When I say a new run is created, what is what it means is

that run will be tied to one experiment.

So similar to how we track uh a MLflow experiment, we can track a LLM based experiment. So let me start a experiment

experiment. So let me start a experiment first of all. So I I'll set a experiment. Maybe I'll say um prompt

experiment. Maybe I'll say um prompt evaluation. Prompt evaluation. So

evaluation. Prompt evaluation. So

currently if you see let me go to the server and if I refresh it and if I go into the experiments, we have all sorts of experiment but we don't have prompt

um evaluation experiment. So let's run this uh file or maybe I will comment most of this code so that not this one

not this one also. So let's

keep uh these things as they are python lecture 9.py.

lecture 9.py.

So after this we will have experiment with the prompt evaluation does not exist. Creating a new one. If you see

exist. Creating a new one. If you see this line now if I go inside here you see prompt evaluation. Now within this prompt evaluation, so this MLflow view

provides two different kinds of view. It

can um so if I click on it, notice what happens. It will ask me what kind of

happens. It will ask me what kind of experiment you are dealing with. You are

dealing with machine learning based experiment or GI based experiment. Since

this is going to be a geni based experiment, I'll confirm because model based experiment, traditional machine learning based experiments are going to have a different artifacts, different

set of things than geni based experiment. As we discussed earlier in

experiment. As we discussed earlier in GI based experiment, prompt is something similar to model. U so it it has different set of constructs than a

machine learning based experiment. Let

me confirm this. And if you see this view and also to compare this if you don't recall already let me open a skarn based experiment this looks different

right so this is this this has runs models and a experiment which is um a geni based experiment will be looking like this it will have traces sessions

data sets evaluation runs currently we don't have any evaluation run because we are going to create one okay so after we do that let's First of all look at the

function that we use for the evaluation.

So there is this function called evaluate. Now let's see what all things

evaluate. Now let's see what all things that we basically require to pass because we are going to learn those in a backward like uh fashion. So first we

see the evaluate function. So let's use the data. Okay. So we have data, we have

the data. Okay. So we have data, we have scorers. Okay. And then we have predict

scorers. Okay. And then we have predict function. Okay. So these three things

function. Okay. So these three things would be required if you have to run the evaluation job. Now let's talk about the

evaluation job. Now let's talk about the data first.

What is a data? Now similar to a traditional ML ops, if you have to evaluate that machine learning model,

you need a data set, right? Similar to

this, here also you would need a data.

In the traditional machine learning with your uh x and y's you will you will also have to have the predicted value uh that

is that is how we we can compute accuracy or f1 score things like that.

Here also we would need to have some sort of expected response. So this data will be a list and this list will be essentially a list of you can say

dictionaries and each dictionary can have inputs.

Okay. And let's talk about what input can contain. It will have uh it will

can contain. It will have uh it will have a expectations because there there are no so basically expectations means what would be the right thing uh for

this right answer for this uh kind of input. Okay. After expectation we can

input. Okay. After expectation we can optionally have outputs as well. But

since we are passing predict function, we are delegating this uh to uh the to the geni to the geni.ealuate to

basically call this function to generate output. So this list can be can contain

output. So this list can be can contain as many test cases as uh you wish to basically include in the inputs. You

would like to have a key. Now what sort of key uh should I have here? That

depends that depends on what sort of predict function that uh you will be having. So let's design the predict

having. So let's design the predict function first. What predict function

function first. What predict function what my predict function is going to do is it is going to take a key. Okay,

let's take a questions key and this this particular um this particular function is going to essentially return a output.

Now what will be the output? Let's

design that. Okay. So what we are going to do is we will take this code. So

essentially we will load the prompt first here.

Okay. We will load the prompt first and before even uh you know loading the before even formatting the prompt we will have to load the prompt. Right. So

from here I will I will load the second version of prompt.

Okay. So I load the prompt. I format it and I don't print it for now. And then I will call the

um the the OpenAI client API.

Okay. So after I do that let's see after I do that I basically assign the output output would be

output would be what this right okay so now we have the predict uh function ready

okay so now I will uh write the predict function here okay now now let's come back to the data In the inputs, I want to specify that

key which is used in the predict function. So I will use the question key

function. So I will use the question key here. And in the question, let's ask the

here. And in the question, let's ask the question here. Let's say uh what is the

question here. Let's say uh what is the who invented invented uh

maybe I don't know uh who invented telephone? Okay. Now in the expect

telephone? Okay. Now in the expect expectation I will I will uh write something like expected response would be so let's see who

invented because I don't know who invented I think uh somebody Bernard Lee I I don't know who invented

telephone okay Alexander Garbell okay no Alexander

telephone okay so I'll copy this and I'll write it here.

Okay. So let's go with this first. Okay.

So we have input, we have expectations, we have expected response. Now we can come here and we can write the data.

Okay. So the only missing piece now is scorers. How how can we basically create

scorers. How how can we basically create scorers? So this scorers

scorers? So this scorers would be a list of all the different type of scoring methodology that you would like to use to evaluate uh your

you know answers. You can write custom scores but for now what I'll do is I'll use the inbuilt um MLflow's inbuilt correctness scorer.

GNI do.tscorer

dot dot sorry scorers dot correctness.

Okay, so this is one scorer that it offers. Now this correctness scorer is a

offers. Now this correctness scorer is a LLM based uh driven scorer. So let me pass it for now and let's run this experiment. So now if I run run it, what it will do is it will

under this experiment prompt evaluation.

It will create a run. Let's run this and make sure it is successful. So you see this to disable. So currently you you see this progress bar. So it is running

some uh validation. So you see evaluating evaluation complete matrix and evaluation results are logged. Let's

go here and let's refresh it. We should

be able to see this run. So if I uh if I click on it, we see this trace and if I click on it, I on the right I see some

sort of correctness evaluation. We will

look at what those uh things mean in the next video. I hope you are able to

next video. I hope you are able to understand so far how to set up the uh evaluation um with MLflow. But we will do a deep dive and look at other ways by

which we can evaluate u write these evaluations more in depth in next video. Thank you.

So in the last video we discussed about how to set up a evaluation prompt evaluation uh function using MLflow. Now

in this video we are going to deep dive into what exactly is the output of that evaluation. So what happens is that once

evaluation. So what happens is that once that evaluation is complete you get a new run inside that new run you will get traces for each of the data uh inputs

that you provided. So for the first data input if you see if I click on this trace so you will get two things here on the left inputs and outputs. Input would

be the prompt. So you see you are a helpful assistant and then uh this is the exact uh question that was 1 second

one second I think something uh is not right 1 second okay I think I did not use this question so let's do one thing

let's uh in place of this use this question and then run this evaluation again so let me close this and there will be new run that uh will get created

after this. So you have to watch out

after this. So you have to watch out this evaluation. So currently this

this evaluation. So currently this evaluation is running. It takes some time because it takes help of LLM to to basically do this correct correctness test. Okay. So if I refresh it, you see

test. Okay. So if I refresh it, you see a new run is created. And if I click on it, you see uh again there are inputs and outputs. In the inputs, you see you

and outputs. In the inputs, you see you are a helpful assistant. All these

things have been extracted from our prompt template, right? That we created earlier. that prompt template uh exist

earlier. that prompt template uh exist in in the prompt registry if you if you can um uh navigate that navigate there.

So if I open this FAQ bot and I'm loading the second version. So this is the prompt template that we are loading and we are replacing the question. So if

I go back I'm replacing the question who invented telephone and that question is coming from our uh evaluation data set.

I I I can call this this this as eval data set and this has eval matrix. Okay.

So correctness is something that we are measuring. Now what is the output?

measuring. Now what is the output?

Output is the output by the LLM model.

Now we need to figure we need to navigate to here Alexander Graham Bell is credited with the invent inventing of telephone. Now why it was so concise

telephone. Now why it was so concise because in the prompt we basically asked the LLM that you need to answer the question only in English answer

should be under three sentences and that is why it respected that. Now on the right you will see navigation um not navigation on the right n right part of

this screen you will see assessment there are multiple assessment only one assessment for now correctness I can expand this correction it is saying yes it means it is correct the response that

LLM has given it is correct how it is correct it is basically following our expected response okay so the claim states Alexander Graham Bell as the

inventor of the telephone. The document

explicitly states that this is credited with inventing the telephone. Now, let's

do something dubious here. Let me run this evaluation again. But here, let me So, I'm I'm kind of tricking the LLM

to give the wrong answer to to basically fail this test. So, let me uh do something like um let me write here I

don't know, Isaac Newton. Okay. So I'm

I'm I'm I'm doing this for fun but let's see what happens.

Okay. So who invented telephone? Isac

Newton. Now our evaluation is being run.

So this is our expected response based on our ground truth. So ground truth itself is wrong. But let's see what happens. So if I refresh it, we have a

happens. So if I refresh it, we have a new run and a new trace and each trace will corresponds to one data. So you see no why. Okay. Let's expand this. Let's

no why. Okay. Let's expand this. Let's

uh see the rational questions ask who invented the telephone. The document

clearly states that telephone was invented by Garbell which means this document states that telephone was

invented by Grabwell. Okay. And he was awarded the claim however names Isac Newton claim names as the Isac Newton as the inventor which is not supported or

mentioned anywhere in the document.

Therefore the claim is not supported by the document. So this is the document

the document. So this is the document that is returned by the LLM on the left and this is the rational that that it is not correct based on the uh you can say

based on the uh um you can say based on the claim okay so we are claiming that it is um Isaac

Newton who invented it okay let's let's go back to that uh Alexander Grabwell okay I hope you are able to understand what this correctness is doing now there

are some pretty cool thing that you can do um with only inbuilt things. So there

are some ways by which you can build your own custom scorers that we can cover right now. But uh that maybe we we will cover in the next video. But let me

cover one more important um you can say uh evaluation framework that is guideline. Okay. So there is one more

guideline. Okay. So there is one more thing that we can do. Genai dot uh scorer i dotg guidelines. Okay. In the

guidelines we can provide the guideline name. So suppose I create a guideline

name. So suppose I create a guideline such as uh is is uh professional is professional and I

can instruct this guidelines. So

instructions oh no no sorry guidelines.

So I can provide the guidelines.

guidelines again these these uh both correctness and guidelines will be LLM driven so LLM is acting as a judge here uh the

answer should be professional or maybe let me trick this also because in the prompt I'm saying it to give me

answer in as uh precise and less than one words but here guidelines I I'm saying should be professional and long maybe really long. Okay. So now it will

fail, right? So the reason I'm trying to

fail, right? So the reason I'm trying to um cover this type of situ situation is I want to show you how this is working under the hood. Let's run this and let's

run uh let's see how the output looks like.

Okay. So let's refresh it and see. Okay.

So this time a new run is created and let's open the trace. So you see correctness this time is uh true.

Correctness is true. Why? Because in the expected response we clearly stated that Alexander Garville will uh is the expected response for this question.

This is the claim. Now is professional.

Why it is false? Let's look at the rational. So it's saying the guideline

rational. So it's saying the guideline states that the answer should be professional and really long. The

provided response is indeed professional in tone. Okay. What is the uh response?

in tone. Okay. What is the uh response?

This is the response. It is professional but it is very brief and does not meet the criteria for being really long.

Therefore, the guideline regarding length is not satisfied. So here in this particular assessment, we are able to uh

see two different uh uh you can say correctness. We are able to evaluate uh

correctness. We are able to evaluate uh this uh this prompt using two different uh two different uh two different metrics. you can say

one is guideline and another is now there are custom scorers that you can develop which we will be looking at in uh next video but I think um this is this is going to be it for this video

but before I close out this video I want to take your attention to one important feature of uh this u uh this ML flow is

that in this tab if I come to the right so within this tab if you come scroll to the Right? You will see all the

the Right? You will see all the assessment and you will see this nice looking graph. Now currently I have only

looking graph. Now currently I have only one data and that that is why you are you are seeing only one trace and corresponding to that trace you are saying it is passed it is failed and overall it is it is going to tell you

what is the percentage of pass what is the percentage of failed. Okay for each of the assessment that you are doing.

Suppose uh hypothetically speaking if you have got 100 uh test cases here then there will be 100 traces here and then it will be a real test of your prompt

like how many of those um are correct, how many of those are professional and things like that. Okay. So I can I can quickly show you uh what exactly I'm

trying to uh say here.

Okay. So I have I have just added three basically um uh questions here.

uh maybe first question is what is the capital of what is the capital of uh India and let's write the answer as New

Delhi and then oldest what is the which is the oldest

university in the world.

Okay. And let's search for our oldest university. Oldest university in the

university. Oldest university in the world. And let's write that down.

world. And let's write that down.

University of Bologonga. Okay. So let's

write it down as the expected response. Now we will be evaluating three. Sorry. Control Z.

evaluating three. Sorry. Control Z.

Control Z.

University of H what is happening I think I'm not able to rectr Ctrl

+ V okay so now we have got three uh data points three data points in our eval data set and let's change back change uh it back to being uh

professional only okay and uh let's run this now so now you will be able to see the true power of MLflow.

So it will take some time because it is going to uh call LLM for for three uh test cases. So if I now

refresh it, so you will have a new run and in the new run you will have three um inputs and if I scroll to the right you will see uh one of them is failing.

Which one is failing? Let's see. Our

claim is that the oldest university is university of Bologonga. I don't know what where that university is. But let's

see why it is failing. What is the output? It's saying University of Al

output? It's saying University of Al quarin founded in this this. Okay, it is failing on the correctness. Why it is failing? Let's think it step by step.

failing? Let's think it step by step.

The claim states that university of is the oldest university. However, the

document explicitly states that oldest.

So, you get the idea, right? But on the professional, so on the professional assessment is professional, all of them are professional.

So this is how we evaluate prompts. But

there are much more things that you can do. So for example, if you have to

do. So for example, if you have to compare two different runs. So let's say I want to compare this salty run to the skillful run. So I'll select skillful.

skillful run. So I'll select skillful.

See what will happen. You'll be able to side by side see which of the which of your prompts. So it may happen that your

your prompts. So it may happen that your next this run is corresponding to another prompt and this run corresponds to some other prompt. So in this manner you will be able to make a uh

sideby-side comparison of both the prompts and you can uh clearly decisively say which prompt is scaling better, which prompt is doing better in

correctness and in in professionalism.

And you can imagine if you have got 100 of data sets and 100 of prompts, this can be really useful and handy thing.

Okay. So, thank you so much. In the next video, we will be seeing how to write custom scorers.

So far uh within our evaluation uh of uh prompts, we have been only using the inbuilt uh correctness and guidelines classes that are readily available by uh

that are provided uh by MLflow. Now let

us write a custom scorer. Okay. So uh

for writing a custom scorer you need to uh basically create a decorator function. So uh let me first from geni

function. So uh let me first from geni import scorer. Okay. And let's create a

import scorer. Okay. And let's create a function called custom scorer. Now this custom scorer can can

scorer. Now this custom scorer can can basically return certain things. Okay.

It can either return a boolean and it can take various things such as inputs, inputs, outputs and expectations. There

are other things such as traces also it can take but for now let's go with this.

We won't be needing inputs here but let's use outputs and expectation. Okay.

So suppose my custom that I want to implement uh is trying to check for exact match. Okay. So I will say exact

exact match. Okay. So I will say exact match uh exact match

uh and what I'll say is I'll simply say check uh match is equal to if my outputs

is going to be exactly equal to my expectation. Now expectation would be a

expectation. Now expectation would be a dictionary. So I would need to extract

dictionary. So I would need to extract the expected response. Okay. So I will just return

response. Okay. So I will just return this match one.

Let's try to use this scorer. Uh so I will plug this scorer after my guideline. So exact match that's it.

guideline. So exact match that's it.

Okay.

Okay. So exact match is going to take inputs. Now I'm not using inputs. that's

inputs. Now I'm not using inputs. that's

not um required. So let's now run this evaluation on the three data set.

And I'm telling you this uh affirm that it will fail for all of these because by default our prompt is asking for a generic answer. It is not uh strictly

generic answer. It is not uh strictly telling the model to return only the exact um thing. Okay, let's let's see.

Let's run and try to see what do we get.

So uh exact match missing three positional arguments.

Okay.

All right. Uh why it is not uh okay. So

we are getting three positional argument. Why do we need that?

Why? Okay. If I provide the exact match just like this, I'm I'm I'm hoping that something is wrong here. Okay. Let's

see. we are not able to okay so evaluation complete but let's now go back to the run okay so we have a new and then in the

trace let me get rid of this first please let's refresh it let's run it okay so I have opened this in a new tab in new tab you can clearly see so exact

match it is failing why it is failing let's look at that so it says simply false why because um we did not provide

any uh rational here but if let's say you have to provide the rational you can use something known as a feedback class.

So in mlflow so from mlflow dot entities you can import a class called feedback and instead of returning this match just like this you can return a feedback

class. Okay the the first argument of

class. Okay the the first argument of that feedback class will be this value and then second argument will be rational. Okay, let's come back here. So

rational. Okay, let's come back here. So

here um we are we are uh correctly uh you know getting this answer. We we are getting a professional answer but on is match we are not getting anything. Okay

let's now do one thing. Let's add one more uh ewell uh input to kind of see if we can get it to pass the test. So what

is I will I will ask it a very deterministic and simple answer. So what

is what is 2 + 2? Tell me

just that nothing else. Nothing else.

And inside here I will try to just print four. Now here is the issue because in

four. Now here is the issue because in expected response we can have uh something like four or maybe maybe we can have a list but let's let's uh run it because otherwise I will have to

change all the expected responses. Let's

see if we can uh get it to pass this uh test or not.

Okay. So, we have got four. Okay. All

right. Let's let's uh refresh it. So, we

are getting this and let's Okay. So, even this is failing. But

Okay. So, even this is failing. But

let's try to see what what was the model's uh response.

H okay. So, it has basically given this uh rational that 2 + 2 is equal to 4.

And in the test uh we are clearly saying uh checking whether output is exactly equal to expected response. But if I let's say uh

expected response. But if I let's say uh see uh check it something like this control x you can modify it if expected

response in outputs which essentially means that I'm I'm saying outputs would be a string and I want to check whether

my expected um response is in output or not. But let's let's try to see what

not. But let's let's try to see what happens.

Okay. Something Yeah, something happened. But let's let's see it in the

happened. But let's let's see it in the UI. Okay. Do we Yeah. So we are get we

UI. Okay. Do we Yeah. So we are get we are able to uh pass all the tests. Okay.

So let's open this trace in a new tab.

in the new tab what happens is here if you if you notice yeah if you notice um uh if I open any

one of the trace and expand this exact match um one I'm able to see that uh this is passing because here from the

earlier exact match uh uh restriction we expanded it to a very relaxed uh match

where I'm just checking whether the expected response is there in the uh outputs or not. Engineering is

everywhere right now but most of the time we pick prompts by intuition not by measurement. In this video I want to

measurement. In this video I want to show you how to evaluate prompts properly. We will take a real world task

properly. We will take a real world task that is resume skill extraction and I'll define some three prompts for the same input. Then using MLflow we will

input. Then using MLflow we will systematically evaluate which prompt actually performs better using various metrics such as correctness, coverage,

formatting rules and even custom business level scorers.

By the end of this video, you will have a clear framework to compare prompts for any LLM task not by gut feeling but

using reproducible evaluation. I hope

you are excited. Let's get started.

So before actually doing something and starting with anything I would like to frame the problem and task clearly so that we both are on the same page. So

first of all the problem is that uh we have multiple prompts and we would like to choose one of them which is actually better for solving our specific task.

Right now LLM solves tasks via prompts that is well known right. So we have a task it could be any task such as resume skill extraction or Q&A bot and things

like that. We have a input and we have

like that. We have a input and we have output A corresponding to prompt A output B corresponding to prompt two.

Okay. So we have got multiple prompts and different outputs. So we would need a systematic way to choose the best one.

That is the actually that is why uh we need prompt evaluation.

Now what is the task that we are solving?

The task that we are trying to solve is we have a resume and in real business setting maybe you are associated with a HR firm that might be dealing with

millions of résumés and that might want to extract skills from those résumés. So

our task is to uh take the in uh take the input as réumé and convert that into a skills list. So essentially our input will be resume text and output would be

clean list of skills. Okay. So we have a resume text as input. We will feed that text into LLM and prompts and we'll get clean uh list of uh skills that is

output. And here these two arrows

output. And here these two arrows represent uh two different sets of skill sets that we could receive via two different prompts.

Okay. So essentially what we want to do is we want to compare multiple prompts.

Okay. So we will have a resume text input and we will have a prompt A. So

maybe prompt A would be maybe something like shorter version of prompt and then prompt B might be uh instructed to extract

uh in a you know very strict manner and also provide us the structured uh output and all of these prompts will corresponds to one output not one out

output because if if you run the same prompt again and again you might be um you might likely to receive different um outputs but And but the idea is that uh

the more detailed the prompt would be the likelihood of uh the output would be same would be um would be very high.

Okay.

Now, now uh the next thing that we need to think about is let's say I have got four prompts and I would like to understand which prompt is better like

how do we mathematically and objectively define that a given prompt is better.

Okay. So we have various metrics and uh we will be using uh MLflow inbuilt matrix and some of the also custom scorer based methods that um are

provided. So we have got things like

provided. So we have got things like correctness. Now what is coverage?

correctness. Now what is coverage?

Coverage is nothing but a guideline based uh metric that we will be defining. We could have custom business

defining. We could have custom business logic. So so all of these things we

logic. So so all of these things we would we would be defining in the code that we will be writing in in a moment.

Okay.

So we can have LLM based scorers.

Essentially LLM based schoolers take your LLM output and the input and it tries to based on its capability of uh language and reasoning it tries to

understand the output generated by another LLM and give you a sense of uh semantic quality of how much the outputs

are related to the input. Then we could have rule rule-based scorer in which we can take the output and we can we can apply some uh heristic rules such as

maybe calculate the length of the output or maybe split the text split the output by comma and try to see how many skills we got. Okay. So essentially what we are

we got. Okay. So essentially what we are doing is we are evaluating outputs here not actually the model or rather I should say we are evaluating prompts.

So uh what are the scorers we have uh we are going to use u although there are like lot of scorers we will be using correctness and guideline. Now all these

must be looking very intimidating but essentially what we will be doing is we will be using the inbuilt guidelines scorer by MLflow and then there are other scorers that we

will be uh writing from scratch which would be called as custom scorers. Okay.

All right. So, essentially uh what we want to do is from ad hoc testing to reproducible evaluation. That is the

reproducible evaluation. That is the idea. We don't want to basically um

idea. We don't want to basically um randomly say that this prompt was u uh better. We want to show that this prompt

better. We want to show that this prompt was actually better. Okay. So, that is the idea. So, what I'll do is I'll just

the idea. So, what I'll do is I'll just uh open my project folder. Now, this is my project folder. I have already activated my virtual environment. And if

you are new to this video, you might want to check out uh some of my older videos where I basically demonstrated how you can set up MLflow on your laptop

and also how to get started with it. So

I'll I won't be showing all of that in this uh video. Okay. So let us uh first of all create a plan first of all. Okay. So I I'll create a plan.txt txt file and in this

plan I will write down uh what would be the uh steps that we are going to follow in this video to set up this evaluation

uh uh system. Okay. So first thing first I want to have a rum. Now how can I get rums? Okay. So you can basically go

rums? Okay. So you can basically go ahead and download some sample rums from um uh from internet but what I will do is

I'll just use my own. So I have got uh my CV uh downloaded here uh on my uh system. If I have to show you that uh so

system. If I have to show you that uh so here it is the it is my CV. So now what I'll do is I'll try to basically use this

uh within uh my project. Okay. So the

project is resume to skill extraction.

So let's define some of the steps that we are going to follow. Okay. First

would be uh we would be trying to uh create a data injection system. Although

it's not a very uh sophisticated data injection system. This would al only

injection system. This would al only include reading the file, reading the PDF, but this can be scaled to you know any

number of PDFs that you might be having in your um real world setting. So second

step would be clean the PDF. Okay. So

these two steps we'll be following and demonstrating. So data injection is

demonstrating. So data injection is done. After that we'll uh we'll set up

done. After that we'll uh we'll set up MLflow. Although MLflow is already

MLflow. Although MLflow is already installed uh we just want to uh write down some introductory code that would allow us to basically log certain things. So we will

be writing uh we'll be creating a experiment creating a a new experiment.

Then after creating the experiment we would be importing some packages from MLflow and then

we want to uh create create a EVL data set.

you want to create a eval data set and I will tell you how that uh looks like the format uh in which the eval data set is uh return.

After creating the ewell uh data set we will be in the third step we will be evaluating the

eval data set.

Okay, before that the fourth step in the in the sec uh second step would be define a list of scorers.

Okay, if applicable, if applicable, write a custom scorer. Okay, so this would be the uh framework that we will be coming back to during this video. So

let's get started and let's uh quickly start with the project. Okay. So I'll

say project 1.py

and in this project uh first I want to uh create this pipeline quickly. I don't

want to spend too much time on uh you can say uh cleaning the data set but let's quickly get it sorted. So I will import the pipdf module

and from pi pdf module I'll create a reader and this reader will be essentially a pdf reader and here I want to pass the stream of the PDF. So let me

show uh look at the path of my PDF users.

I should be having a path users and then uh Rahul and then desktop

desktop and then I got one CV here. So

let me list down this directory. Okay.

So I've got this PDF file here. So I

will just grab this.

Okay. And after this I'll append this name.

Okay. So after getting this reader I'll loop through all the pages for page in reader dot pages.

And what I'll do is I'll define a one second. I'll define a resume text

second. I'll define a resume text variable here which will be an empty string and I'll just append to it all the pages. So

page dot extract text. Now let's quickly look at how does

text. Now let's quickly look at how does uh this resume text looks like. Print

resume text. I'll print the first 100 characters maybe.

So let's print it out.

Python project one. It says uh reader dot uh object is not callable. Why it is

saying that? Because we did not have to

saying that? Because we did not have to do this.

Okay. You see this it is loaded now but it is very uh messy right now. There are

a lot of uh you know unnecessary characters. So we would like to solve

characters. So we would like to solve that first. Let me define let me import

that first. Let me define let me import RD package and let me define a clean text function really quickly. This will

take a text and return a text. So first

we want to lower the text and after lowering the text I'll just return uh I'll just uh save this in text and uh after lowering the text the next

thing that I want to do is I want to substitute all the uh you know uh not allowed characters. So what I want to do

allowed characters. So what I want to do is I want to take my text and I I would like to substitute what I would like to substitute. So

anything that is not a allowed character I would like to substitute it with with a space and I would like to apply this to text. Now what is my pattern? My

to text. Now what is my pattern? My

pattern would be starting with R and the pattern would be something like I want I don't want anything

besides A to Z and also space. So I I want to keep space as the special character. And uh the reason I'm not

character. And uh the reason I'm not keeping the uh uppercase A and Z because I have already lowered the text. Okay.

Now let's try to return this text and let's see what happens. Although it's

not complete yet but let's see uh what is happening. So I would like to say

is happening. So I would like to say cleaned resume text would be clean text and I'll pass

this resume text and let's see how does it look like. So I would just Ctrl Ctrl + V and here I want to print the clean resume text. Okay, let's try to print

resume text. Okay, let's try to print that. If you see uh there is not much

that. If you see uh there is not much difference right now. The only

difference that I see is that objective has been lowered uh to lower case. Now

how can I just remove these characters because there are some extra spaces here. So I want to get rid of that. So

here. So I want to get rid of that. So

what I'll do is I'll apply one more substitution and this time the pattern would be I want to remove any uh any space which

is occurring uh more than once with one space. Okay. And I want to apply it to

space. Okay. And I want to apply it to text. Now you will see a drastic

text. Now you will see a drastic difference. So let me run this now.

difference. So let me run this now.

If you see this one was the uncleaned version and this one is the cleaned version. So you see this. Okay. So now

version. So you see this. Okay. So now

it is making sense. Now we have the data injection pipeline ready. Let me

uncomment this uh print statements.

Although in your real world settings what will happen is you would be having separate scripts for uh taking care of um uh you know all the things. You might

be having a separate Python script to handle the data injection pipeline, separate Python script to set up the uh experiments and all. Okay. So I hope it

makes sense. For now, let's uh keep

makes sense. For now, let's uh keep everything in this uh project 1.py file

and then we can um take a look at how we can basically uh refactor this code into separate scripts.

Okay.

So now the next uh let's let's look at the plan. So we have completed the first

the plan. So we have completed the first step. Now we need to set up MLflow. So

step. Now we need to set up MLflow. So

let us do that. So I will I will be starting importing MLflow from here. So

first to start to set up the experiment what I need to do is I just need to uh call this uh method called set experiment. And here I will just name it

experiment. And here I will just name it as resume skills extraction prompt evaluation.

Now what happens is if you are new to MLflow then you might be uh might be interested in knowing how does this experiment work. So there is this

experiment work. So there is this experiment which is at the top and then within experiment you have various kind of runs. Each run uh is essentially uh

of runs. Each run uh is essentially uh used for doing some sort of uh uh hypothesis testing you can say. Uh so

this is a prompt evaluation experiment.

So within that we can have several experiments to try out uh several runs to try out different things. Okay. So we

are we have set up the experiment. Now

at this point if I run this. So first

let me uh start the MLflow server.

After starting the MLflow server I can come on browser and can go to this URL to look at this. If I go inside the experiment. So we have got already um

experiment. So we have got already um some experiment because this uh server has been set up as part of the uh YouTube playlist that we already did on

MLflow. So uh we have already got uh

MLflow. So uh we have already got uh several uh experiments uh which are currently there. Now what we can can do

currently there. Now what we can can do is we can open a new terminal and we can run this script again and this time what will happen is 1 second it will create

this experiment because this experiment does not exist right now. Okay so I I'll run this project 1.py so it's saying uh cannot set a deleted experiment as an

you can restore the experiment or permanently delete. Oh I see. So let me

permanently delete. Oh I see. So let me prompt evaluation new. Okay, let me write it as this.

new. Okay, let me write it as this.

And this time uh it has created a experiment. So if I refresh it, so we

experiment. So if I refresh it, so we have got one experiment. Now it asks me what type of experiment this is because we are evaluating genai based models. We

will be choosing gen apps and agents.

Okay. So we have got this. You can come here on the evaluation runs. Currently

we don't see any evaluation run because we did not uh run it um at this point.

So let's uh set up other things. So in

the plan we have set up the experiment.

Now let's import some of the required uh MLflow packages.

Okay. So uh what do I want to do? So uh

I think I should have also in the plan uh return one step to basically set up uh set up open AI or any LLM. Set up any

LLM. Okay. So let's come to that. Uh

LLM. Okay. So let's come to that. Uh

let's come back to the project 1.py pi

and then let's import some packages. So

I would like to import from geni doscorers. So we have we have to use two

doscorers. So we have we have to use two scorers two inbuilt scorers that are available by uh that are made available by um mlflow. The first one is

correctness and the second one is guidelines. Okay. And then we will also

guidelines. Okay. And then we will also be defining our own custom scorer.

For that we need to import a decorator called scorer. Okay. So uh this is it

called scorer. Okay. So uh this is it and there is I think there is one more thing that we could uh import although I'm not sure how often we are going to use it but let's import it. It is called

feedback class. When we write a custom

feedback class. When we write a custom scorer we can either return a boolean or we can return a detailed um output with rational of uh the decisions using the

feedback class. Okay. So after this I

feedback class. Okay. So after this I would like to create my ewell uh data set. So I will like to create my eval

set. So I will like to create my eval data set. Now this ewell data set sorry

data set. Now this ewell data set sorry will be a list of dictionaries. Each

dictionary must have keys such as inputs.

Input will be a dictionary. It could

also have outputs. Now there are two options with you here. You can either create a output within here or you can define a prediction function which will

actually um interface with any LLM of your choice. maybe openis lm or geminis

your choice. maybe openis lm or geminis but that's your choice we will be setting up openi so we don't have to pass this outputs otherwise we would have to pass the actual output that uh

you want to evaluate okay so uh I would like to just take uh inputs here for now and I would like to uh also create expectations now expectations would be

you can say um uh what are the expectations from for this question so whatever your input is uh you want to uh understand what what would be the

expectation from this that okay so I will have a question here and this question will be something but let's define the expectations first

expected response so in the expected response we have to define uh what is the expected response

for this question okay so now what we need to do is why it is uh giving us okay I think uh I should have done done this. Okay. So now this is uh this is

this. Okay. So now this is uh this is the format that our eval data set uh will be having. Okay. So in the expected

response what we can do is we can write down what uh I would like to do or what I can do is I can also define other keys here in the expected maybe something

like skills. I can define list of skills

like skills. I can define list of skills that I would like to capture for a given rum. So now I can say something like uh

rum. So now I can say something like uh in the question and essentially this question is nothing but uh cleaned resume text. Now why I'm saying uh this

resume text. Now why I'm saying uh this is nothing but clean resume text and in in place of key I should have just added here clean resume text. Okay. Now uh I I

need to define this expected uh response um in a moment but before that I would like to define a predict function. Now

this what this predict function is going to do is it is going to take this clean resume text and you have to name it exactly the way you defined it in uh

inputs and it would return the response. Okay, we would be defining

the response. Okay, we would be defining the response in a moment. So what what is going to be the response and response should be a string format. So let me uh

explicitly mention it. So it should be uh in a response. So essentially either you can uh so maybe let's let's let's do one thing let's uh for now

define the response as maybe python ml just just uh for the sake of simplicity for now I I'm not setting up open's API

for now just use this okay so python ml let's expand this a bit uh maybe I don't know skarn and linear regression

linear regression or maybe I don't know um machine learning.

Okay. Now after this so this is my response. So I'm assuming that this will

response. So I'm assuming that this will be coming from uh where from uh from the uh LLM. Now this would be my expected

uh LLM. Now this would be my expected response. Okay.

response. Okay.

Okay. So the the idea is that uh in here you will have a code to basically use this clean res text. For now I'm hard coding the response but this is how it

works. Okay. All right. So now what we

works. Okay. All right. So now what we need to do is uh we need to uh basically set up the scorers. Let's set up a scorers.

Scorers will be a list of scorers. So

first is correctness. So correctness

won't be taking any argument because correctness is essentially an LLM based uh correctness measure. Uh

and what it will do is it will use the any setup LLM that you have available uh in your local system. Okay. Then we have guidelines. Again guidelines is also a

guidelines. Again guidelines is also a LLM based um scorer. In the guidelines let's uh create a uh coverage based guidelines.

Okay. coverage based guidelines. So I'm

I'm naming this guideline as coverage and then we will be defining guidelines.

So what exactly? So the these will be human readable instructions for these guidelines. Are all the skills

guidelines. Are all the skills captured? Okay. So this is the

captured? Okay. So this is the guidelines job to tell us whether all the skills are captured or not. Finally,

we will have a custom based scorer. So

here custom based scorer may let's assume uh we are writing one scorer called um uh minimum

minimum five skills okay now this scorer is something that we need to define so let's define it right now define scorer and here I can write

scorer uh decorator that's how uh that's how we can call it okay now this minimum skills u um minimum from five skills uh

decorator function it is it has to accept some arguments and the names have to be names are very restrictive you can't cannot name the argument anything

so you need to pass only inputs outputs you could pass only inputs outputs and expectations nothing else you can pass and what will the what will be the value of inputs inputs will be exactly like

this this will be the dictionary out in fact I can even um show you by printing it right now. One second. So let me

show you these uh three by printing. So

what will be the inputs? Inputs will be exactly this. And what will be the uh

exactly this. And what will be the uh outputs? Output will be uh this returned

outputs? Output will be uh this returned uh string. And what will be the

uh string. And what will be the expectation? This will be the

expectation? This will be the expectation. So the idea is that u you

expectation. So the idea is that u you can take this inputs, outputs and expectation within this minimum five skills uh function and you can do anything with it. When I say anything

essentially you have to return some boolean or some feedback based um uh based uh u response. So you can return a feedback also you can return a boolean.

So for now let's uh assume that it is passing. So minimum five skills are

passing. So minimum five skills are happening. We have to write the logic

happening. We have to write the logic here within this function. But for now we are not writing logic even in predict function and also in minimum file skills. So now we are ready to run this.

skills. So now we are ready to run this.

But we just defined the scorer. How are

we going to evaluate on eval data set?

That's the final step. So uh we will call mlflow dot jenni.ealuate.

This evaluate will take a data and data will be eval data. So we defined the eval data and it should be in this format only.

So let's define eval data set and then after eval data set we need to pass predict function. So we have already

predict function. So we have already defined our predict function. Okay. And

finally we need to define a scorer. Now

right now if I run this it would it would be failing. Why? Because this

correctness uh works on llm but let's try it out and let's see what happens. Okay.

Okay. So python

project 1.py pi. So we have already set up one experiment.

Uh where is the experiment?

Yeah. So this this is the experiment. So

this is this will actually create a log and uh that log will be so so what is happening? Huh? So we are printing it

happening? Huh? So we are printing it here. Right? So what what is happening

here. Right? So what what is happening is input is coming from here. You see

clean resume text. This is how we define input. Clean resume text and clean

input. Clean resume text and clean resume text is a very long text. Okay.

After that what do we have? After that

we would be having outputs. What is the output? Output we return from here and

output? Output we return from here and we should be having outputs here. Right?

So this is the output and then we have expected response. This is the uh so

expected response. This is the uh so finally we are printing expectations.

Expectations would be this is the expected response for now skills is just empty. Okay. All right. And then

empty. Okay. All right. And then

evaluation is complete. Now let's see what happens. Currently you you see

what happens. Currently you you see correctness uh uh and guidelines did not throw any any error but let's see what do we see here we see a run name if we

click on it and if we look at it so this this particular thing is called as trace so this is a run within this experiment this is a run and this is a trace and if

you open a trace in trace if you see this was my input and this was my output right okay make sense on the right If you come correctness is throwing an

error here. What is the error? Let's

error here. What is the error? Let's

expand this. The error is that open API key is not available that we need to figure it out right now. Okay. So we so that was the step that is set up any LLM

and in fact you you you are not restricted to using only open AI but essentially you can use uh any LLM that you might be familiar and comfortable

with.

Okay. So let's set up the LLM. Now for

LLM uh I have created this env file that you see on my left. This env file contains a key called open AI API. Now I

will be loading that um that env file in my environment uh during the run of this script. So I will I will say from env

script. So I will I will say from env import load env.

So what happens is this load env file in a given path. So I can pass a path here or I can just leave it empty.

Then in that case load.env will look for env file in the current directory. That

is what will happen for now. And what it will do is it will so when we run this Python script it creates a new process and each process have its own uh set of

environment variables. And what this

environment variables. And what this load env is it will just inject any variable that are there inv.

Now after this line what we can do is we can import OS and we can load the open AI key. Although we don't have to load

AI key. Although we don't have to load it because essentially uh the correctness and guidelines metric will be utilizing those uh automatically uh

uh uh using their backend uh uh code.

But let's try it out. OS.get get envi API key and we need to define it just like the way um they are expecting. So

you see they are expecting just like this and I have defined it. I I won't be opening this because obviously this contains my API key but you get the idea. So now I have got this API key.

idea. So now I have got this API key.

I'm defining it in variable but I I did not have to because I already loaded it loaded it. um and these correctness and

loaded it. um and these correctness and guidelines should be able to take it.

Now this time it will take some time and also now that you have got this idea of inputs outputs I'm just uncommenting uh sorry commenting it out and let me run this project again and this time it will

take some time because this time it will actually making the open AI API call using the uh open AI API key that we configured.

So you see evaluation has started and it is taking some time right because it it actually uh went ahead and uh ran the openi query. Okay. So now refresh it and

openi query. Okay. So now refresh it and we will see one more um run. Let's click

on it and we see one more trace. And if

I click on it you see this time correctness and uh was yes. What does

that mean? So if I expand this now what will happen is it it will try to look at your uh inputs. It will look at your inputs

no sorry not inputs but outputs and it will look at the expectations defined here and it will try to compare those by the LLM. So it says let's think

step by step. The document clearly mentions Python multiple times as a tool used in various projects and rules. It

also references ML extensively through experience with M things like that. So

overall all components of the claim this are directly supported by the document.

Okay. Currently if you notice I have not set up any prompt yet. I I

just loaded u the clean rum text as it is. I did not tell the LLM to do a

is. I did not tell the LLM to do a certain thing. And in the predict

certain thing. And in the predict function I'm just returning the response as it is. Okay. So there is no prompt evaluation which is happening right now.

So the next step would be in fact I I should have also added that here to uh define some prompts. So define

some prompts and we will be doing that from the portal. We can do it from um code also. If you want to know you can

code also. If you want to know you can check out my uh video on how to register prompts in MLflow. But uh for now let's

do one thing. Let's uh come here in the prompts and uh let's create a new prompt. Okay. So I'll create a

prompt and in this prompt what I'll do is I'll uh define resume skill

extraction prompt. Okay. And here uh I

extraction prompt. Okay. And here uh I will just write you are an intelligent assistant helping the user.

The user extracting skills from the resume text. Okay.

resume text. Okay.

And here I'm writing a placeholder for resume text. Let's see how I how have I

resume text. Let's see how I how have I defined it.

Or maybe I can say clean resume text. Okay. So and so here is the

resume text. Okay. So and so here is the resume.

Here is the resume.

Okay.

Okay. So here is the resume extract all the skills. So this is my first version

the skills. So this is my first version of prompt. So I I'll just name it as

of prompt. So I I'll just name it as version one and I'll create it. So now

my prompt is ready. Let's create another version. So create prompt version. Now

version. So create prompt version. Now

here from here what I can do is I can uh edit it. I can say the output

edit it. I can say the output should be in this format.

So how uh should I want how how do I want the format to be? I want the format to be in a JSON format

something like this skills and skills would be a list of skills and

I can add comment list of skills.

Okay.

uh the output should be strictly in

in the above JSON format.

Okay. So let's create another prompt version. So now we have got two prompt

version. So now we have got two prompt versions ready. Now let's uh do one

versions ready. Now let's uh do one thing. Let's uh run the evaluation. So

thing. Let's uh run the evaluation. So

how am I going to do that?

So what I'll do is I'll I'll uh I'll do uh I'll define prompt uh versions. So

versions.

So I I want to load two versions. So I

want to evaluate two versions. So I'll

loop through it. So for version in versions and let's rename it as versions.

Okay.

So I I'll load the prompt.

prompt is equal to uh prompt would be uh uh mlflow dot geni.load

prompt and in here I want to pass the URI. So it would be prompts

URI. So it would be prompts and the prompt name. So this is the prompt name and prompt version. So I

want to write a placeholder for v here.

That is why I want I I need to have it as uh format string. So now I have the prompt loaded. Now I would like to

prompt loaded. Now I would like to format it. So I would like to write it

format it. So I would like to write it as formatted prompt prompt dot format

and in here I can pass my uh this uh what is what was the uh one second huh clean rumé text. So I can pass this

variable in the format as the clean rumé text. So how did did I define it clean

text. So how did did I define it clean réumé text.

So now if I for now just print the formatted prompt. Let's see how uh it

formatted prompt. Let's see how uh it looks like. So uh for that I I just want

looks like. So uh for that I I just want I don't want to evaluate uh this uh for now. I I would just like to see the

now. I I would just like to see the prompt for now. Let's run it.

Okay. So we have got uh two prompts here. In one case it says extract all

here. In one case it says extract all skills. Why? Because if I see the first

skills. Why? Because if I see the first prompt, it says extract all skills. In

the second prompt, it gives a detailed instruction into how would I like the extraction to happen. Okay, let's see uh how can we use it use it now. So I would

uh uh just get rid of this.

Okay, now in the formatted prompt now I would like to grab this eval data set here.

Okay. So uh

1 second control Z.

So this was my uh clean resumeumé text.

But now I can replace this with my formatted prompt. Okay. So this so I'm

formatted prompt. Okay. So this so I'm loading the prompt. I'm creating a new eval data set for each of the U version

and then I can call this okay but for now it's not uh done yet because we did not define the

uh we did not define the uh predict function here. So predict function is

function here. So predict function is currently uh using the clean rum text and it is just responding with this uh response uh this hard-coded response. So

how can I change this um response to load from the uh the prompt to load from the actual prompt? So how can I do that?

So what I can do is possibly I can define the uh open interface within here. For that we need to uh import open

here. For that we need to uh import open AI also. So from open AI import open AI

AI also. So from open AI import open AI and then in the open AI what I want to do is I want to create a client

API key would be API key and here I want to create a new response which which is going to be client dot

chat.comp completion dotcreate

chat.comp completion dotcreate and here I can define the model GPT4 mini let's go with that

and uh let's write the messages I can write the messages and the messages would be uh role

uh role would be user and content would

clean réumé text. Okay. Now, I don't want this clean réumé text here. Uh

sorry, sorry. Uh I I want this clean rum text to match the the eval data sets clean rum text. Okay. Because this is what will be used here formatted prompt

in place of that. Okay. And in place of uh response I want to um return choices dot uh

choices dossage.content. content. Okay.

choices dossage.content. content. Okay.

So this is how we can get uh this. So

after this we pass the val data set which is going to take the formatted prompt that we are passing uh that we are essentially passing here as well. So

this formatted prompt and this is going to be the expected response. For now

let's uh don't uh write the uh this minimum five skills function. Let's see

whether uh this is getting run successfully or not.

Let's run this Python project one.

So it is running but I don't see the evaluation line. Okay. Yeah. So now we

evaluation line. Okay. Yeah. So now we see the evaluation one.

So it will run this evaluation I guess two times because we are running this in a loop but surprisingly it is taking yeah so it

so first evaluation is complete it will run one more evaluation essentially we are running two uh evaluation for two different prompts and I'll show you how you can actually

compare uh the uh two results in a moment.

Okay. So now both of our evaluations are complete. Now if I refresh it by going

complete. Now if I refresh it by going in the experiments. So you see there are two u experiments which are basically used here. So let's let's look at the uh

used here. So let's let's look at the uh the first one. So first one in the input we were using uh the extract all skill.

So it is a very simplistic uh prompt uh the prompt version one in the correctness we see the claim mentions this the document exactly list. Okay. So

this this is right. So correctness is getting passed. Coverage in the coverage

getting passed. Coverage in the coverage uh how did we define coverage? So

coverage was defined uh via the guidelines class. Uh that is are all

guidelines class. Uh that is are all skills captured. So it says yes the sole

skills captured. So it says yes the sole guideline provided are all all the skills captured. The input is a detailed

skills captured. The input is a detailed rum text containing numerous skills across programming languages blah blah blah. And then comparing the detailed

blah. And then comparing the detailed rum text with the extracted skills. The

response captures all explicitly mentioned. So what is the response?

mentioned. So what is the response?

Let's see the response once. So you see uh the the output is uh so what what the uh openi is doing. It is extracting all

the skills and it it is not adhering to the uh structure that we uh provided in the version two. Let's let's look at the version two trace. In the version two

trace, let's look at the output once.

Yes, you see in the outputs we are uh basically uh getting a you can say uh uh JSON JSON with a skills and uh minimum five skills is getting passed coverage

is getting passed in the minimum five skills for now we are just returning the true we can fix that in a moment but uh this is how it it is looking like okay

so now how can I say clearly that uh minimum five skills so Let's see here. So let's

write uh minimum five skills u again. So

what I want to do is I want to take my outputs. So outputs would would be a

outputs. So outputs would would be a string and I would like to load try to load it as JSON.

So maybe what I can do is I can define another scorer which will be something like is JSON. Okay. And it would take outputs

and if it would try to load. So import

JSON try uh JSON dot loads outputs. So if it is successful it will

outputs. So if it is successful it will return true here.

Okay.

And if if there is some error we will return false here. Okay. So uh and I can also add this is JSON

evaluation uh here is JSON based scorer here. Now in the minimum five skills

here. Now in the minimum five skills what I can do is first I would like to do the same exact

thing here. Why? Because I I can count

thing here. Why? Because I I can count minimum five skills only when I can structurally load this. Right? So I can say if uh we have the skills JSON

if it if we don't have uh we we are not able to load it then we will return false and I I can also get rid of this.

Okay. Yeah. So after this skills JSON I'll extract the skills.

So uh I can say something like if skills JSON dot get uh get skills. If we have a key called skills, uh we will return something like

maybe for now let's uh keep as true uh else return false and then if we have the skills I

would like to count the length. So if

length of skills JSON is greater than five then I would

like to return it as true otherwise I would like to return it as false. So you

are able to see the picture right? So

first we try to load the JSON if we have if we don't have the JSON we return false. If we have a skills otherwise we

false. If we have a skills otherwise we return this. I can also get rid of this

return this. I can also get rid of this now. And if we if we have a skills and

now. And if we if we have a skills and and the skills is having a length of greater than five then also we do this.

Okay. So this is how we basically u create these two custom scorers that is minimum five skills and ISJ. So now let us run the evaluation once again because

in in real world projects you just don't uh want to want the output to be correct and follow the guideline but also you

want the output to um be uh following the coverage part and also the uh structure part. Okay. So let's run this

structure part. Okay. So let's run this one more time, one last time and let's see how our prompts are performing and then I'll also show you how you can

actually compare it compare them side by side. Let's wait for a couple of seconds

side. Let's wait for a couple of seconds for them to complete.

Currently first prompt is getting evaluated or me meanwhile I I'll just come here.

I'll show you how you uh you basically evaluate it. Come here in the evaluation

evaluate it. Come here in the evaluation run open this. So if you see on the right you see this assessments here and if you have to compare it. So casual do

run you want to compare with funny one.

Okay, let's see uh the Okay, I think uh one evaluation is completed. Let's see

one second.

Okay, so we have we have the uh evaluations ready. So let's refresh it.

evaluations ready. So let's refresh it.

We we have two evaluations. So first is honorable that is with uh uh that is with the uh second prompt. So we we are seeing uh correctness and coverage here.

So let's uh load this. Let's view this uh in a separate page and let's let's try to see it. So you you see is

JSON false. It is not JSON

JSON false. It is not JSON even with the second prompt minimum five skills false. Why? I don't know yet

skills false. Why? I don't know yet because we we need to dig into why u it basically u is not showing us the result. But let's let's try to see the

result. But let's let's try to see the second one. Vulnerable cat is done. This

second one. Vulnerable cat is done. This

one here also we are not getting the results. Why it is happening? One

results. Why it is happening? One

second.

Output is okay. So this is this this is u expected because we are not getting a uh JSON based output. But here

we should be having a JSON based output.

Are we are we are we getting as output anything else also?

Okay. So I think uh to to be able to see that why it is failing, we need to uh debug the output. But but you you um are

able to see the overall picture, right?

The overall picture is that this is how we define our custom scorers and this is how the prompt evaluation works. And to

compare the prompt side by side, the way it works is uh we can come here and we can compare it to one run and then uh these will appear uh like this. Okay.

Ah, I see. Because JSON is uh is being returned here and that is why uh it is not able to parse it clearly. Oh, I see. Okay. But

anyways, uh I think this video is already 50 minutes long. I would like to stop right here. But you got the overall idea. The idea was to show you how you

idea. The idea was to show you how you can use MLflow to structurally and in a reproducible manner compare your prompts

uh using both inbuilt uh scorers and also the custom scorers.

Please let me know if you have any feedback. I would like to u cover them

feedback. I would like to u cover them in uh next set of videos that I prepare.

Thank you so much. Hello everyone. In

this video, we are going to be looking at MLflow's AI gateway feature. We have

already seen prompt management and model management using MLflow. But now we will be looking at how you can manage your LLMs or rather LLM providers in a much

more efficient and effective manner.

With this AI gateway, you can create several API keys and you can securely use those API keys within the endpoints.

And in endpoints, you can have several different providers, several different models. You can also have traffic

models. You can also have traffic splitting the way um the way you can see on your screen. I'm basically using OpenAI's uh GPT 40 model and Google

Germany's Germany 2.5 flash model and I'm splitting 50% traffic to it and all of uh and apart from all of these you

also get uh one uh common interface to call these application uh if I can show you. So you can see I'm getting this uh

you. So you can see I'm getting this uh this URL which I can use irrespective or without knowing uh how the internals of

Germany or OpenAI's calling work. Okay.

So I feel this is much more much more powerful than individually coding uh the rappers or maybe you using the rappers uh providers SDK in your code. Okay. So

if you if you want to uh if you are excited about this video then please continue. Okay. Thank you so much.

continue. Okay. Thank you so much.

So first of all let us quickly set up our virtual environment. So I will uh create a virtual environment really quickly

using the v package and after this let's activate this virtual environment.

After activating this, I would like to install the MLflow package. Now I want to install this package with GNI extension. So I

will specify that.

Okay, it will take some time although I did install it previously. So it will be using mostly the cached versions of the packages. So let it uh complete.

packages. So let it uh complete.

Meanwhile, what we want to do is we want to set up two different uh models from two different providers into the API

gateway. So, let's arrange the API keys

gateway. So, let's arrange the API keys for that. Uh for for one of uh the

for that. Uh for for one of uh the projects, I'm going to be using the open AI and another one um I will be using

the Gemini API key. for Germany API key.

Uh AI studio is a great source and from there you can get API keys for free without even giving your credit cards.

So I will come here on the bottom left and I'll click on get API key and I can click on create API key. I need to select a project. You can select any project or you can uh create a project.

It's free of cost. And then I will hit on the a uh create API key.

After that, I'll just uh copy this key and uh for now, let me save it in a text file. Okay, API keys dot maybe text.

file. Okay, API keys dot maybe text.

Okay, so this is my Japanese API key.

And then there is uh one more API key for my OpenAI model. Okay, so now that we have the uh packages installed and

API key set up, now we want to start the MLflow server because that's where we are going to be creating endpoints and API key. I'll tell you what endpoints

API key. I'll tell you what endpoints are in a moment. So I'll create uh I'll do MLflow server and I'll specify the

port. I'll specify 5,000 or any port

port. I'll specify 5,000 or any port which is not in use. Currently for me 5,000 is not in use. So you want to make sure that 5,000 is something that you

can possibly use. So let us wait for the server to spin up. It will create a database file because currently I'm on my local environment. The database file

will be created here. So you see this mlflow db file. So this is a simple SQLite database which will be used by

mlflow. Now after this I want to

mlflow. Now after this I want to basically navigate to my uh browser and I want to go to this um URL called 127

which is called home and at this route.

If I hit enter you will see this interface. If you navigate to

interface. If you navigate to experiments models all these things are just empty but we want to go to AI gateway here. Now currently MLflow is

gateway here. Now currently MLflow is developing this feature um in a so it's not uh I would say matured but uh two features which I

personally have used is the uh managing API keys and endpoints. Now how do they work? I'll tell you. So endpoints is you

work? I'll tell you. So endpoints is you can say a centralized um you know place where you can have multiple uh model LLM

providers which you can register and you can split your traffic between uh those and in in in this manner you can do pretty cool things like looking at which model is performing best for your use

case. Okay. So I'll show you how to

case. Okay. So I'll show you how to create endpoints but let me first show you how to register API keys. So I will be registering two API keys. So let me

create first API key uh first. So it is loading provider. So let's create a API

loading provider. So let's create a API key for open AI. I'll I'll I'll say my API key open AI. So you might be having

um a situation where you are into um in in a large organization where multiple different API keys exist within OpenAI.

So you can manage all those sort of things uh here. So let me grab my um open API key here and then come here and I'll just paste

it. Now I don't need to provide a base

it. Now I don't need to provide a base URL. So I'll just create API key. Now I

URL. So I'll just create API key. Now I

will go come here again. Then this time I'll use Germany. And here

uh I'll say my my API key.

And I'll grab my Germany key that we uh grabbed from the AI studio and I'll paste here and create API key. So now we

have got these two API keys which we can use and load. So there are interesting things that you can do in your Python script. So that so it's like your uh you

script. So that so it's like your uh you can say envile. Now you no longer need to load from ENV. You can load from here also. Although I know it's not a very

also. Although I know it's not a very exciting use case but let's go to endpoints and you will see um where AI gateway MLflows AI gateway can shine. So

I will come here and create an endpoint.

So I need to basically um uh specify the name of the endpoint. Now you can say you can think of endpoint as a container. Now by any chance if you have

container. Now by any chance if you have worked with Azour endpoints um realtime endpoints or maybe um uh SageMaker endpoints you would understand what I'm

trying to say here. Endpoints are a way to have different models deployed within the endpoints. And here AI gateway is AI

the endpoints. And here AI gateway is AI gateways endpoint is very similar to that under one endpoint. Uh you can have several different LLM models and it's

just a wrapper. But the nice thing is that now that we have got uh you know thousands of LLM models you don't have to manage uh you can say the interface

between uh all those different LLM providers. you can remember and recall

providers. you can remember and recall only one interface and that is MLflow's interface and you can just um you know forget about it. Now you might be

wondering like there are other wrappers also available for example um uh lang chain also provides uh such kind of functionality. You would want to answer

functionality. You would want to answer this question that why MLflow? Now to

this question I would probably say that MLflow has certain other features for example experimentation, prompt management, model management which your other tool might not uh be having. So it

can be integrated into your AI experimentation and prompt evaluation and all workflows quite neatly. So that

is one of the plus point of this tool over others. Okay. So we every tool has

over others. Okay. So we every tool has uh their merits and dearites and we can create a separate video uh covering that only. So when I provide the endpoint's

only. So when I provide the endpoint's name, why do I have to provide model?

Because you can't have an empty endpoint. You need to provide at least

endpoint. You need to provide at least one model. Okay. So what I'll do is I'll

one model. Okay. So what I'll do is I'll come here and I'll say uh let's say my uh in my chatbot uh endpoint. I'll say

chatbot endpoint.

Okay. Now this chatbot endpoint can have multiple models. Okay. Now let's start

multiple models. Okay. Now let's start with open's model. So I'll say openi openai. Now here in the models I need to

openai. Now here in the models I need to select which model I will be using. I'll

be using one of the cheapest models. So

I'll I'll come here. 40 mini is the one that I feel uh uh at this moment is the most u uh cost uh uh like friendly in

terms of so and and you not only not only get model names but also what features they support like uh GPT4 mini supports tools caching and structured

outputs and it has maximum input token of 128k and it has uh cost and input and output tokens costing. So you can see

you have got $0.15 for per million input token and 0.6 um dollar for per million output tokens. So I will select this

output tokens. So I will select this GPT40 mini. Now uh I can I can create a

GPT40 mini. Now uh I can I can create a new API key or use an existing one.

Okay. So since I already have a API key so I can select that one. So I will uh automatically it will uh display only the open a open AI's API key. And you

might recall that we created two API keys but Germany one are not being displayed here. So I'll select this and

displayed here. So I'll select this and now I can go ahead and create this. So

it does certain things like it it will mask your API key so that uh those will those will not be displayed here. Okay.

So I'll go ahead and create this. The

moment I create this now if I go inside the endpoint I'll got this I'll get this uh chatbot endpoint. And if I open this, you see there is this thing called

priority. Now what is this priority? Now

priority. Now what is this priority? Now

priority lists down all the models that you want to basically use. Okay. So

currently this priority has only one model and there is one more priority and this priority is basically a fallback.

Suppose you have two um you your API keys expire or maybe because of rate limiting um a given endpoint uh a given model fails. So in that case your

model fails. So in that case your priority uh two will be executed. You

can add a fallback and the process of adding fallback is just like um you know previous. Now uh within the priority one

previous. Now uh within the priority one I can have a traffic splitting. So let's

let's do one thing. Let's create one more model within this uh first one. And

here in the uh in in in the model two what I'll do is I'll uh I'll select this time Germany and here let's uh use uh

Germany's 2.5 flesh and we have got costing here as well. Uh so I'll just uh and and here the maximum input tokens

are massive 1 million input tokens we can have. I've selected I I can use the

can have. I've selected I I can use the existing API key and I can use uh this Japanese API key and I can also specify the weight. Weight would mean um that I

the weight. Weight would mean um that I how much traffic splitting I want to do.

Now MLflow will handle all the load balancing between these two models.

Okay. So what I can do is I can have something like I want to uh I want my 50% of the traffic to go here and I want

50% traffic to be um to be um basically handled by open EIS model. Now here uh you can think of uh this from a very you

know broader perspective. If you want to evaluate a given model in a real uh production scenario, you can possibly have a setup where you want to uh

evaluate each and every output and in fact you can connect this directly with uh your experimentation and in experimentation what you can do is you

can tie these you can add tags to each run each trace and that tag would would tell you uh which basically model was used to generate uh this output and then

you can basically evaluate which of uh the model is being um is is correctly giving you the output. So now

what we can do is we can simply save these changes. Okay. So so now our uh

these changes. Okay. So so now our uh endpoint is ready. Now we have got two providers here and uh and also u we have got two providers and we have got uh use

by and all these tags are there. But

what I want to do is I want to use this model, right? I can use it. So I can

model, right? I can use it. So I can come here on the top right and I can click on this use. So MLflow is providing you with this uh URL since

currently my server is running on uh this one on on on locally. So I I'm getting this URL forward/gateway

and then name of my endpoint. This is

the name MLflow invocation. Now you see I'm not interfacing with OpenAI or um GMAN anymore. Now I'm interfacing with

GMAN anymore. Now I'm interfacing with uh directly the MLflow's uh URL and that URL uh when I hit it will take care of u

calling the appropriate model. Okay,

let's uh copy this and let's come here to the uh terminal and let's open a new terminal. So what I'll do is I'll just

terminal. So what I'll do is I'll just paste uh the curl request uh curl command here and I'll try to hit enter.

I'll wait for some time and it should give me the response and it is giving me the response. If you see

the response. If you see um it used Germany model it is telling you that it used Germany model. Why it used Germany

model? I'm not sure. Let's run this

model? I'm not sure. Let's run this query again.

This time it's used GPT4 mini. You see

this uh when we use GME 2.5 uh flash um we were getting hello I'm doing well this this this and then here uh I'm getting hello I'm just a program and

blah blah blah okay so you get this point uh the only interface that I have to remember is this and uh the mlflow is going to take care of uh the traffic

routing for you okay all right so I think that would be it for this video but um what I would like you to do is uh you can do pretty

interesting thing in your production application if you are uh using MLflow heavily you can um you can add add fallbacks obviously if let's say these

two both of these models fail then which model so in production you don't want a user to hang in between uh your uh you know uh request so you would want a

fallback uh mechanism with which uh those models can Okay. Um called now two important features which are not yet uh uh there are called uh usage tracking

and rate limiting. These two features will be coming soon and I will be covering um these also when they as long as as soon as they arrive. So uh in this

video I just wanted to showcase uh how you know you can use MLflow's u AI gateway feature and um how you can create APIs and how you can create

endpoints and uh split your traffic between uh those. Okay. So okay. So

thank you so much for watching this video. Let me know in the comments what

video. Let me know in the comments what you think and if you have any feedback please uh do give me that feedback.

Thank you so much. Have a nice day. So I

have a very simple chatbot like app here. If I send messages to it, it will

here. If I send messages to it, it will respond. I can ask various type of

respond. I can ask various type of questions like how is the weather and it will respond. Now in this video

we will take this simple flask chatbot that calls an LLM and we will add real observability step

by step. And I'm not talking about just

by step. And I'm not talking about just logs I mean tracing latency metrics token usage error tracking saving

request and response artifacts. So we

can debug issues like slow responses and random failures.

So I hope you are excited for this video. I have set up most of the code

video. I have set up most of the code already so that I don't spend time on setting up the chatbot. Chatbot is

ready. We will just be first of all reviewing the chatbot functionality chatbot code so that you understand what we are going into. So uh it's a simple

flask based application. Here we have two routes. One is home and another one

two routes. One is home and another one is uh for handling the response. Now

this u route takes cares of uh the openi calling and this home route takes cares of the uh rendering of index html. Here

in the index html I've got simple div setup. You can see we have got this app

setup. You can see we have got this app and we have got history. This history

div would hold uh the historical conversation and this send will hold uh the input and the button. Okay. And when

we click the button um this handle click is going to handle the calling of this API. Now uh the way it works is we

API. Now uh the way it works is we create some bunch of variables for message and history and we store uh the context using this message uh previous messages and after each click we

basically create these uh JSON like structure and we push this into previous message and when we are uh so after that we append the history and then um after

appending the history we will uh we are just fetching the response which exist here and while fetching the response we are passing the user users message and the context. In fact, users message is

the context. In fact, users message is not even required. We will be using the context to basically uh call the API. So

you can see here we are getting that context value from the data that uh we receive from the uh API request and we uh try to call the API. So this is how

the uh current uh chatbot looks like.

It's a very simplistic chatbot. The idea

was to show you how you can add observability within an existing application.

Now let's talk about the baseline problem that we have. Okay. The baseline

problem is we don't have any visibility.

Visibility into what? Suppose I send a message. Okay. I send a message and

message. Okay. I send a message and suddenly uh those responses are very slow. Okay. So right now our app is

slow. Okay. So right now our app is working but if it becomes maybe slow or fails in production we have no idea now

why it is happening. Um

maybe the LLM is slow was our app code slow or was it because of some exception? We don't have any idea. This

exception? We don't have any idea. This

is exactly where observability exists.

So we can answer what happened very quickly. Okay.

quickly. Okay.

All right. So, uh how are we going to proceed with it? Let's let's uh first of all open the application code.

Uh how so what what is the first thing that we want to do? We want to add a request ID for correlations so that uh

each uh request can be tracked. Okay. So

every user message gets a unique user ID not sorry not user ID but request ID. So

you can find that exact trace or run later. Now how are we going to do that?

later. Now how are we going to do that?

The first rule of observ observability is correlation. That is correlation with

is correlation. That is correlation with so your responses have to be correlated somewhere. Okay. So if a user says uh my

somewhere. Okay. So if a user says uh my request failed, you need one ID uh that connects browser to the back end and by

back end will be connected to the LLM call and LLM call will be connected to logs and traces. So we will add a request ID and return it to the front

end. That is uh the idea. So let's uh do

end. That is uh the idea. So let's uh do that. So first of all we will be

that. So first of all we will be importing a package called UU ID. Now

this uyu ID basically gets us some you know unique identification. So we'll be doing that. So inside the response what

doing that. So inside the response what we can do is we can do something like uh request

id is equal to uh what we need to do is

we need to get a request dot headers.get get and inside here I

dot headers.get get and inside here I will say say x request id

and if it is not present I can say something like st uyu id dot

uyu ID4 okay so I just I just get a id and I uh try to extract it uh from this request Okay.

Now I need to return it where I can return it here. Right. So right now I'm sending the uh assistant um sending the message uh with only one key. So I will

also add a request ID and here I can have a request ID parameter that I am generating here. So

the idea is that it will try to extract it first from headers. If it it is not able to find it, it will just uh create one. Uh so if if if it returns if this

one. Uh so if if if it returns if this get operation returns none it will just set it and then we are sending it back.

Now what do we need to change um in the front end? So if I go inside this um

front end? So if I go inside this um this our uh index html in the index html now I need to basically

uh pass a request ID. So how can I do that? So to be able to do that what I

that? So to be able to do that what I can do is I can generate a random ID here also.

So I can use something like uh crypto library.

Uh okay. So now what we need to do is uh

okay. So now what we need to do is uh after adding this uh into uh flask uh app.py

uh app.py next step would be to uh send this request uh uh ID from our

uh from our front end. So for that we can use crypto um package in the JavaScript. So I can do something like

JavaScript. So I can do something like crypto dot random uyu ID and then I can

save this in a request ID variable.

Okay. So after we save this what we need to do is we need to uh okay sorry in the I need to define it uh per session per

uh request. So I will just

uh request. So I will just um clear this and I will create this variable here. After we have the request

variable here. After we have the request ID, what we need to do is we need to

create a header key called X request ID and I can pass this request ID here. Okay. After

this step, what we can do is uh within this then when I process the data, what I can do is I can

process uh the uh text. So basically uh right now I'm just uh passing this uh as assistant message but I'm not uh passing

any request ID here. So I can I can also have a request ID embedded in in between. Okay. So um in the assistant

between. Okay. So um in the assistant element dot inner HTML what I can do is within here I can create a simple uh I

can embed a simple HTML tag called small.

Okay. And within here I can have the request ID and I can pass dollar here and uh I can pass request ID here.

Okay.

Uh after this step is over, now what we will have is uh we will have uh this request ID um for every response. So

every response comes with request ID and uh we would want this request ID to be used somewhere. But first let us see uh

used somewhere. But first let us see uh how how it looks like in the dashboard in the app. So hey uh let's send the message. So the response comes with a

message. So the response comes with a request ID. Now let's let's pass some

request ID. Now let's let's pass some different message maybe um uh what is the capital what is the capital of France.

Okay. And you see this uh request ID is different than this request ID. So so

each response comes with a unique request ID. Now uh we want to um go to

request ID. Now uh we want to um go to uh this uh uh uh this uh this uh like this server

code again. Now what we have to do is we

code again. Now what we have to do is we need to install MLflow. MLflow and all I have already installed. So uh you can check out uh my other videos where I um

showed the setup of MLflow and uh other packages.

So what I can do is I can um go ahead and uh start the MLflow server. So I

what what I can do is I can do something like MLflow server.

Okay. So it's saying uh the port is already in use. So what I can do is I can specify port here

maybe 5,0001.

Now it should run and let's navigate to the UI 1271.

So now you can see we have got uh the uh MLflow UI, we have got experiments, we have got models, we have got prompts and AI gateways. Currently no experiments

AI gateways. Currently no experiments exist. We'll be doing that um in a

exist. We'll be doing that um in a moment. So now what uh we want to do is

moment. So now what uh we want to do is we want to create our first monitoring signal and that is latency.

So the idea is that uh we want to log the latency total essentially total request time and latency for the LLM.

Okay. So um what we want to do is uh we want to uh basically track the latency of the model uh sorry not the model but

the request. So how can we set that up?

the request. So how can we set that up?

So what we have to do is uh within our server we have to import the mlflow

and after importing it we can so now after importing this uh mlflow we need to set the uh experiment. Okay we can

either get these uh details like experiment name and uh mlflow tracking URI from either environment variables or we can set them manually. So what I can

do is I can do something like set experiment here and here in the experiment uh what I can do is uh I can name it whatever I

want. I would I will name it maybe

want. I would I will name it maybe something like flask chatbot observability.

Okay. So this is the experiment that we have and how do we measure the total uh request time. So I have already imported

request time. So I have already imported the time module here. What I can do is I can do something like start time and I

can do something like uh time dot time and after the request uh ends what

I can do is I can do something like uh end is equal to time dot time and after this we can log these metrics. So we can

do something like mlflow dot start run and within here I can do something like

one second mlflow sorry in the start run I need to pass uh oh sorry uh with start run I

need to start with uh uh this as start run. Now MLflow dot log

run. Now MLflow dot log log matrix sorry and within log metric I will name it as latency

total s or maybe total and I will pass here and minus

start or maybe we can we can do something like total latency. We can

define a variable called total latency and minus start total latency. Okay. So I I will log

total latency. Okay. So I I will log this total latency here and uh we can we can we can log so so

usually in production systems you can log total request time because uh in production uh scenarios you won't be just making one LLM call you will be

having some database calls some uh Cosmos DB or maybe Dynamo DB calls so you would want to log all of that but let me show you um this in action okay

quickly. So I will just uh refresh this.

quickly. So I will just uh refresh this.

Okay. Now let me also refresh this quickly. But before uh Okay. So before

quickly. But before uh Okay. So before

this let's send some requests. So what I will do is uh let me see if my server is up and running. Yes. So I will say hey let's wake the llm up.

What is the weather? Let me see. Okay. I I think I

weather? Let me see. Okay. I I think I got some error. What is the error about?

It's saying node name the above node name is not found. What is this error about?

Let's see. I think it is not able to find something although our server is running. Yes. And do we have ML runs?

running. Yes. And do we have ML runs?

Yes, we have the ML runs. Why exactly it is failing?

Uh what what it is not able to find it's saying node name or cell name not in my previous videos we explored MLflow

and understood how powerful it is for experiment tracking model management and building production ready ML systems. But in real world companies, MLflow is

rarely used in isolation or in local host. It is usually deeply integrated

host. It is usually deeply integrated with platforms like datab bricks or maybe Azour.

And this is where things become incredibly powerful because datab bricks gives you MLflow built-in, no installation, no setup, and even no configuration. So in this video, I'll

configuration. So in this video, I'll create a brand new datab bricks account using the free tier, show you what compute resources you get, what you can do with the free account, and most

importantly, how MLFlow is tightly integrated into databicks. We will

create a notebook, run an MLFlow experiment, log parameters, metrics, and models, and then see exactly where everything appears inside datab bricks.

I'll also show you how collaboration works on datab bricks. how to add users, create compute clusters, and assign compute to different users. By the end

of this video, you will understand how MLFlow actually works in real enterprise environments.

So, I hope you are excited. Let's get

started.

Okay. So, the first thing that we want to do is we want to create and set up our account on databicks. So, I will navigate to databicks.com

and I will navigate to the login page.

uh under the login page we will have to create a new account and for that you can use your Gmail account or any other uh email emails

that you might be having access to.

Okay, so let this page load.

It is taking surprisingly low amount of time but let's wait. Okay, so I don't want to use my existing account. I'll

just log out. And uh now uh you can see I'm getting several options. I can

continue with my email or I can continue with Google or I can continue with Microsoft. I'll I'll choose this option

Microsoft. I'll I'll choose this option continue with Google and I will try to use a brand new account which I haven't used uh before.

So I'll I'll select this one.

Now it will ask me uh bunch of clarification and confirmations. So I'll

just continue because I trust datab bricks.

Now after this step is over, it will try to ask us to set up some uh I guess clusters because um

because essentially datab bricks runs on cloud, right? So you can attach your uh

cloud, right? So you can attach your uh existing cloud or you can uh basically work with Okay. So you can see what will you use

Okay. So you can see what will you use datab bricks for. So you can use it for uh personal use or you can use for u uh

basically work. Okay. Now uh you can you

basically work. Okay. Now uh you can you can uh get a free addition or you can just start a trial with express setup.

So I'll select the start trial with express setup.

And let me also try changing my Wi-Fi.

One second.

Um, okay. So, I'll just uh I'll just change

okay. So, I'll just uh I'll just change my Wi-Fi really quickly. Give me a sec.

Yeah.

So, yeah. So, now it is basically going to ask me uh the location because uh in the back end datab bricks is going to set up certain things like servers, compute options and all. So, I'll just

select my nearest location. Maybe let's

uh select India for now. I'll continue

with it.

It's saying uh solve this puzzle.

Obviously, it will uh make sure that I'm human. Okay. But I am human. So, I'll

human. Okay. But I am human. So, I'll

just click submit.

Verification complete. Now, let's

wait for it to complete. Now in this 14 days trial uh it is going to give you some you know accesses like you can have

serverless compute. It will be a premium

serverless compute. It will be a premium um kind of access and you can essentially create u uh notebooks create

uh compute instances all those things you can create. So this is the interface that you get. Now uh what I want to show you in this video is uh I want to show

you this lower um left section because that's where MLflow features are integrated. Although MLflow is tightly

integrated. Although MLflow is tightly integrated with all of databicks but if you come here you will see uh these

experiments you will see these features models and serving and even the you know

UI also you can see uh the UI uh is also basically completely taken from the MLflow's open-source uh models

which I had been showing you uh for the past uh couple of use. So um you as you can see you can go to experiments and you can see all those experiments that

you have created. Now the question arises how do you uh log things here because in the local setup of MLflow what you used to do is you used to

create runs or create experiments in your uh maybe scripts or notebooks. So

you can continue with that here as well.

I want to first of all um go to this compute option because uh to be able to run your script locally you had local python installed but here uh you will be

relying on the compute provided by datab bricks. So I'll come here to the compute

bricks. So I'll come here to the compute and on the compute you can see serverless starter warehouse. Now this

warehouse has been provided uh for free uh for at least 14 days. You can try it out. So this warehouse is something that

out. So this warehouse is something that you can attach. Now what does it mean to attach it? So whenever you are running

attach it? So whenever you are running some script or some uh notebooks, you can attach this uh serverless warehouse.

So I'll just come to my workspace. Now

workspace is basically a container of all your files and notebooks and everything. Okay. So I'll come here and

everything. Okay. So I'll come here and I'll just uh go to create and I'll select this notebook option.

Now once I select this so you get get this really nice and sleek uh notebook interface which I really love because you see it it's quite userfriendly and

it looks u you know minimal and professional.

Okay so to be able to log things in MLflow obviously you would need MLflow support. Okay. So I I would like to

support. Okay. So I I would like to import MLflow. Now to be able to run

import MLflow. Now to be able to run this I have to attach the serverless compute. So I'll just uh click on

compute. So I'll just uh click on serverless GPU. And now I can be I can

serverless GPU. And now I can be I can basically uh run this script. So let me run this uh let me allow this

uh one second.

Let me somehow uh this is not loading. Yeah. So

now it's connected. So now let's um import the MLflow.

Okay. So our MLflow is loaded. Now um

the workflow the common workflow is that you uh have a training script using maybe any package such as skarn or maybe pytorch and you'll log things right but

for now we will just simulate um uh whatever we could and also you see this uh uh autocomp completion that uh datab bricks is offering. So AI is integrated

coding assistant is integrated within datab bricks. So you can just use it. Uh

datab bricks. So you can just use it. Uh

but for now I'll just not use it. So

I'll just manually type start run as run. And now I will just one second. I

run. And now I will just one second. I

just ran it which is going to throw an error. So I what I'll do is I'll just

error. So I what I'll do is I'll just log a metric. Okay. And I could have optionally also created a experiment.

Let's create a experiment in the uh top cell. So I'll MLflow dot create

cell. So I'll MLflow dot create experiment. I'll name this experiment as

experiment. I'll name this experiment as datab bricks YouTube demo.

I'll just run this.

And here it's saying uh an experiment name must contains an absolute path. Oh,

I see. Uh I forgot it. So I'll just accept it and run.

And here in place of this uh one second I'll just let's run this and let's see parent

directory does not exist. I think we need to grab the directory here. Uh

somehow it's not working. So I'll just do one thing. I'll import OS and OS dot get current working directory.

Okay. So workspace. So I'll just grab this email this path and I'll paste it here.

Okay. So now if I run this.

Okay. So I should be able to get the experiment. Okay. So now uh after the

experiment. Okay. So now uh after the experiment has been uh created we will be using that experiment and now let's log something. So I'll just try to log

log something. So I'll just try to log some parameter called theta.

I'll log it as 360. And if I run this now uh the moment I run this uh start run it automatically detects it and it basically is telling me that uh you can

go into the experiment in MLflow to basically see this run. So I can go into the experiment. I can open this in new

the experiment. I can open this in new tab and under the experiment I would be having my experiment datab bricks YouTube demo

and as in the local setup you might have seen if you have seen uh if you have watched my earlier videos uh it asks us to basically uh check the experiment type. So I'll

just go ahead with machine learning. And

now in the runs. One second. Why no run is blocked.

Okay. One second.

If I come here.

Oh, I see. Uh I got it. So what what happened was when I when I started this run I did not uh basically uh provided

the experiment ID here. So and that is why that run was logged into a default uh run. Okay. So if I run this now

uh run. Okay. So if I run this now and now if I refresh this we can see in the databicks YouTube demo we are able to see this running. Okay. So this is

not a MLflow tutorial. This is a tutorial about how you can use data bricks to do experiment tracking to do whatever we have been discussing for the

last uh you know bunch of lectures.

Now I want to talk about briefly some of the uh I feel which are cool features of uh datab bricks managed. First of all,

uh you can you can do collaboration um in a very seamless manner and every thing about user management and all is managed by uh datab bricks. Now this can

be done even with the local MLflow setup but the uh but the problem with that is uh there is lot lot of overhead and operational uh expense. So what you can

do is you can come here and you can navigate to settings and under the sec settings you will see this identity and access. So currently uh from the email

access. So currently uh from the email that you have logged in and set up this account you are kind of a pseudo user.

So you are a admin. Now you can manage users you can manage service principles.

If you have worked with cloud, you would understand that ser service principles allow you know applications to talk to uh softwares in a you know uh and it it allows to authenticate those

applications uh in a automated manner.

So I'll just come here and in go to users. Now what I can do is I can add

users. Now what I can do is I can add users by their email. I can invite them to basically um uh you know work on this workspace. And not only that I can have

workspace. And not only that I can have multiple workspace. I can create more

multiple workspace. I can create more workspaces. Okay. So now if I go to

workspaces. Okay. So now if I go to users and if I go to add new, I can type out the email. Let me uh type out one email.

Maybe uh I'll just type out my personal Rahul.

Let's set the rategmail.com.

So if I add it, okay, I think uh since with this account I already have got one um uh one databicks account set up. So that's why

it's not a new user. So you get the idea. The moment I uh sent out the mail,

idea. The moment I uh sent out the mail, I will get uh this kind of u uh invite.

One second. Let me show you.

So this is the kind of invite I will be getting. Join your team on datab bricks.

getting. Join your team on datab bricks.

This person is inviting you and if you click on join, you will be able to uh join them on their uh basically uh on their workspace. Okay. So um with

that said uh these are the things that you want to do. Obviously the account that I have set up is uh basically comes with a 14 days trial. After 14 days you

won't be able to access anything but yeah so uh one important piece which I wanted to discuss was this user management and then there is this

compute option. Now there are in the

compute option. Now there are in the compute option you have got uh options to create computes. So you you can have let's say if I click on this manage what

you can do is you can manage uh access policy for the compute okay you can you can navigate to this compute and you can create a compute so I can go ahead and

create a SQL warehouse and in the SQL warehouse I can select the size and size directly uh is linked to the performance of the server or in the in the datab

bricks and snowflake terminology these are known as warehouses but they are essentially compute instances Okay.

Although warehouses uh when you when you hear the term warehouse you might think it's a storage thing but no it's not a storage but rather it's a it's it's compute cluster you can say and you can

say here scaling uh you can be you can set um what will be the minimum number of instances what will maximum number of instances you can set the uh cluster size and things like that. Okay. So you

can create it not only create let me create it uh really quickly. small

instance and after creating it you can attach it uh to some user so I can I can manage permissions so I

can uh maybe do add add people uh since currently I don't have any users added in my uh workspace uh I I'm not look I'm

not seeing any results here but uh you you you get what I'm trying to say that the moment you add one user you can basically Okay, add them and uh give

them access to this compute and they can then use it. Now the next thing after talking about users and compute, next thing that I want to talk about is how do you install packages. So if I come

here in the workspace, if I come to my um you know untitled notebook. So

currently you can you you were able to import MLflow because MLflow is tightly integrated already with uh datab bricks.

But what about uh maybe uh other packages that you might not have access to. So for those packages what you could

to. So for those packages what you could do is you can in insert the code and you can do something like pip pip install. So for but first let me let

pip install. So for but first let me let me show you what happens if I try to import open AI or maybe from open AI

import open AI client. Okay. Okay. So if

I run this, it's saying module not found error.

Although if you if you notice um something interesting is happening. The

moment I try to import openi, it is suggesting me to basically do this openi.lo

openi.lo so that all the openi traces will be automatically logged. Anyways, so let's

automatically logged. Anyways, so let's install this package. So if I run this, it will be installed on my uh serverless compute

and after that I should be able to use it. So let's now run this and now I'm

it. So let's now run this and now I'm not getting any error. So I I should be able to now load the uh not load but connect to the OpenI client and I should

be able to use it. Okay. So there are tons of features. Um it is very difficult to cover all of them in uh 16 or maybe 20 minutes video but I want to

um I wanted just to give you a uh you know taste of what is possible with uh manage datab bricks. It's awesome if you don't if your organization or you don't

want to manage um all the setting up of MLflow locally or maybe deploying that uh opensource MLflow it's wonderful. So

yeah so I think that that would be it.

Thank you so much for watching this video. There would be lot of interesting

video. There would be lot of interesting stuff which are coming um related to MLflow and data bricks on my channel. So

stay tuned, subscribe for more content and let me know if you want to um see any specific topic gets covered. Thank

you so much. Hi everyone. Let's start

talking about experimentation and runs in MLflow using datab bricks. Now before

I talk about experiments and runs, I would like to talk about few convention changes that we follow not follow but rather we have to uh follow these uh

conventions because we are now on data bricks.

What happens usually what is your workflow when you are working as a solo developer uh on MLflow is you install MLflow on your uh local machine using

maybe terminal and then uh what you do is then you try to uh start the MLflow server and you try to create some um

backend store and artifact store. Now

there are a couple of options. You can

either set these stores as uh local location or you can set them to remote location. So you can have a SQL database

location. So you can have a SQL database that you can set for backend store. And

for those of you who don't know what backend store and artifact store are to summarize, a backend store stores all the experiment and run metadata whereas

artifact store large files such as uh large models and python files and things like that. Okay. So what usually we do

like that. Okay. So what usually we do when we are doing MLflow locally is we try to set up the artifact store to

either local location or some uh remote object storage such as uh S3 or um or maybe Azure blob storage. Okay. So with

that said let's get started with the MLflow on on the datab bricks. So um I have to come here and on here I see this new option. I can click on it and I can

new option. I can click on it and I can create create all um all the different bunch of things that I wish to do on datab bricks. Now datab bricks is a very

datab bricks. Now datab bricks is a very uh you can say complex and uh comprehensive piece of software. So

rather than covering all of that um in one video let's go ahead and create a notebook that we want to uh use for our MLflow demonstration. So I'll just

MLflow demonstration. So I'll just create this notebook and I can come here and I can rename this notebook and the UI is quite um uh minimal and seamless

and it's it works quite uh wonderful. So

what I'll do is I'll create this uh notebook as experiment and runs.

Okay. So before um before you run anything you need to attach uh the uh server on um not trust server but rather a compute instance that uh you will be

using to uh run this notebook. So I will create I will select this serverless option and I will select the serverless GPU. Uh this seems like a some sort of

GPU. Uh this seems like a some sort of bug uh wherein uh if you just keep clicking on it, it won't start. Um but

it will throw this error. But now you can see it has started because we we see this green uh icon here. Okay. So uh

what we do is uh we we import MLflow and after uh the workflow is that after importing MLflow you uh start one experiment. So you can have an existing

experiment. So you can have an existing experiment or you can create an experiment from scratch. So let's uh navigate uh to the left and let's uh navigate to the experiment. So I I'll

open this experiments tab in a new uh tab and uh here I see new experiments uh got created right now. So what I can do

is I can set uh create one experiment.

Now here something is different than what we do in uh in the uh that uh local setup. So in the local setup what we can

setup. So in the local setup what we can do is we can simply create something like demo experiment. We can name anything that we wish to uh do but it

won't create the experiment rather you need to provide the full path of this uh full path of where this experiment would lie. Now how do you get that path? So

lie. Now how do you get that path? So

what I can do is I can import the OS package and I can print my OS current working directory and this would give me um give me the path that I have to

basically append before it. So I can just get rid of this and I can grab this uh uh string that I've got. So I'll just copy it from here. And then after this

after this whole email what I'll do is I'll um I'll add the uh demo experiment here. And if I run this, it should

here. And if I run this, it should create the experiment. It will first try to locate uh experiment with this name.

If it is not able to find it, it will just create it. Let's wait for a couple of seconds uh for it to run. So you see this, please read through these logs because these are very important logs

and when things go wrong, you need to um debug those. So if you come here, you

debug those. So if you come here, you can see that experiment with name this does not exist. creating a new experiment and after that it has created

one experiment and it has basically set this thing called artifact location. Now

what exactly is being used for artifact?

Artifact is essentially a you know a store for large files and models. So uh

what databicks has done is it has created this location uh for uh artifact storage. Okay. So now what we can do is

storage. Okay. So now what we can do is we can use those experiment now that um this experiment has been set what we can do is we can simply come here and we can

log all um all type of things here. So

what I can do is I can do something like with MLflow and you can see AI is starting its work.

So it is basically um giving me some sort of suggestion. Now I after starting this work I don't need to provide the experiment although you can provide it by providing the experiment ID. Now

where do I get this ID from? You can see from the previous run also you get this ID or if you don't um uh if you don't uh like if you are not uh available if you

don't have this output ready what you can do is you can come here into in into the experiments uh tab in uh datab bricks and after that you can click on

it and you can grab the experiment ID uh from here. So there are a couple of ways

from here. So there are a couple of ways and also yeah so this is important piece of uh choice that you need to make. So

in MLflow experiment can be of either two types. It can be geni based

two types. It can be geni based experiments or traditional machine learning based experiments. For now I'll just click on this.

Now uh within the experiments you can have multiple runs. Now where do you get where where would you get the experiment ID? So if you come here to the URL after

ID? So if you come here to the URL after the experiments you see this number.

This is your ID. So the from here also you can get uh your ID. Okay.

All right. So let let's now move on. So

you can uh specify the ID here. And you

can also specify the run name. So I'll

I'll uh specify the run name as first run.

Okay. Or rather first test run. And

after this uh this uh would start the MLflow's uh run context. And within here we can log all sorts of things. what

kind of things we can log. I have

covered these uh things in detail in my previous MLflow um MLflow videos where we did MLflow

locally. Uh now what we can do is we can

locally. Uh now what we can do is we can uh test this out really quickly. We can

log params and in the param uh we need to provide the key value pair and if you have a dictionary ready you can provide that dictionary and it will create all the

key value as the parameter it will log all the key value as the parameter. So

um now in the mlflow.log params what I can do is I can do something like maybe you would want to log theta so I can have that as either a string or uh

integer. I can also do log params. In

integer. I can also do log params. In the params, I can provide the dictionary. Uh I can I can say something

dictionary. Uh I can I can say something like alpha and alpha could be something like maybe 0.33 and I can have something like beta

and uh I can have maybe 12. Okay. Now

after logging params, I can log metric.

And as you can see uh the built-in AI autocomp completion of datab bricks is giving me all sorts of suggestions. So I

can uh I can obviously um if if I wish I can uh basically hit tab and I can uh I can basically uh use that. Okay. So

I I'll I'll just uh log the accuracy and then finally I would like to show you how you can log the artifact. Now since

artifact u is something that is usually considered uh you know large compared to params and metrics so artifacts are models and files. So now before logging

the artifact you also should be having artifact right. So let's create the

artifact right. So let's create the artifact here. So if I click on it I can

artifact here. So if I click on it I can come here and create and I can create a text file. This will open the text

text file. This will open the text editor and I can create all sorts of text editors here. So let me create a python file here. So what I'll do is

I'll I'll name this file as sample sample.py and here I'll just add bunch

sample.py and here I'll just add bunch of things. import OS OS dot or rather

of things. import OS OS dot or rather print OS.get

print OS.get CWD. Okay. Now

CWD. Okay. Now

I can use this file to basically log the artifact. Currently the file is not

artifact. Currently the file is not being displayed here. So I'll just uh refresh it and if you can see my sample.py is here. Now what I can do is

sample.py is here. Now what I can do is I can come here and I can log my artifact and in the artifact I can provide the path of the file where it exist. I can

either provide the full path or I can if I am in the if I'm running this notebook from the same directory I can just uh provide the name of the file. Now if I

hit enter it should create uh this run in the experiment called uh this demo experiment. Now notice here uh

this nice thing about uh that datab bricks is that after u running it is basically giving you the direct link to access the experiment. So I can click on

this right here and I can close the previously opened one. And once uh the experiment is opened I'll I'll see the

run first test run. If I open it here I should see all the params and logs that I created. So we see accuracy

we see the uh parameters and if I come to the artifact I will see the uh sample.py PI file being logged as an

sample.py PI file being logged as an artifact here. So I can open it and I

artifact here. So I can open it and I can inspect it. Okay. So uh I think that would be it for this video. In the next video we will be uh looking at how we

can uh do manual logging and auto logging using MLflow and using Skaler. Thank you

so much.

Hello everyone.

In this video, we will be looking at uh auto logging and manual logging. And in

fact, manual logging is something that we already kind of looked at because uh in the earlier discussions, we we saw how we can uh log things uh using

mlf.log and then whatever we want to log

mlf.log and then whatever we want to log uh that name. So log artifact, log params and log metric that we have already seen. So that is a manual

already seen. So that is a manual logging. Okay. Now imagine that you are

logging. Okay. Now imagine that you are basically training a model using skarn and you have to log using this method.

It will work if you want uh to have more control over uh your you know logging functionality. But then MLflow comes

functionality. But then MLflow comes with uh a flavor specific uh auto logging functionality. And when I say

logging functionality. And when I say flavor specific auto logging, what I mean by that is uh different frameworks have different kind of things that have to be logged. For example, uh the things

that are going to be logged for uh sklearn models will be different compared to let's say pytorch based models. So that is why MLflow comes in

models. So that is why MLflow comes in uh various flavors of autologging. Now

let me show you how uh we can do we can basically take advantage of the auto logging. So before um doing anything you

logging. So before um doing anything you might want to set up this experimentation and all. Uh if you don't know you might have to watch uh the

earlier discussion that uh we had. So um

here if you see uh we are logging things manually. Now u I need to get some sort

manually. Now u I need to get some sort of training script. Okay. So what I can do is I can come here uh skarn uh

basic training example.

Okay. And it should give me some uh some examples for uh the training. So I will just uh grab some examples uh from here.

Let's see. Let's uh use the classification classifier comparison.

Not here. I guess these are uh I think this example I can grab. So what I'm doing is I'm loading the iris. So I can just grab this. So now if I just uh come

here to the experimentation and run and if I run this, you see all of the packages are pre-installed. So it will be uh already in um involved. Now you

might be thinking that how are they already included. So you can come here

already included. So you can come here and you can click on this uh button and you can uh click on the configuration and inside the configuration you see

this dependency. So you will get this uh

this dependency. So you will get this uh pip requirements file and inside this p sorry not here but rather here. So what what all you need

rather here. So what what all you need to uh uh what all things are already installed for you you can uh come here and you can look at that and if you want

to add more uh things you can basically add uh uh add more packages add you can say add uh you can even add your requirement txt file. Okay, let's let's

come to the topic. So what I can do is I can one by one run this uh script.

So let's run this and let's load the data.

Let's run this. Yeah. And after that we can run the third step that is train test split.

And um we can we can uh fit the model and notice here currently I'm not concerned about logging here.

Okay. So now uh if you see I I have u achieved the model accuracy of 1.1. Now

let me navigate to the experiments. So

here is my experiment. Let me go to the experiment. Let me see if I if I don't

experiment. Let me see if I if I don't have any new run. Why? I I was explaining the auto logging, right? But

why it is not auto logging? What is

happening here? So, let's do one thing.

Let's run all these cells again. But

this time what I'll do is I'll do mlflow dos skarn dot auto

log auto log on one second auto log.

Okay. So let's run this.

Okay. So now auto logging is has been enabled. Now let's run this again. Now

enabled. Now let's run this again. Now

it should log right. Let's see. Okay.

So, we are uh we should be seeing model accuracy as one and let's after this step.

Okay. So, we have got K nearest neighbor classifier and we have got model accuracy. Let's let's refresh this page

accuracy. Let's let's refresh this page and let's see if we have got any more U run or not.

Okay. So, we are seeing one run, right?

So if I open this uh it has created a run called efficient crane and inside it you see a lot of information like you have got training

accuracy you have got testing accuracy you have got um these params like metric param and jobs and things like that.

Okay. So if I go to model matrix, you will get all these model matrices. You

will get you will get artifacts also logged. When I say artifact, you will

logged. When I say artifact, you will get the model. So entire model is packaged into a format called MLflow model. You will you will got

model. You will you will got estim.picle.

estim.picle.

So uh model can be reproduced using different uh you can say flavors. You

have got um requirement.txt txt files uh to reproduce uh this these things you you you will get get model.pickle pickle

file you'll get. Mameml file. So all

these things you are getting when you when you get when you do this auto logging but currently you don't have any control over run. So run gets created

automatically. Now what do you think

automatically. Now what do you think what uh you need to do to enable this um you know run functionality. So what you can do is you can take this entire

thing. So let me not entire thing but

thing. So let me not entire thing but sorry uh let me do one thing. Let me now with mlflow.st start run and inside here

with mlflow.st start run and inside here I'll I'll pass the run name as um

uh maybe manual uh auto logging run auto logging and then here what I can do is I can do model fit again let's try that

and then after that let's try only model fit nothing else so if I run this let's see what happens so here I'm not uh doing anything. All

other things are done before uh this run context. But let's see if we if we have

context. But let's see if we if we have got any any other run here.

Okay, it's it's loading. So we have seen we are seeing this auto logging. So you

see still even if I do just the training uh part here within the auto logging uh within this I'm able to get

uh all these artifacts here. Okay,

cool. So, uh this is how you do auto logging using MLflow. Now, if you had to do let's say manual logging, then you could have just done uh MLflow. Now,

please be aware that if you uh do auto logging again and run it again, then there will be one more um auto logging run get created. So, runs differ uh by

their ID. So you might have to basically

their ID. So you might have to basically uh grab this run id and if you want to basically reproduce this uh in the same

run. So I can I can I I have to do run

run. So I can I can I I have to do run id and inside here I can pass this. So

now it will it will use uh this run only. Okay. So I can now comment this

only. Okay. So I can now comment this out because the uh model is already there. I can do something like log

there. I can do something like log metric. I can do something like extra

metric. I can do something like extra metric and here I can do some some random values. So if I run this, it

random values. So if I run this, it should log the extra metric here. Let's

see. Let's refresh it and we should be able to see a extra metric. And this is an example of manual

metric. And this is an example of manual logging. You see this extra metric.

logging. You see this extra metric.

Okay. So uh yeah. So now uh in the next uh discussion I would like to uh discuss the concept of nested runs. Okay. So

let's uh let's continue with that. Thank

you so much.

So let us talk about how to do a nested uh runs in MLflow on datab bricks. Now

the concept is quite similar as we do nesting locally on MLflow. Uh so let me show you that the idea is that uh right

now the way we are making these runs is uh we create a run and we open this run and we see uh the outputs within that run. Right. The idea is that we can have

run. Right. The idea is that we can have a use case where I might have a you can say a parent run and inside that parent run you have got different different

experiments. Maybe usually the way this

experiments. Maybe usually the way this experimentation works is um you have a experiment uh as part of something large you have some hypothesis that you are

testing and you have got various runs which basically test that hypothesis.

Now you can have a situation where you have a hypothesis and under that hypothesis you have multiple other hypothesis. I hope you are getting uh

hypothesis. I hope you are getting uh the point. You will get it uh in a

the point. You will get it uh in a moment. So I'll do this mlflow start

moment. So I'll do this mlflow start run. Okay. So I'll create this as uh

run. Okay. So I'll create this as uh maybe parent run. I'll I'll name this as parent run and I'll uh create this as

parent. Now within this u uh run what I

parent. Now within this u uh run what I want to do is I want to start one more run. Okay. So with mlflow dot start run

run. Okay. So with mlflow dot start run and I would like to name this as maybe uh child one. child run one. Okay. Or

maybe child one. And here I need to specify a parent ID. So I'll say parent run ID will be it will come from this

parent. Okay. So I will say parent

parent. Okay. So I will say parent dot info dot run id. Okay. So as child one and

here I can do something like mlflow dot log. You can do all sorts of things that

log. You can do all sorts of things that uh we have already seen. So log param log params and here I can uh log my

param one. Okay. Similarly I can copy

param one. Okay. Similarly I can copy this couple of times

and I can do child one, child two, child three.

I don't uh I don't need to necessarily change these names because I'm running this only after this gets executed.

So uh and in this manner you can see you can create a hierarchical relationship where within the child run also you can have uh multiple runs. Okay. So let me

run this and you will see uh the output uh how it will look like. One second.

Why is why the in uh syntax is wrong.

Uh let me see line number three syntax is wrong. Why? What ex? Okay. Okay. I

is wrong. Why? What ex? Okay. Okay. I

think name is something that I did not correctly specify.

Okay. So and after that okay so what we can do is uh we can simply run this and let's see start run

got an or run name sorry run name uh run name run name run. Yeah.

run. Yeah.

Run with to start a new run first and the current run with a nested run. Oh,

okay. I think okay. So, yeah. So, after

specifying the parent run ID, I can I need to also specify the nested argument which is going to be true.

Okay. So I need to uh basically get this nested equal to true in all the run definitions.

Okay. Now let's run this.

Uh it's saying okay. Ah I needed to so there are a lot of bugs here.

Provide this as dictionary not set.

Okay. Now it will run.

So after this we will see lot of because uh okay so let's let's refresh this.

Okay.

So um the only correct run parent run would be the latest one. So all these things were created because we were getting the errors. So let's delete this.

Now you'll get this parent run and if I expand this you'll get child one, child two, child three. Okay. And uh I can open the parent run also and I can open

uh their child runs separately. So you

get the idea like if you have a sub hypothesis and inside inside that sub hypothesis you want to try out bunch of uh parameters, bunch of metrics and

bunch of models you can uh try out the nested run functionality of MLflow. So

yeah, so thank you so much. Now we will be uh moving ahead and looking at how model registry and uh serving of model looks like uh with datab bricks. Let's

uh let's move on to that. Thank you.

Let us now look at how model registration works on datab bricks and it is a little bit different than what we normally do in local mlflow setup.

Now on your screen what you are seeing here is uh I'm not into the MLflow section but rather this catalog section.

Now this catalog section is where your model will eventually land. Okay. So you

can log your models here uh in the experiments but eventually if you have to register your model you need to create those those here. Okay. So uh

what I have here is you have got this organization level uh segregation and in the organization you have uh these workspace and system. What are these?

These are known as catalog in the uh datab bricks terminology. And if I expand this within the workspace you see two different uh what we can call them

them as schemas. So we have got default schema and we have got information schema. And obviously information schema

schema. And obviously information schema will have some um you can say u information related to data bricks and then there are system specific u uh you

can say catalog. So I've got two cataloges here. Uh sorry not catalog

cataloges here. Uh sorry not catalog I've got two catalog here workspace and system. If I expand the system catalog

system. If I expand the system catalog within the system you will you will see um uh there are some some um you can say

options like for example MLflow. If I

expand the MLflow within the MLflow, I will have these uh you can say schemas.

Okay. Uh uh sorry the schema is MLflow.

Uh these are tables. So if I open the experiments latest this table will have um all the details related to my experiments like account ID, who ran the

uh experiment, what was the workspace ID, what is what was the experiment ID.

Now how can I query it? I can query it in the SQL editor. So I can open this SQL editor um in the in a new line in a new tab sorry and then I can uh explore

this. Now why I'm telling you all this

this. Now why I'm telling you all this is because here there is a shift in mentality in your local setup you have everything logged locally but here uh

these things are uh being saved in the catalog u here. Okay. So now I can see uh the SQL query. I can create a new SQL

query and I can basically uh run any uh any anything that I wish to run. Okay.

And you can see I have got this users.

Now I have got two users here. One is uh the user that I added um and I have got one more user uh that I can uh clearly

see here. Okay. So now now if I open the

see here. Okay. So now now if I open the SQL query I can select the workspace and I can select the schema within the workspace. Workspace is where? Workspace

workspace. Workspace is where? Workspace

is here. In the workspace, I can select what I can select the default schema.

Okay. And within here, um so so this is what I wanted to show you is that uh how catalog looks like. Now let me come to the model part. Now within the catalog,

so if I expand the default um uh you can say schema within uh this I can create some things. What are all the things

some things. What are all the things that I can create? Let me click on default and let me click on uh uh let me come here to the right. So if I click on it you can see you can create several

things. You can create a table, you can

things. You can create a table, you can create a model, you can create metric and you can create volume. For now let's go ahead and create a model. Let me

create a model and I will create a model with the name let's say uh I'll I'll name this uh model as maybe sample demo model. Sample demo model.

Now once I have this model name ready, what I have is uh so I have got uh this model available in uh workspace catalog default schema and this is the name. So

the way I'm going to refer to this model is I'm going to refer to this model as uh workspace dot what is the uh schema name? It will

be default schema default schema. And then I can uh I can

default schema. And then I can uh I can basically grab this name. So this is the fully qualified name of my uh cataloges

model. Now what is the use of this? So

model. Now what is the use of this? So

the use of this is uh let me show you this experiment. So this is the uh

this experiment. So this is the uh model. This is the notebook that I have.

model. This is the notebook that I have.

In this notebook I am basically setting up a run which uh basically uh is creating a simple classifier model and then it is logging the model. Okay. Now

uh you can log the model at the same time you can also register the model. So

the the these two are different things.

Logging model would just log the model and associate uh that with this run. Let

me show you what I mean by that. So I'll

say run name is equal to uh demo uh registry or maybe demo catalog.

Okay. So let me now do one thing. Let me

don't uh let me not register it. Let me

just log it. So I'll just run it. Uh

this is going to be demo catalog.

Let uh let's wait for some time and meanwhile let's explore this further. So

uh I want to uh take you to this system uh catalog and inside this system AI catalog. In the AI you have got this

catalog. In the AI you have got this models. So as I as I uh explained you

models. So as I as I uh explained you you can create models, tables and variety of things. So I can expand this model and you see these uh models are available. What are these models? So

available. What are these models? So

these models are LLM models. They are

already registered um by datab bricks.

Now what you can do with it? You can

serve those these models. How can you serve it? I will show you uh in in later

serve it? I will show you uh in in later uh videos or later sessions uh how to serve these videos. But these LLM models since these are very huge uh datab

bricks took the effort of uh basically logging these models and providing you uh in the uh catalog. Now uh what you can do is you can just take these models

and uh basically create uh the serving endpoints. Okay. So let's see uh okay so

endpoints. Okay. So let's see uh okay so this uh this is completed. Now I can open this in experiment this run. So let

me open this run and let's see what uh have we got here.

So I see this uh experiment. We don't

have anything. But if I come here in the artifact, I will see this model. Right? Okay. So

now I'm I'm able to see this model. But

if I go into the models tab from the left I

I'm I'm um so here if I see I don't see my model right so I I do do

see my sample demo model but I don't see exactly like so this model is not uh here in this registry. Now where is this

model? This model is only associated

model? This model is only associated with my experiments. Okay. Now if I open the sample demo model, it will take me to the catalog.

The catalog where from where we created it. But currently this sample demo model

it. But currently this sample demo model has no no artifact associated with it.

Now how can I bring my artifacts to basically to this model? Now I I do we don't have anything like even we don't have the first version. So you see there

are no registered model versions yet. We

do have the model but we don't have the registered model versions to it. Okay.

So let me go to the um experiments again and under the experiments I will uh come here into the de not demo experiment

sorry not demo but rather uh this one because in this notebook I did not uh refer to any experiment.

So now uh let's let it load first.

Okay. So one second why it is not loading. Okay. So yeah. So this is the

loading. Okay. So yeah. So this is the uh experiment name. When you don't specify the experiment name, it will refer to the notebook name. Okay. So now

after that uh it is loading the run. So

we have demo catalog and if I go to artifact I have this run. Now if you see this register model you can either register it from here or you can from

here you can provide the register model name register model name argument and how are you going to

provide that name? You will provide it by uh specifying three pieces of informations. Okay, first one is going

informations. Okay, first one is going to be catalog name.

Second one is going to be schema name.

Third one will be the model name. So if

you provide all these three names, it will uh take this model artifact associated with this run and it will put it inside this model. Let's do it from here. So I I'll click on this registered

here. So I I'll click on this registered model and then I need to select which model I will be using. So you have got several models here. So you can choose any model that you wish but we created

the sample demo model for this and I'll register it and the moment I register it. Uh so it it started registering run

it. Uh so it it started registering run this. Now if I come here and if I

this. Now if I come here and if I refresh it so I I'll just see uh by refreshing whether my uh sample model has uh been create has been basically a

new version has been created or not.

Okay. Okay. So, currently we don't have any uh versions yet, but let's wait for uh some time and let's see how long it takes. But this is the process that I

takes. But this is the process that I wanted to show you like how cataloging works uh and how catalog is actually different than uh how do how we do

modeling in um uh like locally.

Okay. So, let's refresh it again and let's see if it does not happen this time. I'll just pause this video and

time. I'll just pause this video and resume it. uh once

resume it. uh once uh once we have this registration completed. Okay. So we have the

completed. Okay. So we have the registration complete. You can see now

registration complete. You can see now version one. So if I click on version

version one. So if I click on version one I will be able to see the model artifact paths.

Now once I go here I can click on the serving also like uh uh let me show you one second. So yeah. So from artifact I

one second. So yeah. So from artifact I can look at the exact um uh models that uh basically got created here. Okay. So

with that said uh I think um uh that would be it for this video. Uh now we we are going to be looking at how to serve

a registered model using datab bricks manage serving endpoints. Thank you.

Hello everyone. So in this video we want to look at how we can actually take our registered models and create uh the endpoints. Now I went ahead already and

endpoints. Now I went ahead already and created one endpoint um uh already because it takes some time to uh container to spin up and actually start

serving our models. But let me meanwhile while this is uh being created what I want to do is I want to do couple of things. First I will show you how uh

things. First I will show you how uh when I click on this create service serving point how uh does it look like and then uh we will be reviewing this

endpoint and then we will be try to uh use this endpoint. Okay. So uh I think uh there are a lot of things that we need to do. Let's get started. So there

are two different ways by which uh we can uh create these endpoints. First is

uh we can go inside the navigate to the models and we can choose the models that we want to create the endpoints for. So

for example, if I if you want to uh deploy the uh claude set 4, you can click on it and then uh you can come to this top right button that is serve this

model and if you click on it uh it will automatically take you to the serving page and it will ask you for some parameters like what will be the name of

the output, name of the endpoint and then uh entity details and also for your information in databris entity is known as anything that is there in the

catalog. So catalog can have tables and

catalog. So catalog can have tables and it can have models and it can have other things also. So and that that follows

things also. So and that that follows this pattern uh that is uh catalog dot uh the schema dot uh the uh name of the

entity. Okay. So this is how uh you

entity. Okay. So this is how uh you create uh the endpoint. So you can obviously specify the name uh and and you you see this uh URL format it will

be https by default. So yes uh your endpoints will be secure and then there will be a user ID. So user ID will be um depending upon what u your user ID is

and then there will be.cloud.databicks.com

be.cloud.databicks.com

and then serving endpoints and then whatever name you specify here will be come will will uh come here and then invocation. Okay. So this endpoint if

invocation. Okay. So this endpoint if you if you hit you can call these models whatever you want to register. So let's

try to basically uh let's let's go inside the models again and let's uh open the sample demo model uh and let's see

when I uh so what I can do is I can serve this model. I can click on it and serve this model. I can uh then it will take me to the serving page as uh

mentioned. uh I can I can have um I can

mentioned. uh I can I can have um I can have maybe uh sample model and then after that uh it is

automatically uh basically filling the compute type but depending upon uh the kind of usage uh you are dealing with you can select uh GPU or CPU you can

select GPU small, GPU medium and GPU large and then uh there are these concurrency uh settings which you can uh basically select. You can uh select

basically select. You can uh select whether you want to scale out or not and uh what will be the minimum concurrency, what will be the maximum concurrency.

Okay. Now you can also enable this scale to zero. I have enabled it uh in my

to zero. I have enabled it uh in my previous uh serving that is currently in progress. We can enable the tracing

progress. We can enable the tracing also. It will basically track all the

also. It will basically track all the input output and it it is going to save this into your catalog. Then uh you can uh do the root optimization and uh we

can do that using AI gateway. So we have got AI under the AI gateway we have got enable usage tracking. We we can enable inference tables and we can also select the table location. So I can select the

catalog. I can select the schema. I can

catalog. I can select the schema. I can

select the table prefix and then we can obviously go ahead and create some tags server uh serverless usage policy and then we can get uh some sort of alerts.

So we can configure uh let's say email alerts and on success or on failure uh if you want to create uh the alerts and then finally you can click on this

create button and it will basically start the serving process. Okay. Now if

I look at my previous um endpoint which I created now it's ready for usage. How

can we use it? But but before even using it I want to uh review some of the settings. So you can see as u previously

settings. So you can see as u previously the u endpoint is this https secured version then I my user id cloud.databicks.com

cloud.databicks.com serving endpoints my first endpoint and then invocation.

So after this um I can I can also look at my usage policy like how uh it is basically um performing and uh uh so for

example if I can I can look at my latency. So datab bricks is uh a

latency. So datab bricks is uh a complete end toend uh you can say monitoring and observability stack for you. MLflow is inbuilt. Uh so all these

you. MLflow is inbuilt. Uh so all these things you are getting uh by subscribing to one tool. All right. So after this uh is done, what we can do is we can

obviously delete it. But let's let's try to uh use this.

So how are we going to use it? We can

use it uh via curl. You can use it via Python. So let's

Python. So let's uh try to use uh it via Python. So I'll

just copy this command and I'll come to my uh my notebook. Let me also save this notebook. maybe uh

notebook. maybe uh log and register model. Okay. So let's do one thing.

model. Okay. So let's do one thing.

Let's uh use it here. Let's try to use it here only. So we are importing OS request and then uh create TF serving JSON and then we have the this URL

authorization. I'm getting this uh datab

authorization. I'm getting this uh datab bricks token. Where is this uh datab

bricks token. Where is this uh datab bricks token is going to come from?

Let's see. Uh

1 second okay I think it will automatically uh for secure unity catalog connection securely okay okay dataf frame.split

split.

So where exactly is my uh token is going to come from?

So let's see.

So I did some digging. So this is where from where we can get uh the authentication tokens by coming to the settings. So I can uh go here and then

settings. So I can uh go here and then uh open my settings and inside the settings I can basically get the token

by coming uh in the uh in the developer panel. I can click on manage and I can

panel. I can click on manage and I can generate a new token.

Maybe I I'll set the life uh lifetime to one day. my token and I'll generate it

one day. my token and I'll generate it and then I'll just grab this and I'll paste it here. So I won't be using

environment variables for that but you should probably uh use that.

Okay. So now uh this is our the token the content type will be JSON. Then we

have got data set dot2 dictionary dot orient.split.

orient.split.

So now I would want to uh basically first of all run this and then uh I will take my x.

So I will look at my x the way x looks like. So currently x is like this. So

like. So currently x is like this. So

what I want to do is I want to uh score model score model and I would want to provide

my X here. So let's run this and let's see what do we get. So what we are getting here is we are getting predictions and the predictions are

coming in a dictionary uh of list. So predictions dot list. So

it has scored all this all this into uh this JSON and it has returned this. If I

come here, I should be uh this graph should be updated. Let's refresh it and let's see if we uh have got uh the

latency numbers.

Uh so we we did get the CPU usage, we did get uh the memory usage, we got provision concurrency also, but uh the

latency is something that uh we haven't received yet. Maybe it will take some

received yet. Maybe it will take some time. But anyways, see uh the point that

time. But anyways, see uh the point that uh we wanted to make by creating this uh endpoint is how it would look to uh

deploy one registered model uh via databicks. Now this endpoint if let's

databicks. Now this endpoint if let's say this token is still valid by the time I upload this video, you can make calls to this uh URL. So this is really

powerful. Now you are no longer creating

powerful. Now you are no longer creating these endpoints locally uh as we do using MLflow's open source. We are

creating it uh using datab bricks managed.

Uh okay. So I think that would be it for uh this discussion. Now we will try to uh look at some of the uh geni stuff

uh using datab bricks. So thank you so much.

In most machine learning tutorials, you learn how to train models. But in real industry projects, training is only 10%

of the job. The real challenge is deploying models reliably, serving them, versioning them, and making them accessible to applications.

In this video, I'll show you exactly how to take a real hugging face model, wrap it inside a custom MLFlow Pyunk model,

register it in databicks, and deploy it as a production ready endpoint.

Specifically, we will install PyTorch and transformers. Download a real

and transformers. Download a real embedding model from hugging face. Wrap

it up inside a custom MLFlow Python model. Implement load context and

model. Implement load context and predict methods. Register the model in

predict methods. Register the model in MLflow model registry. And finally,

deploy it as a live serving endpoint in databicks.

By the end of this video, you will understand one of the most important real world skills in modern ML engineering that is how to deploy

transformer models using MLflow.

Let us start with making a plan of what all things we need to do in order to serve the endpoint and create and deploy

our model that we want to basically uh deploy on databicks. So we'll be following these steps so that we have a clear guideline and guidelines

as we are progressing.

First one is going to be install PyTorch and transformer library. Now we are not going to do anything on our local platform. The only thing that we are

platform. The only thing that we are doing locally is just create this project 1.txt file. Nothing else. So

I'll just go ahead and open databicks and uh I will just quickly log into databicks platform

using my Gmail account and then after logging in I'll create a new notebook and attach the serverless compute to it.

I'll just go ahead and create a notebook.

And I'll name this notebook as uh transformer model deployment.

Transformer model deployment.

Now what do I mean by uh by the by the statement that I want to install some packages locally? We have Python but

packages locally? We have Python but here on serverless how do I install my packages? So first of all let me connect

packages? So first of all let me connect it really quickly and then from here I can go to more and I go to sorry from here I can go to configuration and

inside the configuration if you see I have this added and installed. Now the

way you install the new packages you write the package name here transformers for example. Now the reason it is giving

for example. Now the reason it is giving me this uh is because one second uh so the reason it is giving me this is

because it wants me to use equal to equal to for specifying the exact version but I don't need to do that and then I also need to do uh pytorch okay

sorry uh torch so I want to install uh torch library and then uh uh transformers but let me see if I have already got uh these

installed or not. Transformers, I don't have this package or maybe let's say torch I don't have this either. So what I will

do is I will install these two packages really quickly. So I'll go to addit

really quickly. So I'll go to addit transformers torch and I I will apply.

Now when I apply it, it would take some time to basically install these packages.

Uh so let's wait because it takes it rebuilds the container and uh everything. So I'll just pause this

everything. So I'll just pause this video right here and I'll let it installed first and then we can resume it.

Okay. So as you can see it took 3 minutes to add these packages. Now let

us quickly test whether um we have these packages ready or not. So from

transformers I will quickly try to import a a subm module called auto a function a class called auto

tokenizer. So if if this uh succeeds

tokenizer. So if if this uh succeeds that means my package is working fine. I

can also see quickly if my import torch is working or not. Uh since it is loading it it most probably means our

packages have been installed. Okay. Uh,

it says auto tokenizer is not found from transformers. Why is that? One second.

transformers. Why is that? One second.

Let's see. From transformers. Auto

tokenizer. Why it's not working? Is did

I spell it incorrectly or what? Auto

tokenizer.

Why it's not? Okay. This time it's work.

It worked somehow. I don't know why.

Okay. So now I have imported the torch.

Okay. So now our uh package imports have been completed. Let's go back. So we

been completed. Let's go back. So we

have been able to install this. We have

been able to complete the first step.

Now let us download the model from hugging face. Which model I'm trying to

hugging face. Which model I'm trying to download? So it's a embedding model uh

download? So it's a embedding model uh provided uh as part of sentence transformers uh and sentence trans transformer itself is built on top of

transformers. So the exact model is

transformers. So the exact model is paraphrase mini LM. The reason I took this model is because it's uh lightweight and yet powerful. So it's a powerful embedding model which we can

use. So let me navigate quickly to

use. So let me navigate quickly to hugging face platform.

Hugging face.

Now from hugging face I can search all sorts of models. I have already logged in. Although you don't need to log in

in. Although you don't need to log in and you don't need to uh get a hugging face token for this video. And the

reason for that is because u this model is openly available and it does not require the uh use of uh this token. So

I'll just uh search for this model that I just uh showed and that is paraphrase minm.

So parrase minm. So this is the model uh that we

minm. So this is the model uh that we want to basically use here. And since

this is L3 version, so let's uh go go for L3 only. So uh paraphrase paraphrase mini this one. So this is the

model that we want to get. If I go inside the files and versions, I can see uh the model files. So you can see model.safe tensor is the largest file

model.safe tensor is the largest file and it's uh hardly 70 MB which is good.

Now I need to grab these models and I need to make them available somewhere here in my uh datab bricks notebook. Now

how can I do that?

So uh what we can do is we can download them definitely but where will those downloads go? That's the um that's the

downloads go? That's the um that's the one uh one point one concern that uh we might have. So what we can do is we can

might have. So what we can do is we can create uh a folder maybe let's create a folder which would hold our model. So

model files I'll create this folder called model files. So now uh let me refresh it. Okay. So I've got this model

refresh it. Okay. So I've got this model model files. Now what I can do is I can

model files. Now what I can do is I can download my um my models in this file.

Now how how am I going to download it?

So uh the way to download it is by using it. So basically what I'll do is uh I'll

it. So basically what I'll do is uh I'll quickly from transformers I will import

uh I will import automod and if I run this it will import the auto model and auto

tokenizer have has been already imported. Now after this what I want to

imported. Now after this what I want to do is I want to create two variables.

uh first one is auto tokenizer and it's already giving me suggestion the of the command that I want I have to use so it's called from pre-trained and what

will be the key the key will be this exact key that we get from hugging face so now I can provide this key here so this will create my tokenizer and I

similarly I can create a model so automodel dot from pre-trained Similarly, I'll paste it. And if I run this, it will give us some uh warnings

like you are not using HF token and things like that. And it will download it. But where exactly it is going to

it. But where exactly it is going to download. For that, we need to basically

download. For that, we need to basically uh create a uh we need to create a variable called model download directory and it will

point to this directory here. So I'll

just uh try to copy this path. How can I copy this path? Yeah. So from here I can copy the path and I can paste it. You

see workspace and model file. So this

model files directory has been saved here. Now what I can do is I can do

here. Now what I can do is I can do something like token dot save uh pre-trained

and I can do model download directory here. Similarly I can do model dots save pre-trained

and model download directory.

So this would just take our model files and it will download place these uh in here. Okay. And why are we downloading

here. Okay. And why are we downloading it? Because later we will be referencing

it? Because later we will be referencing it in our custom Python model that uh we will be creating. So if I run this, it should kick off the uh

model download process. So we are seeing that uh it is basically downloading it.

So yeah, so you see all these progress bars are showing uh that it uh the download is in progress. So let's wait for it for a couple of seconds for it to

complete. Meanwhile,

complete. Meanwhile, let me see how I can one second.

Let me see.

Let me close this first of all.

Is it complete? Yeah. So it's complete.

So if I refresh this now, so in download files, you see configs, model, safe tensors, all these files are available now. Now I don't need to use these token

now. Now I don't need to use these token and model variables. I can just use these one. Okay. So uh let's see how to

these one. Okay. So uh let's see how to use these now.

So our next step is to basically learn to use these models that we just created. So what we will do is we will

created. So what we will do is we will quickly uh take these to and model variables and we will try to see if we are able to use

it on some text. So let's create some texts here.

This is a sample text. Okay. So I'll

just uh run this and then uh I will so so creating embeddings on top of u the text using the models that we just downloaded is a two-step process. First

we create call the tokenizer on the text and we get uh inputs and then we pass these inputs to produce the

uh you can say to pass through these inputs uh to the neural network. So in

the text I want to return tensors. So basically I want to uh tell

tensors. So basically I want to uh tell tell it in which format I want I have to return the tensors. I'll say pytorch and then I would like to set the padding as

true. Now why do I need to set padding

true. Now why do I need to set padding as true? The reason for that is when you

as true? The reason for that is when you will be uh using text of different length you will need to align them right because one text can be smaller another

can be larger. you need to add some uh extra fields, extra tokens to be able to uh basically uh give you the same

length. So if I run the inputs now, so

length. So if I run the inputs now, so you see we we got input ids and it was a much quicker process. Now we take these

inputs and we pass it to uh the uh torch. So with

torch. So with torch dot no gradient. So we run it uh in a context uh manager and we get

outputs. In the outputs we what we get

outputs. In the outputs we what we get is we we take our model where is our model defined our model has been defined here. Now if I take my model I just need

here. Now if I take my model I just need to pass my input uh input sorry but here inputs is a dictionary but rather I want to unpack it. So I'll just put a double

star in it. So don't get scared from this double star. It's just unpacking it. So input ids u and token type ids

it. So input ids u and token type ids and attention mask all these are unpacked. So if I run this it will take

unpacked. So if I run this it will take some time like little bit of time. So I

I get outputs. Now if I check my outputs what will I get? I'll get a detailed output of uh all the hidden layers information. But I want to um get only

information. But I want to um get only the last uh hidden state. So I can get last hidden state. I get a you know very

long uh tensor. Now if I check the shape of this hidden tensor, you will see it has a shape of 1 7 384. What is that? So

it's so one corresponds to the batch. If

let's say I add one more text here, one more text. And if I run through this again and if I check the shape one more time,

you will see now it will have two which means this dimension represents which text we are dealing with. And this seven 384 you can imagine seven different

arrays of length 384. Usually we mean it across this axis. So that what we get is we get a 2x 384 um uh matrix where each

uh of this dimension represents the text and 384 is the actual embedding. So I

hope you understand and this is how we use this model. Okay. So uh uh yeah. So

now what we need to do is we need to move to the next steps and next step would be to create a custom Pyunk model

and log that model in MLflow.

Okay. So the next thing is to implement uh a custom Pyunk model and exactly uh in that we will be implementing a class which will have two methods. One is

going to be load context and another is going to be predict method. So let's

navigate back to uh the uh the transformer notebook that I had. I would

like to import MLflow really quickly.

Now I will create a class. So I'll

create this uh class as uh maybe sert and uh sbert custom model.

And this asert custom model is going to uh inherit from mlflow dotpunction

dot pyunk model. So we need to uh sorry python model python model. Okay. So we

need to inherit from this and we need to define two methods here. So I'll quickly define load context.

This takes again itself. Self means

itself. And then context and I'll just pass it for now. And then

we will define a predict method. This

predict method should take again self.

It takes context as well. And it takes I'll tell you how this context and all will be defined. It also takes model input. Okay. and it is expected right

input. Okay. and it is expected right now what do we do in load context in the load context what we do is we take the context that is passed now how can we

pass it I'll show you when we have to log the model so while logging the model we define the context and while uh passing the context we we take the context and we load the model whatever

uh we have to do right right okay so now what I can do is I can assume that uh this context has a um has the you can say artifacts key for now just assume

that it has the artifacts key and from where we will see how uh how to pass that artifacts okay so what I'll do is I'll say context

dot artifacts and this will be a dictionary assume that it's a dictionary and I'll tell you how uh you can assume that so I'll say model directory so this will give me the model directory and the

model by model directory I mean this model files directory Okay. So I'll say model der and then from this model der what I want to do is I want to run these

two methods that is uh I'll just copy these two methods uh and I'll paste it paste them here. So

I want to run uh tokenizer and model.

Now I don't want to uh pass this uh you can say this uh uh hugging face uh string rather I want to pass my model

directory here.

So I'll just pass model d here. Now after this tok and model has been defined I need to

also attach these to this instance of sbird. Okay. So I'll say model sorry

sbird. Okay. So I'll say model sorry self dot model is equal to model self

dot token is equal to token.

Okay. So what is the next step? Next

step is to implement the predict method.

Now where will this predict coming will be coming from? And it depends on you.

It depends on how you want the model input uh to be passed. Let's assume that it will be passed as a PD dot dataf frame. Now for that we would need to

frame. Now for that we would need to define PD. So let's define PD here. So

define PD. So let's define PD here. So

import pandas as PD. So we are forcing it. You could have just accepted a list.

it. You could have just accepted a list.

Okay. So that is up to you. Now what we want to do is we also want to return a PD. So so we want to return a data frame

PD. So so we want to return a data frame with a key called predictions. And this

prediction will be having some let's say embeddings. Now we will be creating

embeddings. Now we will be creating these embeddings for now. Just for now I'm creating the uh I'm setting up the inputs and output context of this

predict method. Now for this model input

predict method. Now for this model input I'll assume that it has a text field.

Okay. So this predict has a text field that we want to reference. So what I'll do is I'll do something like model input.

I'll grab the text and I'll convert it into a list. And why I'm converting it into a list? You see when I tested this model, I had a list, right? So that is

the rational. So I'm telling you my

the rational. So I'm telling you my rational of why I'm basically writing out certain line of code. So I'll just say texts.

Give me a minute. One second.

Okay. So now that we have the text ready, what we have to see what what we have to do, we have to do the exact same thing that we did here. So I'll just

grab my input here uh input from here and then I'll grab my width dot uh torch dot no uh grad.

Okay. So I will get my outputs and after outputs I get my uh hidden uh state. And

if you if you recall this hidden state was giving me uh a a entry which is of the shape. So I I have to mean it. So I

the shape. So I I have to mean it. So I

have to mean it some across one axis. So

I'll just try to uh mean it across first axis and see what happens. So you see I have mean it and now I'm only getting uh like uh entries like this. Okay. So now

what what I can do is I can just take this and return this as it is. So I'll

just take my uh last hidden state and get the mean get the axis along the axis one and I'll name this as embedding.

Okay. And this embedding I'm returning as the prediction. So this is my custom assert model. Now what do I need to do

assert model. Now what do I need to do next? Let's see. we implement this model

next? Let's see. we implement this model and the next thing that we need to do is we need to take this and register it in the so so first of all we need to log it

before registering it we'll be looking at that uh next okay so for that uh let us now move to the notebook and let's see how are we

going to register it now not register but before that uh we need to log the model now to log it anything that has to be logged using MLflow it has to be

associated with a run and run has to be associated with a experiment. So what we will do is first of all we'll create the experiment here really quickly. So for

that let us first of all check our path because uh experimentation um uh looks different than uh what we do locally. So

I'll just get my current working directory. So I've got this. Now what I

directory. So I've got this. Now what I need to do is I need to uh set my experiment. So I'll do set experiment

and under that uh I can do something like sorry uh I can I can grab this also or if you want it to be fancy then you

could have just appended it. So I'll

just name this experiment as uh logging sort model.

Okay. So I'll just run this now. This

will create the experiment. So it says it is creating it. Okay. Now after that we can create a run context. So with

mlflow dotstart run within this run I will be logging the model. So you see all this code is

model. So you see all this code is already being provided. So what I'll do is I'll just uh run mlflow.log model.

Now this model we need to provide four pieces of information. First is artifact path. Artifact path. It will be a

path. Artifact path. It will be a certain thing. Let we will fill out

certain thing. Let we will fill out these details. But let us first of all

these details. But let us first of all um see what all we need to pass here. We

need to pass uh python model argument. For now I'm just putting this

argument. For now I'm just putting this as a string. But let's wait. Python

model. After that I need to pass the actual artifacts. Okay. I'll see what uh

actual artifacts. Okay. I'll see what uh what is what has to be filled here and then we have we we can pass input example as well. See all input example

is not required but it will um make your autogeneration. So basically what ML but

autogeneration. So basically what ML but what datab bricks does is it autogenerates the scoring script. So if

you want that to succeed uh correctly you can provide uh input examples also.

Now what is the artifact path? It can be anything. You can provide anything any

anything. You can provide anything any string that you want. I'll show you what exactly this uh does once the model is logged. Then Python model. Python model

logged. Then Python model. Python model

will be the actual class instantiated.

So I'll just take my sert model and I'll call it. Then in the artifact it will be

call it. Then in the artifact it will be the dictionary which is we are uh fetching here. So context dot artifact.

fetching here. So context dot artifact.

So I need to provide a model directory here.

model directory. So what will be my model directory? So I'll just grab the

model directory? So I'll just grab the path here from here. So this will be my model directory. What will be the input

model directory. What will be the input examples? We saw that in in input

examples? We saw that in in input examples we are expecting the text to be there, right? So I will create a data

there, right? So I will create a data frame with a key called input example with with a key called

text. Okay. So sorry for the

text. Okay. So sorry for the disturbance. So what I was saying is uh

disturbance. So what I was saying is uh in the input example.

Uh we have to pass in the same format in which we are consuming it. So we need to have a text field and we can provide a

maybe a list. This is input example.

Okay. So now it should log the model.

Okay. So let's log it and I'll show you what this artifact path means. So I'll

just run this. But this time it is going to take some time. Why? Because when

when I call this espert it will run all these steps. Okay. So it is going to

these steps. Okay. So it is going to take some uh time. Let's wait for it and let's see uh how much time it takes. So

it says it has no attribute called log model. Why is that? MLflow dot

model. Why is that? MLflow dot

uh okay sorry uh I think I need to pass the pi function model. So it's not a normal model but rather pi function model. So let's wait for some time and

model. So let's wait for some time and let's see how much time it takes.

So it's loading. It's still uh doing its job and you see loading weights. How it

is loading weights? Because we are uh we have called these two methods and it is looking at this model directory and it is loading these uh things in here.

Okay. So let's see. Yeah. So it's

completed. Now let us open this run and I'll show you certain things. So when I open this run and if I navigate to artifacts, you see this model name

that is what is being uh referred here artifact path. Let's say I I register uh

artifact path. Let's say I I register uh I I log one more model as model 2. And

if I run this again, I run this again, this will come out as model 2.

So let's uh let's wait for for it to complete.

Okay. So now if I open this experiment, this is going not this experiment but this run this will be model two and this

model has all the things like artifacts, model files and things like that. Okay.

All right.

So I think something uh in place of model files if I do this let's see if this solves the issue.

Because currently the files are coming this directory is also included somehow.

I don't know why we should not have this directory here.

Okay, let's let's open the run.

Why it is including the directory?

That's a good point. One second.

Artifact. Yeah, this time it's not including the directory. So that is why I was concerned. Okay, so there are some warnings. Fail to run predict on input

warnings. Fail to run predict on input example dependencies introduced in are not captured. It's saying uh okay. So

not captured. It's saying uh okay. So

what exactly are we getting here?

Okay. So somehow

okay so what is happening? We pass this and uh so what fail to run predict on input example. Okay. So it is basically

input example. Okay. So it is basically taking the input example and it is trying to run this. Okay. But why it is failing?

Uh can't we call? Let me do one thing.

So let's uh let's take this and let's call text here on this data frame. Yes,

it's working. And let's say uh if we call to list and see if it's it's working. So why is it failing then? So

working. So why is it failing then? So

we are we are able to load the list and then okay uh okay fail to validate serving input example

data.

Okay let me see let me let me debug this. Give me a sec. So I saw one issue

this. Give me a sec. So I saw one issue which uh could could could have produced this because we are directly calling this tokenizer. We are not calling the

this tokenizer. We are not calling the self dot tokenizer self dot model. Let's

run this again and let's see if this solves the issue or not. Okay. So let's

set the experiment and then after that let's run this again. this time. I think

that uh was the issue. But let's see.

Let's wait for uh maybe 20 seconds again and see if we are able to log the model correctly.

It's still not working. Fail to validate serving input examples. Why it is failing here?

Okay. So the issue was not uh how we were passing the data but rather how we were returning the data. So earlier if you recall I was using this uh as the return value which is basically a

pytorch value and uh it is the mlflow was not able to understand it properly.

So now what I did was I just looped through it and I created this list of uh embedding values. So now we'll be

embedding values. So now we'll be getting this uh clean embedding uh list and now we are able to run this log model method. After the log model method

model method. After the log model method we have to basically next step is to register this model. Registering is

different thing. Logging is associated with a run and registration is associated with the model catalog and model catalog works differently

in the uh what you can say in the datab bricks. So you have got this workspace

bricks. So you have got this workspace and inside this workspace so workspace is the catalog name default is the schema name and inside the this default you have to basically provide your model

name. So I'll just define the model name

name. So I'll just define the model name here really quickly.

Let me also uh yeah so I'll say workspace is the catalog name dot default is the schema name and this esbert

model new one okay new one is my uh model name and then I will have to also uh get my

logged model URI and it follows the method like runs and then id of the run and then finally the model. So let's grab this uh from the

model. So let's grab this uh from the run itself. So I'll just open this run

run itself. So I'll just open this run which logged the model and from here I can get the artifact and uh you will see

here model URI here. Okay. So I'll just I I can just grab the u model

uh URI from here and then put it put it here. Okay. And then after that I

can register my model. To register it I can I can call MLflow dot register model.

So MLflow dot register model and I can pass the logged model URI

and the model name.

So let's run this.

So let let it let us wait for it to complete. So successfully registered

complete. So successfully registered model. Now

model. Now uh we should be able to see the model.

So now you will have a versioned model in the workspace. So if I open this, if I open the default, you will get uh asert model new one. So this is our

model. Okay. You can have multiple

model. Okay. You can have multiple versions within the same name. Now

finally we can serve the model. How are

we going to serve? Serving is quite easy because all the hard work was in understanding the input output format creating a custom Pyunk model. This is

where the real work was. Serving is

quite simple.

Okay, one second.

So what we need to do is okay let me go back to that page again. So I'll just come here to the catalog.

I'll open the workspace. I can open the schema. I can open the esert model new

schema. I can open the esert model new one.

And uh for this new one, I can click on this ser uh this model. And the only detail that I need to provide here is sert

new model or whatever you wish to have it named. So you will get the this URL.

it named. So you will get the this URL.

That URL is something that you can use for calling it. So if I go ahead and create this right now, it will take forever. It takes about 30 minutes to

forever. It takes about 30 minutes to spin up. Last time it took 30 minutes.

spin up. Last time it took 30 minutes.

So that is why what I'll go what I'll do is I'll go to serving and I'll use the already created one that I have. So it

has the same configuration and all. I'll

just use it and I'll showcase you how you can uh basically work with it.

So now that we have done all the hard work of creating these endpoints and all you would want to use it and consume it right.

So what we can do is once your endpoint is ready its status will go from updating to ready. And if you click on

it, you will be able to see all the details about the endpoint like what is the URL and what is going to be the

configuration of the endpoint and besides those details you will be able to see the latency,

request uh request per second, response uh sorry uh uh response rate, GPU GPU or CPU usage whatever you are using memory usage. So all the

monitoring and observability you are getting for free. Now uh how do we consume this model? Okay. So to be able

to consume it you would need to first of all come here and open the settings tab.

Under the settings, you would want to grab the datab bricks token. So, I'll go inside the developer and I'll navigate

to this access token. I'll click on manage and I'll generate a new token.

I'll name this as testing and I'll put the lifetime as 1 days. Let's generate

it and let's grab this token. Okay,

let's go to the datab bricks and let's create a new testing uh notebook.

One second.

I'll just open a new tab. Maybe

new notebook.

And in the notebook, I'll paste this token. So, I'll name this uh token as HF

token. So, I'll name this uh token as HF token.

and I would just I would just uh make it as a string. Now

I will come back to my endpoint.

I'll go here and use and under the use I'll go to Python and I'll grab this code.

Now under this I see this option for datab bricks token. I'll just remove this and I can not HF token sorry because we

are uh dealing with hugging face so I thought it's HF token so I I'll call this as datab bricks token and I'll paste it right here now you see all of

these have been autogenerated and the reason for that is because we provided all the context of what will be the input data, what will be the output

data in our custom Pyunk model. So this

is why datab bricks is able to generate all these details. We did not have to provide the scoring script as well. So

I'll just first of all attach the serverless compute to this notebook.

It's ready.

If I run this, it's now ready. Now what I can do is I can score my model. Okay. Now in the

scoring, I want to provide the input example. Now input example will be a

example. Now input example will be a data frame.

So I'll just create a input example really quickly.

PD dot dataf frame.

And I would like to pass a text.

And this text would be this is a test maybe.

Now if I run this, it says uh okay. So I think input example we need to pass. If I run this now,

it will hit the esert embedding serving.

Now it might take some time because the model the endpoint that I have been using here was a older one and it will

basically spin up the uh servers to start the process and then only it will be able to

make the scoring. So it will take some time some maybe um 1 or 2 minutes as well for this to warm up the instance

again because our concurrency is set to 0 to 4. So because of not using the instance for some time it has been

switched off and now the instance is being like uh wake up in in the wake up mode.

Currently it's not uh it has not completely woken up otherwise we would have got the response. So it is taking some time.

Let's wait and let me pause this video and let me resume it back once this is done. So as you can see now the results

done. So as you can see now the results have been obtained. But now that our instances have u been warm, if I score the model once again, it will be almost

immediately.

So you see it uh returned the results within seconds. Now there are some not

within seconds. Now there are some not so recommended ways to keep these instances uh always warm. Maybe by

setting up a server uh and running a chron job which periodically sends request to this uh endpoint and it keeps

the instances warm.

Or if you have a use case where the activity is uh uh very you know high traffic activity going on throughout the

day then in that cases these instances will be always warm. But even in those scenarios, it's not recommended to basically go for uh zero scale down. We

usually have to have at least some instances always warm so that we can cater to the request that can come at uneven times.

With that said, I think this is what uh we wanted to I wanted to basically discuss uh in this video in this project that how do we build such a kind of

serving system.

Okay. So that's it. You have

successfully taken a hugging face model, wrapped it inside a custom pi Pyunk model, registered it in databicks and deployed it as a production endpoint.

This is the exact workflow used in machine learning systems to deploy models reliably and at a scale. So if

you found this video useful, consider subscribing to the channel. I regularly

post real world ML engineering tutorials like this. Thank you for watching and

like this. Thank you for watching and I'll see you in the next video.

Loading...

Loading video analysis...