Introduction to AI in Software Testing - Chapter 1 (Path to AI QA Engineering) 🤖

By Execute Automation

Summary

## Key takeaways - **2 Years to 2 Months Path**: As a test engineer I spent almost more than couple of years to learn about all these AI related tools and techniques and in this video I'm going to cover all these detail as a crux and we'll see how you can easily get to this particular point where I have spent 2 years but now you can spend around 2 or 3 months. [02:44], [03:04] - **Skip Deep Math, Focus Applications**: You can study the large language model designing transform models and all those things from the complete ground up or you can just learn the applications of the artificial intelligence uh or the large language models. Today we are going to talk about the second half which is the applications of the large language model. [01:52], [01:58] - **Transformer Breakthrough Paper**: All these models that we are seeing today like the GPT models or the claude set uh 3.7 model or Gemini model all these models or otherwise called as the transformer models because these models uh I mean transformer there is a paper called as uh attention is all you need. [11:15], [11:37] - **Pre-Transformer Models Specialized**: Even before that the large language models were there but they were specific to certain operation for example grammar uh correction uh and there were models which were doing just for the translation uh and there were models just for doing specific operation like named entry recognition uh or maybe uh sentiment analysis like they they were all doing things in a more isolated fashion. [12:46], [13:08] - **AI Agents Bridge Knowledge Gap**: The AI agent will actually enables the large language model uh with the external world knowledge. So it is going to act like a bridge for the large language model uh which can go and search online for the details and it's going to get you the response. [16:51], [17:08] - **LLM-as-Judge Automates Evaluation**: This is an evaluation methodology where a LLM is employed to assess the output of other LLM based application such as chatbot or AI agent or rack system that you are building. So here you're not going to do a manual testing but you are basically going to be doing an uh testing of your large language model applications output with a large language model itself. [25:51], [26:05]

Topics Covered

Skip Deep AI Theory, Master Applications
Transformers Revolutionized LLMs
AI Agents Bridge LLM Knowledge Gaps
Use LLMs as Judges for Evaluation

Full Transcript

So I'm going to say uh welcome to the first uh lecture of uh the live streaming of the path to AIQA engineer.

Uh and we are going to talk about so you you might have seen me just making this action so far but you're now hearing me.

So let's talk about this. So path to AIQA engineering. So in order to get to

AIQA engineering. So in order to get to this path to AIQA engineering to me for as a test engineer I spent almost uh

more than couple of years to learn about all these uh AI related tools and techniques and uh and what are the things which is happening uh across uh

the industry right now and uh it is not like very straightforward may to it's not very straightforward for me to get to this particular point where I'm going

to talk about AI because AI itself is not a very small thing as you all know.

It has to do with quite a lot of different uh learnings. For example, I went to a path where I was learning uh the algebra initially and then also linear algebra and then I started

learning uh more about the neural networks and then I went nowhere and then I realized that this is not going to lead me to get to the point where the industry is going to go because if I'm

going to go to that particular route to start learning this is not going to be um that particular path is not going to lead me to to get to a point where I can at least compete with the market today.

So in order to get to this path to AIQA engineering, there are many different path that you can take. You can go and learn like how I was talking about like you can uh you can study the uh the

large language model designing transform models and all those things from the complete ground up or you can uh or you can just learn the applications of the

artificial intelligence uh or the large language models. And

today we are going to talk about the the second half which is the applications of the large language model but not how you can um test a large language models in

terms of how you build a large language model and then testing it. That's not

the whole idea of this particular session. The idea of this particular

session. The idea of this particular session is to give you the direction of how you can now become uh in a way to get into this uh testing of an

artificial intelligence applications and large language models. That is the whole idea of this particular session and we are going to talk about all these entity details of what this particular path

are. But most importantly

are. But most importantly uh to talk about this particular uh details of becoming an AIQ engineer. As

I told you, I was spending like 2 years to study all the different details and in this video I'm going to uh cover all these detail as a crux and we'll see how

you can easily get to this particular point where I have spent 2 years but now you can spend around 2 or 3 months to get to this uh to to become a better QA

engineer and you can tag it as AI QA engineer doesn't matter really but you can do that. Uh let me see if everybody is really happy over here so far. Oh

yeah, perfect. Awesome. So uh we are we all know about the current QA engineers role. So I'm not going to talk about the

role. So I'm not going to talk about the current QA engineer roles because the current Q engineer roles has even become more burden because right now you can see that the Q engineers not only need to do a manual testing. They also need

to do automation testing. They need to learn various different programming languages based on the tool that they are using and based on the company they go they have to again uh use different

tools and techniques to make that happen. We know about that. That's the

happen. We know about that. That's the

current state of the QA engineer.

So I'm not going to talk about it but I'm going to talk about this role the AI QA engineering role. So you see that this is kind of evolving right now and I

have seen many posts uh like job posting which is talking about the QA engineer roles with AI in their tag. So if you read through this um a bit more clearly

over here it says that uh the AI QA solution development uh will have uh design development implementation of advanced AI powered QA automation

solution for testing and AI based applications models and systems and use innovative techniques to identify the performance bottleneck inconsistencies

and model biases. So you you can see that at the

biases. So you you can see that at the moment what they're really talking about is quite uh like they they're wanting us to create an advanced AI powered QA

automation solution while these kind of job descriptions are going to come into market. I mean this is one of the

market. I mean this is one of the example right if you see these kind of uh uh job descriptions comes over here they don't really give you uh a specific

guidelines of how you're going to achieve this particular part right it just say that you need to implement an advanced AI powered QA automation solution but how are we going to do

these things because is there any already available patterns or is there any uh approved tools available like uh in uh in playright or selenium C# or or

maybe in Cyprus we know that there is an uh tool which also is a framework and we we can start doing things from there but

over here this AI itself is completely a new um uh new field altogether and now they're suddenly asking us to do these kind of operation this is going to be

really really crazy like how an AI QA I how an QA engineer can suddenly do this while this field itself is quite new and They also talk about the test automation uh and framework build and maintain

automated test framework for AI system using a wide variety of automation tools. I don't know if there is any such

tools. I don't know if there is any such tools unless until you learn that. And

similarly AI model testing you need to perform functional and non-functional testing of an AI model to assess accuracy biases scalability robustness and all those things. And you

also need to adapt AI innovations with the latest advancement and research.

Actively explore and adapt emerging AI techniques um like GBD model, neural networks, reinforcement learning to continuously improve the quality

assurance process. uh I'm not going to

assurance process. uh I'm not going to learn the uh going to talk about the entire uh job description but you can see that there are so many things that the the companies are starting to expect

and this is one such example that we are going to see more uh frequently and prevalently in upcoming days or maybe in

months. uh and you will never regret

months. uh and you will never regret that uh if you don't do it today, you'll only regret uh that you have not really

learned all these techniques probably couple of years before. And this might have happened to you already, right? You

might have been working into manual testing and suddenly you saw all the post in automation testing and then everybody started jumping into selenium cypress playright and you might have

already upgraded into playright and cypress now you're very comfortable suddenly this AI came in and now we have to start learning this AI and we have to

be a master in this while this technique this this whole uh idea of AI itself is quite new becoming a master it's it's it's not very straightforward or easy So how do we get to this particular

point while the company is expecting so many things? Uh maybe this this job

many things? Uh maybe this this job description is not a 2025 job description. Probably this could be a

description. Probably this could be a 2027 job description but but you can see this is already happening. So how do we get to this particular path and what an

AI engineer what an QA engineer right now should learn to become an AI QA engineer. So AI QA engineering role is

engineer. So AI QA engineering role is exactly same as the QA engineer but the work is more exciting that's what I could see that the way I can see it's

more exciting is because we are going to we are going to learn a lot of new things this time and you don't necessarily have to do a lot of

programming like playright or or selenium or things of that nature because here the program is almost very straightforward because you're going be

doing a lot of evaluation uh and also you are going to be uh learning quite a lot of different details and techniques of how you can test a new system al together and then you will see how you

can fuse all your learning to make this entire uh journey to be working uh in one single uh atomic unit. So while I say atomic unit you're going to see all

these operation that you all this knowledge that you have put uh on the time that you have put to learn these uh these AI stuff is going to be just so

easy for you like how you use a chat GBD for that matter. So I'm just telling you that in order to become an AI QA engineer to learn uh you need to have a basic working knowledge of the

generative AI tools and APIs. That's the

first part. While I say basic working knowledge, I would I'd probably say that uh let's say chat GPD for that matter.

So that is that is one of the generative AI tool. We all know that and the APIs

AI tool. We all know that and the APIs for instance like the gemini API or you can also take the uh the open AI's API

or cloud API. So we can use these APIs uh to uh to send some data into it and then we can get the response from the model and then we can see if the response we can format into JSON format

and then we get the response like how we are looking for. So again the prompt engineering comes in picture over there like how you send a uh request and then how you get a response back from the uh

from the large language model from the APIs and also from the chat GPT kind of tools. You know how things are working

tools. You know how things are working there. But now the next question

there. But now the next question naturally comes is like how is the large language models doing all these things.

While you just give an input you're going to ask like just uh go and identify the elements in the page. You

just give the source code of a of a HTML page and put it on the uh on the chat GBT and then chat GBT is going to go and parse the HTML file for you and it gives

you the response saying these are the different locars that you need to uh choose from this particular page. Then

your question naturally comes is like how is this really happening? how is the magic of the uh the large language model is doing all these things while getting

the ID and locaras for me automatically and that's when you are going to start learning about the next stage which is the basics of the transformer models. So

all these models that we are seeing today like the GPT models or the claude set uh 3.7 model or Gemini model all

these models or otherwise called as the transformer models because these models uh I mean transformer there is a paper called as uh attention is all you need.

So you can go and search in Google about attention is all you need. you will get to this um you will get to know about uh how the uh how this um transformer model

has shaped things. So if you just go and search for attention is all you need.

Look at that. That's a PDF file. Uh and

if you just go to this particular paper over here, you see that this is the paper and if I go to the PDF file, this this is

one of the breakthrough happened in the large language models um probably in the history uh which has

made the large language model become more aware uh and become more attention uh to the details and then it started to learn things from there on. So that is

so this I mean I'm just giving you in a very very high level but you have to go deep and to understand what this really does. So you can see like this is one of

does. So you can see like this is one of the breakthrough happened and this is the place where the transformer model came into life and all the models that

you are seeing over here like the uh the model like GPT model for example they started becoming like a general purpose model only after um having this this

innovation of the attention is all you need. Even before that the large

need. Even before that the large language models were there but they were specific to certain operation for example grammar uh correction uh and there were models which were doing just

for the translation uh and there were models just for doing specific operation like named entry recognition uh or maybe uh sentiment analysis like they they

were all doing things but they were all doing uh things in a more isolated fashion. Every models were doing things

fashion. Every models were doing things which were completely different and every models were very specific for that particular operation. But having after

particular operation. But having after having this attention is all you need paper and once the transformer model came in all the operations were done by one single model itself the model

started to learn even more better. The

way you train it and the way it started learning it it becomes even more and more better and that's when uh this paper comes in. Just if you have time just go ahead and read this. This is

quite amazing.

So that's when the basic of transform model uh comes in and you you need to learn about this transformer model. This

is quite amazing journey for you and it's going to take a long time I'm sure but it's worth understanding that.

Finally, you also need to have a good understanding of the AI agents uh as well as the um uh the the craziness of AI agent really. Uh and one of the

example is MCP server uh which is currently making the internet more crazy like the model contest protocol of the cloud anthropic. It's it's good because

cloud anthropic. It's it's good because you are now giving a capability for the large language model uh to go and talk in the external world. I'll give you one

example of this um AI agent uh which I wanted to really show you. So you see that this is the Olama

you. So you see that this is the Olama uh a lama which I'm running to run a local large language model. And if I hit

uh enter this is the QN model which I'm currently running and there we go it has came over here. Uh I'm probably going to ask one of the question and this question I I always ask in my Udemy

courses many time and I'm going to ask the same question over here. Can you

tell me uh who is the president of uh USA in 2025? So what do you expect this large

2025? So what do you expect this large language model to give you the answer right now? Do you think it's going to

right now? Do you think it's going to give me the answer as Donald Trump?

Well, the answer is no. Now it will tell you that um as of my last update I don't have specific information about the US

presidential election or leadership in 2025. Uh the the prediction future uh p

2025. Uh the the prediction future uh p political uh figure is challenging and with my uh and not within my capabilities and things of that nature.

That's what it's going to tell you, right? Because this is this is the cutff

right? Because this is this is the cutff of the uh large language model where it has the information and this is the exact same thing happened while the chat GPD came

in. You remember every time in 2021

in. You remember every time in 2021 while we tried to ask uh any question uh it used to um it used to give you an answer saying that I am not trained with

this information that you're asking because my cutff knowledge is until 20 I think 2014. No, not 2014. I think it was

think 2014. No, not 2014. I think it was 20 uh 21 or something like that. So

that's how it was. So so yeah that's the that is the uh shortcoming of the large language model. So how does this large

language model. So how does this large language model can now be more aware with the current trends and affairs.

Well, now you need to train the model to give all this information like presidential election uh information or maybe if there is any um any new

technology or innovation happen in fusion uh technology then you need to go and fuse that over here to large language model which is going to be more cumbersome right you can't give all the

world knowledge to the large language model all the time that's when we have this uh AI agent comes in So the AI

agent will actually enables the large language model uh with the external world knowledge. So it is going to act

world knowledge. So it is going to act like a bridge for the large language model uh which can go and search online for the details and it's going to get

you the response. Let me just open a simple example probably. Uh, so if I just go and see

probably. Uh, so if I just go and see the chat GBT and I'm going to say who is the president of USA in 2025. You see that now it is searching

2025. You see that now it is searching for the web there and it's giving you the answer saying the president of USA is Donald Trump.

So this is happening because now it is searching the internet and then it's getting giving you the response back. So

the LM by itself doesn't have any knowledge but but it actually went and searched online and get get you the response.

That's when the AI agent comes in. So

this is one of the application of the AI agent and similarly so this is something that you need to know about like good understanding of the AI agent how that works and how you can actually make um

make use of the AI agent and how the applications are built using AI agent like large language model applications are built and then you also need to have a good understanding of the concepts of

the rags and chat boards etc. So I'm going to talk about the uh rags and chatbots later. Just give me a

chatbots later. Just give me a second. But uh this is again this is

second. But uh this is again this is another way for you to uh give the knowledge for the AI u uh large language model by giving some local datas. For

example, uh your company datas or your company's uh details or documentations you're going to be giving in or fusing in to the large language model using the

rag and then chat bots is pretty much chat GPD. We all know about that. So I'm

chat GPD. We all know about that. So I'm

just going to leave that as it is. So

these are things that we need to know as well. And

well. And finally after we learn all these concepts we then need to learn about the

tools which is going to be helping us to uh evaluate an uh AI application. So that's when our QA

application. So that's when our QA engineers hats comes in. Right. We are

now going to learn about what are the different types of evaluations available and what are the different techniques available to evaluate a large language

model. So this is the place I mean after

model. So this is the place I mean after you learn all these information we are then going to focus on testing. So here

the testing is not as straightforward as testing an application in user interface or API or mobile app. You need to learn these concepts

first and then you need to learn about the testing of the large language model. And again testing is not like

model. And again testing is not like you're going to be selecting an element or doing things or maybe you're going to be doing uh testing uh of a large language model like uh just opening an

application and testing it. That's not

how the testing is going to look like over here. We always say evaluating

over here. We always say evaluating here. So basically while I say

here. So basically while I say evaluating we are going to actually uh uh do a lot of different metrices to ensure that the large language model

application works as expected. That's

the type of evaluation that we are going to be doing over here. Well as that said uh let's first talk about the evaluation of large language model. So we are going

to talk about this a bit and then we'll see an practical example and then we'll be done for today.

So uh again why should we do the large language model evaluation? We already

talked about that uh at least in in here. So evaluation of the large

here. So evaluation of the large language model is mainly done to ensure that the LLM actually returns the answer that what we are looking for that is

when the evaluation of the large language model comes in. So um again once again the LLM evaluation is not as straightforward as testing any usual

software. Uh we have to do a lot of

software. Uh we have to do a lot of different uh uh there are a lot of different techniques that is involved in order to test the large language model.

Uh so some of them for example is uh in the evaluation of the large language model we need to uh evaluate in terms of uh different set of metrices to build uh

to test an application which is built with the uh uh with the LLM application also built with the AI agent and also fine-tuned with the u large language

model. So there is different ways that

model. So there is different ways that we can actually do the evaluation for uh applications of these categories.

So uh there are different types of matrices available. One is the static

matrices available. One is the static evaluation metrics. Another one is the

evaluation metrics. Another one is the uh error metrics. So the static evaluation metrics are these. So you can see that we always call uh we also call

it as traditional metrics as well as the non-traditional metrics.

So over here in the static evaluation metrics which is the traditional metrics we can see that we we we verify the large language model using exact match

or blue score uh blue or rogue or um or f1 score something like that. uh and

there is also called as the LLM metrics which is this is the place where we're going to verify the large language model for its relevancy of the answer prompt alignment correctness

hallucinations and then contextual relevancy. So these are the things uh

relevancy. So these are the things uh that we actually do to verify a large language model. So just read through this before

model. So just read through this before I drink sip of water because I'm starting to get cough.

Excuse me. So uh you can see that uh if we take the large language model over here for the answer relevancy. One of

the example of the answer is let's say you're going to ask a question to your large language model or expecting uh an answer from a large

language model and it is relevant to what you are looking for and that's when the answer relevancy comes in right you need to verify if the output of the LLM

is is actually uh in the manner that you are expecting so that's when answer relevancy comes in and similarly uh hallucination One of the good example of

the large language model. You should

always expect the LLM to give you the answer uh without hallucinating or imagin imaginary answer that the LLM should give. It should be

giving the answer that that is what you're expecting really. So uh it should not give you the fake answer for that matter. That is what is hallucination.

matter. That is what is hallucination.

Uh and similarly contextual relevance is another most important thing. And there

are many different metrices available.

Believe me. uh these are the different kinds of matrices we have as you can see over here. So all these uh evaluation metrics

here. So all these uh evaluation metrics I mean this is again some of the evaluation metrics that we have got on but there are even more depending upon the tool that you use to do the

evaluation. So we need to verify all

evaluation. So we need to verify all these metrices to ensure that the LM actually gives the response like how we

are expecting. So in order to do all

are expecting. So in order to do all these evaluation, I know that if you're going to be doing it for the very first time, it's going to be a bit hard or tricky. But once you start learning

tricky. But once you start learning about all these kinds of techniques and tools uh which is available uh in uh in the large language model space uh and

the AI AI application space, these things are going to be more and more easier for you to start working with it.

So uh so as that said with having all this knowledge over here I know there are so many things that I have covered um the one of the most important thing in order

for us to do the evaluation for these things like um answer relevancy faithfulness bias detection if you do this manually I'm telling you it will be like taking you

ages to complete and that's why we have a concept called as um a large language model as an agent or uh large language model as a judge sorry. So what is this

LLM as a judge really? So this is an evaluation methodology where a LLM is employed to assess the output of other LLM based application such as chatbot or

AI agent or rack system that you are building. So here you're not going to do

building. So here you're not going to do a manual testing but you are basically going to be doing an uh testing of your large language model applications output

with a large language model itself. So

let's say you have uh an application which is built using this model the QN model right this is the least powerful model uh built by Alibaba I mean they

are now very powerful but let's assume this is the very least powerful AI model right and it has not been trained with some data which other models know and

now if you wanted to test an application which is built using this model uh if you want to evaluate um the output of this uh models application which is

built on the top of it. Then you can use another powerful model and then use that as a judge to see whether the answer is

relevant to the to the question that you have given and whether they are producing the correct answer or not. So

you can now use a large language model as a judge to perform this operation. So

think about this now. Now you are using an LLM's help to get all these matrices done over here which is amazing right that's

that's one of the way that you can actually do things even more better uh while you do the testing.

So, so this approach uh over here uh will be very very helpful uh mainly to reduce the high cost

associated with the human evaluation of the LM and the LLM applications. So

using this approach uh we can do a rapid testing of course pretty much like the automation testing we do in classical like current applications right I don't say classical yet because we're still

not there uh and similarly uh the LLM's uh evaluation uh like using LLM as a judge will also be more consistent and it's also going to be

more scalable assessment. Uh this

question always comes in all the time right like LLM will give you answer every single time differently. So how do we know that the answer is always what

we are looking for? So if you ask Chad GPD the question maybe if I go back see I'm asking the same question one more time over

here and looking at it's looking at different sources this time and see the answer is not pretty much

exactly the same text over here They both are entirely different, right? See, so every single time the

right? See, so every single time the large language model is not going to generate the same answer, but if you see they are

actually relevant answers, right? They

are the the relevancy of the answer is pretty much exactly the same. Even

though they are different, I mean even though the the text is different, the relevancy is correct. Now the large language model can answer uh can actually understand this relevancy while

you try to give it that's when the LLM as a judge comes into picture. Now LLM

as a judge can actually do all these things for you. I mean humans do like textto- text comparison or if not they have to assess or understand how the uh

answers are being created or generated but instead they can use this large language models as a judge to perform all these operation and that's when this testing

tool which I'm talking about comes into picture for example the deep eval and ragas these tools uses the technique of LLM as a judge

to perform all these operations for you.

So now you don't necessarily have to worry about how you are going to do all these evaluation over here that you are seeing. You can use these

seeing. You can use these tools like deep eval ragas or eclipse um or uh or also there is something called

as hugging face evaluate uh or uh there are many different tools available to be harnessed like every single day new tools are evolving. So you can use all these tools uh to make this happen uh

like LLM as a judge to be honest.

So if I go over here in the deep eval which is going to use the LLM asset judge technique you can see that it uses uh there are there are many different

matrices available uh in the deep eval.

One of the matrices is the G valve. This

Gal uh is actually from one of the papers which has been available like the attention is all you need paper. Uh so G

uh the DAL employs the G AL to do that and over here you can actually verify all these uh matrices in even more better fashion like the answer relevancy

faithfulness or context relevance everything you can employ in GL and make this happen. I know there are so many

this happen. I know there are so many different theories that I'm keep on giving over here like Daval or Ragas but I'm just going to stop right now because we're already running out of time and

I'm going to show you an example demonstration of how we can actually achieve all these operation but I also told you that there is a surprise um gift that I'm going to give you all

before you started to leave I see there are counts reducing uh I'm actually going to give a coupon code just do

right away before you close the session right now. Use the coupon code uh free

right now. Use the coupon code uh free for. So just use the coupon code free_4

for. So just use the coupon code free_4 um and then you should get the the course which is available in Udemy uh for free and I'll tell you which course

is that. So if I just go if I go search

is that. So if I just go if I go search for Udemy Kartik uh and if I go to my course over

here which is the uh test AI uh LLM application with the DAL RAS and more using Olama. So if you are now available

using Olama. So if you are now available online uh just go and enroll the course using uh the coupon code free for so just go

apply the coupon code free_4. This is the coupon code that you

free_4. This is the coupon code that you got to use. Hit apply you get 100%age discount. Uh this is only for you guys

discount. Uh this is only for you guys who have joined. So please go ahead and do that uh immediately. after this

session I'm going to disable that coupon really so they won't be able to enroll it so yeah that's for you guys thanks for thank you so much for joining and whatever that I'm talking about today in

this particular session it's all there over here so you see that the deep eval is covered which does all the evaluation ragas which does all those evaluation over here so so every single thing that

we are talking about including hugging face evaluate is available in this course so please go ahead and watch there. But if

you wanted to really learn even more about the uh the techniques of the uh other details like uh AI agents uh and

uh applications like rag or the chatbots, you can also learn this course which is even more better for you to get an understanding of how you do it. There

is also testing related stuffs in there as well in this course. And if you wanted to go uh even more crazy about

learning uh the um the the models and fine-tuning the models and also testing a fine-tuned model then you can go with this course understanding test uh and

fine-tuning an AI model with hugging face library. So this is an NLP library

face library. So this is an NLP library which has been used. NLP stands for natural language processing. So you can use uh this course to learn it as well.

And finally, this is a lightweight course to be honest, the the genai um in software automation testing. You can

even learn from here quite a lot of different techniques to be honest. So

this course is where you you talk about like how you run the large language model on local machine using O lama and how you do automated UI testing and manual testing and how you can use the

APIs of Genai uh and how you can uh use the APIs of Genai to write an intelligent test automation code uh and visual comparison uh of the test

automation code and things of that nature. So there are so many things

nature. So there are so many things available. I think that I have covered a

available. I think that I have covered a lot more detail about MCP especially the model context protocol with playright and things. So you can watch these

and things. So you can watch these details from here in this course. All

right, that's about my courses and now you have a free coupon code. So please

go ahead and enroll to start learning more details. It's going to be really

more details. It's going to be really really helpful. Well, as I said, just on

really helpful. Well, as I said, just on the glimpse of the course, we have seen all the different evaluations so far and I'm going to quickly show you one of the

evaluation uh using the tool deep eval over here.

So uh you can see that the in the deep eval uh over here I'm just going to so all these details that you are seeing over here like section one section two

until here is all from the course that I just gave you for free. So

uh it has got the details as you are seeing over here as well. So um what we trying to do this time we're going to test it is um you we are going to be uh

testing the answer relevancy. I think

somebody's asking the coupon code. So

let me go paste that free for is the coupon code. There we go. All right.

coupon code. There we go. All right.

Uh is it not working? Uh why is people saying that it's not working? It is

working. We we tried it. We just now tried it. It's all working fine.

tried it. It's all working fine.

Um, and let me just see why is it not working just for you over

here. Uh, and apply coupon free for.

here. Uh, and apply coupon free for.

Yeah, it's working. See that it's 100%age. It is working right. Sorry to

100%age. It is working right. Sorry to

interrupt uh or deviate.

So as you can see over here in this code we are using the DPA library uh and we are going to be uh using an class called as answer relevancy metric. So this is

one of the metrics that we are going to be verifying. As I told you if I just do

be verifying. As I told you if I just do control space there are many different metric is available. Answer relevancy is one of

available. Answer relevancy is one of them but there is the base conversation metrics um and then there is this base metrics and then there is this faithfulness metrics. hallucination

faithfulness metrics. hallucination metric, image uh helpfulness metrics and you can just keep naming it. There are

so many different metrics available. So

we are using answer relevancy metrics over here and in this code what I'm doing is I'm creating an LLM test case.

So this is inbuilt class in the DPL to be honest and over here I'm asking like who is the current president of United States of America. That's my input I'm

giving as a user and the actual output I'm expecting u from a large language model is Joe Biden. So I have hardcoded it right now.

Biden. So I have hardcoded it right now.

I mean in this demonstration I have just hardcoded it but in reality this output is going to be coming from your large language model applications or chatbots or rag systems or any other AI agent for

that matter. those things going to be

that matter. those things going to be coming over here in the actual output but I have hardcoded it just for the simplicity purpose because you see that this is an introduction section uh and

now I'm going to say in the retrieval context I'm going to say that uh uh Joe Biden serves as the current president of

u uh America so that is the retrieval context this is the context which is which is given to the large language model to go and uh do the val the the

the the metrices like validation metrics. So you see that these inputs

metrics. So you see that these inputs are very important for the large language model as a judge to to take the

decision because now so you are giving an input and you have given the actual output and in the retrieval context you are giving some details over here. So

this retrieval context in reality will come from a rack system like a vector database or any other vector data store and the actual output is going to come from a large language model. The

only thing that you give in reality is going to be this input. And now these are coming from the actual systems output. And now I'm going to use this

output. And now I'm going to use this test case. I'm going to run this on the

test case. I'm going to run this on the evaluation data set. I'm going to pass this. So this can be any number of test

this. So this can be any number of test case. I'm just going to pass one test

case. I'm just going to pass one test case. But if you have like hundreds of

case. But if you have like hundreds of test case, you can still pass that as well. And now let me just run this code

well. And now let me just run this code over here. It just executes it. And

over here. It just executes it. And

there is a beauty of this particular test. U basically this entire test is

test. U basically this entire test is running for me with a local large language model as well, which is amazing. So this is running using my O

amazing. So this is running using my O Lama. So I'm not using like a third

Lama. So I'm not using like a third party um uh cloud provider of the large language model like uh like OpenAI or

Cloud or Gemini. I'm actually using my local machine to do all these operations. So on the course that you

operations. So on the course that you have got, you will be learning everything um from uh from running everything from your local large language model instead of purchasing the

the the API u as an additional cost as well which is amazing. So you will learn how powerful and potential you your machine already has got with the local large language model. It is amazing. If

you can able to run the deepseek model, I'll tell you there is nothing like that. It is amazing. All right. So now

that. It is amazing. All right. So now

I'm going to run or I'm going to do an evaluation of this answer relevancy metric. So now I have created the test

metric. So now I have created the test case. I've did the evaluation to create

case. I've did the evaluation to create a data set. Now because we have created the answer relevancy metrics, I'm going to execute this answer relevancy metrics this time. So if I just do an evaluate

this time. So if I just do an evaluate over here, look at that. It says that you are running a deep evalency metrics uh using a QN 2.5

model. See this is my local large

model. See this is my local large language model that that I just showed you uh over here. This one this model uh

and it has executed the test for me and look at that it tells me that the answer relevancy score is 100%age. So one is basically like 100%age and I have given

a threshold of 0.5. So even if it is 0.5 I told that the answer is correct u but over here it has achieved beyond that like it is giving like 1.0 zero. So it's

like 100%age of the accuracy in the answer relevance and we got the answer output coming over here. So the 100%age passing rate. So the test is already

passing rate. So the test is already passing for us guys. This is insane. And

now if I go and uh go and click this link. So I'm going to go copy this uh

link. So I'm going to go copy this uh link. Yeah, I'm going to copy. Okay. And

link. Yeah, I'm going to copy. Okay. And

if I go to the arc browser, um, which is this one, the confident AI, look at that. We have we're starting to get the responses over here. So this

is the one that is just executed. So if

I go to the test case over here, you see that it's 100% is passing uh in the the result. And if I go to the test

result. And if I go to the test case, you should see there is a test case over here. And it says that the input is who is the current president of United States of America. The actual

output is Joe uh Biden. Uh and the retrieval context that we have got is Joe Biden serves as a current president of uh USA. This is for a very very super

simple example that I'm showing. But in

the course that you're going to be learning, these things are going to be even more crazy. I mean we go even more crazy about it. Uh we will doing relevancy faithfulness contextual

precision. Um and then contextual

precision. Um and then contextual relevancy and we'll do all sort of verifications. Uh and if you see the the

verifications. Uh and if you see the the test cases there it'll be uh I think it'll be more more bigger than expected.

Look at that. See how much is the actual output coming up and what is the retrieval context. If you manual test

retrieval context. If you manual test engineer are going to do these kind of testing, I'm telling you it is not really possible for us to match every single thing and we need to know about

the context, understand the English text uh to see the relevancy and things. But

over here uh these things are going to be taken care for us by the large language model. Look at that how many

language model. Look at that how many details have been generated from a large language models uh output and how the verdicts are coming up over here.

from the LLM asset judge. Uh and finally it is giving us the response and you can also see there is a verbose details that is been shown over

here. I know this is a lot of things

here. I know this is a lot of things that I have talked today uh about the way the large language model can be tested and things but guess what these

are the things that you really require uh to um to understand while testing a large language model. So that's it guys.

I have spoken quite a lot today. Sorry

about the glitch initially but I'm telling you it is really worth for you to learn all these concepts from the complete basic because now still I'm

telling you the AI is in it's in fancy it is not in a uh artificial super intelligence yet but while this AI is get into this artificial super intelligence or artificial general

intelligence in another five to six years maybe 10 years that's what they climb but they are already pretty close but if the companies are going to achieve achieve that level we should

probably have some understand the basics understanding of how this large language model works and how this uh behind the scene things works. So these knowledge will be helping us to do a better

testing at least better way to talk about the things that we are using every single day even in our phones that we are doing that right so that's about the

the way that you can learn these techniques uh to become an AIQA engineer having these knowledge is going to be very very useful while you put or apply

an application for a job uh especially for an AIQA engineer and I'm Sure um this knowledge is going to be very very helpful. Once again thank you so much

helpful. Once again thank you so much for joining today's session. I'm I'm

sure you guys might have liked all the uh discussion that we have did but I will quickly see if there is any question that I have got or do you guys have any question? Uh I can take couple

of questions right now before we wind up this session. Um do you guys have any

this session. Um do you guys have any questions? I have not received any

questions? I have not received any questions though. Sorry, I guess what?

questions though. Sorry, I guess what?

I'm really having hard time looking at my monitor. So,

my monitor. So, uh, any questions, guys, so far? I see MC, I don't know. MC, can you

far? I see MC, I don't know. MC, can you just send me an email um at kartik@ techgeek.co.in. I

techgeek.co.in. I

in I will send you the coupon code. Uh

looks like you are the only person who couldn't able to somehow apply the coupon code. All right. Awesome. So looks like

code. All right. Awesome. So looks like there is no questions. Uh which makes me proud again because I feel like I have covered things more clearly. But

anyways, guess what? Thank you so much for joining this session and thank you so much for your time that on Friday um and uh today is Friday already and I

have to go and sleep. But guess what?

Thank you so much for joining once again and I'm sure exciting things are going to come up pretty soon and I'm very excited to share all these details in our upcoming YouTube live series. But

thank you so much for making this today.

I catch you in the next one. Until that

you guys have a great weekend. Thank

you.

Loading...

Loading video analysis...