Introduction to AI in Software Testing - Chapter 1 (Path to AI QA Engineering) 🤖
By Execute Automation
Summary
## Key takeaways - **2 Years to 2 Months Path**: As a test engineer I spent almost more than couple of years to learn about all these AI related tools and techniques and in this video I'm going to cover all these detail as a crux and we'll see how you can easily get to this particular point where I have spent 2 years but now you can spend around 2 or 3 months. [02:44], [03:04] - **Skip Deep Math, Focus Applications**: You can study the large language model designing transform models and all those things from the complete ground up or you can just learn the applications of the artificial intelligence uh or the large language models. Today we are going to talk about the second half which is the applications of the large language model. [01:52], [01:58] - **Transformer Breakthrough Paper**: All these models that we are seeing today like the GPT models or the claude set uh 3.7 model or Gemini model all these models or otherwise called as the transformer models because these models uh I mean transformer there is a paper called as uh attention is all you need. [11:15], [11:37] - **Pre-Transformer Models Specialized**: Even before that the large language models were there but they were specific to certain operation for example grammar uh correction uh and there were models which were doing just for the translation uh and there were models just for doing specific operation like named entry recognition uh or maybe uh sentiment analysis like they they were all doing things in a more isolated fashion. [12:46], [13:08] - **AI Agents Bridge Knowledge Gap**: The AI agent will actually enables the large language model uh with the external world knowledge. So it is going to act like a bridge for the large language model uh which can go and search online for the details and it's going to get you the response. [16:51], [17:08] - **LLM-as-Judge Automates Evaluation**: This is an evaluation methodology where a LLM is employed to assess the output of other LLM based application such as chatbot or AI agent or rack system that you are building. So here you're not going to do a manual testing but you are basically going to be doing an uh testing of your large language model applications output with a large language model itself. [25:51], [26:05]
Topics Covered
- Skip Deep AI Theory, Master Applications
- Transformers Revolutionized LLMs
- AI Agents Bridge LLM Knowledge Gaps
- Use LLMs as Judges for Evaluation
Full Transcript
So I'm going to say uh welcome to the first uh lecture of uh the live streaming of the path to AIQA engineer.
Uh and we are going to talk about so you you might have seen me just making this action so far but you're now hearing me.
So let's talk about this. So path to AIQA engineering. So in order to get to
AIQA engineering. So in order to get to this path to AIQA engineering to me for as a test engineer I spent almost uh
more than couple of years to learn about all these uh AI related tools and techniques and uh and what are the things which is happening uh across uh
the industry right now and uh it is not like very straightforward may to it's not very straightforward for me to get to this particular point where I'm going
to talk about AI because AI itself is not a very small thing as you all know.
It has to do with quite a lot of different uh learnings. For example, I went to a path where I was learning uh the algebra initially and then also linear algebra and then I started
learning uh more about the neural networks and then I went nowhere and then I realized that this is not going to lead me to get to the point where the industry is going to go because if I'm
going to go to that particular route to start learning this is not going to be um that particular path is not going to lead me to to get to a point where I can at least compete with the market today.
So in order to get to this path to AIQA engineering, there are many different path that you can take. You can go and learn like how I was talking about like you can uh you can study the uh the
large language model designing transform models and all those things from the complete ground up or you can uh or you can just learn the applications of the
artificial intelligence uh or the large language models. And
today we are going to talk about the the second half which is the applications of the large language model but not how you can um test a large language models in
terms of how you build a large language model and then testing it. That's not
the whole idea of this particular session. The idea of this particular
session. The idea of this particular session is to give you the direction of how you can now become uh in a way to get into this uh testing of an
artificial intelligence applications and large language models. That is the whole idea of this particular session and we are going to talk about all these entity details of what this particular path
are. But most importantly
are. But most importantly uh to talk about this particular uh details of becoming an AIQ engineer. As
I told you, I was spending like 2 years to study all the different details and in this video I'm going to uh cover all these detail as a crux and we'll see how
you can easily get to this particular point where I have spent 2 years but now you can spend around 2 or 3 months to get to this uh to to become a better QA
engineer and you can tag it as AI QA engineer doesn't matter really but you can do that. Uh let me see if everybody is really happy over here so far. Oh
yeah, perfect. Awesome. So uh we are we all know about the current QA engineers role. So I'm not going to talk about the
role. So I'm not going to talk about the current QA engineer roles because the current Q engineer roles has even become more burden because right now you can see that the Q engineers not only need to do a manual testing. They also need
to do automation testing. They need to learn various different programming languages based on the tool that they are using and based on the company they go they have to again uh use different
tools and techniques to make that happen. We know about that. That's the
happen. We know about that. That's the
current state of the QA engineer.
So I'm not going to talk about it but I'm going to talk about this role the AI QA engineering role. So you see that this is kind of evolving right now and I
have seen many posts uh like job posting which is talking about the QA engineer roles with AI in their tag. So if you read through this um a bit more clearly
over here it says that uh the AI QA solution development uh will have uh design development implementation of advanced AI powered QA automation
solution for testing and AI based applications models and systems and use innovative techniques to identify the performance bottleneck inconsistencies
and model biases. So you you can see that at the
biases. So you you can see that at the moment what they're really talking about is quite uh like they they're wanting us to create an advanced AI powered QA
automation solution while these kind of job descriptions are going to come into market. I mean this is one of the
market. I mean this is one of the example right if you see these kind of uh uh job descriptions comes over here they don't really give you uh a specific
guidelines of how you're going to achieve this particular part right it just say that you need to implement an advanced AI powered QA automation solution but how are we going to do
these things because is there any already available patterns or is there any uh approved tools available like uh in uh in playright or selenium C# or or
maybe in Cyprus we know that there is an uh tool which also is a framework and we we can start doing things from there but
over here this AI itself is completely a new um uh new field altogether and now they're suddenly asking us to do these kind of operation this is going to be
really really crazy like how an AI QA I how an QA engineer can suddenly do this while this field itself is quite new and They also talk about the test automation uh and framework build and maintain
automated test framework for AI system using a wide variety of automation tools. I don't know if there is any such
tools. I don't know if there is any such tools unless until you learn that. And
similarly AI model testing you need to perform functional and non-functional testing of an AI model to assess accuracy biases scalability robustness and all those things. And you
also need to adapt AI innovations with the latest advancement and research.
Actively explore and adapt emerging AI techniques um like GBD model, neural networks, reinforcement learning to continuously improve the quality
assurance process. uh I'm not going to
assurance process. uh I'm not going to learn the uh going to talk about the entire uh job description but you can see that there are so many things that the the companies are starting to expect
and this is one such example that we are going to see more uh frequently and prevalently in upcoming days or maybe in
months. uh and you will never regret
months. uh and you will never regret that uh if you don't do it today, you'll only regret uh that you have not really
learned all these techniques probably couple of years before. And this might have happened to you already, right? You
might have been working into manual testing and suddenly you saw all the post in automation testing and then everybody started jumping into selenium cypress playright and you might have
already upgraded into playright and cypress now you're very comfortable suddenly this AI came in and now we have to start learning this AI and we have to
be a master in this while this technique this this whole uh idea of AI itself is quite new becoming a master it's it's it's not very straightforward or easy So how do we get to this particular
point while the company is expecting so many things? Uh maybe this this job
many things? Uh maybe this this job description is not a 2025 job description. Probably this could be a
description. Probably this could be a 2027 job description but but you can see this is already happening. So how do we get to this particular path and what an
AI engineer what an QA engineer right now should learn to become an AI QA engineer. So AI QA engineering role is
engineer. So AI QA engineering role is exactly same as the QA engineer but the work is more exciting that's what I could see that the way I can see it's
more exciting is because we are going to we are going to learn a lot of new things this time and you don't necessarily have to do a lot of
programming like playright or or selenium or things of that nature because here the program is almost very straightforward because you're going be
doing a lot of evaluation uh and also you are going to be uh learning quite a lot of different details and techniques of how you can test a new system al together and then you will see how you
can fuse all your learning to make this entire uh journey to be working uh in one single uh atomic unit. So while I say atomic unit you're going to see all
these operation that you all this knowledge that you have put uh on the time that you have put to learn these uh these AI stuff is going to be just so
easy for you like how you use a chat GBD for that matter. So I'm just telling you that in order to become an AI QA engineer to learn uh you need to have a basic working knowledge of the
generative AI tools and APIs. That's the
first part. While I say basic working knowledge, I would I'd probably say that uh let's say chat GPD for that matter.
So that is that is one of the generative AI tool. We all know that and the APIs
AI tool. We all know that and the APIs for instance like the gemini API or you can also take the uh the open AI's API
or cloud API. So we can use these APIs uh to uh to send some data into it and then we can get the response from the model and then we can see if the response we can format into JSON format
and then we get the response like how we are looking for. So again the prompt engineering comes in picture over there like how you send a uh request and then how you get a response back from the uh
from the large language model from the APIs and also from the chat GPT kind of tools. You know how things are working
tools. You know how things are working there. But now the next question
there. But now the next question naturally comes is like how is the large language models doing all these things.
While you just give an input you're going to ask like just uh go and identify the elements in the page. You
just give the source code of a of a HTML page and put it on the uh on the chat GBT and then chat GBT is going to go and parse the HTML file for you and it gives
you the response saying these are the different locars that you need to uh choose from this particular page. Then
your question naturally comes is like how is this really happening? how is the magic of the uh the large language model is doing all these things while getting
the ID and locaras for me automatically and that's when you are going to start learning about the next stage which is the basics of the transformer models. So
all these models that we are seeing today like the GPT models or the claude set uh 3.7 model or Gemini model all
these models or otherwise called as the transformer models because these models uh I mean transformer there is a paper called as uh attention is all you need.
So you can go and search in Google about attention is all you need. you will get to this um you will get to know about uh how the uh how this um transformer model
has shaped things. So if you just go and search for attention is all you need.
Look at that. That's a PDF file. Uh and
if you just go to this particular paper over here, you see that this is the paper and if I go to the PDF file, this this is
one of the breakthrough happened in the large language models um probably in the history uh which has
made the large language model become more aware uh and become more attention uh to the details and then it started to learn things from there on. So that is
so this I mean I'm just giving you in a very very high level but you have to go deep and to understand what this really does. So you can see like this is one of
does. So you can see like this is one of the breakthrough happened and this is the place where the transformer model came into life and all the models that
you are seeing over here like the uh the model like GPT model for example they started becoming like a general purpose model only after um having this this
innovation of the attention is all you need. Even before that the large
need. Even before that the large language models were there but they were specific to certain operation for example grammar uh correction uh and there were models which were doing just
for the translation uh and there were models just for doing specific operation like named entry recognition uh or maybe uh sentiment analysis like they they
were all doing things but they were all doing uh things in a more isolated fashion. Every models were doing things
fashion. Every models were doing things which were completely different and every models were very specific for that particular operation. But having after
particular operation. But having after having this attention is all you need paper and once the transformer model came in all the operations were done by one single model itself the model
started to learn even more better. The
way you train it and the way it started learning it it becomes even more and more better and that's when uh this paper comes in. Just if you have time just go ahead and read this. This is
quite amazing.
So that's when the basic of transform model uh comes in and you you need to learn about this transformer model. This
is quite amazing journey for you and it's going to take a long time I'm sure but it's worth understanding that.
Finally, you also need to have a good understanding of the AI agents uh as well as the um uh the the craziness of AI agent really. Uh and one of the
example is MCP server uh which is currently making the internet more crazy like the model contest protocol of the cloud anthropic. It's it's good because
cloud anthropic. It's it's good because you are now giving a capability for the large language model uh to go and talk in the external world. I'll give you one
example of this um AI agent uh which I wanted to really show you. So you see that this is the Olama
you. So you see that this is the Olama uh a lama which I'm running to run a local large language model. And if I hit
uh enter this is the QN model which I'm currently running and there we go it has came over here. Uh I'm probably going to ask one of the question and this question I I always ask in my Udemy
courses many time and I'm going to ask the same question over here. Can you
tell me uh who is the president of uh USA in 2025? So what do you expect this large
2025? So what do you expect this large language model to give you the answer right now? Do you think it's going to
right now? Do you think it's going to give me the answer as Donald Trump?
Well, the answer is no. Now it will tell you that um as of my last update I don't have specific information about the US
presidential election or leadership in 2025. Uh the the prediction future uh p
2025. Uh the the prediction future uh p political uh figure is challenging and with my uh and not within my capabilities and things of that nature.
That's what it's going to tell you, right? Because this is this is the cutff
right? Because this is this is the cutff of the uh large language model where it has the information and this is the exact same thing happened while the chat GPD came
in. You remember every time in 2021
in. You remember every time in 2021 while we tried to ask uh any question uh it used to um it used to give you an answer saying that I am not trained with
this information that you're asking because my cutff knowledge is until 20 I think 2014. No, not 2014. I think it was
think 2014. No, not 2014. I think it was 20 uh 21 or something like that. So
that's how it was. So so yeah that's the that is the uh shortcoming of the large language model. So how does this large
language model. So how does this large language model can now be more aware with the current trends and affairs.
Well, now you need to train the model to give all this information like presidential election uh information or maybe if there is any um any new
technology or innovation happen in fusion uh technology then you need to go and fuse that over here to large language model which is going to be more cumbersome right you can't give all the
world knowledge to the large language model all the time that's when we have this uh AI agent comes in So the AI
agent will actually enables the large language model uh with the external world knowledge. So it is going to act
world knowledge. So it is going to act like a bridge for the large language model uh which can go and search online for the details and it's going to get
you the response. Let me just open a simple example probably. Uh, so if I just go and see
probably. Uh, so if I just go and see the chat GBT and I'm going to say who is the president of USA in 2025. You see that now it is searching
2025. You see that now it is searching for the web there and it's giving you the answer saying the president of USA is Donald Trump.
So this is happening because now it is searching the internet and then it's getting giving you the response back. So
the LM by itself doesn't have any knowledge but but it actually went and searched online and get get you the response.
That's when the AI agent comes in. So
this is one of the application of the AI agent and similarly so this is something that you need to know about like good understanding of the AI agent how that works and how you can actually make um
make use of the AI agent and how the applications are built using AI agent like large language model applications are built and then you also need to have a good understanding of the concepts of
the rags and chat boards etc. So I'm going to talk about the uh rags and chatbots later. Just give me a
chatbots later. Just give me a second. But uh this is again this is
second. But uh this is again this is another way for you to uh give the knowledge for the AI u uh large language model by giving some local datas. For
example, uh your company datas or your company's uh details or documentations you're going to be giving in or fusing in to the large language model using the
rag and then chat bots is pretty much chat GPD. We all know about that. So I'm
chat GPD. We all know about that. So I'm
just going to leave that as it is. So
these are things that we need to know as well. And
well. And finally after we learn all these concepts we then need to learn about the
tools which is going to be helping us to uh evaluate an uh AI application. So that's when our QA
application. So that's when our QA engineers hats comes in. Right. We are
now going to learn about what are the different types of evaluations available and what are the different techniques available to evaluate a large language
model. So this is the place I mean after
model. So this is the place I mean after you learn all these information we are then going to focus on testing. So here
the testing is not as straightforward as testing an application in user interface or API or mobile app. You need to learn these concepts
first and then you need to learn about the testing of the large language model. And again testing is not like
model. And again testing is not like you're going to be selecting an element or doing things or maybe you're going to be doing uh testing uh of a large language model like uh just opening an
application and testing it. That's not
how the testing is going to look like over here. We always say evaluating
over here. We always say evaluating here. So basically while I say
here. So basically while I say evaluating we are going to actually uh uh do a lot of different metrices to ensure that the large language model
application works as expected. That's
the type of evaluation that we are going to be doing over here. Well as that said uh let's first talk about the evaluation of large language model. So we are going
to talk about this a bit and then we'll see an practical example and then we'll be done for today.
So uh again why should we do the large language model evaluation? We already
talked about that uh at least in in here. So evaluation of the large
here. So evaluation of the large language model is mainly done to ensure that the LLM actually returns the answer that what we are looking for that is
when the evaluation of the large language model comes in. So um again once again the LLM evaluation is not as straightforward as testing any usual
software. Uh we have to do a lot of
software. Uh we have to do a lot of different uh uh there are a lot of different techniques that is involved in order to test the large language model.
Uh so some of them for example is uh in the evaluation of the large language model we need to uh evaluate in terms of uh different set of metrices to build uh
to test an application which is built with the uh uh with the LLM application also built with the AI agent and also fine-tuned with the u large language
model. So there is different ways that
model. So there is different ways that we can actually do the evaluation for uh applications of these categories.
So uh there are different types of matrices available. One is the static
matrices available. One is the static evaluation metrics. Another one is the
evaluation metrics. Another one is the uh error metrics. So the static evaluation metrics are these. So you can see that we always call uh we also call
it as traditional metrics as well as the non-traditional metrics.
So over here in the static evaluation metrics which is the traditional metrics we can see that we we we verify the large language model using exact match
or blue score uh blue or rogue or um or f1 score something like that. uh and
there is also called as the LLM metrics which is this is the place where we're going to verify the large language model for its relevancy of the answer prompt alignment correctness
hallucinations and then contextual relevancy. So these are the things uh
relevancy. So these are the things uh that we actually do to verify a large language model. So just read through this before
model. So just read through this before I drink sip of water because I'm starting to get cough.
Excuse me. So uh you can see that uh if we take the large language model over here for the answer relevancy. One of
the example of the answer is let's say you're going to ask a question to your large language model or expecting uh an answer from a large
language model and it is relevant to what you are looking for and that's when the answer relevancy comes in right you need to verify if the output of the LLM
is is actually uh in the manner that you are expecting so that's when answer relevancy comes in and similarly uh hallucination One of the good example of
the large language model. You should
always expect the LLM to give you the answer uh without hallucinating or imagin imaginary answer that the LLM should give. It should be
giving the answer that that is what you're expecting really. So uh it should not give you the fake answer for that matter. That is what is hallucination.
matter. That is what is hallucination.
Uh and similarly contextual relevance is another most important thing. And there
are many different metrices available.
Believe me. uh these are the different kinds of matrices we have as you can see over here. So all these uh evaluation metrics
here. So all these uh evaluation metrics I mean this is again some of the evaluation metrics that we have got on but there are even more depending upon the tool that you use to do the
evaluation. So we need to verify all
evaluation. So we need to verify all these metrices to ensure that the LM actually gives the response like how we
are expecting. So in order to do all
are expecting. So in order to do all these evaluation, I know that if you're going to be doing it for the very first time, it's going to be a bit hard or tricky. But once you start learning
tricky. But once you start learning about all these kinds of techniques and tools uh which is available uh in uh in the large language model space uh and
the AI AI application space, these things are going to be more and more easier for you to start working with it.
So uh so as that said with having all this knowledge over here I know there are so many things that I have covered um the one of the most important thing in order
for us to do the evaluation for these things like um answer relevancy faithfulness bias detection if you do this manually I'm telling you it will be like taking you
ages to complete and that's why we have a concept called as um a large language model as an agent or uh large language model as a judge sorry. So what is this
LLM as a judge really? So this is an evaluation methodology where a LLM is employed to assess the output of other LLM based application such as chatbot or
AI agent or rack system that you are building. So here you're not going to do
building. So here you're not going to do a manual testing but you are basically going to be doing an uh testing of your large language model applications output
with a large language model itself. So
let's say you have uh an application which is built using this model the QN model right this is the least powerful model uh built by Alibaba I mean they
are now very powerful but let's assume this is the very least powerful AI model right and it has not been trained with some data which other models know and
now if you wanted to test an application which is built using this model uh if you want to evaluate um the output of this uh models application which is
built on the top of it. Then you can use another powerful model and then use that as a judge to see whether the answer is
relevant to the to the question that you have given and whether they are producing the correct answer or not. So
you can now use a large language model as a judge to perform this operation. So
think about this now. Now you are using an LLM's help to get all these matrices done over here which is amazing right that's
that's one of the way that you can actually do things even more better uh while you do the testing.
So, so this approach uh over here uh will be very very helpful uh mainly to reduce the high cost
associated with the human evaluation of the LM and the LLM applications. So
using this approach uh we can do a rapid testing of course pretty much like the automation testing we do in classical like current applications right I don't say classical yet because we're still
not there uh and similarly uh the LLM's uh evaluation uh like using LLM as a judge will also be more consistent and it's also going to be
more scalable assessment. Uh this
question always comes in all the time right like LLM will give you answer every single time differently. So how do we know that the answer is always what
we are looking for? So if you ask Chad GPD the question maybe if I go back see I'm asking the same question one more time over
here and looking at it's looking at different sources this time and see the answer is not pretty much
exactly the same text over here They both are entirely different, right? See, so every single time the
right? See, so every single time the large language model is not going to generate the same answer, but if you see they are
actually relevant answers, right? They
are the the relevancy of the answer is pretty much exactly the same. Even
though they are different, I mean even though the the text is different, the relevancy is correct. Now the large language model can answer uh can actually understand this relevancy while
you try to give it that's when the LLM as a judge comes into picture. Now LLM
as a judge can actually do all these things for you. I mean humans do like textto- text comparison or if not they have to assess or understand how the uh
answers are being created or generated but instead they can use this large language models as a judge to perform all these operation and that's when this testing
tool which I'm talking about comes into picture for example the deep eval and ragas these tools uses the technique of LLM as a judge
to perform all these operations for you.
So now you don't necessarily have to worry about how you are going to do all these evaluation over here that you are seeing. You can use these
seeing. You can use these tools like deep eval ragas or eclipse um or uh or also there is something called
as hugging face evaluate uh or uh there are many different tools available to be harnessed like every single day new tools are evolving. So you can use all these tools uh to make this happen uh
like LLM as a judge to be honest.
So if I go over here in the deep eval which is going to use the LLM asset judge technique you can see that it uses uh there are there are many different
matrices available uh in the deep eval.
One of the matrices is the G valve. This
Gal uh is actually from one of the papers which has been available like the attention is all you need paper. Uh so G
uh the DAL employs the G AL to do that and over here you can actually verify all these uh matrices in even more better fashion like the answer relevancy
faithfulness or context relevance everything you can employ in GL and make this happen. I know there are so many
this happen. I know there are so many different theories that I'm keep on giving over here like Daval or Ragas but I'm just going to stop right now because we're already running out of time and
I'm going to show you an example demonstration of how we can actually achieve all these operation but I also told you that there is a surprise um gift that I'm going to give you all
before you started to leave I see there are counts reducing uh I'm actually going to give a coupon code just do
right away before you close the session right now. Use the coupon code uh free
right now. Use the coupon code uh free for. So just use the coupon code free_4
for. So just use the coupon code free_4 um and then you should get the the course which is available in Udemy uh for free and I'll tell you which course
is that. So if I just go if I go search
is that. So if I just go if I go search for Udemy Kartik uh and if I go to my course over
here which is the uh test AI uh LLM application with the DAL RAS and more using Olama. So if you are now available
using Olama. So if you are now available online uh just go and enroll the course using uh the coupon code free for so just go
apply the coupon code free_4. This is the coupon code that you
free_4. This is the coupon code that you got to use. Hit apply you get 100%age discount. Uh this is only for you guys
discount. Uh this is only for you guys who have joined. So please go ahead and do that uh immediately. after this
session I'm going to disable that coupon really so they won't be able to enroll it so yeah that's for you guys thanks for thank you so much for joining and whatever that I'm talking about today in
this particular session it's all there over here so you see that the deep eval is covered which does all the evaluation ragas which does all those evaluation over here so so every single thing that
we are talking about including hugging face evaluate is available in this course so please go ahead and watch there. But if
you wanted to really learn even more about the uh the techniques of the uh other details like uh AI agents uh and
uh applications like rag or the chatbots, you can also learn this course which is even more better for you to get an understanding of how you do it. There
is also testing related stuffs in there as well in this course. And if you wanted to go uh even more crazy about
learning uh the um the the models and fine-tuning the models and also testing a fine-tuned model then you can go with this course understanding test uh and
fine-tuning an AI model with hugging face library. So this is an NLP library
face library. So this is an NLP library which has been used. NLP stands for natural language processing. So you can use uh this course to learn it as well.
And finally, this is a lightweight course to be honest, the the genai um in software automation testing. You can
even learn from here quite a lot of different techniques to be honest. So
this course is where you you talk about like how you run the large language model on local machine using O lama and how you do automated UI testing and manual testing and how you can use the
APIs of Genai uh and how you can uh use the APIs of Genai to write an intelligent test automation code uh and visual comparison uh of the test
automation code and things of that nature. So there are so many things
nature. So there are so many things available. I think that I have covered a
available. I think that I have covered a lot more detail about MCP especially the model context protocol with playright and things. So you can watch these
and things. So you can watch these details from here in this course. All
right, that's about my courses and now you have a free coupon code. So please
go ahead and enroll to start learning more details. It's going to be really
more details. It's going to be really really helpful. Well, as I said, just on
really helpful. Well, as I said, just on the glimpse of the course, we have seen all the different evaluations so far and I'm going to quickly show you one of the
evaluation uh using the tool deep eval over here.
So uh you can see that the in the deep eval uh over here I'm just going to so all these details that you are seeing over here like section one section two
until here is all from the course that I just gave you for free. So
uh it has got the details as you are seeing over here as well. So um what we trying to do this time we're going to test it is um you we are going to be uh
testing the answer relevancy. I think
somebody's asking the coupon code. So
let me go paste that free for is the coupon code. There we go. All right.
coupon code. There we go. All right.
Uh is it not working? Uh why is people saying that it's not working? It is
working. We we tried it. We just now tried it. It's all working fine.
tried it. It's all working fine.
Um, and let me just see why is it not working just for you over
here. Uh, and apply coupon free for.
here. Uh, and apply coupon free for.
Yeah, it's working. See that it's 100%age. It is working right. Sorry to
100%age. It is working right. Sorry to
interrupt uh or deviate.
So as you can see over here in this code we are using the DPA library uh and we are going to be uh using an class called as answer relevancy metric. So this is
one of the metrics that we are going to be verifying. As I told you if I just do
be verifying. As I told you if I just do control space there are many different metric is available. Answer relevancy is one of
available. Answer relevancy is one of them but there is the base conversation metrics um and then there is this base metrics and then there is this faithfulness metrics. hallucination
faithfulness metrics. hallucination metric, image uh helpfulness metrics and you can just keep naming it. There are
so many different metrics available. So
we are using answer relevancy metrics over here and in this code what I'm doing is I'm creating an LLM test case.
So this is inbuilt class in the DPL to be honest and over here I'm asking like who is the current president of United States of America. That's my input I'm
giving as a user and the actual output I'm expecting u from a large language model is Joe Biden. So I have hardcoded it right now.
Biden. So I have hardcoded it right now.
I mean in this demonstration I have just hardcoded it but in reality this output is going to be coming from your large language model applications or chatbots or rag systems or any other AI agent for
that matter. those things going to be
that matter. those things going to be coming over here in the actual output but I have hardcoded it just for the simplicity purpose because you see that this is an introduction section uh and
now I'm going to say in the retrieval context I'm going to say that uh uh Joe Biden serves as the current president of
u uh America so that is the retrieval context this is the context which is which is given to the large language model to go and uh do the val the the
the the metrices like validation metrics. So you see that these inputs
metrics. So you see that these inputs are very important for the large language model as a judge to to take the
decision because now so you are giving an input and you have given the actual output and in the retrieval context you are giving some details over here. So
this retrieval context in reality will come from a rack system like a vector database or any other vector data store and the actual output is going to come from a large language model. The
only thing that you give in reality is going to be this input. And now these are coming from the actual systems output. And now I'm going to use this
output. And now I'm going to use this test case. I'm going to run this on the
test case. I'm going to run this on the evaluation data set. I'm going to pass this. So this can be any number of test
this. So this can be any number of test case. I'm just going to pass one test
case. I'm just going to pass one test case. But if you have like hundreds of
case. But if you have like hundreds of test case, you can still pass that as well. And now let me just run this code
well. And now let me just run this code over here. It just executes it. And
over here. It just executes it. And
there is a beauty of this particular test. U basically this entire test is
test. U basically this entire test is running for me with a local large language model as well, which is amazing. So this is running using my O
amazing. So this is running using my O Lama. So I'm not using like a third
Lama. So I'm not using like a third party um uh cloud provider of the large language model like uh like OpenAI or
Cloud or Gemini. I'm actually using my local machine to do all these operations. So on the course that you
operations. So on the course that you have got, you will be learning everything um from uh from running everything from your local large language model instead of purchasing the
the the API u as an additional cost as well which is amazing. So you will learn how powerful and potential you your machine already has got with the local large language model. It is amazing. If
you can able to run the deepseek model, I'll tell you there is nothing like that. It is amazing. All right. So now
that. It is amazing. All right. So now
I'm going to run or I'm going to do an evaluation of this answer relevancy metric. So now I have created the test
metric. So now I have created the test case. I've did the evaluation to create
case. I've did the evaluation to create a data set. Now because we have created the answer relevancy metrics, I'm going to execute this answer relevancy metrics this time. So if I just do an evaluate
this time. So if I just do an evaluate over here, look at that. It says that you are running a deep evalency metrics uh using a QN 2.5
model. See this is my local large
model. See this is my local large language model that that I just showed you uh over here. This one this model uh
and it has executed the test for me and look at that it tells me that the answer relevancy score is 100%age. So one is basically like 100%age and I have given
a threshold of 0.5. So even if it is 0.5 I told that the answer is correct u but over here it has achieved beyond that like it is giving like 1.0 zero. So it's
like 100%age of the accuracy in the answer relevance and we got the answer output coming over here. So the 100%age passing rate. So the test is already
passing rate. So the test is already passing for us guys. This is insane. And
now if I go and uh go and click this link. So I'm going to go copy this uh
link. So I'm going to go copy this uh link. Yeah, I'm going to copy. Okay. And
link. Yeah, I'm going to copy. Okay. And
if I go to the arc browser, um, which is this one, the confident AI, look at that. We have we're starting to get the responses over here. So this
is the one that is just executed. So if
I go to the test case over here, you see that it's 100% is passing uh in the the result. And if I go to the test
result. And if I go to the test case, you should see there is a test case over here. And it says that the input is who is the current president of United States of America. The actual
output is Joe uh Biden. Uh and the retrieval context that we have got is Joe Biden serves as a current president of uh USA. This is for a very very super
simple example that I'm showing. But in
the course that you're going to be learning, these things are going to be even more crazy. I mean we go even more crazy about it. Uh we will doing relevancy faithfulness contextual
precision. Um and then contextual
precision. Um and then contextual relevancy and we'll do all sort of verifications. Uh and if you see the the
verifications. Uh and if you see the the test cases there it'll be uh I think it'll be more more bigger than expected.
Look at that. See how much is the actual output coming up and what is the retrieval context. If you manual test
retrieval context. If you manual test engineer are going to do these kind of testing, I'm telling you it is not really possible for us to match every single thing and we need to know about
the context, understand the English text uh to see the relevancy and things. But
over here uh these things are going to be taken care for us by the large language model. Look at that how many
language model. Look at that how many details have been generated from a large language models uh output and how the verdicts are coming up over here.
from the LLM asset judge. Uh and finally it is giving us the response and you can also see there is a verbose details that is been shown over
here. I know this is a lot of things
here. I know this is a lot of things that I have talked today uh about the way the large language model can be tested and things but guess what these
are the things that you really require uh to um to understand while testing a large language model. So that's it guys.
I have spoken quite a lot today. Sorry
about the glitch initially but I'm telling you it is really worth for you to learn all these concepts from the complete basic because now still I'm
telling you the AI is in it's in fancy it is not in a uh artificial super intelligence yet but while this AI is get into this artificial super intelligence or artificial general
intelligence in another five to six years maybe 10 years that's what they climb but they are already pretty close but if the companies are going to achieve achieve that level we should
probably have some understand the basics understanding of how this large language model works and how this uh behind the scene things works. So these knowledge will be helping us to do a better
testing at least better way to talk about the things that we are using every single day even in our phones that we are doing that right so that's about the
the way that you can learn these techniques uh to become an AIQA engineer having these knowledge is going to be very very useful while you put or apply
an application for a job uh especially for an AIQA engineer and I'm Sure um this knowledge is going to be very very helpful. Once again thank you so much
helpful. Once again thank you so much for joining today's session. I'm I'm
sure you guys might have liked all the uh discussion that we have did but I will quickly see if there is any question that I have got or do you guys have any question? Uh I can take couple
of questions right now before we wind up this session. Um do you guys have any
this session. Um do you guys have any questions? I have not received any
questions? I have not received any questions though. Sorry, I guess what?
questions though. Sorry, I guess what?
I'm really having hard time looking at my monitor. So,
my monitor. So, uh, any questions, guys, so far? I see MC, I don't know. MC, can you
far? I see MC, I don't know. MC, can you just send me an email um at kartik@ techgeek.co.in. I
techgeek.co.in. I
in I will send you the coupon code. Uh
looks like you are the only person who couldn't able to somehow apply the coupon code. All right. Awesome. So looks like
code. All right. Awesome. So looks like there is no questions. Uh which makes me proud again because I feel like I have covered things more clearly. But
anyways, guess what? Thank you so much for joining this session and thank you so much for your time that on Friday um and uh today is Friday already and I
have to go and sleep. But guess what?
Thank you so much for joining once again and I'm sure exciting things are going to come up pretty soon and I'm very excited to share all these details in our upcoming YouTube live series. But
thank you so much for making this today.
I catch you in the next one. Until that
you guys have a great weekend. Thank
you.
Loading video analysis...