LongCut logo

Synthetic Data Generation using LLM: Crash Course for Beginners

By AI Anytime

Summary

## Key takeaways - **Synthetic Data Solves Complex Problems**: Synthetic data generation can help solve complex problems like climate change and healthcare. Most people want to generate synthetic data in their own industries and domains. [00:26], [00:35] - **Microsoft Phi Uses Synthetic Data**: Microsoft uses synthetic data to train their Phi family models because they have better data quality. Recent LLMs like Phi rely on synthetic data for training. [00:47], [00:59] - **Distillation vs Self-Improvement Types**: Distillation uses a larger LLM to teach a smaller one, producing data closer to human-created. Self-improvement creates a dataset with the model you want to improve, like using a 70B model for a 7B one. [09:32], [11:02] - **Distillation Techniques Hierarchy**: Distillation techniques include self-instruct for basic data, Evol-instruct for challenging tasks, LaB for diverse chatbot data, task-specific like Open Math Instruct, and knowledge-specific QA. These can be achieved using frameworks like Distilabel and Prometheus. [12:55], [14:16] - **Evol-Instruct Increases Prompt Difficulty**: Evol-instruct separates prompt and answer generation timing and adds processes to slightly increase prompt difficulty by repeating, evolving simple tasks like '1+1' into complex ones like proving in Goldbach conjecture. This addresses self-instruct's lack of complexity for fine-tuning LLMs. [19:08], [20:12] - **LaB Enhances Task Diversity**: LaB creates diverse datasets via hierarchical task classification like knowledge and skill tuning, classifying tasks in advance and limiting examples to one classification to reduce bias. It can be layered on self-instruct by modifying example selection logic. [22:43], [25:10]

Topics Covered

  • Synthetic Data Solves Climate, Healthcare Crises
  • Distillation Mimics Human Data Quality
  • Evol Instruct Escalates Prompt Complexity
  • LAB Ensures Diverse Chatbot Training Data
  • Distilabel Generates DPO Datasets Efficiently

Full Transcript

hello everyone welcome to AI anytime channel in this video we're going to explore synthetic data generation one of the hottest topic right now in the jni

ecosystem and know most of the people I'm talking to they all want to generate synthetic data in their own Industries domains Etc and it's really fascinating

as well guys because synthetic data generation can help you solve some of the complex problem that exist currently in this world okay uh for example

climate change you know uh also in healthcare and some other Industries know we can solve some big and complex problems with synthetic data generation if you look at some of the llms as well

you know in the recent times like for example 53 or even you start the the entire five family by Microsoft Microsoft always say that they have a better data quality for five models

because they also use synthetic data generation they use synthetic data to train these llms and of course there are multiple ways to generate synthetic data

and we are going to explore each and everything guys in this video this is going to be a lendy video so if you if you are someone who does not want to uh you know spend at least an hour this video is not for you this video is for

the people who wants to understand everything about synthetic data generation we are going to start theoretically we're going to look at how to generate data using both open source

and close Source models different Frameworks like disty label by argila we have Prometheus we have lot of other tools that can help you generate generate synthetic data in the close

Source One we have grittle okay an European company if I'm not wrong you know probably I'm not sure which which I think it's an European company grittle or GLE I don't know how do we pronunciate it but grittle I have

already a video on grittle I think few months back when nobody used to know about them and you can use that you can also use llms stand alone you know to

just create synthetic data if you have a better prompt engineering skills as well so we're going to cover each and every details this is this is for beginners we're going to cover everything about synthetic data generation let's jump

into guys you know in synthetic data generation here if you look at here on my screen right now I have a notebook which is not created by me it is by distri label you can see distri label

and Prometheus on dpu data sets I'm going to cover this in the last but let me first show you the magic of uh synthetic data Generation Now what you look what you see on my screen is a

function that's called Define generate reviews and here I'm using open AI model so I'm using GPD 3.5 turbo for this example and if you look at this what I'm saying I have a while loop in the while

loop if it's not review is not generated then I'm using 3.5 to model I have a system very basic system prompt and then I have some code that basically gives me

the the synthetic generation we're going to look into this now we have some thresold values because when you work for these kind of problems when you work for generating data then you have to do

a lot of business Logics you know to generate good quality data you need to understand the domain you know for which the data you are generating so I have a threshold value and what I'm saying here

if the word count is between 15 and 17 and this value then print counted and then append those reviews it should not be less than 15 you know so what I'm trying to to generate here I'm trying to

generate reviews for products now if I want to build a you know a a model that can you know that can help me with sentiment analytics of products finding out you know how many people like the

product and classifier or generation so those kind of hug cases can be solved using these data if you do not have a data handy and of course you can generate data based on your own domains

as well now I'm just have a prompt text WR or 25w positive review for wireless earbud highlighting it's it's battery life and number of

data points five so you can set these two values and you can see what I'm doing I'm passing prom text and number of data points and once you run this function it basically generates so this is how it generates let me show you that

now if you look at this it generating review one review two review three review Four review five with other perspective as well wherever required now imagine if you have your own products and you want to generate

synthetic data out of it or for reviews you can also do that you can also go and you know generate you know some transactional data if you want to build a system that can identifies anomalies in transaction some climate change

related data some healthare related data and so on and so forth right now I will give all of these code but this is not the main agenda of this video guys the main agenda is to help you understand how synthetic data Works what is

synthetic data what are the Frameworks tools available and then in last we go we go through this code basis let's start first of all with our journey guys so let me just do one thing I'll go go

inside my P pictures and I have a folder called synthetic data generation I'm also going to walk you through but here I have uh my tab as well we're going to do a tab route today

you're going to explain a few things let me bring up my other monitor here so I can see it synthetic data generation I I I think I am able to see that cool guys

now we're going to jump into it uh quickly here and if you look at this picture that I opened this is how synthetic dat generation process it starts you have a

seat data query input then you have an llm that basically takes as that and then you sy data gets generated then you do a postprocessing you know you both have to do pre-processing and

postprocessing to get better quality data now post processing where you do refinement filtering you know Phi or pii information sanitization ractions and so

on and so forth right it can all go in and then you have validation and testing where you can again use an llm as a judge so llm as a judge all the evals company are doing it by the way and we're going to talk about this all of

this but let's start with the definition first what do we mean by this and what are the different techniques we have now imagine you are solving a problem

that that identifies uh some of the personal identifiable information so I'm going to write here pii now you have pii information for

example if you have an Excel seat of called name and then you have geography from which country and then you have SSN number or I I'll probably remove SSN because that's in us and I'm going to

call it ID number now if you have this sample data one of your client has shared the data with you and they want you to train a classifier model but they say that hell

look you cannot use my data as it is because this contains pii information now in that case what you can do you can generate a synthetic data based on the

same schema so in case your schema is name and your schema is geography and then you have IDs so there

are multiple ways so what for example if one name is Sonu country India and I have some 1 2 3de number and these are

all my confidential details I do not want to share this with uh the company that I'm going to share my data with but of course they cannot use the same dat data they can use this as a sample of

course to generate a new data out of it or a data that might look similar but it's fake okay which is not there so for example I can just you know replace this

with uh somebody XY Z and I can just replace this with some country you know like Australia or something and then I also replace this ID number for example

you know 89 XY b or something and I create this data now to create this kind of data Standalone you can use a tool called Faker so Faker is a python

library that you can immediately use to create synthetic data of these kind of patterns basically looks at regular expression this is how the underlying concept not rocket science guys algorithms are always easy trust me

coding programming they're all logical thinking the way you think to solve a problem if you look at these kind of names geography that these are all can be solved using Rex using regular

expressions using basic natural languages that's that's that's what Faker does now this is not what we are going right away look at let's look at self data synthetic data generation

types so if you talk about the types which is very important and the entire industry is talking about it and after we got llm guys the things becomes more

easy so the first thing is called distillation and the second thing is called selfimprovement these are the two types where high level categorization of synthetic data

generation how do we generate you know and using llm of course I'm talking about see my entire focus is to uh always keep llm in Loop for this video okay because I assume that you want to

use large language model to generate synthetic data because it might be really beneficial at the same time it might be the data that you generate might be a bit biased as well if you

don't have a better uh Logics while generating the data but this is this is what uh self distillation and self imp improvements are now distillation is a

method of creating synthetic data using llm which is very easy to understand and is said to have a larger number of parameters and higher capabilities so first we're going to look at distillation here so let me just right

quickly the definition a very high high level view of distillation so when you use distillation it it basically have a large number of parameters but the accuracy or the uh understanding is also

very high you know so the the capabilities is higher so understand distillation like one model is going to teach the other model or the one model it teaching teaching

somebody to generate something new that's what distilled means you are distilling something right out of it now self-improvement means the model learns itself okay so there's a pictures also I can show that in a bit but that's what

distillation is now when you use distillation so this synthetic data that we generate using this is closer to human created data data so distillation

is closer closer to human created data so when somebody creates data manually we also do a lot of annotation right if you

remember humans are always going to create better data we have seen that onve and St but now also looking at the machines for example Now using a 7B model for example

you know it can improve the performances so this is on the distillation now very high level on selfimprovement the other technique if you use llm it has to be

selfimprovement now in selfimprovement it create a data set with the model you want to improve so for example if you want to improve llama 7B or llama 8B which is the Lama 3 for example if you

want to uh improve that you have to improve yourself with the data set so I'm talking about the model okay now this is often used to train models so let me

just write it don't have a model larger than the model so for example not for 7B but for example a 70b model so let me just write it over here

quickly so I'm going to write create a model that create a not a model create a data set so let me just

write it here so create a data set with the model you want to improve now this is on

self-improvement now let's focus on distillation first this is a high level information about two different types of famous sensitive data generation types now if you talk about distillation it's

one of the most famous techniques right now in Industry everybody was talking about data distillation knowledge distillation and so on and so forth so let's talk about distillation

here so I'm just writing distillation now distillation have also multiple techniques guys so we're going to cover a few techniques in distillation so on very high label if

you look at there are multiple techniques for distillation as well so one is uh self instruct so that first is self instruct

and these all can be achieved using Frameworks like uh distri label by argila IO we're going to talk about that Prometheus also supports that and different open source and close Source

Frameworks as well now one is self instruct which is like a basic form of synthetic data generation the second is evolve evolve

instruct so second is Evol instruct which is a more challenging data set so it's bit challenging let me just write it over here then third is lab and I'm going to

talk about this it's really interesting it's more diverse so me just write it and this is good for chat Bots if you have a chatboard use case and you need data you probably you can

look at lab which is the most diverse data sets for building a chatboard for generic users guys your data sets needs

to be really diverse the next is uh more task oriented so I'm going to write it task is specific so for example this can be uh open math

instruct which is if you want to generate these kind of data so that is Task specific and then last one is knowledge specific so last one is knowledge is

specific QA so last one is knowledge specific QA data set so these are the five ways you can you know uh distill okay of your

data if you want to generate that now self instruct is the most easiest one to create okay if I show you some of the pictures you know of how thees self instruction works so let me just open

this probably and I can make this a bit bigger guys I don't know if you can read this so let me just make this I download this from internet this is not created

by me to be honest the credit goes to the creators now if you look at this uh it has it has your step one which is instruction generation okay your

instruction generation you write give me a quote from a famous person on this topic and then you give it to a language model it basically does a classifications and task identification

so basically classify basically classifying the prompt you know over here and then if says yes then find out if the given text is in favor of or

against abortion it looks at class level and then inputs then goes into filtering and then it has some seed task instruction task pool and then create the data for you and if it says no first

is output first then the input first The Importance of Being Honest input and output th je ver right so this is how you create a self instruction data

sets and we're going to talk about this letter if you look at distillation and from this chart I got this from somewhere on internet this is a very good picture to understand distillation and self-improvement if you look at here

distillation is somebody is supervising it also works good for supervised fine tuning or saf because those are sens fine tuning that you do because then you are you know generating a data in somebody's supervision when I say

somebody of course might not be a human directly that's what the distillation is that you're teaching somebody and self-improvements to learn from itself right that's what you know a model can

learn to improve itself using some data sets now this is what the picture is but we're going to go inside a bit detailed now for the self instruct the basic

process is to generate a data set by sending a request to the llm directly for example like in this case that we did with we did with open AI GPD 3.5 turbo

right you can see what we have we are doing here we saying write a 25 words projective review for a wireless earb highlighting its battery life now this is this is nothing but these are the

self instruct synthetic data generation that's what we are doing here with this case now if you go to this picture I don't know if I have opened now here you

can find out some seed instructions instructions but also let me just to write it something so you can also go a bit deeper into it if you want to understand this is how the this is how

the schema can look like okay the template now you have an instruction and you can give any instruction okay over here so I'm just

writing put your instruction okay let me just cast that put your instruction and this is your instruction

then you have an in put and you basically write your input something like 10 x + 5 = 10 you have a

math equation and then you have an output this is how you define a schema you know so you can do 10x equal you know something if you look at this it's 10 - 5 whatever you know 5 10 whatever

it comes okay so five and and I'm just solving it over here but X might be 5 by 10 and it might become 0.5 so this is how you

define uh uh instruction so the these are the seat task okay so these are called seat task that you see so seat task consist

of a pair of in instructions input and output that's what you see over here on the image on the left hand side on Top This is called seat task and that includes samples you can see this say

175 seat task because the model can learn from the patterns itself okay now by sending the prom to l LM with some example that that's are included in the

seed you know the subsequent task will be generated so it will generate the upcoming task that you have that's how the self instruction works so I hope you understood the self instruction part

now and all of these you can achieve it through our Dy level which is the best framework to work with now let's go a bit difficult thing which is

evolve so the next is Evol instruct now self instruct was effective in generating data sets but it difficult to use because the diff difficulty and complexity of the task are close to the

original seat task and that's why when you want to F tune llm for more difficult problems you will need to manually create complex and difficult seed task now imagine somebody sitting

and creating 500 seed having instructions input and output again that's a tedious task to do right that's where we bring up Evol instruct and you

know and self instruct we generally you know does not give you better accuracy or high quality data and that's why we adapt we adopt two different ideas in that case so the first is let me just

write it the first is separate the timing of generating prompt and answer so let

me just write it over here and the second one is add a process add a process to slightly increase the

difficulty slightly increase the difficulty of the prompt Now by repeating this process you can increase the difficulty of the promt so let me show you this image so you

understand a bit better now let me just make this a bit bigger I hope this is clear this looks good I I hope you can get this now if you look at this it has lot of things

happening the initial instruction is 1 + 1 equals what now it increase reasoning if you look at this first increase reasoning

what is the value of x if x CU + 2x + 3 = 7 very related because it has numbers included you you have a simple equation which is 1 + 1 equals what and if you go

deeper it says in what situation does 1 + 1 not equal to two so this is deepening right because if you look at how people evaluate a large language model they always ask what is 2 plus two

LM say four but there are some scenario that can also make other sense for 2 + 2 now if you look at add constraints how to prove 1 + 1 to two in the gold back conjecture now these are the different

prompts which is complex prom that has been you know created uh for the llm to generate better quality data now you have other techniques you can look at over here what is the speed of light in

a V vacuum this is one instruction blah blah blah how is the speed of a light in a vacuum measured and defined how many times faster is light that than sound so increasing the reasoning capabilities

the thinking ability of a language model if of course thinking is the wrong word it can't think but the reasoning capabilities if it can reason well and so you can look at in this image right

it can go really deeper can create complex prompt as well the difficulty of the the seed task now we let llm do increase the difficulty of this prompt

and there are several ways to increase the difficulty you can add constraints you know you can increase the depth or the depth or the breadth of breadth of the question so you can increase the

depth and that's what you see here in breath evolving you have you know here you have deepening you know increase reasoning in breadth evolving complicate input so these are the different ways

you know you can create the logic to uh increase the complexity of the prompt that's what it does now the third one

is lab so in the lab uh lab is interesting because as I said right it it gives you more diverse data sets to create uh it's the lab full form is

large scale [Music] fourth boards or something okay so let me just so this is how you basically write it so it's called

large scale if I'm not wrong alignment of boards so the lab is taken from this

particular three words or three letters by the way excuse me that's L and then you have a and then you have B so this is how lab has been so basically it

creates detailed Hier rical classifications and for example top classifications can are like knowledge tuning and skill tunings and probably you can find some pictures I don't have

a pictures for this but you can get it now for chat boards you would have heard about most of the chatbot uh in production has gone Rogue you know they started asking they started responding

uh stupid answers of course you know you can bring up some guards on the postprocessing or also in the pre-processing you can bring up moderation content moderations you can bring up different types of you know

payload splitting preventions different types of token splitting preventions these are all the guardrails that you can put in the system I have a guardrails video how to stop prompt injection how to prevent prompt

injection have a look at that video as well but for the lab it Ino task diversity so for more diversity so let me just write it over here so okay I'm

not casting but you can look at here now Ino task diversity so let just do that this is what lab does you know it so self and evolve these are these are

good you know but they can be sometimes can be biased as well and to address this issue just to cut the biasness you know what we do we we classify so let me

just write it over here we classify tasks in advance so classified task in advance that's basically called High rical

classification so you are basically classifying the hierarchy in tasks and limit the tasks example that's the second thing so limit the task

examples given each time to only those from one task classification so that's that these are the two things that you know you do and the good thing about this is that the prompt can remain the

same as in self instruction so this you can use this as on a extra layer on self instruct the only code you need to modify is the logic for for selecting example so you can so you can just use

self instruct plus lab together if you want to create a very high quality data sets now the self instruct what you can do

you can keep the prompt as same so prompt remains same so prompt remains same and what you can do is now the code you need to

modify is the logic so Logic for selecting examples in in the previous one so log for selecting examples so let

me just WR it over here selecting examples so this is how you can also combine it now the next two are a very domain specific you know if you want to

generate some specific QA like you know you have task specific and knowledge specific QA you know they there are the these are basically way to generate high

quality specific data for different task you know so if you look at open math instruct for example example let me just write quickly you know so if you

look at uh open math instruct so let me just write over here open math instruct now this these are again based on evaluation benchmarks data set so you

have evaluation data sets for benchmarking right so for example if you look at uh uses this uses the train data of the math benchmarks if you look at

this there's a math benchmark data and that's called called uh math GSM if you look at I think this is

available on mlu as well if I'm not wrong so this math benchmark math GSM 8K so it's a benchmark data it contains both problems and answer and then we evaluate llms on top of it now in

response to this we create a data set that contain problem Solutions and answers by generating Solutions using python with llm now mix was to create the problem okay so you can use mix so

if you look at open math instruct they use mix so they use mix here for this data sets to create the problem but even with mix it is

difficult to solve the problem with a perfect score so the following methods you can use so let me just write it over here have a student solve the problem so you need a

students solve the problem uh as many times as possible and use the correct answers from the students and you know and then evaluate so basically you can employ a technique to generate a solution when you already

have only the prompt and the answer so now imagine rla aif guys reinforcement learning with AI feedback we all have been talking about rlf and stuff but if

you look at these kind of synthetic data generation types rif might seems might seem possible Right it might be a good thing for future you can use llm as a

judge itself it can uh the AI can give feedback to improve systems as well and you can keep on gener so they can you can build an entire system using all of these techniques but might be a bit

costly as well now you can create knowledge specific QA you can use multiple data sets for that okay so if you want to create lot of knowledge specific data there are a lot of based

data like common common craw Wikipedia Wiki and you can use all of this so then we have Wiki Etc you can take all these based data and base data nothing but

guys keep remembering it what I'm saying is they're all seed task so you can create seed out of it for the self instruct if you remember where we can created a pair of instructions input and output so you can use this common craw

Wiki Etc uh and have a system to create a question answer related to the text it's very specialized you can use that now these are way of you know creating

distill or you know performing distillation in the selfimprovement it's bit different than what you do it so I'm not probably going to cover it now let's jump into and see you know what are the

different ways you can create these uh we can create these synthetic data so if you look at here the first thing there's a library called distri label so let me just go a bit on top and show you it

says dist label is a framework for synthetic data and AI feedback for AI Engineers that require high quality outputs full data ownership and overall

efficiency scroll down it says synthesize data for AI and add feedback on the Fly blah blah blah this is the library you have to use you know this that's a very good Library guys supports

different llms you can see you can use Gro which for free for now if you want to generate that you can use open source closed Source now what I'm going to show you is let's see how we

can uh you know create a DPO data set uh with a smaller models like 53 so I'm not going to write the code I'll just explain it the a very detailed code is available already and before this let me just

going to grittle if you want to use a apid driven solution to generate synthetic data then you can use grittle of course it g they gives you for fre they give you for free

as well for some trials and stuff you can use it up to some uh rows and column this is a very good way to generate data if you're not working with really sensitive or confidential information if

you want to try it out for your learning you can use synthetic data for everyone so you can see 15 free credits per month and then $2 per credit so for

15 free credit your 15 free CR monthly credits are enough for 100k plus highest quality synthetic records 2 million transform records 2 million plus pii detection records you can you can

understand what I'm saying now you can also use this for pii redaction as well or sanitization as well you have some personal identifiable information you can use this as

well and now the next thing is let's come okay I mean this is fine you can reconnect it this is okay now to use this Library you can see we

are using District label argila and Lama CPP so you basically install this once you install it I'm logging with my hugging face Hub API and then if you

look at this this is where you you know Define your project name you have project name is your project that you want to create because you can push it on hugging face so that's why they asking about this uh parameters you have

project name you have input data set repo ID so give it uh input data set repo now if you look at this what it says what supervised find data set on The Hub should we start from so basically the seat guys remember the

seat I was talking about the seat task the base questions prompts whatever you call it now that data set over here and then what are the instruction and response column named in the data set so

whatever you select here because you are generating a DPO remember it so if you select here Ultra interact SF we have to look at inside the data set so let go inside this let me explain that to you

as well open it over here oh excuse me and if you open this oh the viewer is not available say the data set viewer is taking too long

to face the data let me just try to refresh if this works but I hope I think it will not work okay if you go to viewer uh something wrong with the viewer so I cannot show you the column

from here as of now or let's see if they have given something below yeah so if you go back okay on here it says response instruction now look at

these two things so response and instruction you have both the things if you look at here response and instruction if you select this data set or any other data sets you have to basically name that and then in

evaluation rubric which is factual validation because evals are important and they also have an evaluation model if you come down let me show that now they select a model so if you read what they're saying Define the quantized

model that you want to use for generation and evaluation they are using quantized because if you look at the foundational or base model it might take a lot of time and gpus compute to

basically generate all the generations and then evaluate it might take up to a day as well uh but having said that right if you look at here they they take a ggf

model for evaluation which is from Prometheus and uh then the text generation they're using 53 mini 4K which is a very good model you can see 53 mini 4K instruct and then this is finally the

some parameters that goes here inference params called temperature max tokens and so on and so forth and then you define human feedback test in arzilla so

basically you can look at instruction generation and then you have some questions how would you rate the quality of the ansers you can also use llm as a judge here you know llm judge whatever there are different Frameworks lities available

how would you rate the quality of the answer it says 1 2 3 4 5 the values from 1 to 5 to rate the answer and then gives a feedback so feedback on the quality and it has a guidelines please read the

question carefully and try to answer it as accurately as possible and you can also push it on your P using push to arzilla as well you

can see name project name workspace admin blah blah blah and then we Define the distri level pipeline so this is a pip L that we have defined you can see here in the

step instruction Generations feedback results and the model name so basically everything goes here it says DPO to RG it is a function you are using everything over here you are generating

instruction Generations prompt choos and rejected because we are generating a DP data sets performance alignment direct performance alignment and it accept these formats of data sets and you can

look at choosing dejected and then it have some suggestion s and then append the records you create a data set add the records in record and ELD the inputs

and then promas Etc will again be used you can see text generation with Lama CPP using 53 over here and this this looks nice and then

for evaluation you can see this model for evaluation for each answer that it generates it has to evaluate as well and it has some parameters and for on what thing it should evaluate it should

evaluate on factual validation we can have a look at there and then you use some push to arilla method to push it to arzilla you can also do it on hugging face and then just call the

functions and then run the disty level pipeline you can see run it and it takes a bit of time so I'm not going into too deep here okay inside this but you can use this notebook if you want to

generate a DPO data set using 53 mini you can use that to generate you can use grittle if you want to do it they have a very good documentation if you go to

examples lot of documentation to read from guys and if you look at here you know different use cases synthesize tabular data so if you want to create a

tabular data go to hdk you can use Python hdk you can see that's called grittle client and you can generate over here it generate synthesize you can also redact

pii you can remove pii by the way I was talking about earlier and you can see this is how you can use and you can also look they also using Faker so remember we talked about Faker we talked about

grittle we talked about we also seen open AI here we are generating using open AI as well uh self instruct generation that you see here might not be that

relevant but yeah this is how you generate guys so you have grittle feel free to use go to arzilla have a look and generate and see how you can do it once this is what I wanted to do in this

video guys I wanted to create a bit of awareness within you when it comes to synthetic data generation different ways self instruct Evol instruct lab question answering really

task a specific QA and you can use different tools and Frameworks to do that I'm not writing any code in this video because those notebooks are already available so it doesn't make sense copy paste and just write it from there but if you have any question

thoughts or feedbacks do let me know in the comment box and I will give all of these quote notebooks and the code base that I we saw for open AI in the GitHub repository I'll give the link in

description if you like the content please hit the like icon and if you haven't subscribed the channel yet please do subscribe the channel guys that motivates me to create more such videos in your future thank you so much for watching see you in the next one

Loading...

Loading video analysis...