Fraud Detection Using Machine Learning – Full Python Data Science Project (94% Accuracy)
By Onur Baltaci
Summary
Topics Covered
- Transfer and Cash-Out Are the Only Fraud-Prone Transaction Types
- Higher Transaction Amounts Signal Higher Fraud Probability
- Class Imbalance: The Hidden Challenge in Fraud Detection
Full Transcript
Welcome back to a new data science project. In this video, we are going to
project. In this video, we are going to train a machine learning model for fraud detection and we are going to build a streamllet web app for deploying our machine learning model. So we are going to be doing data analysis, training a
machine learning model, creating a streamllet web app and much more. In
this video, I'm going to be sharing the data set that I used in this video. You
can get the data set from the link in the description. You can see the app
the description. You can see the app that we are going to be create in this video from the screen right now. Let's
start coding. Okay, we are in the data card of the data set that we're going to be using for the fraud detection and I'm going to add this link to the description of this video. You can just visit this link and use the download
button at the right upper side of the page of the keggo for downloading the data set for coding with me in this video. So what I'm going to do right now
video. So what I'm going to do right now is I'm going to take a quick look at this kegle page and then we are going to explore it more on a Jupyter notebook
using Python. Okay, we have a fraud
using Python. Okay, we have a fraud detection data set. It says detect fraud on the go with a comprehensive data set and it says we have more than six
million rows. Great. And it says some of
million rows. Great. And it says some of these records were flagged false by existing algorithms cashins. It says process of increasing
cashins. It says process of increasing the balance of c account by paying in cash to a merchant. Cash out is the opposite process of cash in. It means to
withdraw cash from merchant which decreases the balance of the account.
Debit is similar process then cash out and involves sending money from the mobile money service and payment is a
process of paying for goods or services to merchants which decreases the balance of the account and increases the balance of the receiver. And on the transfer
side we have this is the process of sending money to another user of the service through the mobile money platform. Great. So here at the table we
platform. Great. So here at the table we have the columns. I'm just going to take a look at them quickly. So we have step
type amount name original old balance new balance and name of the target account old balance of target account and new balance of the destination
amount. And we have the is fra and is
amount. And we have the is fra and is flag. So we have this fra column in here
flag. So we have this fra column in here and we are going to try to predict that.
Great. So again you can use this download button for having the data set and you can just put that CSV file in the directory that you're going to create your Jupyter notebook and Python
script. And I'm going to be recording on
script. And I'm going to be recording on the code editor in a second. Right now
I'm in the VS code but you can use any code editor that you want. Firstly, I'm
going to create a file like analysis and model. We are going to start with a
model. We are going to start with a Jupyter notebook. And let's select our
Jupyter notebook. And let's select our kernel as Python 3114. In the next step, I'm going to create a code sale. And
here's the data file that you're going to download from Keo AI machine learning data set. CSV and I'm going to do my
data set. CSV and I'm going to do my imports. Firstly like I will say import
imports. Firstly like I will say import pandas as pd import numpy as np for the data visualization we will import
metplot lip.pipplot pipelot as plt and
metplot lip.pipplot pipelot as plt and also we will say import seaborn as sns also at the first place I want to set
the seaborns style as white grid and also I want to import warnings and I want to filter the
warnings like warnings filter warnings ignore like this so after this we can
start to our analysis. Firstly, I'm
going to say data frame and it's going to be echo to depend read CSV and I'm going to take the data set name in here.
Next up, I'm going to paste it in here.
So after this, we are going to have our data frame loaded and when it finishes, I'm going to just check by dataf frame.head. And here we have our data
frame.head. And here we have our data frame. So once again we have the steps
frame. So once again we have the steps type amount name original old balance new balance name destination old balance destination new balance destination is
fraud and is flagged from columns in our Jupyter notebook right now great let's start analyzing it firstly I want to get the general information about data types
and also the memory usage side so we have five floats three integers and three object types And the memory usage is 134 in here.
Okay. So I'm just going to take the column names here. And I will say data frame is fraud dot value counts for seeing the
count of the fraud and nonfront count.
So we have the 8,200 frauds and 6,354,000 non frauds. Also, I want to see the same statistics for the is
flagged fraud. Then I'm just going to make this
fraud. Then I'm just going to make this run. And here we have a really low
run. And here we have a really low number on the flag side. Okay, firstly
let's see if we have any NA values in our data set. So I'm going to call data frame is null. And this returns cell-wise booleans. But if we go with
cell-wise booleans. But if we go with sum, it's going to return us the column wise NA value count. And if we just go with sum again, we can see the general
number on the data set. Okay, we don't have any NA values. And let me quickly check the shape of the data set. Here we
have 6 million 362,000 rows with 11 columns. Okay, in this place I want to see the percentage of the frauds to the total data. So I will
say data frame is fraud value counts and let me see the output of this one more time. Okay, we
are going to take the first index of this and this is our fraud count and we are going to divide it by data frame shape zero. So this is going to give us
shape zero. So this is going to give us this number. So our percentage here is
this number. So our percentage here is we can multiply it with 100.
We can add parentheses like this. Maybe
necessary, maybe not. I prefer using that. So fraud percentage we have in our
that. So fraud percentage we have in our data is 0.12%. And if you round it maybe 0.13
0.12%. And if you round it maybe 0.13 like maybe it's going to be clearer. We
can do it like let's say one and we are going to see actually my bad two decimals. Okay. 0.13. Great. So if we
decimals. Okay. 0.13. Great. So if we direct the model like this, we are going to have class imbalance. So we are going to handle that situation later on this tutorial. Firstly, let's analyze our
tutorial. Firstly, let's analyze our data by visualizing the important things about fraud detection. I'm going to start with visualizing the transaction
types. So data frame
types. So data frame type value counts and then I will say plot kind is going to be bar. Next up,
title. We are going to give a title like transaction types and we can specify a color
like I love sky blue. I will pass that.
Next up, I'm going to say X label is going to be transaction type and Y label is going to be
count. And I will pass pl. After that
count. And I will pass pl. After that
and here we have our plot. So cash out is the leader and at the left side you can see it says scientific access
payment second and the last one is debit. Okay. Now let's find fraud rates
debit. Okay. Now let's find fraud rates by types. So I will say
by types. So I will say fraud by type and it's going to be data frame group by. I will pass type and I'm
going to go with the if on the aggregation side I will use mean. Next
up I will say sort values ascending false. Next up I will
say by type.plot plot kind is going to be bar and title is going to be front
rate by type and next I will say like let's say color let's use another thing like
salmon and next we can directly say y label will be fraud
rate and we can pass plow So here it is.
So our fraud rate on the general is 0.13% and here we can see the distribution transfer is leading and cash out is the second place on the
fraud rates typewise and we can see nearly zero maybe they are zero cash in debit and payment. So let's quickly
check if they are zero or not. Okay they
are zero. Okay it makes sense. Now let's
go with the amount statistics. So I will say data frame amount and I will pass describe like this. Next up we are going
to see we have scientific numbers. We
can go like maybe s type and integer like this. Here we can see the numbers. The minimum amount is zero.
numbers. The minimum amount is zero.
Makes sense. We have no negatives on that side. and the maximum
that side. and the maximum is 92 million. And we can see the quantiles. So we have an outlier on this
quantiles. So we have an outlier on this side. In my opinion, we have the mean of
side. In my opinion, we have the mean of 179,000 and our standard deviation is large since the difference between this maximum values and quantiles are large.
Okay. Now let's get the histogram of this. I will say seabouris histogram
this. I will say seabouris histogram plot numpy I will use log transform for
having a smooter histogram I will say data frame amount I'm going to pass bins
as 100 and I will pass kda as true and I will pass the color of
green next up I will say title transaction amount distribution and I will pass log
scaled and let's say like maybe log scale is better plt.x
label amount plus one and I will say plt.show show. So here we are going to
plt.show show. So here we are going to have our histogram in a second with this code. We can see our histogram in here
code. We can see our histogram in here and we can see the skill. Great. Next
up, let's look at the relationship of the fraud and amount. So I will say seabour box plot and I will say data
data frame. I'm going to filter the
data frame. I'm going to filter the amount which is going to be less than
50,000 and I will say X is fraud and next up I'm going to say Y will be
amount and we can give it a title like amount versus is fraud and we will say filter it under
50k and I will pass plt.sh show and we are going to have our plot ready. Here
it is. So from here we can see that for the higher amount which fitted under 50k we have more fraud rates and the
lower mean on the non fraud side closer to 10,000. Okay let's see the balance
to 10,000. Okay let's see the balance chains and the anomalies. So firstly I'm just going to create two different columns as balance difference of the
origin of account and destination account and I'm going to try to see if we have any negative balances on that side. So firstly I will say data frame
side. So firstly I will say data frame balance difference original and data frame old
balance original minus data frame new balance original.
Next up I will say actually it will be I let me quickly check the column names here. Okay, we
have this. Next, we are going to have this the same for the destination. Destination we will say
destination. Destination we will say data frame new balance destination minus
data frame I will say old balance destination like this. Okay. So now we can check if we have any negative values
on that side. I will say balance diff original and we can go with closer actually let's not take the values
closer to zero I will directly pass that and we are going to have a panda series like this and we can just take this into parenthesis and we can pass sum like
this so we are going to have oh we have a lot of balance differences with a value less than zero which means negative on the original side. That's
interesting. Let's see this for the destination. So, I'm just going to copy
destination. So, I'm just going to copy this and I'm just going to change this like here. I will just pass destination
like here. I will just pass destination and we can see that we have a lower value but this is a big value again.
Okay. So, in the data I will just call data frame.head we have a column named
data frame.head we have a column named step and it's increasing by day. So I'm
just going to create a plot. Then I'm
going to drop the step. So I'm just going to prepare the dropping code. I
will pass columns and I will take the step. I will set in place accurate true.
step. I will set in place accurate true.
So it will modify the original data frame. And after visualizing number of
frame. And after visualizing number of routes with step, I'm going to drop the column. So let's do that. Frouds per
column. So let's do that. Frouds per
step is going to be data frame. Data
frame is fraud occurs to one and I will take the values of step. I will use value counts actually lowerase letters.
I will take the value counts and I will say sort index like this. Next up I will use plt.plot plot and I will
use plt.plot plot and I will say FRS per step like this and I will
use index. Next up, I'm going to say FS
use index. Next up, I'm going to say FS per step dot values and I will say label is going to
be sprouts per step. And we can customize our plot like X label is going to be step which means
time in our data frame. Y label will be number of frouds and title is going to be frouds
over time. I'm going to set the grid as
over time. I'm going to set the grid as true and I will use plt show like this. So here it is
actually it seems non time dependent.
Okay, we are not going to use the time on this side on the modeling. So I'm
just going to drop that. And when I call data frame head, we are not going to see the step from now on. Great. So let's go customer wise. I want to find the
customer wise. I want to find the customers which makes the highest amount of transactions like the top senders and top receivers. And firstly I will start
top receivers. And firstly I will start with top senders and it's going to be name. We are going to go with this
name. We are going to go with this column. I will use value counts and
column. I will use value counts and let's take the top 10. So in the top sender side and we have this the three
generally let's see if we have equality on the top receivers side too top receivers it's going to be data frame
name destination value counts head of 10 and after that I'm going to say top receivers. So we don't have a equity on
receivers. So we don't have a equity on this side and customers wise we can see that who takes the highest amount of
transactions. So let's see the fraud
transactions. So let's see the fraud making customers. I will say fraud users
making customers. I will say fraud users and I will face dataf frame. Data frame
is fraud. Of course we are going to filter this as true which means one and I will pass name original value counts
and I will take the top 10 and I will take the fraud users like this. Okay, we have equality on this
this. Okay, we have equality on this side. Great. Let's analyze the transfer
side. Great. Let's analyze the transfer and cash out because in our plot, let me quickly go to that. We saw that these
two transaction types are the most open ones for the fraud. So let's just come here and I will say for
types and it's going to be dataf frame data frame type is in and I'm going to pass the
list as transfer and cache out like this. So after this we are going to have
this. So after this we are going to have f types data frame like this which has on the type side only two values like
let me quickly show you type value counts we are going to have cash out and transfer only and we can create a count
plot on this like seaborn count plot data is going to be fraud types x is going to be
type hu is going to be is fraud. Next up, I'm going to give it a
fraud. Next up, I'm going to give it a title fraud distribution in transfer and cash
out like this and I will use plt.show
after that. So, we are going to see this distribution in here. Okay, since we really have a low fraud values, we can't
even see the orange, but we can see cash out dominates this chart. Now we can go with a correlation matrix. So we are going to filter the numeric columns like
data frame amount and old balance
original new balance original. Next we
have old balance destination. We have new
balance destination and we have is fraud. And I'm going to say like I got
fraud. And I'm going to say like I got one in here. I'm going to delete that. I
will say coalation like this. And I'm going to make this run. Now on the correlation side we are going to have a matrix like
this and we can visualize it using a seaborn heat map. I will say seabor heat map I will pass the correlation matrix we have and out will be true. I'm going
to set the color map as cool warm. Actually, I will pass like cap cool
warm. And next up, I'm going to
warm. And next up, I'm going to say two floating like I'm going to format it like this. I will say
plt.title and it's going to be collation
plt.title and it's going to be collation matrix and I will use plt.show. So let's
see our collalation matrix. Here we can see the high correlations in our data.
The ones are the intersection of the same column and we have 0.98 which means high correlation between new balance destination and old balance destination
which is kind of normal and we have like new balance destination and amount correlation of 0.46 not that strong but not that weak.
So just for recap correlation can take values between minus1 and one which minus one means strong negative relationship where one means strong
positive relationship and zero means they don't really have relationship now that's strong one okay okay now I want
to filter the customers who has balance and after the transfer they go to zero so I will say zero after transfer
And it's going to be data frame. We are
going to add three filters on the side.
Firstly, data frame old balance original is going to be greater than zero and
data frame new balance original occurs to zero.
And we will say data frame type is going to be is in actually like this
transfer and cash out. So we can just say zero after
out. So we can just say zero after transfer we can take the length of this for seeing the customer amount like this. Here we have more than a million.
this. Here we have more than a million.
These records like this list is just a suspicious record and we can check them if we want to find there's a situation.
Great. So now we are going to get the feature engineering but we have a important thing that is fraud value count we have a class imbalance
situation. So for that let's do feature
situation. So for that let's do feature selection and preparation. So we are going to make some imports from scikitlearn. We will say
scikitlearn. We will say scikitlearn model selection import train test split scikitlearn preprocessing import
standard scaler scikitlearn linear model import logistic regression for the
modeling side and from scikit learnmetric import classification report with confusion matrix from
scikitlearn.pipeline import
scikitlearn.pipeline import pipeline from scikitlearn.compose import column
scikitlearn.compose import column transformer and from scikitlearn.rocrocessing import one hot
scikitlearn.rocrocessing import one hot encoder. Okay, so these are going to be
encoder. Okay, so these are going to be our imports and in here we are going to use train test split for splitting our data into training and testing sets.
Standard scaler for scaling our data.
Logistic regression for the modeling side. Scikit learn matrix classification
side. Scikit learn matrix classification report and co vision matrix. These are
for model evalation. Pipeline for
training the model and doing the transformation operations together.
Column transformer one hot encoder for the data transformation. So I'm going to call dataf frame.head again. And we have type amount name original old balance
original new balance original name destination old balance destination near balance destination is f is flagged fraud bounce difference original bounce
different destination so what we are going to do is firstly we are going to drop some columns dataf frame model we will say data frame drop and I'm going
to drop name original I'm going to drop the ones that I'm not going to use on the modeling name destination and is
flagged fraud and I'm going to set the access as one. So I'm going to make this run and let's see the data frame model
head. Okay, it seems great. Next up, we
head. Okay, it seems great. Next up, we are going to set the categorical and numerical types. Like categorical is
numerical types. Like categorical is going to be type and only this one and numeric is going to be
amount old balance original new balance original
and old balance destination and new balance destination.
Okay, seems cool. Now I'm going to set the target. Y is going to be dataf frame
the target. Y is going to be dataf frame model target is going to be is fraud and x is
going to be dataf frame model. drop we
are going to drop is fraud like this and axis is going to be one. Okay. So it's
going to be capital x. It's the general usage. I will use it like this too. Now
usage. I will use it like this too. Now
we are going to make train test split. X
train X test Y train Y test. We will use train test split. We will pass X Y test size of
0.3 and strify Y. So in here we set the 30% of
strify Y. So in here we set the 30% of our data in the testing set and 70% in the training set like X train and Y
train holds the 7% of the data and X test and Y test holds the 30% of the data. Next up we are going to do
data. Next up we are going to do preprocessing. So we will say
preprocessing. So we will say preprocessor and it's going to be column transformer. We will pass
transformer. We will pass transformers and I'm going to give a list like for the numerical ones it will
use standard scalar I will pass numeric too also I need to use parenthesis for the correct initialization and for the
categorical we are going to use one hot encoder and I will say drop first and also what I'm going to say is
categorical like we defined. Next up I will say after this
remainder drop and after that we are going to create a model pipeline. So let
me fix the typo firstly transformers.
Okay, now it's cool. Now we are going to create the model pipeline. So I will say pipeline and I will use the pipeline
class for them learn and preparation preprocessor and classifier is going to be logistic
regression. This part is really
regression. This part is really important. We are going to set the class
important. We are going to set the class weight as balanced because we want to handle the class imbalance situation
like 99% of our data is not fraud. So
our model is going to predict non fraud for every input we pass if we don't set this class way to balanced. We are
handling the class imbalance situation in here. And after this I'm going to say
in here. And after this I'm going to say max iterations is going to be th00and.
So next up actually points need to be at the end like this. Okay, next up what I'm going to do is I'm going to train my model with the preprocessing. So we will
use pipeline.fit and I will pass X train
pipeline.fit and I will pass X train with Y train. So when I do that it's going to train the model and I'm going to be re-recording when this finishes.
Okay, our model is ready. Here we can see the pipeline steps. standard scaler
one hot encoder and logistic regression like this. So we can make predictions
like this. So we can make predictions using that predict method right now with our machine learning model. So we can say predict and we can pass x test and
we are going to have a prediction array like this. So I'm going to save them
like this. So I'm going to save them like y prediction and I'm going to compare with the y test right now. I
will say classification report. I will
pass Y test and Y predictions like this. So I will also pass print. It's
this. So I will also pass print. It's
not looking pretty. Now it's going to be better like this. Great. And we can also get our confusion matrix on our model. Y
test Y prediction. Here it is true positive, true negative, false positive, false negative. Okay. So in here we can
false negative. Okay. So in here we can say that our model is good on catching the fraud and precision is not that
good. In here we can just do like try
good. In here we can just do like try another methods for class imbalance like smooth or under sampling type of things or we can try different models but I'm
not going to do that. I'm going to use this model. It's okay for me and it can
this model. It's okay for me and it can be used in my opinion like pipeline.score score X
pipeline.score score X test with Y test and we have the N with 4% accuracy
and I think it's pretty cool. I'm going
to continue with this model. Again, we
can improve this precision like it's going to be better 100% but I'm not going to make this video too long. But
you can go with the methods that I shared for fixing this situation. But
for this tutorial, I think it's going to be more than enough of this percent of accuracy. Great. Now it's time for
accuracy. Great. Now it's time for exporting this pipeline. So we will say import job at the first
place and then we are going to say joblib.dump. We are going to export the
joblib.dump. We are going to export the pipeline and we can say it like fraud detection pipeline.pickle.
detection pipeline.pickle.
And here in my file directory, I don't have a pickle file like this. And when I make this run, we are going to have fraud detection pipeline.pickle in here. And we are
pipeline.pickle in here. And we are going to be using this for getting predictions from our model in our streamllet app. So let's summarize what
streamllet app. So let's summarize what we did in here. Next, we can move to the streamllet app creation. Firstly, we
imported pandas numpy metplot lip and cboard. Then we load our data and we did
cboard. Then we load our data and we did some analysis and we did some data visualization and analyzed the data set
like this with different chart types.
And after that we have a correlation matrix in here. Next up we started to do feature
here. Next up we started to do feature engineering. At the first place we made
engineering. At the first place we made our imports column transformer and one hot encoder on the transformation side pipeline for creating a data pipeline with prep-processing and modeling
classification report and confusion metrics on the model evolution side logistic regression on modeling side standard scaler for scaling and train test split for data splitting. We
selected the features that we are going to be using in the model. Then we
defined the columns with their data types like this and we saved X and Y to target and we use training test split
with the 0.3 which means that 30% is going to be in the testing set. We set
our column transformer for numerical and categorical. Next we set our pipeline
categorical. Next we set our pipeline and trained our model. We get
predictions and we evolated our model.
At the end, we exported our pipeline and we are ready to create our web app. So
now I'm going to create a file like fraud detection.py and we are going to create
detection.py and we are going to create our web app for using our model in here at the first place. So at the first place we will say
import streamllet as st import import pandas as pd and import job lip and we are going to load our model like model
is going to be job lip.load we will say let me copy the exact name in here in here I'm going to copy this and
paste it in here. So now after we load it like this, we can just say model.predict and it's going to be
model.predict and it's going to be working on new predictions. So let's set a title for our streamllet app fraud
detection prediction app. Next up I will say streamlet markdown and please enter
the transaction details and use the predict button.
Next up, I will say simmit divider for a better look. And I will say transaction
better look. And I will say transaction type is going to be our first input and it's going to be streamllet select
box. We will say transaction
box. We will say transaction type and we are going to pass transaction types like payment,
transfer cash out. Actually they need to be uppercase
out. Actually they need to be uppercase like payment transfer cash out and also the
posit inside the list like this. Next up we are going to set the amount and it's going to be stream number input. We will pass it
like amount and we are going to say let's say minimum value is going to be 0.0. So we will take float as input and
0.0. So we will take float as input and let's set the default value as 1,000. Next up I'm going to take all the
1,000. Next up I'm going to take all the balance original like this streamlet number
input old balance and I will say sender.
Next up I'm going to say minimum value is going to be 0.0 and value is going to be 10,000 like this.
Okay, next up I'm going to say new balance original and it's going to be streamlet number input again new
balance sender and we will say minimum value let's say 0.0 again and value let's set this to 9,000.
Next up, what I'm going to do is we are going to do the same for the old balance destination and new balance destination.
So, old balance destination is going to be streamlet number input. We will say old balance and I
input. We will say old balance and I will pass receiver. We will say minimum value will be 0.0 and value is going to be let's say
0.0.
Also we are going to make a last one.
New balance destination is going to be stream number input. New balance receiver we will say
input. New balance receiver we will say minimum value is going to be 0.0 and let's set the value like 0.0. Okay. So
now we are going to define the prediction button. So if stream button
prediction button. So if stream button which means that if this button gets clicked I will say like
predict and we will say like input data is going to be pandas data frame.
We will pass a dictionary inside a list like in here. And we are going to say type is
here. And we are going to say type is going to be transaction type. Amount is
going to be amount and old [Music] balance original is going to be old
balance original. And we are going to do
balance original. And we are going to do the same for the new balance original is going to be new balance original. We
will say old balance destination is going to be old balance destination and new balance destination
is going to be new balance destination like this. So here is our data frame is
like this. So here is our data frame is ready and now we will say prediction is going to be model.predict input data and we are
model.predict input data and we are going to take like the first index. This
is going to return zero or one. So we
will say streamllet subheader and we are going to say fring
prediction and it's going to be integer prediction like this and we can say like in
here zero means actually let's not go for Next up we will say this is going to be better. If
better. If prediction occurs to one stream
error this transaction can be fraud and else we will say stream success. This
success. This transaction looks like it is not a fraud. And here it is. Great.
fraud. And here it is. Great.
So we are going to save this streamlet file and make this app run. Then after
showing you how this app looks and how it works, I'm going to summarize the code in here. So let's go to the streamlet. I'm going to open the
streamlet. I'm going to open the terminal and from here we are going to say streamlet run fraud detection.py the file name that you just
detection.py the file name that you just set it when you are creating your Python file. And when you press enter on that,
file. And when you press enter on that, oh we got a typo. Let me quickly fix that. In which one? Old balance
that. In which one? Old balance
original. Okay. Mean values. Now it's
going to be saved. And I'm just going to open new one. Streamlet run from pro detection.py. Now it's working smoothly.
detection.py. Now it's working smoothly.
Okay. In here a streamlet page is just opened in this address local URL. You
can just use control or command plus click or you can just copy this address and paste it on your browser. You are
hosting this app on your browser right now. So just go to this address and I'm
now. So just go to this address and I'm going to be re-recording on my browser right now. And here I am at the stream
right now. And here I am at the stream page and I'm just going to set this to dark mode quickly like this. And now we can test our app. So we have fraud
detection prediction app and here transaction type payment and amount.
Here we can see the values. I'm just
going to try the predict with the default values. Here we see that it says
default values. Here we see that it says this transaction looks like it is not fraud. So let's make the new balance as
fraud. So let's make the new balance as zero and increase the old balance like this. And let's see the result. It's not
this. And let's see the result. It's not
a fraud again. Okay, our model is doing a great job on predictions. So let's say that we are going to do cash out and amount is going to be th00and. Let's
skip this and I'm just going to make this a huge number and I'm just going to save this as zero. So when I say predict it says prediction is one a fraud
detection and it says this transaction can be fraud. Okay, it's working nicely.
Let's also see this on the transfer side. I will say predict it says this
side. I will say predict it says this transaction can be fraud. Amazing.
Great. So let's make another example payment. I will say 5,000 in here like
payment. I will say 5,000 in here like this. And I will say
this. And I will say 10,000. And I will say 5,000 on this.
10,000. And I will say 5,000 on this.
And let's see our prediction. It says
this transaction looks like it's not a fraud. So our model and our web app is
fraud. So our model and our web app is working really really good. I'm going to be recording back on my code editor right now. So here on the streamlet side
right now. So here on the streamlet side we started with importing streamllet pandas and job. Then we loaded our pipeline in here as model. Then we give
our app a title and we wrote a markdown as please enter the transaction details.
Actually there was a typo. I'm just
going to delete that. Doesn't matter. We
take the inputs from the user in here.
One select box for the categorical. The
others are numbering and we define stream button predict and in here we just set the input data like this in a data. Then we pass this into
data. Then we pass this into model.predict and get the prediction
model.predict and get the prediction with this. Next up we write the
with this. Next up we write the prediction and we display this is a fraud or not. That was it for the coding part. Let's get to the outro. Thanks for
part. Let's get to the outro. Thanks for
watching this tutorial. I have a playlist named data science and machine learning projects where I have more than 40 videos just like this one. You can
reach that playlist from the cards of this video or from the links in the description. Also, I'm sharing a new
description. Also, I'm sharing a new data science video every week on my channel. You can subscribe for more
channel. You can subscribe for more videos like this. Have a great day.
Loading video analysis...