LongCut logo

Fraud Detection Using Machine Learning – Full Python Data Science Project (94% Accuracy)

By Onur Baltaci

Summary

Topics Covered

  • Transfer and Cash-Out Are the Only Fraud-Prone Transaction Types
  • Higher Transaction Amounts Signal Higher Fraud Probability
  • Class Imbalance: The Hidden Challenge in Fraud Detection

Full Transcript

Welcome back to a new data science project. In this video, we are going to

project. In this video, we are going to train a machine learning model for fraud detection and we are going to build a streamllet web app for deploying our machine learning model. So we are going to be doing data analysis, training a

machine learning model, creating a streamllet web app and much more. In

this video, I'm going to be sharing the data set that I used in this video. You

can get the data set from the link in the description. You can see the app

the description. You can see the app that we are going to be create in this video from the screen right now. Let's

start coding. Okay, we are in the data card of the data set that we're going to be using for the fraud detection and I'm going to add this link to the description of this video. You can just visit this link and use the download

button at the right upper side of the page of the keggo for downloading the data set for coding with me in this video. So what I'm going to do right now

video. So what I'm going to do right now is I'm going to take a quick look at this kegle page and then we are going to explore it more on a Jupyter notebook

using Python. Okay, we have a fraud

using Python. Okay, we have a fraud detection data set. It says detect fraud on the go with a comprehensive data set and it says we have more than six

million rows. Great. And it says some of

million rows. Great. And it says some of these records were flagged false by existing algorithms cashins. It says process of increasing

cashins. It says process of increasing the balance of c account by paying in cash to a merchant. Cash out is the opposite process of cash in. It means to

withdraw cash from merchant which decreases the balance of the account.

Debit is similar process then cash out and involves sending money from the mobile money service and payment is a

process of paying for goods or services to merchants which decreases the balance of the account and increases the balance of the receiver. And on the transfer

side we have this is the process of sending money to another user of the service through the mobile money platform. Great. So here at the table we

platform. Great. So here at the table we have the columns. I'm just going to take a look at them quickly. So we have step

type amount name original old balance new balance and name of the target account old balance of target account and new balance of the destination

amount. And we have the is fra and is

amount. And we have the is fra and is flag. So we have this fra column in here

flag. So we have this fra column in here and we are going to try to predict that.

Great. So again you can use this download button for having the data set and you can just put that CSV file in the directory that you're going to create your Jupyter notebook and Python

script. And I'm going to be recording on

script. And I'm going to be recording on the code editor in a second. Right now

I'm in the VS code but you can use any code editor that you want. Firstly, I'm

going to create a file like analysis and model. We are going to start with a

model. We are going to start with a Jupyter notebook. And let's select our

Jupyter notebook. And let's select our kernel as Python 3114. In the next step, I'm going to create a code sale. And

here's the data file that you're going to download from Keo AI machine learning data set. CSV and I'm going to do my

data set. CSV and I'm going to do my imports. Firstly like I will say import

imports. Firstly like I will say import pandas as pd import numpy as np for the data visualization we will import

metplot lip.pipplot pipelot as plt and

metplot lip.pipplot pipelot as plt and also we will say import seaborn as sns also at the first place I want to set

the seaborns style as white grid and also I want to import warnings and I want to filter the

warnings like warnings filter warnings ignore like this so after this we can

start to our analysis. Firstly, I'm

going to say data frame and it's going to be echo to depend read CSV and I'm going to take the data set name in here.

Next up, I'm going to paste it in here.

So after this, we are going to have our data frame loaded and when it finishes, I'm going to just check by dataf frame.head. And here we have our data

frame.head. And here we have our data frame. So once again we have the steps

frame. So once again we have the steps type amount name original old balance new balance name destination old balance destination new balance destination is

fraud and is flagged from columns in our Jupyter notebook right now great let's start analyzing it firstly I want to get the general information about data types

and also the memory usage side so we have five floats three integers and three object types And the memory usage is 134 in here.

Okay. So I'm just going to take the column names here. And I will say data frame is fraud dot value counts for seeing the

count of the fraud and nonfront count.

So we have the 8,200 frauds and 6,354,000 non frauds. Also, I want to see the same statistics for the is

flagged fraud. Then I'm just going to make this

fraud. Then I'm just going to make this run. And here we have a really low

run. And here we have a really low number on the flag side. Okay, firstly

let's see if we have any NA values in our data set. So I'm going to call data frame is null. And this returns cell-wise booleans. But if we go with

cell-wise booleans. But if we go with sum, it's going to return us the column wise NA value count. And if we just go with sum again, we can see the general

number on the data set. Okay, we don't have any NA values. And let me quickly check the shape of the data set. Here we

have 6 million 362,000 rows with 11 columns. Okay, in this place I want to see the percentage of the frauds to the total data. So I will

say data frame is fraud value counts and let me see the output of this one more time. Okay, we

are going to take the first index of this and this is our fraud count and we are going to divide it by data frame shape zero. So this is going to give us

shape zero. So this is going to give us this number. So our percentage here is

this number. So our percentage here is we can multiply it with 100.

We can add parentheses like this. Maybe

necessary, maybe not. I prefer using that. So fraud percentage we have in our

that. So fraud percentage we have in our data is 0.12%. And if you round it maybe 0.13

0.12%. And if you round it maybe 0.13 like maybe it's going to be clearer. We

can do it like let's say one and we are going to see actually my bad two decimals. Okay. 0.13. Great. So if we

decimals. Okay. 0.13. Great. So if we direct the model like this, we are going to have class imbalance. So we are going to handle that situation later on this tutorial. Firstly, let's analyze our

tutorial. Firstly, let's analyze our data by visualizing the important things about fraud detection. I'm going to start with visualizing the transaction

types. So data frame

types. So data frame type value counts and then I will say plot kind is going to be bar. Next up,

title. We are going to give a title like transaction types and we can specify a color

like I love sky blue. I will pass that.

Next up, I'm going to say X label is going to be transaction type and Y label is going to be

count. And I will pass pl. After that

count. And I will pass pl. After that

and here we have our plot. So cash out is the leader and at the left side you can see it says scientific access

payment second and the last one is debit. Okay. Now let's find fraud rates

debit. Okay. Now let's find fraud rates by types. So I will say

by types. So I will say fraud by type and it's going to be data frame group by. I will pass type and I'm

going to go with the if on the aggregation side I will use mean. Next

up I will say sort values ascending false. Next up I will

say by type.plot plot kind is going to be bar and title is going to be front

rate by type and next I will say like let's say color let's use another thing like

salmon and next we can directly say y label will be fraud

rate and we can pass plow So here it is.

So our fraud rate on the general is 0.13% and here we can see the distribution transfer is leading and cash out is the second place on the

fraud rates typewise and we can see nearly zero maybe they are zero cash in debit and payment. So let's quickly

check if they are zero or not. Okay they

are zero. Okay it makes sense. Now let's

go with the amount statistics. So I will say data frame amount and I will pass describe like this. Next up we are going

to see we have scientific numbers. We

can go like maybe s type and integer like this. Here we can see the numbers. The minimum amount is zero.

numbers. The minimum amount is zero.

Makes sense. We have no negatives on that side. and the maximum

that side. and the maximum is 92 million. And we can see the quantiles. So we have an outlier on this

quantiles. So we have an outlier on this side. In my opinion, we have the mean of

side. In my opinion, we have the mean of 179,000 and our standard deviation is large since the difference between this maximum values and quantiles are large.

Okay. Now let's get the histogram of this. I will say seabouris histogram

this. I will say seabouris histogram plot numpy I will use log transform for

having a smooter histogram I will say data frame amount I'm going to pass bins

as 100 and I will pass kda as true and I will pass the color of

green next up I will say title transaction amount distribution and I will pass log

scaled and let's say like maybe log scale is better plt.x

label amount plus one and I will say plt.show show. So here we are going to

plt.show show. So here we are going to have our histogram in a second with this code. We can see our histogram in here

code. We can see our histogram in here and we can see the skill. Great. Next

up, let's look at the relationship of the fraud and amount. So I will say seabour box plot and I will say data

data frame. I'm going to filter the

data frame. I'm going to filter the amount which is going to be less than

50,000 and I will say X is fraud and next up I'm going to say Y will be

amount and we can give it a title like amount versus is fraud and we will say filter it under

50k and I will pass plt.sh show and we are going to have our plot ready. Here

it is. So from here we can see that for the higher amount which fitted under 50k we have more fraud rates and the

lower mean on the non fraud side closer to 10,000. Okay let's see the balance

to 10,000. Okay let's see the balance chains and the anomalies. So firstly I'm just going to create two different columns as balance difference of the

origin of account and destination account and I'm going to try to see if we have any negative balances on that side. So firstly I will say data frame

side. So firstly I will say data frame balance difference original and data frame old

balance original minus data frame new balance original.

Next up I will say actually it will be I let me quickly check the column names here. Okay, we

have this. Next, we are going to have this the same for the destination. Destination we will say

destination. Destination we will say data frame new balance destination minus

data frame I will say old balance destination like this. Okay. So now we can check if we have any negative values

on that side. I will say balance diff original and we can go with closer actually let's not take the values

closer to zero I will directly pass that and we are going to have a panda series like this and we can just take this into parenthesis and we can pass sum like

this so we are going to have oh we have a lot of balance differences with a value less than zero which means negative on the original side. That's

interesting. Let's see this for the destination. So, I'm just going to copy

destination. So, I'm just going to copy this and I'm just going to change this like here. I will just pass destination

like here. I will just pass destination and we can see that we have a lower value but this is a big value again.

Okay. So, in the data I will just call data frame.head we have a column named

data frame.head we have a column named step and it's increasing by day. So I'm

just going to create a plot. Then I'm

going to drop the step. So I'm just going to prepare the dropping code. I

will pass columns and I will take the step. I will set in place accurate true.

step. I will set in place accurate true.

So it will modify the original data frame. And after visualizing number of

frame. And after visualizing number of routes with step, I'm going to drop the column. So let's do that. Frouds per

column. So let's do that. Frouds per

step is going to be data frame. Data

frame is fraud occurs to one and I will take the values of step. I will use value counts actually lowerase letters.

I will take the value counts and I will say sort index like this. Next up I will use plt.plot plot and I will

use plt.plot plot and I will say FRS per step like this and I will

use index. Next up, I'm going to say FS

use index. Next up, I'm going to say FS per step dot values and I will say label is going to

be sprouts per step. And we can customize our plot like X label is going to be step which means

time in our data frame. Y label will be number of frouds and title is going to be frouds

over time. I'm going to set the grid as

over time. I'm going to set the grid as true and I will use plt show like this. So here it is

actually it seems non time dependent.

Okay, we are not going to use the time on this side on the modeling. So I'm

just going to drop that. And when I call data frame head, we are not going to see the step from now on. Great. So let's go customer wise. I want to find the

customer wise. I want to find the customers which makes the highest amount of transactions like the top senders and top receivers. And firstly I will start

top receivers. And firstly I will start with top senders and it's going to be name. We are going to go with this

name. We are going to go with this column. I will use value counts and

column. I will use value counts and let's take the top 10. So in the top sender side and we have this the three

generally let's see if we have equality on the top receivers side too top receivers it's going to be data frame

name destination value counts head of 10 and after that I'm going to say top receivers. So we don't have a equity on

receivers. So we don't have a equity on this side and customers wise we can see that who takes the highest amount of

transactions. So let's see the fraud

transactions. So let's see the fraud making customers. I will say fraud users

making customers. I will say fraud users and I will face dataf frame. Data frame

is fraud. Of course we are going to filter this as true which means one and I will pass name original value counts

and I will take the top 10 and I will take the fraud users like this. Okay, we have equality on this

this. Okay, we have equality on this side. Great. Let's analyze the transfer

side. Great. Let's analyze the transfer and cash out because in our plot, let me quickly go to that. We saw that these

two transaction types are the most open ones for the fraud. So let's just come here and I will say for

types and it's going to be dataf frame data frame type is in and I'm going to pass the

list as transfer and cache out like this. So after this we are going to have

this. So after this we are going to have f types data frame like this which has on the type side only two values like

let me quickly show you type value counts we are going to have cash out and transfer only and we can create a count

plot on this like seaborn count plot data is going to be fraud types x is going to be

type hu is going to be is fraud. Next up, I'm going to give it a

fraud. Next up, I'm going to give it a title fraud distribution in transfer and cash

out like this and I will use plt.show

after that. So, we are going to see this distribution in here. Okay, since we really have a low fraud values, we can't

even see the orange, but we can see cash out dominates this chart. Now we can go with a correlation matrix. So we are going to filter the numeric columns like

data frame amount and old balance

original new balance original. Next we

have old balance destination. We have new

balance destination and we have is fraud. And I'm going to say like I got

fraud. And I'm going to say like I got one in here. I'm going to delete that. I

will say coalation like this. And I'm going to make this run. Now on the correlation side we are going to have a matrix like

this and we can visualize it using a seaborn heat map. I will say seabor heat map I will pass the correlation matrix we have and out will be true. I'm going

to set the color map as cool warm. Actually, I will pass like cap cool

warm. And next up, I'm going to

warm. And next up, I'm going to say two floating like I'm going to format it like this. I will say

plt.title and it's going to be collation

plt.title and it's going to be collation matrix and I will use plt.show. So let's

see our collalation matrix. Here we can see the high correlations in our data.

The ones are the intersection of the same column and we have 0.98 which means high correlation between new balance destination and old balance destination

which is kind of normal and we have like new balance destination and amount correlation of 0.46 not that strong but not that weak.

So just for recap correlation can take values between minus1 and one which minus one means strong negative relationship where one means strong

positive relationship and zero means they don't really have relationship now that's strong one okay okay now I want

to filter the customers who has balance and after the transfer they go to zero so I will say zero after transfer

And it's going to be data frame. We are

going to add three filters on the side.

Firstly, data frame old balance original is going to be greater than zero and

data frame new balance original occurs to zero.

And we will say data frame type is going to be is in actually like this

transfer and cash out. So we can just say zero after

out. So we can just say zero after transfer we can take the length of this for seeing the customer amount like this. Here we have more than a million.

this. Here we have more than a million.

These records like this list is just a suspicious record and we can check them if we want to find there's a situation.

Great. So now we are going to get the feature engineering but we have a important thing that is fraud value count we have a class imbalance

situation. So for that let's do feature

situation. So for that let's do feature selection and preparation. So we are going to make some imports from scikitlearn. We will say

scikitlearn. We will say scikitlearn model selection import train test split scikitlearn preprocessing import

standard scaler scikitlearn linear model import logistic regression for the

modeling side and from scikit learnmetric import classification report with confusion matrix from

scikitlearn.pipeline import

scikitlearn.pipeline import pipeline from scikitlearn.compose import column

scikitlearn.compose import column transformer and from scikitlearn.rocrocessing import one hot

scikitlearn.rocrocessing import one hot encoder. Okay, so these are going to be

encoder. Okay, so these are going to be our imports and in here we are going to use train test split for splitting our data into training and testing sets.

Standard scaler for scaling our data.

Logistic regression for the modeling side. Scikit learn matrix classification

side. Scikit learn matrix classification report and co vision matrix. These are

for model evalation. Pipeline for

training the model and doing the transformation operations together.

Column transformer one hot encoder for the data transformation. So I'm going to call dataf frame.head again. And we have type amount name original old balance

original new balance original name destination old balance destination near balance destination is f is flagged fraud bounce difference original bounce

different destination so what we are going to do is firstly we are going to drop some columns dataf frame model we will say data frame drop and I'm going

to drop name original I'm going to drop the ones that I'm not going to use on the modeling name destination and is

flagged fraud and I'm going to set the access as one. So I'm going to make this run and let's see the data frame model

head. Okay, it seems great. Next up, we

head. Okay, it seems great. Next up, we are going to set the categorical and numerical types. Like categorical is

numerical types. Like categorical is going to be type and only this one and numeric is going to be

amount old balance original new balance original

and old balance destination and new balance destination.

Okay, seems cool. Now I'm going to set the target. Y is going to be dataf frame

the target. Y is going to be dataf frame model target is going to be is fraud and x is

going to be dataf frame model. drop we

are going to drop is fraud like this and axis is going to be one. Okay. So it's

going to be capital x. It's the general usage. I will use it like this too. Now

usage. I will use it like this too. Now

we are going to make train test split. X

train X test Y train Y test. We will use train test split. We will pass X Y test size of

0.3 and strify Y. So in here we set the 30% of

strify Y. So in here we set the 30% of our data in the testing set and 70% in the training set like X train and Y

train holds the 7% of the data and X test and Y test holds the 30% of the data. Next up we are going to do

data. Next up we are going to do preprocessing. So we will say

preprocessing. So we will say preprocessor and it's going to be column transformer. We will pass

transformer. We will pass transformers and I'm going to give a list like for the numerical ones it will

use standard scalar I will pass numeric too also I need to use parenthesis for the correct initialization and for the

categorical we are going to use one hot encoder and I will say drop first and also what I'm going to say is

categorical like we defined. Next up I will say after this

remainder drop and after that we are going to create a model pipeline. So let

me fix the typo firstly transformers.

Okay, now it's cool. Now we are going to create the model pipeline. So I will say pipeline and I will use the pipeline

class for them learn and preparation preprocessor and classifier is going to be logistic

regression. This part is really

regression. This part is really important. We are going to set the class

important. We are going to set the class weight as balanced because we want to handle the class imbalance situation

like 99% of our data is not fraud. So

our model is going to predict non fraud for every input we pass if we don't set this class way to balanced. We are

handling the class imbalance situation in here. And after this I'm going to say

in here. And after this I'm going to say max iterations is going to be th00and.

So next up actually points need to be at the end like this. Okay, next up what I'm going to do is I'm going to train my model with the preprocessing. So we will

use pipeline.fit and I will pass X train

pipeline.fit and I will pass X train with Y train. So when I do that it's going to train the model and I'm going to be re-recording when this finishes.

Okay, our model is ready. Here we can see the pipeline steps. standard scaler

one hot encoder and logistic regression like this. So we can make predictions

like this. So we can make predictions using that predict method right now with our machine learning model. So we can say predict and we can pass x test and

we are going to have a prediction array like this. So I'm going to save them

like this. So I'm going to save them like y prediction and I'm going to compare with the y test right now. I

will say classification report. I will

pass Y test and Y predictions like this. So I will also pass print. It's

this. So I will also pass print. It's

not looking pretty. Now it's going to be better like this. Great. And we can also get our confusion matrix on our model. Y

test Y prediction. Here it is true positive, true negative, false positive, false negative. Okay. So in here we can

false negative. Okay. So in here we can say that our model is good on catching the fraud and precision is not that

good. In here we can just do like try

good. In here we can just do like try another methods for class imbalance like smooth or under sampling type of things or we can try different models but I'm

not going to do that. I'm going to use this model. It's okay for me and it can

this model. It's okay for me and it can be used in my opinion like pipeline.score score X

pipeline.score score X test with Y test and we have the N with 4% accuracy

and I think it's pretty cool. I'm going

to continue with this model. Again, we

can improve this precision like it's going to be better 100% but I'm not going to make this video too long. But

you can go with the methods that I shared for fixing this situation. But

for this tutorial, I think it's going to be more than enough of this percent of accuracy. Great. Now it's time for

accuracy. Great. Now it's time for exporting this pipeline. So we will say import job at the first

place and then we are going to say joblib.dump. We are going to export the

joblib.dump. We are going to export the pipeline and we can say it like fraud detection pipeline.pickle.

detection pipeline.pickle.

And here in my file directory, I don't have a pickle file like this. And when I make this run, we are going to have fraud detection pipeline.pickle in here. And we are

pipeline.pickle in here. And we are going to be using this for getting predictions from our model in our streamllet app. So let's summarize what

streamllet app. So let's summarize what we did in here. Next, we can move to the streamllet app creation. Firstly, we

imported pandas numpy metplot lip and cboard. Then we load our data and we did

cboard. Then we load our data and we did some analysis and we did some data visualization and analyzed the data set

like this with different chart types.

And after that we have a correlation matrix in here. Next up we started to do feature

here. Next up we started to do feature engineering. At the first place we made

engineering. At the first place we made our imports column transformer and one hot encoder on the transformation side pipeline for creating a data pipeline with prep-processing and modeling

classification report and confusion metrics on the model evolution side logistic regression on modeling side standard scaler for scaling and train test split for data splitting. We

selected the features that we are going to be using in the model. Then we

defined the columns with their data types like this and we saved X and Y to target and we use training test split

with the 0.3 which means that 30% is going to be in the testing set. We set

our column transformer for numerical and categorical. Next we set our pipeline

categorical. Next we set our pipeline and trained our model. We get

predictions and we evolated our model.

At the end, we exported our pipeline and we are ready to create our web app. So

now I'm going to create a file like fraud detection.py and we are going to create

detection.py and we are going to create our web app for using our model in here at the first place. So at the first place we will say

import streamllet as st import import pandas as pd and import job lip and we are going to load our model like model

is going to be job lip.load we will say let me copy the exact name in here in here I'm going to copy this and

paste it in here. So now after we load it like this, we can just say model.predict and it's going to be

model.predict and it's going to be working on new predictions. So let's set a title for our streamllet app fraud

detection prediction app. Next up I will say streamlet markdown and please enter

the transaction details and use the predict button.

Next up, I will say simmit divider for a better look. And I will say transaction

better look. And I will say transaction type is going to be our first input and it's going to be streamllet select

box. We will say transaction

box. We will say transaction type and we are going to pass transaction types like payment,

transfer cash out. Actually they need to be uppercase

out. Actually they need to be uppercase like payment transfer cash out and also the

posit inside the list like this. Next up we are going to set the amount and it's going to be stream number input. We will pass it

like amount and we are going to say let's say minimum value is going to be 0.0. So we will take float as input and

0.0. So we will take float as input and let's set the default value as 1,000. Next up I'm going to take all the

1,000. Next up I'm going to take all the balance original like this streamlet number

input old balance and I will say sender.

Next up I'm going to say minimum value is going to be 0.0 and value is going to be 10,000 like this.

Okay, next up I'm going to say new balance original and it's going to be streamlet number input again new

balance sender and we will say minimum value let's say 0.0 again and value let's set this to 9,000.

Next up, what I'm going to do is we are going to do the same for the old balance destination and new balance destination.

So, old balance destination is going to be streamlet number input. We will say old balance and I

input. We will say old balance and I will pass receiver. We will say minimum value will be 0.0 and value is going to be let's say

0.0.

Also we are going to make a last one.

New balance destination is going to be stream number input. New balance receiver we will say

input. New balance receiver we will say minimum value is going to be 0.0 and let's set the value like 0.0. Okay. So

now we are going to define the prediction button. So if stream button

prediction button. So if stream button which means that if this button gets clicked I will say like

predict and we will say like input data is going to be pandas data frame.

We will pass a dictionary inside a list like in here. And we are going to say type is

here. And we are going to say type is going to be transaction type. Amount is

going to be amount and old [Music] balance original is going to be old

balance original. And we are going to do

balance original. And we are going to do the same for the new balance original is going to be new balance original. We

will say old balance destination is going to be old balance destination and new balance destination

is going to be new balance destination like this. So here is our data frame is

like this. So here is our data frame is ready and now we will say prediction is going to be model.predict input data and we are

model.predict input data and we are going to take like the first index. This

is going to return zero or one. So we

will say streamllet subheader and we are going to say fring

prediction and it's going to be integer prediction like this and we can say like in

here zero means actually let's not go for Next up we will say this is going to be better. If

better. If prediction occurs to one stream

error this transaction can be fraud and else we will say stream success. This

success. This transaction looks like it is not a fraud. And here it is. Great.

fraud. And here it is. Great.

So we are going to save this streamlet file and make this app run. Then after

showing you how this app looks and how it works, I'm going to summarize the code in here. So let's go to the streamlet. I'm going to open the

streamlet. I'm going to open the terminal and from here we are going to say streamlet run fraud detection.py the file name that you just

detection.py the file name that you just set it when you are creating your Python file. And when you press enter on that,

file. And when you press enter on that, oh we got a typo. Let me quickly fix that. In which one? Old balance

that. In which one? Old balance

original. Okay. Mean values. Now it's

going to be saved. And I'm just going to open new one. Streamlet run from pro detection.py. Now it's working smoothly.

detection.py. Now it's working smoothly.

Okay. In here a streamlet page is just opened in this address local URL. You

can just use control or command plus click or you can just copy this address and paste it on your browser. You are

hosting this app on your browser right now. So just go to this address and I'm

now. So just go to this address and I'm going to be re-recording on my browser right now. And here I am at the stream

right now. And here I am at the stream page and I'm just going to set this to dark mode quickly like this. And now we can test our app. So we have fraud

detection prediction app and here transaction type payment and amount.

Here we can see the values. I'm just

going to try the predict with the default values. Here we see that it says

default values. Here we see that it says this transaction looks like it is not fraud. So let's make the new balance as

fraud. So let's make the new balance as zero and increase the old balance like this. And let's see the result. It's not

this. And let's see the result. It's not

a fraud again. Okay, our model is doing a great job on predictions. So let's say that we are going to do cash out and amount is going to be th00and. Let's

skip this and I'm just going to make this a huge number and I'm just going to save this as zero. So when I say predict it says prediction is one a fraud

detection and it says this transaction can be fraud. Okay, it's working nicely.

Let's also see this on the transfer side. I will say predict it says this

side. I will say predict it says this transaction can be fraud. Amazing.

Great. So let's make another example payment. I will say 5,000 in here like

payment. I will say 5,000 in here like this. And I will say

this. And I will say 10,000. And I will say 5,000 on this.

10,000. And I will say 5,000 on this.

And let's see our prediction. It says

this transaction looks like it's not a fraud. So our model and our web app is

fraud. So our model and our web app is working really really good. I'm going to be recording back on my code editor right now. So here on the streamlet side

right now. So here on the streamlet side we started with importing streamllet pandas and job. Then we loaded our pipeline in here as model. Then we give

our app a title and we wrote a markdown as please enter the transaction details.

Actually there was a typo. I'm just

going to delete that. Doesn't matter. We

take the inputs from the user in here.

One select box for the categorical. The

others are numbering and we define stream button predict and in here we just set the input data like this in a data. Then we pass this into

data. Then we pass this into model.predict and get the prediction

model.predict and get the prediction with this. Next up we write the

with this. Next up we write the prediction and we display this is a fraud or not. That was it for the coding part. Let's get to the outro. Thanks for

part. Let's get to the outro. Thanks for

watching this tutorial. I have a playlist named data science and machine learning projects where I have more than 40 videos just like this one. You can

reach that playlist from the cards of this video or from the links in the description. Also, I'm sharing a new

description. Also, I'm sharing a new data science video every week on my channel. You can subscribe for more

channel. You can subscribe for more videos like this. Have a great day.

Loading...

Loading video analysis...