[ML2021] HW1 & Pytorch Tutorial 1
By Hung-yi Lee
Summary
Topics Covered
- Part 1
- Part 2
- Part 3
- Part 4
- Part 5
Full Transcript
okay i'll first introduce the first homework to you and next out i'll give and
give a high search tutorial to you and last i'll tell you how to do the homework and run the code we provide
we'll provide a sample code okay first um this is the outline of my slides and first i'll talk about
what are the objectives of this homework and how to uh and whatever what you're solving and what data you're using and how to evaluate
your model's performance and and i'll tell you how to submit your homework and how to
um great yeah first first of all the objective of this homework is first
to solve a regression problem with deep neural networks and next another important part is that we hope you that can you can understand
some basic training tips of training deep neural networks and also um you should get familiar with pytorch because
um the later on homeworks will use pi torch as well okay the task is to predict covert 19 cases
and this the data is from the group in carnegie mellon university and the data is
a daily survey from since the april last year and they used facebook to take this away and
[Music] to a warning is that you should not download data from the internet other than the data we provide so if you
use additional data or other like pre-trained models your final grade will be time time by
0.9 okay um the the tasker you're doing is to um we will give you the past three days
data um in a specific state and us and you have to predict the percentage of
new tested positive cases in a third day so in the first day you have the survey data and positive cases and the second day you have the same thing as well
and the third day you have to predict the positive cases on the first on the third day
and how how did the data are collected um the the surveys are conducted using facebook
and the the group collect the surveys every day and every state in the u.s and the survey
u.s and the survey consists of like covet 19 symptoms and whether
whether the people are getting tested and um whether they have social distinct
or their mental health status and other indicators as well um for example if this is the
total population of the of a certain state in u.s and the group
in u.s and the group they sample some some of the people and like maybe like 100 several hundreds of
people and they take the survey with facebook and then they use the results of the survey to estimate the
the the data of the total populations and this is the data we're using and we provide data
um like this we have the we have we used 40 states in the u.s
and they are encoded to one hot vectors i'll tell you what is uh one hot vector and next we will have four indicators
of covet-like illness like people have may have some illness like
influenza and they collect the estimate the total populations with the having the illness
and other indicators like behavior that whether people are wearing masks or traveling outside the state and others like mental health indicators
are also provided as well and the most important part is that you have to predict the tested positive cases and
these features are presented by in percentage and what is a one hot vector and one half vectors
are vectors with only one element equals to one while others are zero and uh and this kind of vectors are usually
usually used to encode discrete values for example if a state code is ac arizona and if we encode it to
a one hot vector it will be like this the vector is consisted of zeros but only one element is one
so this one represents the arizona okay this is the training data your training data will be is a csv file and there are
two thousand and seven hundred samples and the um the fourth uh the first 40 columns
are the one hot encoding of the states and the next 18 features are the features in the first day and then the second day
and the third day and the last column is the tested positive cases and this is the target you are going to predict
and one row is one sample and for testing data there are 893 samples and the
third day we only have 17 features because we removed the tested positive cases because you're going to predict the
answer with your model and the evaluation metric is root mean squared error and what is
um what are the um the symbols in here the first f is your model you're given your network and your neural network needs to input
a feature vector vector x so this is the training testing data we we provide and your your target is to minimize this
error and this the the y is the ground truth label you you don't have that but um we will help you calculate the
rnse um with the cargo with cargo okay cargo oh
this is the link to the cargo leak competition and it is our origin already launched so you can ignore this and
um your name should be in this format your student id first and and uh and underline and anything you want to need
so if you're auditing you don't you should not put your student id in your display name and the submission format will be in a csv
file and the the csv file we already help you write the the the code for processing the csv file
in the sample code so you don't have to to see this okay and next um for cargo submission you may submit
up to five results each day is and in the utc time time zone in taiwan is utc plus eight
so every day um in taiwan um the 8 a.m every 8 a.m you can start a new round to submit five results so
you're you're limited to five results to submit each day each day and before the competition ends you should
choose two submissions for the private leader board and you should check this check two two results
for the private leader board and about grading we set three baselines simple baseline medium baseline and strong baseline and each baseline there are public and private
baselines and each and you pass um each baseline you get one points if you pass them so you get
six points if you pass all baselines and remember you have to upload your code to the until cool platform
to get four points so the total point is 10 points and cargo for cargo the leaderboard will be like this there are simple baseline medium base
ions from baseline and a strong baseline if it is too hard we might change this to um easier baseline
for bonus points if you've got 10 points that is you you already get um you pass all baselines and you submitted your code
and we will make your code public to the whole class and if you also submit a pdf report
that briefly describing your methods and you get a bonus of 0.5 points and your report will also be available to all students
and this is the report template it is like this you can [Music] you have to put your
scores here and describe your methods here okay back to the slides about code submission you have to submit
to until cool and the format should be like this it should be compressed to a zip file and the file name should be your student id
underline homework one and and oh we can only see your last submission so you have to make sure that the last submission is your
it's the submission you want to let us to see and don't submit your model or data sets because the the um the files would be too
too large and we we might check your codes if your code is not reasonable the um your semester grade will be time multiplied by 0.9
um code submission you should specify your um the source of your source code if you use the sample code from the from
the ta you should add or should add this line at the bottom of your code a reference part
and about the zip file you should it should include your code it should be in py format or ipython notebook
a report if you if you pass all baselines you should also include your report here so for example there's a report and
your code your source code about code submission use if you use google google call app you can download your code
by clicking file here on the up left hand side and download ipython notebook
and how to compress your file you if you use windows you can use send to um right click your folder and send to and use
the compressed zip file and for mac users you can again also right uh double click
um click yeah double click no yes you can you know how to how to compress and if you ask you if you would like to use
the command line you can also use zip minus r to zip your to compress your photo and the most important part is to
remember the deadlines the cargo deadline is three weeks later and the code submission to
until cool is two days later and we we are not allowed there we don't allow any late submissions and so
so you should submit early okay there are some hints to for a simple baseline you just have to run our sample code and for medium baseline
we recommend you to perform a very simple feature selection to use the 40 states one hot encoding
and two tested positive cases um in the past two days and later i'll demonstrate how to do this
and for strong baseline there are some hints here like thinking that what are other useful
features and for you might change the dnn architecture or changing your training hyper parameters or using regularization
and there are some some mistakes in my sample code so you might you might look deeper into
the code and um remind for a reminder you should finish your homework on your own
and do not modify your prediction files manually so the um the files you upload to cargo should be
the code you your model produced and [Music] oh you should not share the your code or
prediction files with any other living creatures and then um you should only submit five times a day
you should not use any approaches to to do that to to submit more than five times and oh yeah you should not use additional data
or pre-trained model okay and if you have any questions you can ask us using ntu cool or um email or ta hour
and some useful links if you're interested in maybe regularization on your network training you can you can click on
these links and for pytorch you can look at this link okay any questions about homework one
no um if there's no questions i'll go on to the pytorch tutorial okay titor pythos is
a very important machine learning framework and we will use this in every homework in this course so the outline
of this slide is i'll tell you what is pytorch and the some and how to use pytorch to train
your deep neural network okay and the prerequisites are um we assume that you are already firmly familiar with python 3 and numpy
and what is pi torch pytorch is an open source machine learning framework and there are two features in pytorch that is very useful for deep neural network training
the first one is the tensor computation like numpy but the tensors can be computed on gpus graphics processing units
for acceleration and another feature is that titors can calculate gradients for you
and which is a very important part for dnn training and here's a simple comparison
about pytorch and tensorflow and pytorch is tensorflow is also a machine learning
frame framework but is the developed by google brain and python is from facebook but tensorflow is more um compatible
in multiple platforms like you can use javascript or swift to to use tensorflow but um for debugging titles is much more
easier and tensorflow might be is easier in the second of the edition
and so and pi torches are usually used in research and tensorflow flow is used for production so um how to train a deep neural network
the um the training procedure is like this first we load data and next we define our neural network we define our loss function
and the optimizer and the optimizer is the is the algorithm to update your neural network and next we first
we first train our neural network with data and maybe um if we train like one after one epoch we validate
we we use other data in our training set other than the training the data for training and to check if
our neura network improved and and then this procedure will continue for several times and after the training the traineer network will be used for
testing and also we use the low data for testing and in pi torch the pi torch provided torch dot nn and torch dot
optimizer for the neural network and optimizer parts and for loading data pytorch provides
data set and data loader for this part okay first i'll talk about tensors tensors are high dimensional matches matches or arrays
for one dimensional tensor it looks like this and for two-dimensional it's like matrix and for three-dimensional or higher-dimensional they're like
um qubits or something like that and what are the data stored in tensors
there are two common types stored in tensors one is a floating point and another is the integer and if you want to store
store if you want to construct a flow tensor you have to call the flow tensor function and here for long for integer you can call the
lung tensor next i'll tell tell you about um what are the shapes of tensors and for this this tensor a one-dimensional tensor
the the first dimension of this tensor um the length of this tensor is five and then for uh for the two-dimensional tensor
like this one the first um dimension is three and the length is three and the second is five so it's written like this and for the three
dimensions like this and you have to remember that the first dimension is an index zero and the second is one and the third is
two and so on and if you are familiar with numpy the dimension in python is equal to the axis in numpy
okay and then how about constructing a tensor there are several methods to construct a tensor the first one is to construct
a tensor with a list so you just have to call the tensor and put a list inside and another method is to construct a tensor from
a numpy array like this and then if you want to construct a tensor with zeros only zeros in in it then you can use this this
function portion torch dot zeros and then the second um another method is to construct
a tensor with all ones and in it like this and you have to specify the shapes of the tensor you you want
for for example here the first two methods the constructed tensors look like i look like this and for zero tensor the the shape of this tensor is
two by two and it will be look like this and for you um the tensor with only one centimeter will be like this the exam for example
and i'll next introduce some common operators in pi torch the first one is squeeze squeeze is to remove the specified dimension with the with
length of one so like this in the in dimension zero the the length of dimension zero is one
so we want if we want to remove this this dimension we have we use squeeze zero and the zero means dimension equals to zero
so uh illustrated and here the um the zero dimension is one and after squeezing it the dimension is gone
next unsqueeze and squeeze is the opposite of squeeze so if you want to expand one a new dimension we use squeeze so for example if we want
to squeeze at the first dimension dimension equals one it will we have to specify one here and
illustrated here we we have a new dimension here next transpose transpose if you already know about transposing matrix
then this is almost the same thing so you transpose 0 and 1 the two dimensions the dimension 0 and dimension 1.
so the shape will be like b like this concatenation um cat the the function cat is to concatenate multiple tensors
so for example we have three tensors x y z and the three tensors are almost the same shape but for one dimension they are
they're different so we want to concatenate the three tensors by the by that dimension so we have we use cat
and torch and we we use a list to put and put xyz in that list
and send to cat and we specify the first dimension so after concatenating the three tensors it will be like this so um the the diamond
the first dimension one three two will be added together so it'll be like this next for operate other operators like
you can use addition or subtraction or you can calculate power like this is calculating um the power
of x and of 2 squared and for summation or mean you can also use these two functions
and there are other more operations operators you can use and you can see the the link below and if you're familiar with numpy there
are some some from um some similar parts and between python and numpy likes the shape or data type
they are the same thing and for um manipulating the shape are of the tensors you can use reshape or view and high torch and reshape in
numpy and squeeze they are the same and unsqueezed there are python is slightly different from numpy
next as mentioned earlier um pytorch supports calculating tensors on gpus so
you have to move tensors to gpus so that you can calculate on gpus and for defaults a tensor will compute it by
cpus so if you want to use gpu you have to move it to cuda and what is cuda cuda is from the
nvidia cuda so if you want to um run your your code on gpus you have to use um an eda gpu and you have
to check whether there is a gpu for you to run them so you can use this function torch.cuda and check it is
torch.cuda and check it is available and you if you have multiple gpus you can specify cuda 0 or 1 or 2.
[Music] and why gpus because matrix operations or tensor operations can be splitted into
sub little little operations because the operations between the little small operations are independent then they can be calculated separately
and in gpus gpu has lots of little little cores and the cores can calculate a part of
the um operations so if we with a lot of small cores you can calculate great tensor
operations parallelly so it can be accelerated and oh another feature of
pi torch is to calculate gradients for example if we have a matrix x like this and we define the output
z is calculated by squaring the other elements of x and sums together
then we can calculate the partial derivative of z like this and then we combine the results back so we can find the gradient of z like
this so how do we do this in high polish first we we construct a tensor
the same as this one x and you have to specify that it requires gradient so it can calculate gradient
next you uh you have to calculate z so z you calculate the power of x and sum
them together then you calculate the gradient with this function backward finally you you see that you can see the gradient
stored in x so if you print it out you can see that the gradients are as the same as the one we computed earlier okay and
for the dn training procedure i'll first tell you how to load data with data set and data loader
okay for data set you have to construct a data set of the original data set a subset a subclass of
the original data set from pytorch and the thing the functions you have to implement is the first one is you have to read data and pre-process
data in the initialization stage next you have to define another function to get item to return one
example at a time and the end um this uh the get item function is for data loader so the day i'll i'll introduce that
later and the second thing you have to write is length the size of the data set
so after you've you construct your data set you put your data set and your data in a data loader in a data loader you have to specify the batch size
of the of the data loader because it returns one batch at the time but for data set it returns one sample at a time
and you also have to specify whether to shuffle your data if you you're shuffling your data it's only for um train training stage for testing or validation
you should not shuffle the data because um we want every time for validation or testing the results will be the same we don't
want to um be random um so that the so that every time the results will
will be the same so um this is an illustration of the dale order and data set so data set we first put it in the dial order
and the data loader will call the get item the get item functions in the data set like this if you specify your batch size
to five the data loader cost five times and get five samples from the data set and then
the data loader combines the four other five samples to one to one mini batch and the the number of samples equals to
the best size you specified okay next after you your you process your data the next thing you have to do is to define your
neural network um the most uh the most common layer of the neural network is the fully connected layer
and it is used like this you call the torch.nn and
use the linear module in this class and then you have to specify the input features the dimension of the input features and
the output features so for example if your linear layer input
input tensors of dimensions of 32 and output tensors of dimension 64. the input
tensor should should look like this thus the shape of the input tensor the the last dimension of the input sensor should be 32
and the in the last dimension of the other tensors should be 64. and the shape of
64. and the shape of this of these tensors um can be any shape but
only the last dimension should be constrained or limited to 32 or 64.
okay to illustrate the fully connected layer i i drew this this figure so the input
is a 32-dimensional vector and passed through the neurons and output
64 dimensional vector and it can be illustrated like by this the input tensor the input vector x
is a 32 dimensional vector and it is timed by a w matrix is the weight matrix of this
layer and is the size of this is 64 by 32
and it has have to also added a bias term and we get output
so using the um the slice from the from professor lee the neural network this also looks like this so the input x is timed
by a weight matrix and added with the bias term so we can look at
the weights and bias in this in the in the defined layer by calling the weight and bias and we can
see that the shape of the waist is 64 by 32 like this and the bias is size of 60 uh 64.
and also in torch dot and and there are lots of modules for deep neural networks and some commonly used activation functions like
sigmoid and relu are also implemented okay next after defining your neural network you
have to define your loss function one one of the loss functions for linear regression is the mean square error loss
like this mse loss and for classification which will be taught later is the cross entropy loss
okay next with these components you can build your own neural network so your model should be a subclass of the nn dot
module from the from pytorch and the first you have to first you have to initialize your model and define your layers
so the your model should be like this and then dot sequential and you can specify the layers um a linear layer
or an and a sigmoid layer and another linear layer and another function you have to
implement is the forward function it takes an input an input x and it returns the
the output of the network so if we illustrate this neural network it will be like this
the input will be a 10 dimensional vector or tensor and pass and it is passed through the first linear layer
and then it will be a 32-dimensional tensor and then it will be passed through a sigmoid activation function and last
it will be passed through the linear layer and output of a tensor of dimension one next we will define the optimizer
the optimizer is a as an optimization algorithm for gradient descent one of the common common optimizers is
the stochastic gradient dset sgd and if when you're defining your optimizer you have also you have to also pass
your model parameters to this module the optimizer so that the optimizer can optimize your model
models parameters and you you have all you have to also specify the learning rate of the optimizer and next um for on training
validation and testing [Music] you have to first define your data set you
you have to construct your data set and construct your data loader and construct your model
and move your model to the specified device the device can be cpu or cuda and you have to define your loss function
and define your optimizer and you have to put your model parameters into the optimizer and for new network training the
the code is look like this first you iterate through the four um the for loop for um maximum of n epochs you have to specify
how many epochs you are going to to to run and next you should you should set your model to training
mode then you get data from your data loader and before training before calculating the
gradients or about your model you should set the optimizer to zero gradient because the gradient might accumulate if you didn't
set the gradient to zero so you have to clean the gradient and next you should move your data to the same device as your model and then
you you can calculate the output of your model by putting the x as the
input next if you have your prediction you can calculate the loss by the criterion you defined earlier
and and then you you can you should calculate the gradients with the backward function and finally you use
optimizer to update your model's parameters and for validation you should set your model to evaluation mode first
and then also you can get data from your lead data loader and then on another um the
you have you have to also um use this this code to disable gradient calculation because for evaluation you should you don't need to calculate gradient and
without calculating gradients the model in the inference stage it will be calculated
faster and you also get a prediction and calculate loss and you accumulate your loss and and finally
you compute the average loss and then for testing for testing the for for testing you
you don't have the um the correct answer so you you also have to set your model to evaluation mode and get data but your data
don't have the output label so also you have to disable gradient calculation and
you predict predict the the output and then you collect the prediction and finally you should write the
prediction to the prediction file and upload to cargo to find out the performance of your model and next after training your model you
have to save your model and then when you have to use your model again you can load your model with torch.load
with torch.load and about more about pytorch you can you can look at the website of pytorch
and there are also some useful github repositories written in pi torch okay and any questions any questions
about high torch no um okay and then i'll
tell you how to run your code so first um if you have your laptop you can
see the sample code in this this page and to run this sample code you can open and collab
okay and first wait okay first you have to do is to connect to
a device that you can run your code on okay and then i'll i'll tell you what are the functions of this code
first we'll download the data from google drive wait i'll first run the code first so
you have to run your code by clicking the run all here so we first download data from google drive and import some packages here and you
can set a random seed here for reproducibility and then there are some utilities for plotting training
learning curves or plotting predictions you have to you can ignore this part so first we define the data set covet 19 data set
the data set here we first read the data from from the data we downloaded the
csv vial and then this part to do is for the the medium baseline out i'll show you how to how to modify
this part later okay and for testing mode for testing we just move the data to
a flow tensor but for training data we have to split the data to target and training data
and for training data they are we split them into two two parts the first part is for training the second part is for validation here
and then we can see that we only use um ten percent of the training data for validation and for ninety percent for training
and then we also convert data into python tensors and then we we normalize our data
and here is the get item function for training and validation mode we we return the input features and
output target here and for testing we only return the input features and the length here is the size of the data set
and for data loader after we construct a data set here we construct a data loader by passing through the data set
and the batch size so after we construct a data loader we we now um construct our deep neural network
deep neural network is a two-layered dpr network with activation of relu
and the input will be the input dimension should be specified and the first layer is a 6d dimension will be the
input will be projected to a 60 dimensional tensor and then pass through a relu activation function and will be passed through another linear layer
to the final one-dimensional output because we're doing uh we're performing a regression problem so our output should only be one
dimensional and also we can we have to specify our loss function the mean square error loss here forward we define forward to
calculate the output of the um our neural network we we can write
this and for calculating loss we we specify the prediction of our model and the target and then we
we calculate the loss and if you want to implement l2 regularization you can implement here next for
training the training and validation and testing code are similar to the code
i provided in the pytorch tutorial [Music] and you can see here [Music]
if the validation result indicates that the that your model improved
then then you will save your model to a specified path and next another important part is
to to set your hyper parameters like how many epochs will your model be trained and the maximum app box and the batch size
and what kind of optimizer you're using and um the hyper parameters of your optimizer okay for the train for training you
first load your data here and then define your model and move to
device and then start training then we get the final loss of 0.759
on the uh the validation set and we can plot the learning curve of our model we can see that the training loss is
still lowering but the development set the validation set is saturated and here we
we load our our best model back and then we plot the prediction of our model on
the validation set like this the this axis the the xs axis is about the ground truth value of the in the
validation set and the the y-axis is about the predicted value from our our model so we can see that
our model almost performed very well on predicting the covet 19 cases
so the last part we predict our predictor outputs for the testing testing step testing set and then
save the results to to the csv file you can see here clicking this folder and here
predict.csv you download this
predict.csv you download this file okay after downloading this you can wait a minute
okay we first um we next submit our results to cargo so if you have
a cargo account you can log in and join this competition and how how about how to submit your results you can click the submit
predictions and then you click this upload your prediction file and
predict and you can add um some descriptions of this submission you can write in here
and after uploading this you click make submission and then you you can immediately know
your um your score on the public set public leaderboard and click leaderboard
then um you can click my submissions and you can see that your public score is this and the the simple baseline
is here let me see the simple baseline is the score of the of simple baseline is 2.03 so
so simply simply running the sample code you you can pass the simple baseline so the next thing you can do is to modify
the code here so let me see the the hints so if you want to achieve
a medium baseline you can use feature selection and we only use the 40 states and the two tested positive cases in the
past two days we can we have to modify the data loader so here
here to to do here is to specify that we only use the
the 42 features so first we use the 40 states and then the two
tested positive features are in the 57th column and the 75th [Music]
column so like this and then you have to change target only to true so here and
the bottom here so for setting of your hybrid parameters you can change this to true
and then you run your code again
oh feeds oh yeah here feet
okay i'll run it again okay and it seems that we
finished the training so we we download this prediction file again and then submit again submit predictions and
submit here
two oh yeah okay and so see
my submissions the score is here 1.057 so we can see the um
the median baseline the median baseline is 1.28 so so just simply using feature selection you can
pass the medium baseline and after before the cargo competition ends you should
you should choose two scores two submissions for the final score so you can click this and this
you can you can only click two to so that your your your public you will be accounted for the private leader board so
here now you can only see the your score on the public leaderboard but you cannot see the scores on the private leaderboard okay so are there any questions
no okay
Loading video analysis...