LongCut logo

[ML2021] HW1 & Pytorch Tutorial 1

By Hung-yi Lee

Summary

Topics Covered

  • Part 1
  • Part 2
  • Part 3
  • Part 4
  • Part 5

Full Transcript

okay i'll first introduce the first homework to you and next out i'll give and

give a high search tutorial to you and last i'll tell you how to do the homework and run the code we provide

we'll provide a sample code okay first um this is the outline of my slides and first i'll talk about

what are the objectives of this homework and how to uh and whatever what you're solving and what data you're using and how to evaluate

your model's performance and and i'll tell you how to submit your homework and how to

um great yeah first first of all the objective of this homework is first

to solve a regression problem with deep neural networks and next another important part is that we hope you that can you can understand

some basic training tips of training deep neural networks and also um you should get familiar with pytorch because

um the later on homeworks will use pi torch as well okay the task is to predict covert 19 cases

and this the data is from the group in carnegie mellon university and the data is

a daily survey from since the april last year and they used facebook to take this away and

[Music] to a warning is that you should not download data from the internet other than the data we provide so if you

use additional data or other like pre-trained models your final grade will be time time by

0.9 okay um the the tasker you're doing is to um we will give you the past three days

data um in a specific state and us and you have to predict the percentage of

new tested positive cases in a third day so in the first day you have the survey data and positive cases and the second day you have the same thing as well

and the third day you have to predict the positive cases on the first on the third day

and how how did the data are collected um the the surveys are conducted using facebook

and the the group collect the surveys every day and every state in the u.s and the survey

u.s and the survey consists of like covet 19 symptoms and whether

whether the people are getting tested and um whether they have social distinct

or their mental health status and other indicators as well um for example if this is the

total population of the of a certain state in u.s and the group

in u.s and the group they sample some some of the people and like maybe like 100 several hundreds of

people and they take the survey with facebook and then they use the results of the survey to estimate the

the the data of the total populations and this is the data we're using and we provide data

um like this we have the we have we used 40 states in the u.s

and they are encoded to one hot vectors i'll tell you what is uh one hot vector and next we will have four indicators

of covet-like illness like people have may have some illness like

influenza and they collect the estimate the total populations with the having the illness

and other indicators like behavior that whether people are wearing masks or traveling outside the state and others like mental health indicators

are also provided as well and the most important part is that you have to predict the tested positive cases and

these features are presented by in percentage and what is a one hot vector and one half vectors

are vectors with only one element equals to one while others are zero and uh and this kind of vectors are usually

usually used to encode discrete values for example if a state code is ac arizona and if we encode it to

a one hot vector it will be like this the vector is consisted of zeros but only one element is one

so this one represents the arizona okay this is the training data your training data will be is a csv file and there are

two thousand and seven hundred samples and the um the fourth uh the first 40 columns

are the one hot encoding of the states and the next 18 features are the features in the first day and then the second day

and the third day and the last column is the tested positive cases and this is the target you are going to predict

and one row is one sample and for testing data there are 893 samples and the

third day we only have 17 features because we removed the tested positive cases because you're going to predict the

answer with your model and the evaluation metric is root mean squared error and what is

um what are the um the symbols in here the first f is your model you're given your network and your neural network needs to input

a feature vector vector x so this is the training testing data we we provide and your your target is to minimize this

error and this the the y is the ground truth label you you don't have that but um we will help you calculate the

rnse um with the cargo with cargo okay cargo oh

this is the link to the cargo leak competition and it is our origin already launched so you can ignore this and

um your name should be in this format your student id first and and uh and underline and anything you want to need

so if you're auditing you don't you should not put your student id in your display name and the submission format will be in a csv

file and the the csv file we already help you write the the the code for processing the csv file

in the sample code so you don't have to to see this okay and next um for cargo submission you may submit

up to five results each day is and in the utc time time zone in taiwan is utc plus eight

so every day um in taiwan um the 8 a.m every 8 a.m you can start a new round to submit five results so

you're you're limited to five results to submit each day each day and before the competition ends you should

choose two submissions for the private leader board and you should check this check two two results

for the private leader board and about grading we set three baselines simple baseline medium baseline and strong baseline and each baseline there are public and private

baselines and each and you pass um each baseline you get one points if you pass them so you get

six points if you pass all baselines and remember you have to upload your code to the until cool platform

to get four points so the total point is 10 points and cargo for cargo the leaderboard will be like this there are simple baseline medium base

ions from baseline and a strong baseline if it is too hard we might change this to um easier baseline

for bonus points if you've got 10 points that is you you already get um you pass all baselines and you submitted your code

and we will make your code public to the whole class and if you also submit a pdf report

that briefly describing your methods and you get a bonus of 0.5 points and your report will also be available to all students

and this is the report template it is like this you can [Music] you have to put your

scores here and describe your methods here okay back to the slides about code submission you have to submit

to until cool and the format should be like this it should be compressed to a zip file and the file name should be your student id

underline homework one and and oh we can only see your last submission so you have to make sure that the last submission is your

it's the submission you want to let us to see and don't submit your model or data sets because the the um the files would be too

too large and we we might check your codes if your code is not reasonable the um your semester grade will be time multiplied by 0.9

um code submission you should specify your um the source of your source code if you use the sample code from the from

the ta you should add or should add this line at the bottom of your code a reference part

and about the zip file you should it should include your code it should be in py format or ipython notebook

a report if you if you pass all baselines you should also include your report here so for example there's a report and

your code your source code about code submission use if you use google google call app you can download your code

by clicking file here on the up left hand side and download ipython notebook

and how to compress your file you if you use windows you can use send to um right click your folder and send to and use

the compressed zip file and for mac users you can again also right uh double click

um click yeah double click no yes you can you know how to how to compress and if you ask you if you would like to use

the command line you can also use zip minus r to zip your to compress your photo and the most important part is to

remember the deadlines the cargo deadline is three weeks later and the code submission to

until cool is two days later and we we are not allowed there we don't allow any late submissions and so

so you should submit early okay there are some hints to for a simple baseline you just have to run our sample code and for medium baseline

we recommend you to perform a very simple feature selection to use the 40 states one hot encoding

and two tested positive cases um in the past two days and later i'll demonstrate how to do this

and for strong baseline there are some hints here like thinking that what are other useful

features and for you might change the dnn architecture or changing your training hyper parameters or using regularization

and there are some some mistakes in my sample code so you might you might look deeper into

the code and um remind for a reminder you should finish your homework on your own

and do not modify your prediction files manually so the um the files you upload to cargo should be

the code you your model produced and [Music] oh you should not share the your code or

prediction files with any other living creatures and then um you should only submit five times a day

you should not use any approaches to to do that to to submit more than five times and oh yeah you should not use additional data

or pre-trained model okay and if you have any questions you can ask us using ntu cool or um email or ta hour

and some useful links if you're interested in maybe regularization on your network training you can you can click on

these links and for pytorch you can look at this link okay any questions about homework one

no um if there's no questions i'll go on to the pytorch tutorial okay titor pythos is

a very important machine learning framework and we will use this in every homework in this course so the outline

of this slide is i'll tell you what is pytorch and the some and how to use pytorch to train

your deep neural network okay and the prerequisites are um we assume that you are already firmly familiar with python 3 and numpy

and what is pi torch pytorch is an open source machine learning framework and there are two features in pytorch that is very useful for deep neural network training

the first one is the tensor computation like numpy but the tensors can be computed on gpus graphics processing units

for acceleration and another feature is that titors can calculate gradients for you

and which is a very important part for dnn training and here's a simple comparison

about pytorch and tensorflow and pytorch is tensorflow is also a machine learning

frame framework but is the developed by google brain and python is from facebook but tensorflow is more um compatible

in multiple platforms like you can use javascript or swift to to use tensorflow but um for debugging titles is much more

easier and tensorflow might be is easier in the second of the edition

and so and pi torches are usually used in research and tensorflow flow is used for production so um how to train a deep neural network

the um the training procedure is like this first we load data and next we define our neural network we define our loss function

and the optimizer and the optimizer is the is the algorithm to update your neural network and next we first

we first train our neural network with data and maybe um if we train like one after one epoch we validate

we we use other data in our training set other than the training the data for training and to check if

our neura network improved and and then this procedure will continue for several times and after the training the traineer network will be used for

testing and also we use the low data for testing and in pi torch the pi torch provided torch dot nn and torch dot

optimizer for the neural network and optimizer parts and for loading data pytorch provides

data set and data loader for this part okay first i'll talk about tensors tensors are high dimensional matches matches or arrays

for one dimensional tensor it looks like this and for two-dimensional it's like matrix and for three-dimensional or higher-dimensional they're like

um qubits or something like that and what are the data stored in tensors

there are two common types stored in tensors one is a floating point and another is the integer and if you want to store

store if you want to construct a flow tensor you have to call the flow tensor function and here for long for integer you can call the

lung tensor next i'll tell tell you about um what are the shapes of tensors and for this this tensor a one-dimensional tensor

the the first dimension of this tensor um the length of this tensor is five and then for uh for the two-dimensional tensor

like this one the first um dimension is three and the length is three and the second is five so it's written like this and for the three

dimensions like this and you have to remember that the first dimension is an index zero and the second is one and the third is

two and so on and if you are familiar with numpy the dimension in python is equal to the axis in numpy

okay and then how about constructing a tensor there are several methods to construct a tensor the first one is to construct

a tensor with a list so you just have to call the tensor and put a list inside and another method is to construct a tensor from

a numpy array like this and then if you want to construct a tensor with zeros only zeros in in it then you can use this this

function portion torch dot zeros and then the second um another method is to construct

a tensor with all ones and in it like this and you have to specify the shapes of the tensor you you want

for for example here the first two methods the constructed tensors look like i look like this and for zero tensor the the shape of this tensor is

two by two and it will be look like this and for you um the tensor with only one centimeter will be like this the exam for example

and i'll next introduce some common operators in pi torch the first one is squeeze squeeze is to remove the specified dimension with the with

length of one so like this in the in dimension zero the the length of dimension zero is one

so we want if we want to remove this this dimension we have we use squeeze zero and the zero means dimension equals to zero

so uh illustrated and here the um the zero dimension is one and after squeezing it the dimension is gone

next unsqueeze and squeeze is the opposite of squeeze so if you want to expand one a new dimension we use squeeze so for example if we want

to squeeze at the first dimension dimension equals one it will we have to specify one here and

illustrated here we we have a new dimension here next transpose transpose if you already know about transposing matrix

then this is almost the same thing so you transpose 0 and 1 the two dimensions the dimension 0 and dimension 1.

so the shape will be like b like this concatenation um cat the the function cat is to concatenate multiple tensors

so for example we have three tensors x y z and the three tensors are almost the same shape but for one dimension they are

they're different so we want to concatenate the three tensors by the by that dimension so we have we use cat

and torch and we we use a list to put and put xyz in that list

and send to cat and we specify the first dimension so after concatenating the three tensors it will be like this so um the the diamond

the first dimension one three two will be added together so it'll be like this next for operate other operators like

you can use addition or subtraction or you can calculate power like this is calculating um the power

of x and of 2 squared and for summation or mean you can also use these two functions

and there are other more operations operators you can use and you can see the the link below and if you're familiar with numpy there

are some some from um some similar parts and between python and numpy likes the shape or data type

they are the same thing and for um manipulating the shape are of the tensors you can use reshape or view and high torch and reshape in

numpy and squeeze they are the same and unsqueezed there are python is slightly different from numpy

next as mentioned earlier um pytorch supports calculating tensors on gpus so

you have to move tensors to gpus so that you can calculate on gpus and for defaults a tensor will compute it by

cpus so if you want to use gpu you have to move it to cuda and what is cuda cuda is from the

nvidia cuda so if you want to um run your your code on gpus you have to use um an eda gpu and you have

to check whether there is a gpu for you to run them so you can use this function torch.cuda and check it is

torch.cuda and check it is available and you if you have multiple gpus you can specify cuda 0 or 1 or 2.

[Music] and why gpus because matrix operations or tensor operations can be splitted into

sub little little operations because the operations between the little small operations are independent then they can be calculated separately

and in gpus gpu has lots of little little cores and the cores can calculate a part of

the um operations so if we with a lot of small cores you can calculate great tensor

operations parallelly so it can be accelerated and oh another feature of

pi torch is to calculate gradients for example if we have a matrix x like this and we define the output

z is calculated by squaring the other elements of x and sums together

then we can calculate the partial derivative of z like this and then we combine the results back so we can find the gradient of z like

this so how do we do this in high polish first we we construct a tensor

the same as this one x and you have to specify that it requires gradient so it can calculate gradient

next you uh you have to calculate z so z you calculate the power of x and sum

them together then you calculate the gradient with this function backward finally you you see that you can see the gradient

stored in x so if you print it out you can see that the gradients are as the same as the one we computed earlier okay and

for the dn training procedure i'll first tell you how to load data with data set and data loader

okay for data set you have to construct a data set of the original data set a subset a subclass of

the original data set from pytorch and the thing the functions you have to implement is the first one is you have to read data and pre-process

data in the initialization stage next you have to define another function to get item to return one

example at a time and the end um this uh the get item function is for data loader so the day i'll i'll introduce that

later and the second thing you have to write is length the size of the data set

so after you've you construct your data set you put your data set and your data in a data loader in a data loader you have to specify the batch size

of the of the data loader because it returns one batch at the time but for data set it returns one sample at a time

and you also have to specify whether to shuffle your data if you you're shuffling your data it's only for um train training stage for testing or validation

you should not shuffle the data because um we want every time for validation or testing the results will be the same we don't

want to um be random um so that the so that every time the results will

will be the same so um this is an illustration of the dale order and data set so data set we first put it in the dial order

and the data loader will call the get item the get item functions in the data set like this if you specify your batch size

to five the data loader cost five times and get five samples from the data set and then

the data loader combines the four other five samples to one to one mini batch and the the number of samples equals to

the best size you specified okay next after you your you process your data the next thing you have to do is to define your

neural network um the most uh the most common layer of the neural network is the fully connected layer

and it is used like this you call the torch.nn and

use the linear module in this class and then you have to specify the input features the dimension of the input features and

the output features so for example if your linear layer input

input tensors of dimensions of 32 and output tensors of dimension 64. the input

tensor should should look like this thus the shape of the input tensor the the last dimension of the input sensor should be 32

and the in the last dimension of the other tensors should be 64. and the shape of

64. and the shape of this of these tensors um can be any shape but

only the last dimension should be constrained or limited to 32 or 64.

okay to illustrate the fully connected layer i i drew this this figure so the input

is a 32-dimensional vector and passed through the neurons and output

64 dimensional vector and it can be illustrated like by this the input tensor the input vector x

is a 32 dimensional vector and it is timed by a w matrix is the weight matrix of this

layer and is the size of this is 64 by 32

and it has have to also added a bias term and we get output

so using the um the slice from the from professor lee the neural network this also looks like this so the input x is timed

by a weight matrix and added with the bias term so we can look at

the weights and bias in this in the in the defined layer by calling the weight and bias and we can

see that the shape of the waist is 64 by 32 like this and the bias is size of 60 uh 64.

and also in torch dot and and there are lots of modules for deep neural networks and some commonly used activation functions like

sigmoid and relu are also implemented okay next after defining your neural network you

have to define your loss function one one of the loss functions for linear regression is the mean square error loss

like this mse loss and for classification which will be taught later is the cross entropy loss

okay next with these components you can build your own neural network so your model should be a subclass of the nn dot

module from the from pytorch and the first you have to first you have to initialize your model and define your layers

so the your model should be like this and then dot sequential and you can specify the layers um a linear layer

or an and a sigmoid layer and another linear layer and another function you have to

implement is the forward function it takes an input an input x and it returns the

the output of the network so if we illustrate this neural network it will be like this

the input will be a 10 dimensional vector or tensor and pass and it is passed through the first linear layer

and then it will be a 32-dimensional tensor and then it will be passed through a sigmoid activation function and last

it will be passed through the linear layer and output of a tensor of dimension one next we will define the optimizer

the optimizer is a as an optimization algorithm for gradient descent one of the common common optimizers is

the stochastic gradient dset sgd and if when you're defining your optimizer you have also you have to also pass

your model parameters to this module the optimizer so that the optimizer can optimize your model

models parameters and you you have all you have to also specify the learning rate of the optimizer and next um for on training

validation and testing [Music] you have to first define your data set you

you have to construct your data set and construct your data loader and construct your model

and move your model to the specified device the device can be cpu or cuda and you have to define your loss function

and define your optimizer and you have to put your model parameters into the optimizer and for new network training the

the code is look like this first you iterate through the four um the for loop for um maximum of n epochs you have to specify

how many epochs you are going to to to run and next you should you should set your model to training

mode then you get data from your data loader and before training before calculating the

gradients or about your model you should set the optimizer to zero gradient because the gradient might accumulate if you didn't

set the gradient to zero so you have to clean the gradient and next you should move your data to the same device as your model and then

you you can calculate the output of your model by putting the x as the

input next if you have your prediction you can calculate the loss by the criterion you defined earlier

and and then you you can you should calculate the gradients with the backward function and finally you use

optimizer to update your model's parameters and for validation you should set your model to evaluation mode first

and then also you can get data from your lead data loader and then on another um the

you have you have to also um use this this code to disable gradient calculation because for evaluation you should you don't need to calculate gradient and

without calculating gradients the model in the inference stage it will be calculated

faster and you also get a prediction and calculate loss and you accumulate your loss and and finally

you compute the average loss and then for testing for testing the for for testing you

you don't have the um the correct answer so you you also have to set your model to evaluation mode and get data but your data

don't have the output label so also you have to disable gradient calculation and

you predict predict the the output and then you collect the prediction and finally you should write the

prediction to the prediction file and upload to cargo to find out the performance of your model and next after training your model you

have to save your model and then when you have to use your model again you can load your model with torch.load

with torch.load and about more about pytorch you can you can look at the website of pytorch

and there are also some useful github repositories written in pi torch okay and any questions any questions

about high torch no um okay and then i'll

tell you how to run your code so first um if you have your laptop you can

see the sample code in this this page and to run this sample code you can open and collab

okay and first wait okay first you have to do is to connect to

a device that you can run your code on okay and then i'll i'll tell you what are the functions of this code

first we'll download the data from google drive wait i'll first run the code first so

you have to run your code by clicking the run all here so we first download data from google drive and import some packages here and you

can set a random seed here for reproducibility and then there are some utilities for plotting training

learning curves or plotting predictions you have to you can ignore this part so first we define the data set covet 19 data set

the data set here we first read the data from from the data we downloaded the

csv vial and then this part to do is for the the medium baseline out i'll show you how to how to modify

this part later okay and for testing mode for testing we just move the data to

a flow tensor but for training data we have to split the data to target and training data

and for training data they are we split them into two two parts the first part is for training the second part is for validation here

and then we can see that we only use um ten percent of the training data for validation and for ninety percent for training

and then we also convert data into python tensors and then we we normalize our data

and here is the get item function for training and validation mode we we return the input features and

output target here and for testing we only return the input features and the length here is the size of the data set

and for data loader after we construct a data set here we construct a data loader by passing through the data set

and the batch size so after we construct a data loader we we now um construct our deep neural network

deep neural network is a two-layered dpr network with activation of relu

and the input will be the input dimension should be specified and the first layer is a 6d dimension will be the

input will be projected to a 60 dimensional tensor and then pass through a relu activation function and will be passed through another linear layer

to the final one-dimensional output because we're doing uh we're performing a regression problem so our output should only be one

dimensional and also we can we have to specify our loss function the mean square error loss here forward we define forward to

calculate the output of the um our neural network we we can write

this and for calculating loss we we specify the prediction of our model and the target and then we

we calculate the loss and if you want to implement l2 regularization you can implement here next for

training the training and validation and testing code are similar to the code

i provided in the pytorch tutorial [Music] and you can see here [Music]

if the validation result indicates that the that your model improved

then then you will save your model to a specified path and next another important part is

to to set your hyper parameters like how many epochs will your model be trained and the maximum app box and the batch size

and what kind of optimizer you're using and um the hyper parameters of your optimizer okay for the train for training you

first load your data here and then define your model and move to

device and then start training then we get the final loss of 0.759

on the uh the validation set and we can plot the learning curve of our model we can see that the training loss is

still lowering but the development set the validation set is saturated and here we

we load our our best model back and then we plot the prediction of our model on

the validation set like this the this axis the the xs axis is about the ground truth value of the in the

validation set and the the y-axis is about the predicted value from our our model so we can see that

our model almost performed very well on predicting the covet 19 cases

so the last part we predict our predictor outputs for the testing testing step testing set and then

save the results to to the csv file you can see here clicking this folder and here

predict.csv you download this

predict.csv you download this file okay after downloading this you can wait a minute

okay we first um we next submit our results to cargo so if you have

a cargo account you can log in and join this competition and how how about how to submit your results you can click the submit

predictions and then you click this upload your prediction file and

predict and you can add um some descriptions of this submission you can write in here

and after uploading this you click make submission and then you you can immediately know

your um your score on the public set public leaderboard and click leaderboard

then um you can click my submissions and you can see that your public score is this and the the simple baseline

is here let me see the simple baseline is the score of the of simple baseline is 2.03 so

so simply simply running the sample code you you can pass the simple baseline so the next thing you can do is to modify

the code here so let me see the the hints so if you want to achieve

a medium baseline you can use feature selection and we only use the 40 states and the two tested positive cases in the

past two days we can we have to modify the data loader so here

here to to do here is to specify that we only use the

the 42 features so first we use the 40 states and then the two

tested positive features are in the 57th column and the 75th [Music]

column so like this and then you have to change target only to true so here and

the bottom here so for setting of your hybrid parameters you can change this to true

and then you run your code again

oh feeds oh yeah here feet

okay i'll run it again okay and it seems that we

finished the training so we we download this prediction file again and then submit again submit predictions and

submit here

two oh yeah okay and so see

my submissions the score is here 1.057 so we can see the um

the median baseline the median baseline is 1.28 so so just simply using feature selection you can

pass the medium baseline and after before the cargo competition ends you should

you should choose two scores two submissions for the final score so you can click this and this

you can you can only click two to so that your your your public you will be accounted for the private leader board so

here now you can only see the your score on the public leaderboard but you cannot see the scores on the private leaderboard okay so are there any questions

no okay

Loading...

Loading video analysis...