Transfer Learning (C3W2L07)
By DeepLearningAI
Summary
Topics Covered
- How Transfer Learning Works: Repurpose a Neural Network in Two Steps
- Pre-Training vs. Fine-Tuning: The Two-Phase Transfer Learning Protocol
- Low-Level Features Transfer Across Domains: Edges, Curves, and Shapes
- Transfer Learning Needs Data Asymmetry: Source Task Must Have More Data
- Why Transfer Learning Fails When Your Target Task Already Has Enough Data
Full Transcript
one of the most powerful ideas in deep learning is that sometimes we can take the knowledge the new network has learned from one toss and apply that knowledge to a separate task so for
example maybe kind of a new network learn to recognize objects like cats and then use that knowledge or use part of that knowledge to help you do a better job reading x-ray scans this is called
transfer learning let's take a look let's say you've trained in your network on image recognition so you first take a
neural network and train it on XY pairs where X is an image and Y is some object in the image as a cat or a dog or bird or something else if you want to take
this new network and a gap or we say transfer what is learn to a different tasks such as radiology diagnosis or
meaning really reading x-ray scans what you can do is take this loss output layer of the neural network and just delete that and delete also the waste feeding into that loss output layer and
create a new set of randomly initialized ways just for the last layer and at that now output radiology diagnosis so to be
concrete during the first phase of training when you're trading on an image recognition task you train all of the usual parameters within your network all the ways all the layers and you have
something that now learns to make image recognition predictions having trained that neural network what you now do to
implement transfer learning is swap in a new data set X Y where now these are
radiology images and why are the diagnosis you want to predict and what
you do is initialize the last layers ways is called a WL + BL random being and now we train the neural network on
this new data set on the new radiology dataset we have a couple options of regions in your network with radiology data you might if you have a small radiology data set you might want to
just retrain the weights of the last layer just W LPL and keep the rest of parameters fixed if you have an up data you could also retrain all the layers of
the rest of the neural network and the rule of thumb is maybe a bit of a small data set then just retrain the one loss layer at the output layer or maybe even lost one or two layers there's a lot of
data then maybe you can retrain all the parameters in the networks and if you retrain all the parameters in your network then this initial phase of
training on image recognition is sometimes called pre training because you're using image recognition data to pre initialize or really pre train the
weights of the neural network and then if you are updating all the ways afterwards and trading on the radiology data sometimes that's called fine tuning so if you share the words pre training
and fine tuning in a deep learning context this is what they mean when they refer to pre training and fine tuning ways in a transfer learning cost and what you've done in this example is
you've taken knowledge learn from image recognition and applied it or transferred it to radiology diagnosis and the reason this can be helpful is that a lot of the low-level features
such as detecting edges that encourage detecting positive objects learning from that from a very enlarged image recognition data set might help your learning algorithm do better in
radiology diagnosis it's just learned a lot about the structure and the nature of how images look like and some of that knowledge will be useful so having learn
to recognize images it might have learned enough about you know just what parts of different images look like that that knowledge about lines dots curves
and so on may be small parts of objects that knowledge could help your radiology diagnosis Network learn a bit faster or learn what less data here's another example let's say that you've trained a
speech recognition so now X is inputs of audio or your snippers and Y is some in transcript so
you're trained in speech recognition system to output your transcripts and let's say that you now want to build a
waik word or a trigger word detection system so recall that they wake whether the trigger words are the words we say
in order to wake up speech control devices in the houses such as saying Alexis and we're going Amazon echo or okay Google to waken Google device or a series with an Apple device or saying
they hope I do to wake up up my to device so in order to do this you might take out the last layer of the neural
network again and create a new output note but sometimes another thing you could do is actually create not just a single new output but actually create
several new layers to your neural network to try to predict the labels Y for your wake word detection problem then again depending on how much data
you have you might just retrain the new layers of the network or maybe you could be trained you're even more layers of this neural network so when does
transfer learning makes sense transfer learning makes sense when you have a lot of data for the problem you're transferring from and usually relatively less data for the problem you're
transferring to so for example let's say you have a million examples for your image recognition tasks so that's a lot of data to learn a lot of low-level features or to learn a lot of useful
features in the earlier layers in your network but for the radiology tasks maybe you have only 100 examples so you're very low data for the radiology
diagnosis problem you have only 100 x-ray scans so lot of knowledge you learn from image recognition can be transferred and can really help you get going with radiology recognition even if
you don't have an all the data for radiology or speech recognition maybe you've trained the speech recognition system on $10,000 of data so you have learned a lot about what human voices sounds like
from that $10,000 of data which really is a lot but for your trigger word detection maybe you have only one hour of data so that's not raw data to figure out parameters so in this case a lot of
what you learn about what human voices sound like what are components of human speech and so on that can be really helpful but building a good wake word detector even though you have a
relatively small data center he's a much smaller data set for the weak word detection task so both of these cases are transferring from a problem with a
lot of data to a problem with relatively little data one case where transfer learning would not make sense is if the
opposite was true so if you had a hundred images for image recognition and you had a hundred images for radiology diagnosis or even you're a thousand
images really for radiology diagnosis one would think about it is that to do well on radiology diagnosis assuming what you really want to do well on is radiology diagnosis having radiology
images is much more valuable than having cat-and-dog and so on images so each example here is much more valuable than each example there at least for the purpose of building a good radiology
system so if you already have more data for radiology is not that likely that having 100 images or your random objects of cats and dogs and calls and so on
would be that helpful because the value of one example of images from your English recognition terms of cats and dogs is just less valuable than one
example of an x-ray image for the task of building a good radiology system so this would be one example where transfer learning well it might not hurt but I wouldn't expect it to give you any
meaningful gain either and similarly if you built a speech recognition system on 10oz of data and you actually have 10 hours or maybe even more say 50 hours of
data for week word detection you know it won't merely not hurt maybe it won't hurt to include that 10 hours of data to do transfer but you just couldn't expect to get a meaningful game
so to summarize when does transfer learning make sense if you're trying to learn from some task a and transfer some
of the knowledge to sometimes be then transfer learning make sense when toss a and B have the same input X in the first
example a and B both images as input in the second example both had audio codes as input it tends to make sense when you have one more data for toss a then toss
B all this is under the assumption that what you really want to do well on this toss video and because data for tossed B is more valuable for toss B usually you
just need a lot more data for toss a because do each example from toss a is just less valuable photos B in each example for toss B and then finally
transfer learning will tend to make more sense if you suspect that low-level features from toss a could be helpful for learning times B and in both of the earlier examples maybe learning image
recognition teaches you a number about images to hover radiology diagnosis and maybe learning speech recognition teaches you about human speech to help you with trigger words on record
detection so to summarize transfer learning tends to be most useful if you're trying to do well on sometimes be usually a problem where you're relatively little data so for example in
radiology you know difficult to get that many x-ray scans to build a good radiology diagnosis system so in that case you might find it related by different tasks such as image recognition where you can get maybe
million images and learn a lot of low-level features from that so that you can then try to do well on toss be on your radiology task despite not having damage data for it
when transfer learning makes sense it does help the performance of your learning algorithms significantly but I've also seen sometimes seen transfer learning applied in settings where toss
a actually has less data than toss B and in those cases you kind of don't expect to see much of a game so that's it's a transfer learning where you learn from one toss and try to transfer to a different task there's
another version of learning from multiple toss which is called multitasking which is when you try to learn from multiple tasks at the same time rather than learning from one and then sequentially or after that trying
to transfer to a different task so in the next video let's discuss multitasking
Loading video analysis...