Unlocking Personalization: A Deep Dive into Modern Recommendation Algorithms | Sarang Gupta

By Data Science Conference

Summary

## Key takeaways - **Long Tail Enables Niche Discovery**: Online platforms have unlimited inventory allowing niche products unlike limited shelf space in brick-and-mortar stores, and recommendation systems make these long tail items discoverable helping users find obscure content and generating significant revenue for businesses. [07:41], [08:35] - **Jam Experiment: Paradox of Choice**: In the tasting booth experiment, 60% of shoppers stopped at the booth with 24 jam varieties but only 3% purchased, while 40% stopped at the 6-variety booth but 30% purchased, showing too many choices paralyze decision making. [10:32], [11:12] - **Explicit vs Implicit Feedback**: Explicit data is direct user feedback like ratings and reviews which is high quality but hard to gather, while implicit data from behaviors like clicks and viewing time is abundant but noisier requiring more cleaning. [19:04], [20:09] - **Bayesian Average Fixes Popularity Bias**: Simply ranking by average rating favors items with few perfect scores like 5.0 from 10 reviews, but Bayesian average uses a prior from global mean and minimum ratings count to pull low-review items toward the dataset average. [25:04], [26:19] - **Collaborative Filtering Learns Embeddings**: Collaborative filtering decomposes the sparse user-item rating matrix via matrix factorization into user and item latent feature matrices, automatically learning embeddings without hand-engineered features to capture complex patterns. [33:45], [34:59] - **Two-Tower Nets Combine Approaches**: Two-tower neural networks have a user tower and item tower where neural nets process user and item features to generate embeddings, then dot product gives interaction probability combining content and collaborative strengths. [35:42], [37:34]

Topics Covered

Long Tail Unlocks Niche Revenue
Too Many Choices Paralyze Buyers
Bayesian Average Fixes Sparse Ratings
Two-Tower Nets Fuse All Signals

Full Transcript

great um so yeah I hope you all are able to see my screen but um yeah again Welcome to our Deep dive into modern recommendation system um the title of

this tutorial is unlocking personalization and we'll in this tutorial we'll explore how AI systems help match users with relevant content

and we'll go and do a deep dive into some of the modern recommendation algorithms um we'll cover both theoretical foundation and practical implementation of some of the

recommendation algorithms and I hope that by the end of the tutorial um you are able to um have a better idea about like you know how recommendation system

work and if you want to build those you can do that um for your company or as as part of your side project um so I think I'm sharing the wrong screen let me

share the right screen okay there we go okay so uh my name is sang and I am a lead data scientist at um a collaborative Work Management tool

called Asana I am based out of um Vancouver and today I'll be your tutor so this is the uh table of content

so the presentation and the tutorial is split into two parts um there is the motivation section uh where we'll talk about a couple of hypothesis and a couple of like studies that relate to

recommendation system and in particular why are recommend ation systems important and why are we even here um and then the second part is the tutorial

where I'll cover um a bunch of like you know different methodologies we'll start with the input data so looking at like you know what are the different types of input data that go into recommendation

system um we'll then talk about huris stics um how you can use business logic to power like really simple recommendations we'll then talk about three methodologies uh content based

collaborative filtering and hybrid recommendation and the tutorial is structured in such a way that we'll start with like you know basic recommendations and gradually we'll move

on to developing something more and more complicated and hybrid recommendations use steep learning and Ural net so I hope you're still with me at that point in the

tutorial great so let's talk about the motivation for this tutorial um so recommendation systems are ubiquitous they are everywhere um in

our daily lives so some of the key examples are listed here um so if you go on Amazon to shop for things or any other e-commerce website you would see a

page uh something similar to this here that shows your recommended items or if you purchase something it'll show you items similar to what you've already purchased um if you watch TV shows and

movies um this is a screenshot from Netflix and it recommends your top picks um based on your past history it also recommends things like because you for

something uh what are like some of the things that you might like uh LinkedIn I use it for jobs and career search so essentially um if if you're looking for

jobs it'll suggest you your top job picks based on your profile based on what you've interacted with in the past um Spotify um it has a Made For You

section so it curates a daily playlist based on your listening history and it like profiles profiles your taste in music and so

on um Airbnb shows you some recommended vacation rentals based on your past days and preferences and finally if you are on online dating even tender has a

recommendation system which matches you to relevant profiles based on things that you mentioned in your profile or your past swipe history um so again each platform uses

different approach to personalize content we'll go into some of the approaches in today's tutorial and presentation and in general um like recommendation systems are again a very

important part of our daily life they shape how we discover new content and make choices so I think it's very important to understand how they work especially if you are a data scientist

and machine learning engineer great uh so what is a recommendation system um the definition is simple um it's a type of information

filtering system that is designed to predict or suggest items or content that a user is likely to be interested in or is or or generally prefers or is likely

to interact with on a given platform um the whole recommendation system can be broken down into three important components so there's a

component for um the data the input data or the user preferences um again it can be explicit or implicit we'll talk a bit more about this in the tutorial

but explicit data is when a user gives direct user input like ratings reviews and likes implicit is when we use like you know users observed Behavior like

clicks viewing time watching in history and so on um okay so again brings us back to the question why do we need

recommendation system and um I think uh the audience of this tutorial because you all are here I guess you already realize the import of recommendation systems in our daily lives in terms of

how we make purchases and make decisions but I want to um I want to bring forth this idea through two interesting hypothesis and two interesting studies

that were like you know conducted and how they relate to recommendation system so uh the way number one um is something called the longtail um so

essentially the way that we shop has changed over the past decade past two decade and so on uh there used to be traditional stores the traditional brick

and motor stores where you used to go to purchase um DVDs or books and so on um and they could hold only limited inventory because the Shelf space was

limited they could only hold so much inventory and also um they could only hold mainstream products so basically only the products that are

really really popular that are profitable they would hold that but products that cater to like a specific interest or specific group of audience um those were tend to not be not like

you know included in included the Shelf because they were not super profitable they created to a small audience and um again there was limited shelf life uh

we've moved from those brick and motor concept to the concept of online distribution so with online distribution online platforms there's technically

Unlimited inventory there's not a limit of how many movies or books can a website hold um and this gives the uh

website the ability to essentially uh sorry this gives the website the ability to essentially like you know offer Niche products to uh to to like you know to

the customers and so on um and recommendation systems essentially helps users navigate this vast area of choices because um again the inventories become

unlimited there are Niche product that like you know a company can hold on so on uh great so

uh let's uh yeah so okay so let's talk about the longtail uh concept so this bring us brings us to the longtail concept so if you look at this chart which is which plots like you know items

over popularity uh there is the head so essentially these are items which are like really high impact popular they are fewer in numbers but they're mainstream

so these are the things that traditional brick and M stores were um were likely to uh were likely to like you know hold um and then there's these long tail

items which are low impact Niche many in number and obscure and given we now have transition from a brick and motor like you know way of Distributing things to

online way of Distributing things recommendation systems make these long table item discoverable and they help users to find things that they won't discover in through traditional browsing

so if we just show like no really popular items they would not discover these Niche items which again for a business can generate significant amount

of Revenue and this creates opportunities for both consumers and content creators so for consumers they're able to discover things that they like content creators they're able

to cater and make really these Niche products that that might be targeted to a particular segment of customers great so that was the first

study the second study uh that's um that's really interesting and relates to recommendation system is the tasting Booth

experiment and uh the setup of this experiment was basically so this was an experiment that was conducted by two Stanford researchers uh a younger and leper and they've set out to investigate

the effect of choice on customer Behavior uh so for the setup of this experiment this experiment took place in uh gur grocery store so an upscale grocery

store near Stanford University in the campus in California and the setup of this experiment was that they set up two booths uh Booth one and Booth two uh

Booth one contained um 24 varieties of jams and boot to contain like six varieties of Champs basically um and they wanted to test the impact of the

choice um quantity on customer Behavior so how many how does like you know the how does the impact how does like the number of choices that a customer has available to them impact like customer

behavior um so they try to control all the other factors so like you know they placed the booth strategically so that it has like similar foot traffic and other things like that the only thing that was different between the booth was

the number of samples that they had and the findings for the study was really interesting so so about 60% of

Shoppers in Booth one actually like 60% of Shoppers at stopped at Booth one whereas if you look at boot 2 which held like fewer number of samples only 40% of

uh Shoppers stopped at boo 2 so uh there was a lot of lot more foot traffic on Booth one but if you drill a bit deeper

and look at like the subsequent purchases uh in um in tasting boo one um there was only 3% of of of consumers Al

those 60% consumers that made a purchase but in boot two even though it's much smaller foood traffic 30% of those

consumers made a purchase um so the key Insight from this was too many choices can paralyze decision making among customers um and

this is um really um well depicted by this chart on the left which shows happiness of the customers plotted against choices so the happiness of or

customer satisfaction increases until certain point so unless until like you know certain number of choices are given to the customer they are satisfied and happy but if you give them are overwhelmed with overwhelm them with

like too many choices the customer satisfaction and happiness uh decreases and it is stressful for the customer and this is where recommendation system come

into play Given with online distribution there are a lot more choices recommendation system can improve customer satisfaction they can increase conversion rates by um simplifying

decision making and by recommending only a limited curated set of products to the user it helps eliminate this Paradox of choice so helping hopefully uh like you

know Drive these many choices where the customer satisfaction is is really low um so great so we've talked about

why um recommendation systems are important From perspective of the longtail uh longtail Theory and the tasting Booth experiment so before we

jump into some of the theory for our tutorial I want to mention a brief history of recommendation systems um so the history of

recommendation system traces back to late 1970s and it has significantly evolved over the decades into a key technology that is now used by a variety

of Industries um across different sectors um so the first known recommendation systems the early Beginnings were actually developed in

late 1970s by a scientist called elain Rich um at UT Austin U University of Texas

Austin and she developed a system that was called Grundy and what Grundy was was It was a computer-based librarian which was designed to recommend books to

um to like the students based on their user preferences so Gandhi was a really simple recommendation system what it did was it worked by asking users question

and classifying them into certain stereotypes based on their responses and the system then recommended books uh that match the preferences of the users or based on

like you know what they had filled in a survey it classified them into like certain categories and give the same recommendations to like you know all the users within the

category um in 1990s in early 1990s was when the collaborative filtering approach became really popular so this was uh an approach that was developed by

Xerox and it's again uh in today's world it's a widely used approach uh Xerox developed this approach um for one of the products that they were launching

and it was called like tapestry which was a document management system and tapestry introduced collaborative filtering and allowed users to manually rate documents share their opinions with

others on things that they received in their inbox and this method or tap3 basically laid the foundations for future automated recommendations

algorithm it uh brought collaborative filtering into picture which is still a very widely used um recommendation system

algorithm um in late 1990s um Amazon started using collaborative filtering for product recommendations and what their system

did was it analyzed user behavior and preferences to suggest items that other similar users uh purchased and it was hugely successful for Amazon and their

success um in increasing sales through personalized recommendation it spurred widespread adoption for recommendation

system particularly in e-commerce oops um in in 2000s uh Netflix started using um recommendation system again

Netflix is one of the pioneers of recommendation system you would often hear uh a lot of cool technology that Netflix ships uh in terms of like you

know the way that they recommend movies to the users but uh they launched a famous contest called the Netflix prize in 2006

um and the prize was that like you know whichever team or whichever research group is able to improve the recommendation system or recommendation algorithm by 10% they would receive a

price of $1 million uh which was great back at that time and the winning solution U pioneered hybrid recommendation so it used an ensemble of 107 different

recommendation algorithms and um it it demonstrated the blending of like you know multiple approaches and how it multiple approaches when applied together can

enhance recommendation accuracy um and now we are here 2010 and Beyond um recommendation system have

become a ubiquitous part of our daily lives we saw we see them on YouTube Spotify Facebook a lot of different online platforms that we interact with there have been a lot of advances in

recommendation system particularly with the Advent of deep learning and deep learning coming mainstream and they become an integral part of our lives so this is a brief history

um as to how recommendation systems evolved across the years and how theyve become an integrated part of how we consume content online and even

offline so let's jump into the tutorial now um enough of theory um and in this tutorial we'll cover four important aspects of

recommendation systems so we'll start with a jistic based approach which uses predefined rules and business logic to make straightforward recommendation then we'll jump into

content based recommendation so it recommends basically items to what users have previously liked by analyzing features of like you know other items as

and then recommending what items are similar to the items that user has interacted with uh we'll then go into collaborative filtering which is again a very very widely um used approach and a

very popular approach which suggests item based on preferences of users with similar taste and similar taste patterns and behavior and lastly we'll go into

hybrid recommendation system which combines multiple recommendations approaches together and leverages the strength of each of the method and this is a deep learning approach um this is

complicated so um again we'll go into this and hopefully you're still with me at that point of time as we talking through these different approaches I have structured the tutorial in such a

way that we start with something that is uh simple So htic based and gradually we'll look into more and more complicated approach so to get you warmed up as we look at like you know

more complicated and more sophisticated approach of approaches of like recommending items to users so uh let's talk about the input

data uh I briefly mentioned this in one of the previous slides but there are two main types of input data there is uh the

explicit data and there is the implicit data so explicit data is data where some sort of rating is given by the user so

not rating but in general like you know it's basically a direct user feedback so in case of let's say Netflix is developing a recommendation system it has thumbs up thumbs down users can rate

movies 1 to five uh they can give reviews similarly on Amazon a user can read the product from 1 to 5 they can give reviews and things like that so this is direct user

feedback it is really really high quality data because a user has given their feedback or their preference for the

item this data is hard to come by so it's difficult uh to gather this data this data might not always be available as users might not spend time to rate it

items or you know they might not even be applicable to the recommendation system that you're developing the other type of data that we have is called the implicit data and

the data that this data is basically the behavioral data that we gather from like you know users interaction with a given product with a given website and things like that user do not give any specific

ratings but essentially we look at like you know specific actions that a user performs on a given platform so let's say we are building a recommendation system for Spotify um it looks at like

you how many times a user has played the song um if you're building a recommendation system let's say for Netflix it might look at like you know

what movies you click at um click at which movies you navigate to if you're building a recommendation system for Amazon or other e-commerce website we

might look at what items a user clicks or adds to their card and so on implicit data is a a lot more abundant abundant than explicit data because technically you don't require the user to give a

direct feedback you are looking at their direct interaction with their website but it's also a lot more noisier than the explicit data and it might not

directly U directly like you know Express a user's preference and there is a lot more uh data cleaning and rling someone that needs needs to do in order

to use that in the recommendation Systems Great so we've talked about our input data Let's uh talk about uh the different methodologies that we'll cover

in our tutorial so I'll start with some basic basic theory on each of them and I think this would help you to understand um like you know the code more better as

we look into like the tutorial and as we go through the code so heuristic based recommendations are the simplest form of recommendations they use predefined

rules and business logic to make make straightforward recommendations based on behavior and patterns and again there can be different heuristics that you

might use for recommendation systems and those depend on like your business logic and on your product uh some of the common heuristic based recommendations

include recency based or popularity based so for example popularity based recommendations recommends the most popular items to to users

IMDb for example has something called IMDB's top 2 250 movies this is also a very KN or like you simplistic recommendation system which basically is

a popularity based recommendation system so it ranks movies based on how popular they are and so on heuristic based recommendation systems are really simple to set up um

so that's a big Pro they only require metrics like sales or views and it makes it like really really quick to deploy them they also do not suffer from cold

St problems so let's say if a user is joining your platform you do not need to gather any data about the user you can directly recommend in the most popular movies or most popular products based on

other users interaction so it does not suffer from a cold start problem uh but the cons are pretty obvious it lacks personalization um so essentially it

does not reflect individual users performance your giving generic recommendations to everyone so that's a pretty common cons this is again like a very simplistic way to recommend things but it is still used across a bunch of

different products specifically for solving cold start problem when you don't have data for your user um so I want to talk a bit more about the popularity based

recommendations um in terms of how the number of reviews are accounted for so let's say you have three items let's say you're developing an online e-commerce

recommendation system you have three items users have rated these three items uh there's item a which has a rating of five star but it has only received 10

reviews there's Item B which has a rating of 100 uh which has a rating of 4.8 but it has only received 100 reviews and then there's item C which has the lowest rating among these three but it

has retrieved uh it has received 1,000 reviews so a lot more compared to item A and B um so the question here is can we just simply rank these items based on the

rating so should we rank item a above Item B and C because it has a rating of 5.0 much higher than Item B and C but it only has 10

reviews um and the answer is no um generally we want to see the item have at least certain number of ratings

before we can have them on the top and there's a very popular approach that is used to do this and this approach is called ban average imdb2 250 list that

we actually saw in the previous slide actually do this so what it does is it is a basan approach so it develops a prior which is C and it's essentially

the average number of ratings that all your items receive from all your users um across across all your data set so it's essentially used to set

a prior and then you have minimum number of ratings that are required so again you use this to um set your prior and then finally you have your mean rating for the item that you're interested in

calculating the patient average for and the number of rating for that item and what this formula foration average does is it takes a weighted average for your

prior and your item and the rating for your item eventually gets pulled towards the average rating in your data set if there are very few ra ings for that so

if let's say for item 10 there are very few ratings what it would do is it would eventually get pull towards what the mean of all your ratings in the data set

is so if you we apply Bas and average we'll see that the item a is pushed to the bottom of the list in terms of

recommendations for popularity and Item B and C are boosted up there and this makes a lot more sense because again we do want to recommend an item that's just

fitted like you know one time or like 10 times because a user that generated that content might themselves rate the item so very small number of ratings are not

that reliable so as you're developing popularity based recommendation system you would want to apply this technique of B average um again used very widely

across um across the industry um for example the IMT v250 great so the next type of recommendation system that we'll talk about is the content based filtering

again it's a step above the popularity based recommendation system it takes into account user preferences and what it does is it uses

item features to recommend items that uh are similar to the item that a user has liked um and it looks at like you know particularly explicit feedback and it

tries to look at the features and characteristics of the items that a user has already liked so let's take an example for um a movie recommendation

system let's say we had a bunch of different movies uh Shrek Harry Potter dark n Rises momento and triplets and we

can place them on on two axises so one of the axis is um the genre so whether it's a children movie or it's an adult movie and then we have another access

whether it was a Blockbuster or a mainstream movie um and we can place these um place place the movies on this like new two dimensional space we know

about what these movies are we know their plot we know their ratings and so on so we can place these movies um what we do then is let's say

if a user has watched Shrek and given that movie a rating of five star um we would recommend Harry Potter to that because this movie is similar to Harry Potter is similar to Shrek it's a

children movie it was a blockbuster and so on um so this is a simple content based filtering system it places items on end dimensional space based on

certain features and if a user has a high interaction with one item it recommends items that are similar or nearest to that particular

item uh major limitation for this approach um feature representations need to be hard engineered so you need to know what features you want to encode

for a given item so you need to know these different features for example in this movie recommendation system you need to know whether the movie is a Blockbuster movie or it's a children or

an adult movie um and it can only make recommendation the system can only make recommendations based on existing interest of the users so it does not

include that component of ciput so um it's not able to expand a users's taste because what it does is it just looks at the movie that a users watched and it

just recommends simple uh movies that are closer to the user based on like you know certain characteristics and so on and so forth so again very simple

recommendation system just looks at um user preferences and recommends items that are similar to the user preference um and you would have seen this in a lot

of systems this is again like widely used for example if you see um on Amazon it has a section that says people who've watched people who've bought this have

also bought this so it's basically based on like you know content based filtering um where it like you know tends to analyze analyze the the uh features and

characteristics of a product that a user interacts with great um the next approach again the most popular approach um I guess I

would say in today's recommendation system is the collaborative filing approach and it uses basically similarity between uses and items

simultaneously to provide recommendations the key component of collaborative filtering is a matrix

which looks like this um it's also called the user and item Matrix and what it does is it essentially uh the input

data to this approach is this Matrix where we have users as our x-axis and the it s or movies if you're developing a movie recommendation system as the U

as the y-axis and the values in this Matrix are the interaction or the preference of a user with a given item so over here if you're developing a

movie recommendation system based on explicit feedback the value in this Matrix are the stars that a given user gives to a given movie so let's say user

one created these three movies um and this made Matrix is very sparse because generally you tend to have a lot of movies or a lot of items in your catalog

and a lot of users but users in general tend to interact with a very limited set of items so it's a very very sparse Matrix when I see spars it's basically a

lot of values in this Matrix are empty um so what collaborative filtering does is basically it looks at interactions of a given user and then it

tries to recommend item to a given user based on interest of another user that the model or the approach thinks is

similar to a given user so let's say let's look at user one um and user n so we see that user one and user n have given a very high rating to this movie

called Shrek um they've given a very low rating to this movie called momento so what the model does is it figures out that user n and user one are similar to

each other and because user one has watched or given a very high rating to Harry Potter it would give the same it would recommend Harry Potter to user n

because it it thinks that user n and user one are similar to each other based on how they've interacted with like you know different products so one of the big advantages of

collaborative filtering is that like embeddings can be learned automatically so as compared to content based approaches you do not need to rely on hand engineered features so you do not

need to know particular features about your users and about the movies so you do not need to know whether whether Shrek and Harry Potter a children movie or adult movies whether they're a block

Blockbuster or occas movies um the collaborative filing approach basically automatically figures those latent features or those features out based on

users interaction with the movie and basically it can capture uh these complex patterns and relationships by learning these features automatically based on how users interact with

different movies so how collaborative filtering works again um collaborative filtering uses something called Matrix

factorization which decomposes this user item Matrix um again like you know this is the user item Matrix that we looked at in the previous slide into two

matrices which are user cross latent feature and movie cross latent feature and there are multiple approaches to factorize a given Matrix there is the

SVD singular value decomposition there's non- negative matri Matrix factorization and so on uh we'll cover one of the approaches in our tutorial but there are

multiple approaches and it decomposes that Matrix into two Matrix that capture these hidden patterns or these latent features so basically again you are able

to figure figure out certain features or certain characteristics of users and your your items but you do not need to like you know hand engineer them or you

do not need to know these particular Dent features because what this approach does is it automatically filters these like these latent features out based on how users interact with uh with items in

your catalog and this Matrix factorization is the key component of collaborative filtering uh once you have user cross latent features or you have movies cross

latent features you can do a bunch of things you can calculate what movies are similar to a given movie um you can figure out what users are similar to a

given user and so on so it's able to capture these hidden patterns without actually um are specifying or hand coding these specific features um for a

given movie or a given user so really really cool approach uh really simple um and again we'll talk about this in a tutorial a bit

more cool so the last approach uh that we'll talk about is the hybrid recommenders um and in particular we'll talk about one particular type of hybrid

recommender which is called two Tower neural networks um and it's one of the most popular form of hybrid recommenders um and what it does is it combines the goodness of all the

different approaches that we've talked together uh so one of the drawbacks of collaborative filtering is that we specify user item engagement but we are

not able to specify item characteristics and user characteristics so let's say like we have really really uh good features or really good item characteristics that we want to input in

our model um we can do that for collaborative u in content based fil in sry content based approaches uh but we cannot do that in collaborative filtering but at the same time we want

to include um users interaction with the items and like you know other similar users and things like that we can do that in collaborative filtering but we cannot do that in content based

filtering hybrid recommenders to Tower neural Nets helps us do Best of Both Worlds uh so essentially in hybrid recommenders there are two towers as the

name suggest there is the query Tower um or also called the user Tower and then there's also the candidate Tower also called the item

Tower um Within These Towers um you can feed in your user features and item features there are neural Nets basically you can uh design neural net

architecture based on based on like you know however you want to design it um and as we train the model it eventually generates user embeddings and we then

can take a DOT product of these embeddings and generate a similarity score which essentially tells How likely a user is to interact with a given item or basically a user

preference um and our source of Truth again this is a Super Wise learning approach our source of Truth is uh given users interaction with a given product

um but at the same time we can also uh incorporate the user characteristics and and the characteristics of our items uh through these neural networks or through

these separate Towers so again very very powerful approach um this is also used very widely in in in

recommendation systems that require um that have a lot of like you know user data or like you know item data or item metadata user metadata that

could be really useful in in in like you know ining recommendations great okay so that's enough of theory let's jump into a tutorial

um so in this tutorial we will particularly talk about uh the movie lens data set uh we'll try out these different approaches on the movie lens

data set basically movie lens data set is the Titanic data set of recommendation systems if you've uh played around with Titanic data set it is the um it is the standard data set

that's used to teach machine learning classification algorithms and so on and similarly mov lens data set is this Titanic or very standard industry data set for recommendation system um movie

lens data set was developed for research purposes it's non-commercial developed at University of Minnesota and there are different versions of the movie lens

data set based on number of values that are or major number of like you know ratings or number of rows that are available so we'll particularly look at

uh the movie lens data set that has like 1 million rating um again 10 and million uh rating data sets would require more compute and then 100K I

felt was uh was like too small of a data set for us to experiment these different methodologies particularly neural Nets so we'll be looking at this data set uh

you can access this through this link that's mentioned here at the bottom um sorry uh yeah uh you can access this

data set through this link mentioned here at the bottom uh um and if I click on this link it takes me to here uh to

this page um which is the page for movie lens there are different data sets available based on how I mentioned um mentioned like you know the types of

data set on the previous slide in particular we'll be looking at this data set so if you want to download that um you can download that so this data set has about 1 million r

from 600 users and 400 movies it was released in 2003 so it's still outdated um it has movies that are like you know quite old but it's really

really good for Learning and educational purposes so if you could download this um just download this Z

file for this tutorial we would also um all the material is available at this GitHub link so if you click on this link

it will will direct you to um a GitHub it's a public repository which looks like this and um you essentially can follow through all the notebooks um that we'll

go through right now uh in this here there's also the data set so if you don't want to download it directly from from the movie lens website uh you can essentially download this from from this

GitHub page I recommend that you download these CSV files the files um contain certain um pre data that's like not processed

yet so I processed that data you can like you know find that in the CSV file great so that brings us to the start of

our tutorial um and before I jump into it I want to see if there are any on any

uh any questions so far great okay so can you share the link yes so let

me send the link on the chat this is the chat uh this is the link to the GitHub great any other questions besides

question from M oh cool so there's a question from belal that says is there any recommended system which does

synchronicity uh I'm not totally sure what you mean by synchronicity here um B would you would you clarify

that okay so um if I understanding the question correctly um synchronicity probably means that

it's able to use both content based features as well as like user engagement um there is uh we talked about like the two Tower neural networks

that um that does that okay so the simultaneous occurrence of events which appear significantly okay so belal says that synchronicity means the

simultaneous occurrence of events which appear significantly related but have no discernable causal connection yeah that's a good point B uh so hopefully uh

two Tower neural networks are able to do it so let's say in collaborative filtering a user ended up rating two

movies that appear to be related to each other but maybe it's a it's an anomaly because two towel neural networks are really really powerful they encode a

bunch of like you know other features uh hopefully they help deal with that problem as well um so hopefully that answers your

question um not sure but yeah feel free to ask a followup question if it if it does not okay how about serendipity yeah

great Point um so serend dep um so for folks uh who don't know know what Serendipity is Serendipity basically means um means that like you know

sometimes like in recommendation systems uh recommendation systems tend to focus on particularly on things like that user have previously interacted with so let's say if I watch a lot of movies that

relate to like you know Thriller um action and so on and so forth I would be recommended a lot of movies that are like you know very similar to like U very very like you know Thriller action

oriented and and so on um and if it is often times interest it is often times like you know useful to give users the ability to like you know explore new content so like you know

broaden their taste because sometimes user might like things that they've not previously interacted with or things that are like not similar to The Taste that they have indicated through their

previous interactions and so on so uh yeah we include Serendipity in um recommendation systems uh one of the ways to include Serendipity is basically

you can do like something hacky or something heuristic based um which is basically you can try to infuse some things that are like you know really really popular into

recommendations um of a given user based on their previous interaction so let's say like you know again if I interact with thriller or action-based movies and we want to introduce that

component of serendipity what we could do is we could give like you know top um like let's say top three movies that are like you know not not trailer or um action based so let's say like you know

I recommended a movie that is comedy but it's like really really highly rated and then we see we look at like a user's interaction with that particular uh new component and if a user uh gives that

like you know that gives that like comedy movie a very very high rating uh we this would be incorporated in our recommendation system because they've rated that really really high and

hopefully in the next iterations of like the recommendations that they presented those move movies related to Comedy are shown up shown um and if a user did not

like it that would eventually be eliminated in it uh okay so great there's another question from Bal I did my Master's thesis based on POI recommended systems

which use serender uh my result was good based on only Serendipity not with novel um also I used only

those users which have data existed but not for the new users it was a statistical algorithm not AI how do how

to do it for those users which are new H so I guess like your questions around how to use

Serendip for users which are new um okay so send deput for users which are new I guess like

um again I think it would be like really really um interesting to look at your master thesis but uh I think like from my understanding and from my experience working industry like I think for new

users what we generally try to do is um we give things that are popular but again like you know your popularity can you can like tweak your popularity based recommendations in like you know certain

fashion so you can like try to include recommendations um you can try to like segment your recommendations your popular recommendation so instead of going through all your popular recommendations that might be like you know Thriller based in terms of a movie

recommendation system you can give like you know popular recommendations from uh different uh different shanders and see what users interact with right

uh there's obviously a bias component here where like you would see that recommendations or like you know movies that are up there on the top they are interacted with more but as your user

gets accustomed to your platform you would they tend to become like you know your existing user so you can eventually use the same phenomena that you used with your existing users to to introduce

that component of s so if a users user tends to interact a lot more with comedy movies but um you think you want to introduce like you know Thriller to to a given user you could like you know try

to sneak that in into into your recommendations for the user so uh that's one of the DAT ways to do it um there might be some things that are like you know more scientific but in my industry experience that's like you know

one of the things that I've that we've um used previously and that has tended to work uh work really well in terms of like you know figuring out what are the things that a user might might be interested

in great okay so um I want to jump into the tutorial now

um yeah uh great so looks like the link was incorrect um thanks for catching that John

Baptist um great so yeah the links in the chat um feel free to look at this link again the way that this tutorial is structured is okay so I'll quickly walk

through what this what this uh repository looks like basically U there's this data uh 1 M which essentially has the data that from from

this movie L movie lens uh website um I suggest that you download the CSV files again because those are pre-processed um the D files are not pre-processed um so

yeah recommend downloading those and there are a bunch of I python notebooks here um all notebooks have two uh two

categories so there's a notebook called doore T and there are notebooks without the T in this tutorial we'll go through the notebooks that have p in the name

um these are basically I've created a version of The Notebook that has some fill-in the blanks um so I think that'll make things easier as we like you know walk through it but all the solutions

are available oops I realized that I'm not sharing my screen and I apologize for that okay yeah so this is what the repository looks like um so there are

two version um versions here uh the T version basically um is the tutorial version the one that we'll walk through here it has has some fill-in the blanks uh makes things easier you can like

again practice as you're learning through it um and the solutions are basically available um on here so um the one without d uh they're available here

um so if you want to look at the solutions you can directly look here great so let's jump into it and I want to start

with uh I want I'm going to start with the introduction and Eda uh notebook and I will share my screen great one second hold on let me

just set my environment [Music] up okay so okay so the way that this tutorial is um let me add another cell

here great okay so the way that this tutorial is structured is it's broken down into seven

steps um we will actually no I'm wrong it's not this one yeah let's use this one okay so it's broken down into seven steps I'm using

the doore T file so again recommend you to use the T file uh but basically it's broken down into seven steps um we'll

start with loading the data and doing some exploratory analysis um we'll um look at heuristic so we'll look at like popularity based

recommendations we'll look at content based filtering and collaborative filtering the approaches that we talked about earlier and then we'll go into deep learning um I've so in deep

learning section itself we'll cover like you know some large language models um so like you know how you can incorporate like large large language models in your recommendation systems

and so on so that will again like be part of part of like you know our deep learning tutorial itself so this is the flow and I have structured all the

notebooks basically through um through this so um all the notebooks are like essentially labeled through like all the different steps that we'd be going

through great so um we'll walk through the movie lens data set uh okay so let's go into it so first chapter um loading

the data so again um you would find all the data in um in in in in the data folder data-1 and here so I'm just

loading the data um in the CSV files the data is separated by tabs so it's technically not a CSV it's a tsv tab

separated value so you can just add uh the tab here and let's run this and let it load the

data cool our data is loaded um let's quickly print the first rows of our movies data set so I'm going to do movies.

head uh there are three data sets there is the movies data set which contains movie IDs titles and genres so what the

movie is what is the type of the movie animation children comedy and so on so forth there is the uh um there's a users data set we look

into it but let's quickly uh investigate the movies data set a bit more so I'm going to print the

info for the movie data set great so looks like we have a total of 380 3888 3883 movies um looks like there are no

nulls but basically there's a movie ID there's a title and then there's a there's just shaer in there um I want to investigate this data

set so what I'll do is I will um I will build basically a word cloud uh what word clouds are are essentially um it's

basically like what we can do is we can look at like what are all the words in the title for uh for movies and that will give us a picture of like you know

generally in in the 1990s which is this which this data set is from 1990s 1980s what were like the most common titles in the movie so uh there's this quote for

uh generating a word cloud but basically there's a library that I'm using it's called the word Cloud Library um it has stop words um in it so basically in

order to remove words like you know the is uh and and things like that we generally remove those words from uh from from like from the titles uh we

pre-process the title a bit here and then we can use our my plot Li library to quickly plot uh to cck to plot like you know the title word cloud that we

generate here so I'm going to print this and then whoops I also need to generate the title Corpus

here uh it takes a few minutes to run and there we go uh this is our word cloud so basically what it does is IT

sizes the word based on um based on like you know how how much it is being mentioned mentioned in the titles of the movie so you would see like you know

back in the 90 late 90s a lot of movies mentioned man love night Day dead and so on so this is just like pretty cool as you're like you know doing exploratory

analysis for your data set uh we can also plot the genres so in order to look at like you know what were the genres or the categories of the

movies uh we can quickly look at um what the counts were in the genre so what I've done is I'm using pandas to split

the genres so if you look at our movies uh genres they are actually segregated by this uh this like you know pipe

operator so I'm going to uh I'm going to like do a quick count of how many I've like split the genres by this pipe operator and I'm going to do a

quick count of like you know how many genres are there and then I use my standard C libraries to plot this out so

uh basically I plot the values on the xais and then I plot the index basically once you do genres. count it uh generates uh it generates like a p a Ser

a panda series that basically that has like you know index as the genre and the values as the uh number of times that was mentioned so cool so this is the

distribution of our movie genres uh um looks like a lot of movies back in those days were drama so we have about 1,600 movies that had a genre tag with

drama um a lot of movies comedy and action and Thriller and very few movies that were like you know fantasy Western animation and so on uh pretty

interesting um the second data set that we have is called users so let's look at the users data set uh pretty standard uh

if I do ahead um it has user IDs it has a bunch of information about our users so um like you know the gender age bunch of like

demographic characteristics of our users uh let's do a users. info here and um it

shows um we have 6040 6,40 users and it also uh it shows that there's basically no NS uh which is

pretty good uh great um let's quickly plot um maybe in the interest of time I'm going to skip the plottings here but you

can take a look at the solution uh which essentially shows the demographics plotted um for our um users so basically like we have a

lot more Mals in our data set what the age distribution is uh what are like the top 10 occup occupations um and again like the age distribution for different occupations so this is really

interesting uh it's not directly applicable to like you know things that we would be talking about next so I'm going to skip that in the interest of time but you can take a look at this

solution notebook and it would uh show uh it would show this uh basically great okay so the third data set that I want to quickly uh talk about

before we go into the method ologies is the rating data set um so I'm going to print ratings.

head and I see uh we're going to ignore these two columns for now but basically I see for a given user and a given movie what is the rating and the time stamp

for when they uh give the ratings uh these are some noisy columns u in the data set so let's just drop them quickly

so just using my standard python data set function to drop these

uh oh yeah sorry there been already been dropped so let's just do a rating Start Head cool so this is what a rating data

set looks like uh ratings movie ID so what is the ratings that rating that are given users given a movie and what is the time stamp for

that uh let's do a quick ratings doino and um yep um looks like there are no nulls

great um let's also plot the distribution of ratings uh so what I'm going to do is I'm just going to use my standard Mt plot lib library and then plot the

distribution of rating so as you can see here um looks like the ratings are skewed towards four and five uh 3 four

and five so users tend to give a lot higher ratings uh but a lot lower ratings so it looks like there's quite a bit of bias in our data set uh but that's fine let's uh play around with

this bias um but that's that's very common like you know if you're looking at any recommendation system or if you're looking at like you know any ratings you would see a lot of users tend to give ratings at the extremes

because generally whenever user like users are like super happy with the product or like you super dissatisfied with the product that is when they would like you know tend to like rate given product or something if it's like

average uh they tend to like you know ignore it and stuff like that so that's like a very common problem across like you know all the all the data sets uh that are in recommendation systems cool

so I'm also going to do a quick describe of the data set so um again we have about a million ratings that's the one M

data set that we're looking at um minimum is one um and whoops for the rating sorry so the minimum is one looks like the 50th

percentile is four which is uh very very interesting so again our data set is kind of like skewed towards three and four and five rating so the 25th percentile is three but again like I

think this is all rounded up um I think the 50th percentile is about like 3.5 is or something when I investigated it

last uh cool so I'm going to combine my uh data sets uh basically I'm going to merge my movies and ratings data set and I'm going to put them in data set

combined um I've already in the data 1 M folder there's already a data set combined but I am just going to combine them because it's just like good to look at everything together and that's like

the main data set that we' be looking at cool so this notebook has a bunch of Eda again uh sorry for rushing through it but uh feel free to take a look at the

solutions it has like you know all I've also written some notes and things like that um that that talk about like you know the General Ed that can be

done great okay so let's jump into uh our second tutorial where we'll talk about

um uh in this tutorial uh one second let me just open up my notes great okay so in this tutorial we'll quickly talk about popularity based recommendations again these are very simple

uh nothing so complicated so this is a pretty short notebook basically I'm going to import my standard Library so

import pandas import nayas NP oops not no to my kernel uh going to read my data using the read CSV

function uh this is the data set combined that I'm looking at basically what it does is it combined users movies and ratings into one data frame so for a given movie we also have its description

as well as like you know the rating that different user has given as well as the demographics for the user um again in popularity based recommendation systems basically what we

do is we look at what movies are the most popular ones so what I'm going to do is I'm going to group my data frame uh by the uh

title and then look at the rating what is the average number of I'm going to look at the median so what is the median rating as well as how many users have given a rating so I'm looking at both median and count and then I'm going to

sort my values by the median so again looking at this uh looks like to live a 1994 movie has

the highest median actually not the highest median but a bunch of movies have a medium of 51 but again as I mentioned in the slide um we should not directly use popularity

based recommendation systems because we need to take into account how many times a given movie has been created uh so I use the basian average um essentially so

this is what we talked about um basian average basically establishes a prior um it looks at uh what is the global rating

across all your movies uh you can set up a minimum threshold here uh so essentially um what is the minimum number of movie minimum number of ratings that you want to look at and so

on so this is like a common formula that is used for for for calculating like a patient average so um I've written down

a function here uh basically we calculate this so we multiply C by C into M where C is our Global mean rating um so the global mean I'm taking a

median here so I'm taking the median um and then I multiply that with the count so here and then to calculate what is

the threshold that I want to apply on my minimum number of ratings um I can basically take a quantile so basically

I'm looking at what is the quantile of the number of ratings that a user has provided on a given movie so let's say if you have

five movies and uh the Quant is like you know 10 ratings let's establish that as our minimum number of ratings that we require so I'm

just going to run this cell and then again sort the values by

basian average and then generate ahead here great so I see the ratings have completely changed from what I see here I see American Beauty a very very

popular classic movie high up there has a very high count has a very high Bion average rating Star Wars sa PRI trian Matrix Silence of the

lamps very very classic very popular movies from the '90s um all being populated here and if I

look at this movie called Slavs Brer Brothers of sleep uh that has a count of one if I click on this oh I see it has a

ban average of three so this has been downgraded from 5 to three because again it has a very low count rating great so

uh this is our htic based recommendation very simplistic but if you're using popularity based heuristic recommendation I encourage you to do a

ban average to account for the count of ratings uh awesome now let's jump into

our next approach and that is the content based filtering and before I jump into content based filtering let me take a look at the chat quickly to see

if there are any questions looks like no I think we are good to proceed um let me go back and share my screen

and again this might feel a bit rush and I apologize for that um I suggest that you look at these notebooks and the

solutions um these are all provided on the GitHub link um all of them have really really detailed descriptions and comments so hopefully it helps you understand the code as I'm walking

through it great okay so let's let's look into our next uh the content based recommendations so again

doing the standard stuff loading my libraries pandas numpy matplot lib reading my data

sets So reading the ratings data sets the users data sets the movies data set and the combined data

set cool all read in um and again content based recommenders we talked about the theory a bit but basically if a user watches a given movie we find

similar movies based on genres actors director storyline and so on and then recommend those similar movies to those so you need to hand code those

features great okay so there are multiple steps that one goes through for building a Content based recommender system uh this is a flowchart here

basically we'll extract um features of different items so in our case uh we don't have a lot of features because this is an educational

data set but we would mainly look at the genre of a movie that's like one of the features that we have for our movies so if you look at our movie state of set

um it has the only feature that we have is shre so we're going to find movies similar to a given movie based on the genres that are encoded and a given

movie can have multiple genres um so that is an important information that we would use um so let's start with U pre-processing

pre-processing our genres a bit so again genres are currently encoded in such a way that there is a pipe operator between all of uh between like you know

all the genres that a movie is part of so I'm going to split this up and then put this into an array so I'm going to split my movie genre column I'm going to

split it by this pipe operator and then uh I'm going to fill any so there might be like even though there are uh there might not be a null

in general this data set does not have a null but I'm just like having this just in case uh there is uh there is like just in case like you know there's there's something there's an anomal or something so I'm running this

preprocessing code and when I that I see that my uh that my genres are split into like array um and this array is

indicative of like you know what the shra a given movie title is encoded into um and for computing the features

of a given movie I'm going to use tfidf um again tfidf is a very popular industrywide approach basically

what it does is in our case what it'll do is it'll basically look at um it it considers like every movie as a separate document and then it looks at like what

are the different encodings for every movie in terms of like you know what are the genres so let's say if like a lot of movies in our data set have a genre of like you know children it like downway

children genre and then if a movie is specifically like you know um has a specific genre of like animation and very few movies have animation as their only genre it would like you know weigh

those movies together so this is a way to essentially encode features for U for like you know it's only using like natural language processing but this is

a methodology that we can use in like you know our data set as well so um psychic learn has a tfidf vectorizer so I'm going to

use a tfidf vectorizer here from Psychic learn and um I am going to use I'm going to

remove top wordss the English top wordss here and then I will just fit my data set

on a tfid factorer so what it has done is it's for every uh movie in my data set it has built 127 features it has built 127 features because I've also

used an NR um and nram basically looks at like you know words that might be co-occurring together so it has like 127 features which are

uh basically like genres of a given movie and once I have this Matrix I essentially can calculate cosine similarity um again a bunch of like different similarity metrics some of you

might be aware with like cosin's distance uh you C in distance man and distance and so on so I'm going to use cosine distance for here um again site

kit learn has cosine similarity by 10 so going to use cosine similarity uh going to calculate cosine similarity of my tfidf Matrix with

itself so if you look at our tfidf Matrix dot dens 2 D I think oops have that

attribute okay so this is what a matrix looks like so this has all the movies and then all the movies are encoded based on the it shows zero because again

there are 15 127 values in here so some might be populated in between but yeah so I calculate coine similarity um and then I have a function

here that basically uh given a movie title it calculates the most similar movies to that given movie so running this function and I am going to

calculate similar movies to this movie called Toy Story uh so if you run this function directly and I run this it gives me the

similar movies to the movie byy story and as you would see um I think this makes sense to me um Aladin again a kid's movie animated movie uh American

Tale I haven't seen it so I don't know what movies it is Rugrats bug lives Toy Story 2o very similar to the movie here so if we using this methodology using

tfidf and calculating similarity scores um if we input a movie we can eventually calculate uh what the similar movies are um similar if I look at the Matrix

it gives me very similar movie to the Matrix so Nemesis um you know Nemesis is a movie that I've not seen I've seen Terminator

but yeah movies along the similar themes great so once we have these things that we've generated um content based um recommend ations um what

basically you can do is like you know if a users watched a movie rated any movie five star you can just recommend all similar movies to that based on based on

this content based recommendation systems uh great so there is another way to calculate um recommendations for a

user using content based so one of the ways as I mentioned before was that you can technically just recommend the latest movie that a user has given five St to and then you can Surface all the

similar movies to that the other way to do it is something called U is using something called user profile creation and what user profile creation does is

that you know we have these vectors these tfidf vectors that we've created basically for a given user um all the movies that a given user has rated

highly let's say four or five you can average those vectors together for a given user so let's say I'm user one I I've given movie five and six as as a

really really high rating or rating of four and five I can average those vectors I can take a simple mean or simple average of those vectors so let's say I've given these two movies very

high rting I can just average them and then calculate similarity to those uh to that average Vector so calculate the similarity of the movie to that average

vector and this is good because instead of relying on just one movie you are generating a user profile uh using the average of all the vectors so like you know the user profile is like all the

movies that they've wred highly uh just averaging them and then you know this is basically depicting a given user's taste because for every uh movie that a users watched we are taking that and like you

know taking that as a component in in this user profile so the code here basically um walks through it I'm going to skip that again in the interest of time uh but

basically it creates a user Vector it averages um a user's profile based on the movies that they've watched and then it calculates those uh similari so given

for every movie in our data set it calculates the users calculates the movie similarity with the profile that we've created for a given user and then simple standard functions for

calculating uh calculating recommendations for the movies uh so again Solutions are here in

the nont version and you can look at like you know once you create a profile you can just enter any user ID and then it will output the list of uh movies that we should recommend to that given

us sir great okay so we have five minutes um I am going to uh talk about

collaborative filtering quickly because this is where I think uh this is like the most popular approach to be honest like in industry I've used this the most

um even over like neural networks this is really really simple really intuitive and that's why I want to spend like the next five minutes talking about like you

know collaborative filing approach um great so opening this notebook 04 to collaborative filtering uh let's um let's load a bunch

of libraries and I'm going to just copy paste the things from my from my solution in the interest of time let's load the data

sets and I am going to do a quick head here I have my combin data set okay cool so if you remember from uh the theory

that we talked about the key component of collaborative filtering is the user movies Matrix um the values in this Matrix are the ratings for the user and basically

we need to generate this Matrix from from this

Loading...

Loading video analysis...