How I Mastered Data Modeling Interviews
By Afaque Ahmad
Summary
Topics Covered
- Data Modeling Structures Business Info into Tables
- Master Kimball's Fact and Dimension Tables
- Star Schemas Simplify Queries Over Snowflake
- Handle Changing Dimensions with SCD Strategies
- Practice Uber, Netflix Data Models Extensively
Full Transcript
hey everyone welcome to the interview preparation Series where I'll be sharing with you how you can crack interviews at top tech companies I'll be sharing with
you my own interview experiences and details about preparing and acing each of these round so data modeling has been a key in helping me land offers at
companies like uber Apple atlan and data break so I'll be sh sharing with you how I prepared and data modeling for data engineering interviews so let's get
started so we are going to discuss about the following in details number one exactly is data modeling the kind of questions asked in interviews what are
the key Concepts that you need to master what should be your approach towards problem solving what what are the key things that the interviewer is
trying to test you on what are some of the most commonly asked interview questions and last of all my final advice on interview preparation what exactly is data modeling so data
modeling in very simple terms is a way of taking business requirements and information and structuring organizing that information in the form of table so
that they can be queried to find out answers to several business questions so basically you take all the information design a set of tables with relationships and finally query those
set of tables to find answers to business question what are the kind of questions asked in data modeling interviews so there are two types of most frequently asked questions in data
modeling interviews the first one is a direct data model design which simply means that the interviewer will directly tell you to create a data model for something now this may be an app that we
use on a day-to-day basis something like a ride healing or a food delivery or a social media app or even something like a library management system so an example question could look something
like design a data model for U the second type of question is a product sense club together with a data modeling question so in a good number of companies I face questions which start
with a flavor of production which you need to answer in the form of metrics and then you would be asked to create and calculate those metrics using a data model and you have to build that data
model right so there's a question you answer that question in the form of metrics and then you finally build a data model that is going to help you calculate those metrics right so let's
take an example let's simplify this and understand this with an example let's say you're opening a secondhand bookstore app what are some of the metrics that is going to help you
understand if your bookstore is really doing well so if I were to think about this at the top of my head some of those metrics could be the amount of sales that I'm making on a day-to-day basis
the daily active users monthly active users the number of visitors and the duration that they are spending on my app on a day-to-day basis right so in order to calculate those metrics I would
have to first structure my data in the form of tables right so I would have to build a design a data model and let's say we have a few tables like fact orders which is going to store the
orders uh fact sessions or fact logins which is going to store details about the login and the time at which the user logged in and so on now using these two
table we'll be able to Simply write SQL queries and calculate the metrics that we discussed metrics like da ma U the amount of sales that we are making on a day-to-day basis right so this is how
we'll be able to answer this question now apart from this there will also be questions on top of your data model for which you would need to write SQL queries in order to answer them right so
I would say that the first type of questions are more common but I would still stress to be thoroughly prepared for both having a good grasp on product and going ahead what are the key
Concepts that you need to master so you really need to understand and be good at Kimball's dimensional modeling concept I would really recommend reading the first three chapters of the book the dataware
housing toolkit so here are some of the key Concepts that you need to master and then explain them taking a few quick examples the first
one fact table basically contains events or transaction it contains numbers that you can measure count or aggregate so
for example let's say you're placing order for grocery on a grocery app right now this would be recorded in a fact order table an order is basically a
transaction here and that's the reason why it's being stored in a fact table right Dimension tables are table that
provide context to facts table right and in this case context to fact orders will basically mean who what when and where who would be who played the order and it
would be a customer what was placed for the order it was a product that was placed for the order right when was the order placed so there is going to be a date and time when the order was placed
where was the order placed so the order was let's say placed from a store so all of these four details who what when and where is going to be stored in
respective Dimension table and they're going to be dim customer dim product dim date and dim store right dimm customer
for who placed the order dim product for what was placed for the order dim date for when was the order placed and dim store for where was the order placed
from so there is a central fact table and then there are four dimension tables which provide context to the fact table
yeah now there are other Concepts that you should also keep under your belt and they are basically periodic snapshot
fact table accumulating SN snapshot fact tables and factless fact table so the next one is a star and snowflake schema a star schema basically has a central
fact table surrounded by dim table so if you were to look at the previous example there was a central fact orders table and it was basically surrounded by four
dim tables dim customer dim store dim date and dim product right and this Central fact table is going to have foreign keys to all of the DM table so
that they can be joined and queried right now a snowflake schema is almost the same as a star schema just that the only difference is that the dim tables
are further normalized or if I were to say it in simple terms the dim tables are further broken down or branched out so let's say we had the dim product table right and it had two kind of
information the first one is the product details and the second one is product category information right product detail is going to be things like the
size the color the weight and then the second one product category is going to be about the title the title of uh the category or let's say the subcategory and then the other metadata that is
going to have now if we were using a star schema both of the details are together going to be clubbed into one table which is the dim product table but
if we are using a snowflak schema there are going to be two separate table the first one which is going to be dim product is going to store all of the details about the product and then the second one which is going to be the
category information table right dim product category which is going to store details about the title and other metadata that we just discussed the third one is relationship among staes
these can be one to one one to many or many to many right and this is pretty simple to understand so if we take the previous example fact orders and dim customer they are going to have a many
to one relationship right an order is going to belong to One customer but a customer can place many orders so it's going to a many to one relationship the
next one is slowly changing Dimensions so slowly changing dimensions are Concepts that allow us to handle changes in our attributes of our Dimension
tables over time right so let's say if my address keep changing year over year how is this data going to be stored in the dim customer table that we just
discussed right so there are a few strategies that we could follow in order to decide how is my changing address going to be stored in the dim customer
table so the first approach that we could simply follow is there is going to be a address column we could simply take
the new address and overwrite it on top of the old address right so now the old address is going to be completely wiped off and the address column is going to have the new address yeah that is the
first approach the second approach is we want to keep history so we want to have the old address that is the old row and the new address yeah so the old row is
going to remain as it is and we are going to insert a new row which is going to store the new address um what we can also do is have two date time columns which is basically going to say that my
old address was active from X to Y period and then the new address is active from maybe Zed to a null period saying that is still active right now
and the third approach that you could possibly follow is that create a column in order to store the new address so the old address column is going to remain as it is and we create a new column which
is going to be called new address and we store the new address in that column next up is what should be your approach towards problem solving so here I'm
going to share a step by step approach towards building the final design so the first one is identify the business process it's really important to clarify
if we want to talk about the complete business process or just a part of it so let's say in the ride healing example should we discuss only about the ride
aspect of things like the trips payments drivers or do we also want to consider other process like the driver verification and the registration when the when the driver actually comes in to
register himself and his vehicle yeah so it's really important to narrow down this aspect of the problem because you don't want to end up solving unwanted or not so important parts of the problem
yeah the second one is ask clarifying questions it's really important to ask clarifying questions because this will affect your overall model design so for
example does the ride healing platform support ride sharing or does it only support single right the answer to this question will significantly affect your data model design consider another
question should the data model support future bookings now questions like these will significantly affect your data model Design This is really important to ask clarifying question third one is
identify the events and entities associated with those events right so this is going to help you identify relevant fact and dimension table so if you think about the ride process in the
ride healing example there is going to be a rider who's going to make a request the driver is going to accept it and upon completion a payment is going to be made both the driver and the rider can
leave reviews at the end of the trip right so this statement is going to help you identify some of the events and entities involved so the rides payments
reviews these are the events and they going to make up the fact table the entities involved in those events are basically the driver the rider the vehicle itself in which the ride was
taken these are going to make up Dimension tables yeah so now once you identify the fact and the dimension table it's time to go through the fourth point which is listing down the
attributes of those tables yeah so for the for the fact ride stable the attribute could look something like the driver ID the rider ID requested at
arrived at the pickup location the drop off location and many other attributes right so don't worry if you are missing some of the attributes you can always come back and add those attributes to
your data model if you need them for any of your queries right so because I've also been on the other side of the table I can tell you what are the things that the interviewer is trying to test you on
so understand if you can ask relevant questions if you're able to gather business requirements and then translate those requirements into a set of tables which can then be used to answer
business questions using SQL yeah they are also trying to see how strong are your dimensional modeling concept if you are able to appropriately convert those requirements into relevant fact and
dimension tables um identify things like slowly changing Dimensions being able to appropriately Define the relationship between tables right um the data model
is finally evaluated by deciding how simple or complex are the query write against your data model does it enable you to write simple queries how does your data model behave when a sudden
change is implemented right so let's say there is a country table and a country wants to change its name how is your data going to behave for example turkey
wants to change its name to Turkey something like that right um how are those changes handled within your uh tables so the are these are some of the things that the interviewer is trying to
test you on some of the most commonly asked question that you should practice are building data models for right hailing companies like uber Ola food delivery companies like s and zato video
streaming service like Netflix YouTube cloud storage plat form like Dropbox and social media app like Instagram and Facebook so this is going to give you an overall complete idea about how you can
build data models how do you define relationships uh what are the different kind of scenarios that you could face summarizing all that we've discussed we've discussed about the kind of
questions asked in interviews the key Concepts that you need to master the approach towards problem solving the key thing that an interviewer is trying to
test you on and some of the most frequently asked questions so my final advice to you is to practice as many examples as possible take up a diverse set of apps and build data models for
them right as mentioned earlier you can take up any of ride healing food delivery video streaming or social media apps think about various queries that you would end up writing against the
data model evaluate your model by understanding how simple or complex are those queries that you write against your data model how is your data model going to be behave to certain changes
right so I hope that this video is going to serve as a guide to help you master your data modeling interviews please don't forget to like and share this video subscribe to my channel thank you
again for watching
Loading video analysis...