MIT web.lab (6.962) - Day 5: Databases
By MIT Web Lab (6.962)
Summary
Topics Covered
- Server Memory Fails on Crash
- Text Files Kill Query Speed
- MongoDB Beats SQL Flexibility
- Atlas Replicas Ensure Fault Tolerance
- Mongoose Enforces Document Schemas
Full Transcript
okay hi everyone hope we all had a very productive morning debugging catbook um I saw a lot of really good thinking going on um anyways today we're going to
be talking about databases um I'm Annabelle and I'm Sophie yeah so let's start with a quick introduction to what databases are so currently the way that
we are storing data in catbook is that when a client sends a request and they ping like the API story m point for example like trying to write a new story this is sent to the server and all our
data is stored directly on an array called stories in our server so this is going to add a new story directly on our server so here's like the code in api.js right now right we have like this data
object that has the array of all of our stories with Fields like ID Creator name and content but there's a couple of issues with storing data in our server just as
a variable and you might have already encountered some of them or have already like an idea of what some of the issues might be so let's say everything is fine and we're all good right we're posting
stories on our catbook you guys are like busy debugging or whatever the clients are sending are paying the API story endpoint um and then our stories object
is being updated accordingly right so we're adding the high hello but let's say oh no our server crashes so our server is down for some number of
unbearable hours and let's see what happens if once our server comes back up all of our data is going to be lost inside the stories array so everything
that was just sent is now gone and obviously we don't want this right like we don't want all of our data to just be gone if the server crashes and it's not just if the server crashes if we close our terminal or if like your laptop runs
out of battery then all of your data will be gone and all the stories that you just posted are not going to persist on catbook and you can imagine that with any web application you want your data to persist and you want to be able to
retrieve and display the data that your users have previously submitted even if something crashes um the other thing that happens is let's say our server never crashes
and it's some like super powerful like Mega server but catbook has a lot of users right like all of you guys are already using it in class right now and this is going to lead to a lot of ram issues because that's going to be a lot
of memory and we're staring like hundreds of gigabytes of stories and comments so this is also kind of an issue so we want to store our data permanently um and so one of the other
ways the Alternatives that we have instead of storing the data directly on the server is that we can use a text file and we can store this text file on our hard disk instead of storing everything directly on the server so
this seems more permanent right we also have like functions that we can use to access and send data to our text file so if we want to load data we can use use
read data from file which if you guys have taken like 6008 before you've probably seen it's like a very basic like read data function um and basically
this is just taking in what we have in our data.txt file and reading that and then if we want to save data to data.txt we can use the function write data to
file um so here's how we would implement it for example in like our um API um yeah in our API endpoint this is if we wanted to like
load the data from the file so you might be thinking yay we solved the problem now we are like storing everything in this text file on our hard drive and it's all good CU now it's like permanent but there's still a
couple of issues with this so what's wrong with data.txt does anyone want to take a stab at just like one problem that they can potentially think of yes
yeah exactly so one of the big issues is if we're looking up things it's going to be extremely slow um so that's like one of the issues I'll talk about later but some of the other issues we have are
also like kind of like adjacent to what you mentioned is if we're saving stories and comments the right speed is also going to be very slow because we have to call that function every single time and you can imagine that's going to be very
tedious and slow other thing is you can imag we're still keeping everything in our Ram so we can still run out of storage on our hard drive so it's kind of similar to the problems we had before like um if you run out of storage in
your RAM this is still like an issue also this is like what you're were saying right the query speed is very slow because like if we want to just return one story posted by like a given user ID right we have to linear search
through every single one of the stories and this can be very very slow also your laptop hard drive can still break it's not immune you can like drop your laptop
in a pool or something like this is not not um fail proof also if two users post at the exact same time we don't actually know what ends up getting stored in data.txt because we don't know like
which user's request was actually fulfilled to um like write data to the file so that's kind of scary if we don't know what's being saved in data.txt
now how do we fix all these issues well we have this really cool thing called databases yeah so there's like SQL databases um
and then also like mongodb which is what we're going to be using for our web application which Sophie is going to talk more about later um but basically SQL databases are like really good for like relational data and like very
structured data but we're going to use mongodb because it's a little more flexible for our purposes um basically a database is just a really organized collection of data
and it's a way to store all of the data that you're going to be handling in your application in like an or organized manner a database management system um dbms is basically a collection of functions that lets you perform
operations on your data from your database like retrieving data adding data modifying it deleting it Etc so our database is basically like our substitution for data.txt but much
better and we'll talk about the reasons why in a bit um and the database management system is very similar to the read write data from file functions that we just talked about earlier but also better and we'll also talk about why so
there's a lot of different typ of databases but we're not really going to get into it if you're interested you can like um do more reading on your own but some cool examples I think are like graph-based databases which show kind of
like really emphasize the relations between data and also time series databases like influx DB which shows like timestamped data um so it's really useful if you're trying to track things
over time and then hierarchical databases like IBM IMS which shows everything in kind of like a tree structure so um it'll show relationships
using like no so that's cool anyways now Sophie is going to talk more about how we can perform different operations with our databases yeah so we just went over um how we're going to replace data.txt
with a database and we're going to replace our read and write functions with a dbms so let's talk a bit more about the reads and writs through the dbms so yesterday in Workshop 3
we talked about how we can make a get request from our front end to the SL API comons endpoints to request comments from our server and by doing that our
backend server is going to send the comments back to the front end so what actually happens in the back end if we now have a database in is this
better t test test okay so let's talk about what actually happens in the backend
server so after we make the API request to the backend server our backend server can call a read function through the dbms and the dbms is going to go to our
database gather all the stories from the database send it back to the server and the server can now send that back to the frontend client
so similarly for writing to the database if you remember in Workshop 3 yesterday we learned how we can make a post request to SL API comment in the backend server and what this does is we write a
new comment to the backend server and now the backend server is going to store it in the data variable and tell our front end that it has been successfully posted so now let's take a look at what
happens in the backend with our dbms database system now so similarly our server is going to call a write function and through this right function this dbms is going to take the
new story and add it to our database yeah let's take a look at some Puda code
for read and write functions through dbms so typically our dbms is going to be a package that we'll want to import into our backhand
server and one operation we can perform is a read we could read all of the stor in the database okay um another read we can do is with a
query we could filter the database for certain stories that we want so in this studio code we'll be filtering for all the stories with id4 and that'll just be
one story since an ID is an unique identifier for each object in our database we can also do a write as we're familiar with
yesterday and we could also delete and delete by a query so this example might delete all the stories posted by a certain user named Joyce we could also update a story
that's already been written to the database and this would make um an update to stories with the author
J uh but yeah this isn't real code so don't try writing it in your backend server so let's talk about what types of databases we might use so the common
example of storing data you might be familiar with is through like Excel and Google Sheets and usually you'll be storing data like in a row column format
um so a relational database in a relational database each spreadsheet is called a table and you'll have Ros and
columns for categories of data so for our cap example you might want a users table showing all the users using your app and also maybe a stories table
showing all the stories being posted for your um catbook app and you'll have relations between the users and stories for example in the stories you might
want a column showing the user ID so for each story which user is posting the story and you'll want but for a relational database you'll need to write
overhead code for dictating the relations between the users and stories table as shown here so an example of relational databases is SQL which stands
for structured query language so we have some problems with this in the table shown earlier the data for one user is spread out across
multiple tables and as your web application gets larger and you'll want to be storing more data this can get really complicated because you'll have to write overhead code for any relationship between
tables so this can make our code hard to understand and as your web app gets larger it'll make add adding features a lot more difficult and as I mentioned earlier you'll need overhead code for
dealing with the relation between tables so as a programmer we'll have to write all the code to accommodate the structure of our database and that's bad we want our database to adjust to the
needs as a programmer and have it be flexible for adding new features in data so instead of thinking about our
data in terms of rows and columns we can think about a piece of data as an object as we're already familiar with so this is called a document database instead of
storing your objects in rows and columns we can think of a object with all the fields as a document and this is very similar to Jon object that you've been
working with in CAPIC already and documents don't need to have the same field so this allows for a lot of flexibility so for example this is
what um a document database might look for look like for the example earlier but in this example the um object for
Annabelle's Annabelle's user object doesn't have any content so you can have very flexible Fields but as we'll see later you might want a structure for your fields and we'll talk about how we
can do that in the backend so back to yesterday's Workshop three we were working with comments that look like this and as you see this has
the same object layout that um a document database will be using so it's a lot more natural and intuitive to work with as a programmer rather than working in rows and
columns so the specific document database we'll be using is called mongodb and this is a document database that stores data in Json like
documents so you have documents which represent each object in a collection of data and maybe you'll want very similar looking documents that live in a
collection so you might want all your comment objects which are now called documents to live in a Comon collection and similarly you might want a stories collection containing all of your
stories documents which are like Json objects and you want all of your collections to live in a database which will correspond to all the data you want to hold for your web application so we
might have a catbook database so now instead of having a text file that lives on our hard drive we could use mongod DB in the
corresponding database management system so this optimizes our right speed because a lot of Engineers at mongod DB have thought about this and thought about ways to optimize
it we're also optimizing our memory usage because we're no longer storing all of our data in a variable in our server query speed and concurrency issues are also solved thanks to
engineers at mongodb um yeah just as a reminder earlier we were storing all of this data as a variable in our server so that takes up a lot of ram but there's still one more
issue our hard drive could still break we want to make sure that our web app is fullprof so that any fault we might encounter is tolerated and doesn't interfere with
the with how our web app operates and how our users use our web app so how do you make storage our database fault tolerant does anyone have an
idea yeah yes the naive solution is redundancy and that's exactly what mongodb does it duplicates data across different hard drives so that in case
anyone fails we can just access a different hard drive that contains all of our data so DB Atlas does this for us instead of storing our database on our
computer's hard drive we can use mongod DB Atlas and store somewhere over the cloud and this makes our life easier we no longer have to deal with running the database on our laptop and dealing with
problems in case it breaks the database is managed for you by mongodb Atlas and you can also share your data with your teammates as you're working on your um web laab
project and this is a lot more reliable than your laptop's hard drive because it replicates your data so now your backend server is going to store your data over over the
cloud and what this looks like is it'll be accessing a primary mongod DB instance on one hard drive and you have replicate sets of this hard of this
instance so in case the primary mongod DB dies the hard dis braks you can access a replica
set so let's take a look at the mongod DB user interface so if you have here's a catbook database and in it I have uh comments stories and users collections
right now I have the stories collections clicked and inside the stories collection I have different documents these look like Json objects and they have different fields and if you're
working on making some data like Mickey Mouse data for your web application later on for your web lab project you can just edit fields and directly from
the mongod DB interface and add new documents so let's put everything together when we make a get request the
front end code is going to send that request to the back end oh and it's going to make it to the API stories endpoint if you're trying to read
stories data and now the back end instead of sending data from the stories file it's going to make it's going to call the dbms which is going to go to
the database find all the stories the database is going to send this back to to the back end which can send it back to the front end as we went over yesterday similarly we can now make post
requests to create update or delete data so the front end maybe your user is making an edit to their story or adding a new story the front end will make a post
request to the SL API stories endpoint the back end receives this request it's going to call maybe some write function or delete function via the dbms
and now it can perform the operation in the database so in summary rather than storing data on our server or in a text file we'll be using a database to store
our data and the specific database we'll be using is a document database called mongodb and specifically instead of running mongodb on our hard disk to make
our database were fault tolerant we'll run it over the cloud using Atlas because Atlas automatically has redundancy for us in case um the primary
mongodb hard dis breaks so after lunch we'll talk about how we can okay so now we'll talk a bit about mongodb
specifics uh feel free to leave some feedback for the iner databases lecture yay okay yeah okay now we'll talk about
manga DB specifically what is manga DB so as we just mentioned it's a document database that allows our Json
objects to be stored directly in the database like we're familiar with so here's an example of a Json object there's three Fields a name field for my
name Sophie my age which is a number and my hobbies which is an array of strings so these are examples of objects that might be stored in the database so why are we using manga DB so it's efficient
when we need to write a lot to the database a humongous amount so actually is an abbreviation for humongous and it's also useful for the when our
the structure of our data is very prone to changes because just as we talked about the um Fields allow us a lot of flexibility because we can add new Fields or change the amount of fields in
a document and it's relatively easy to use as a programmer because it's very intuitive in the object oriented
nature so let's talk a bit more about the structure of mongodb so in each object we might have fields to describe
the object so maybe we'll have a field describing the color length or how poofy or how angry our object is um I've been calling it a corgi but apparently this
is a Shiva yeah okay and we'll have documents which correspond to each object so inside the document we have fields and
this document might correspond to a corgi and we might have a collection of corgis which might look like oh oh by the way I'm trying to make
a um parallel to a warehouse so paralleling to a warehouse a collection might be a crate of corgis and a database might be a bunch
of crates which are collections so your database might hold a bunch of different types of collections which each have different objects in them an aodb instance might be like a
warehouse of storage units so to reiterate in words a mongodb instance is a group of databases that live on one server and remember that we
have duplicate replica sets of magod DB instances a database lives inside your instance and it usually corresponds to one web application and holds all the
data you want to store for that web application and a database might have a bunch of different Collections and each collection is a group of very similar
pieces of data so maybe we'll want a collection for our stories a collection for our users a collection for our comments and a and Etc inside each collection we're going
to hold many different documents which we want to have similar structures so maybe in our stories collection we want all our documents to share similar
Fields such as name content ID and yeah so this here is a diagram of the structure of our storage facility and
also this is a sneak peek of the prizes we'll have at our hacka or giveaway prizes so make sure to come and this corresponds to these
mongod DB objects yeah okay so if you have any questions feel free to drop them in weblab dois questions so remember we talked about
why we're using mongod DB and it's because it allows our databases to be very flexible compared to a SQL database so this means that different documents in the collection can have different
fields as we just went over so here's an example so say we have a corgi collections with these three documents um a document for three corgis named Sophie Annabelle and Abby all three of
these documents have the name field but you notice that they shared the other fields that they are other fields that they have they don't share but if we're working on a web application we might
want all of our documents to share the same fields in a collection so what if we want to enforce the same structure for all documents in a
collection so that all of these corgy documents have all same five Fields with the same type for their value we can use mongus mongus is an
object data modeling JavaScript library that we'll use in our back-end server that will allow us to enforce a structure for the documents in our
collection so fun fact amongus is also um a mammal and the plural is Mongo's not monges okay so what is is a
JavaScript library that we'll use in our backend server and it allows us to interact with mongod DB it allows us to enforce a structure for our collections from the very
beginning so once we start making post requests and get requests to our database they'll already have the structure so what does do it connects to the mongod DB cluster
which will cover code for it in the workshop it enforces schemas and models which dictate the structure of the documents in the collection it allows us
to create documents by having functions that kind of parallel the dbms and that allows us to interact with
the database using with create functions read updates deletes and more and I think we'll end here yeah okay
Loading video analysis...