Delta Live Tables Databricks (Full Course) | Crack Data Engineer Interviews in 2025

By Ansh Lamba

Summary

Topics Covered

Delta Live Tables Automates ETL Overhead
Streaming Tables Require Append-Only Sources
Expectations Enforce Pipeline Data Quality
Rename DLT Tables Without ALTER Commands
Apply Changes API Handles SCD Types

Full Transcript

[Music] do you know Delta life tables is the new player in the world of data engineering and in this 4our long video you will

learn everything related to Delta life table from scratch this course will cover all the complex Concepts such as slowly changing Dimensions using apply

changes API a pen flow to incrementally combine the data integrating autoloader and Delta life tables to work with semi-structured data not only this we

will also cover some realtime scenarios in the world of Delta life tables such as applying data quality checks using expectations in Delta life TBL pipelines

so if you want to outstand in the crowd you should add the skill in your resume and I will help you learn the skill so are you ready what's up my fam what's up

what's up what's up my fam did you miss me first of all happy Sunday so another Sunday another video amazing amazing

amazing video bro amazing video it's not just a normal video why so before telling that just hit the Subscribe button right now because I know I know I

know I know you love me a lot I'm just talking to those people who are new to this channel so first of all if you do not know about me if you do not know about this channel this channel is purely based on real knowledge in the

world of data engineering covering all the latest tools and Technologies covering all the fundamentals covering all the latest architectures covering all the projects everything bro

everything just hit the Subscribe button and just explore the channel you will get to know why I'm saying this okay so if you have already done that so let me

just tell you why today's video is special first of all Kos to datab BX that they launched this new feature you

can say or you can say a new player in the world of data engineering it is called Delta live tables people are going crazy after Delta live table Delta live tables there

are so many confusions there are so many doubts there are so many questions there are so many gaps that will be answered

filled everything will be done today will be done today trust me because today we will be discussing this Delta life tables in detail and yes we will be

covering both fundamentals Concepts plus practical sessions as well so you will be actually

building DT pipelines DT pipelines okay so an Lama you have already given us the hint that DT or Delta life tables is a

kind of ETL framework yes so I'm really excited to just start today's session because TTA life tables is a great addition is a great addition in the

world of data engine trust me and and I know you will be looking at in so many job descriptions that companies are demanding for Delta life tables and you

should have that knowledge and and I know companies have started asking questions in the interviews as well regarding Delta life tables and it is

very hard to find the right resource to learn that but now just thank just say thank to God because now you are landed on my channel because now now now now

you will know everything that you need to know in the world of Delta live tables everything everything bro everything from scratch so are you

excited are you excited because I'm really really excited so before actually starting this video there are some prequests okay we are happy to discuss

that first of all you should have a laptop or PC or you can say iPad if you use that as a laptop so it's up to you

so I just call it laptop simple sorted no iPads no tablets no tabs just laptop just say laptop okay and with stable internet

connection if you do not have a stable internet connection then you will be seeing that Circle rotating so I don't want that okay okay then you should have

a databas account do not worry if you do not have one we will be creating a free databas account so you do not need to worry about that okay then the third and

most important thing and if you are if you have already watched my previous videos and you know why I consider this you should have excitement to learn

Delta life tables because without that without that you cannot learn Delta life tables trust me if you do not have that

vipe do not have that excited Vibe bro Delta live tables is not the concept that you will be grasping right now then no this is a very creative concept bro

I'm not kidding this is a very creative concept and you need to be creative and you need to be excited to actually know oh okay that these things are actually

done like this so you have to be excited and if you are not excited just watch some other videos and just be excited just watch my videos bro you will be excited so this is a long list of

prequest and I think you already have these things okay and if you have so now it's time to actually start this video and we will be starting our video

with datab break account creation so now here's the thing if you have already an aure account okay then it will be really easy to create databas account if you do

not have an Azure account don't worry I will just tell you how to create aure account as well because first we create aure account then we create databas account and why it is necess necessary to

create your account because the thing is bro you are preparing for the interviews you are preparing to crack the interviews or you are just learning Delta life tables to actually perform

the things that you will be performing in the real world so in the real world we work with Cloud we work with cloud storage we work with Cloud uh

destination we work with like we we just deal with everything and you can say we work only in Cloud so that is why I have

chosen Azure because Azure and data bricks go hand in hand and datab brick is also available as a first party service provider that is a big thing and and

and you can use any different cloud provider if you want so I have no grudges against other Cloud providers but yeah today we'll be just using a do

not worry we will not be exploring a your much we are just creating a account so that we can create a datab breaks instance that's it datab break workspace

that's it we are not going to just do any other fancy stuff okay so do not need to worry so without wasting any time let's get started and bro just take

out your notebook right now right now right now right now just pause the video and just take a not right now so in order to create an your free account

okay so just go to your Google tab and just write a your portal account or you can simply write as your free account that will directly take you to

the right web page okay so just click on first link that you have just click on it then you just need to select this

button instead of this pay as you go because pay as you go is a paid account which will charge you as much as you will be using those services or like using any Services of sure so for now we

do not need to pay to anyone so you can simply pick try aure for free okay just click on it and then it will just take you to a different web page where it

will just ask you to just provide your credentials like your email ID so here's a catch let me show you let me just show

you okay we are finally on this page so now you just need to provide your Microsoft account if you do not have one if you do not have one just click on create one because bro just create a new

account if you do not have do not feel like hey I do not have account what should I do so the answer is simply create a new account right so just create a new account and once you have

the account then simply type the email ID here and then just click on next now what are the next steps when you will just click on this you will actually

land on a page which will show a kind of form like this where you just need to fill your personal details like your name your phone number your uh friend's

name no no no I'm just kidding so you just need to provide your personal details like your address and all then you will be just click on the sign up page which will be at the bottom of the screen then it will just take you to the

payments page so here a lot of people get confused like hey it is it is asking us to provide the details C details will it just deduct some money no no no bro it will not charge you anything it is

just asking for those details so that it can just confirm that this is you who will be using those Services because aure provides 200 us to actually use

their services for free for one month okay so here's the question what will happen after one month after one month it will ask you if you want to upgrade your account even if you do not reply to

that email or if you do not opt into upgrade the account it will just nullify your services and it will not charge you so do not get do not get confused do not worry that will let your your charge you

no no no so this practice is done by all the cloud providers it is done by AWS it is done by gcp it is done by snowflake as well so this is the common factor do not get do not get confused and do not

worry okay so once you have that account ready once you have that Azure account ready then we will be actually landing on the Azure portal let me just show you how you can just log in so now it's time

to actually go to the Azure portal account which is your landing page which is your connection with aure like you have to have to have to land on aure Portal account so let me just tell you

how you can just do that simply go to your incognito mode if you just want to use any mode use it bro just just type

portal. azure.com okay just hit enter

portal. azure.com okay just hit enter then obviously it will say hey just provide me the Azure portal account email ID so the here you just need to type the email ID that you have just

created okay the same email ID bro what you're doing man okay just put that email ID after putting your credentials you will see this page wow what a UI so

this is your Azure portal account homepage and do not worry you may be seeing these services mismatching with yours so do not worry these are some some of the

services that I was using before so do not get worri do not do not get worried so this is your homepage okay this is your homepage

and what can I do like once I just land on this page so if you are already familiar with a you would know like what is this page and what you can do with this page so just a quick overview if

you are new to aure so first of all as you can see these are basically the services that is provided by aor these are some of the resources that are provided by aor so you can either call them Services resources same stuff so do

not get worried okay so then the main thing is this one these three bars so when I just click on

it it is a kind of menu and this will be holding your all the goto tabs uh for example you want to just see all your resources so there should be something

called all resources yeah we have Okay then if you just want to see like SQL databases so this will just L list some of the um applications or resources

which we use on a daily basis which we have to have to just go to a daily basis such as like storage accounts like this is the basic things right like these are the basic things then obviously

Microsoft enter ID where we just manage the you uh users then monitor this is one of the most important things in any Cloud platform then obviously cost

management Plus bilding in your case you're using a free account so you do not need to worry but in the real world obviously if you're working for a startup you can monitor that or if you're working in a big company you will

not be allowed to view it but yeah you should just take care of these things as well then obviously help plus support so this is the area where you can actually do a lot of stuff

then then then then we have this search bar so in order to create anything in order to use anything you have to create that thing right so this is the search

bar where you can actually search anything in the world of azure so now you know that we want to create aure data bricks right this is the one so what is that thing that you need to

create before creating a your datab briak instance so this is called Resource Group so so why you need to create Resource Group so bro you should

like we have to follow the hierarchy that is provided by Azure so when we need to just create a resource we have to put that resource in a resource Group

and basically a resource Group holds more than one resources okay so you can just put so many stuff like you can just put storage account you can put your uh

databas uh you can just put your ADF you can just put anything okay so in that particular case you have to have to have

to create your resource Group okay okay sorted now first of all I will just search Resource Group because I need to create Resource

Group so as you can see I can see the resource groups just click on it and don't worry these are some of my resource groups that I have created earlier so just click on plus new and

just give any Resource Group so I will simply say RG that is an abbrevation for Resource Group RG and then I will say Delta live

table okay just hit enter and then just click review plus create and then create okay simple simple simple simple simple simple now it is saying it is created so I will simply click on go to Resource

Group then it will just take me to the resource Group as you can see it is empty why why not have you created anything inside the resource Group no so

if you want to create any resource any resource either it can be data bricks data Factory synapse storage account anything so then you have to

click on this plus create so we will be creating through two resources one is data brakes second is data Lake as well why because we will be working with

unity catalog we will be working with external Delta Lake external data leg so in order to do that we have to create our external storage account and that is the real world scenario we do not work

with managed uh storage account so in order to do that just simply click on plus create and then it will just take you to the marketplace don't worry this

is not an FB Marketplace so this is a Marketplace by Azure so you just need to Simply search storage

account and then just search it then you will see so many options because the thing is this is the marketplace here like anyone can sell the services do not you are not allowed to do that so here

like there are so many other providers as well that sell their services but we will be using service that is provided

by Microsoft Azure okay just click on this then just click on Create and then it will just take you to

the configuration page okay so now first of all you need to name your storage account and you have to use a name that is unique throughout the your network

netw work so I'll simply say external or let's say DT data

lake is it available no see this name is already taken so someone has already use this name so I cannot use it so I will simply say unch dldt data link so this

is unique this is unique so I can just take that you can also put your name or you can also put my name don't worry no no no you cannot because uh it will just throw an error if you just put my name in the middle or at the

end then it will not give any error but obviously why you will just put my name bro just put your own name then primary Services just leave it then you have two options standard or premium so we will

simply use standard then redundancy in order to keep the cost low or I would say minimum you have to pick lrs which is locally redundant storage which just

uh creates a replica of your data in the same data center okay then just click on next so here you just need to check one box which is very important to create

data Lake otherwise it will just create a blob storage so this is hierarchical nam space okay just simply click on this

box because when you create H H hierarchical name space okay so then you actually save your data within folders

see in a in an hierarchical way otherwise you cannot create folders you have to put everything inside containers okay okay then you can simply

click on review plus create because other things are related to networking and we are not protecting our data we are not using virtual Network so it is fine just click on create so now it will

just create our storage account and it should just take few seconds few seconds few seconds by the way you should always

always always work with external storage accounts external data legs because in the real world as well we have to work with external data legs and what is

the best advantage of that what is that you have to configure so many permissions and that test your credibility in the interviews as well

that have you actually worked with those areas so it is created as you can see I have two options to actually open this I can either e Click on go to resource or

just simply click on home then you can simply click on resource groups and you can just search your resource Group that

you have created so mine is this one then I can see my resource wow on data Lake okay okay now I will simply click

on it just to show you how it looks like okay then this is the overview page just ignore that and just click on data storage click on container so basically we get four services containers which is

the data Lake then file shares this is the area where you can just where like everyone can drop their files and this acts as like one stop solution

to upload the files that can be used by anyone then qes this is a kind of message services that stores the messages uh for streaming then tables

this will just deal with semi-structured data let's say you have data in the form of key value payer so you can just use this service tables we simply need to use containers that's it like that is

data lake so I will simply create one container okay and one container I will create as meta store what is this I will just tell you

don't worry because now we will be just working with unity catalog so this is the container that we will be keeping for Unity meta store this is not our container this is their container by the

way if you have not watched my other video which is purely based on unit catalog you can definitely watch the video and click on the video coming on the screen this is not a prequest you can save this video because obviously

this will give you a detailed understanding of unity catalog we'll be discussing about Unity catalog in this video as well but if you will ask me and if you want to follow my Guidance just watch this video that is a detailed

video on Unity catalog and just save that video just that is coming on the screen right now okay okay so now now now now I will simply create one more

container and I will call it as uh raw container okay perfect then I will simply create a raw container perfect perfect perfect so now our containers

are ready our data Le is ready now it's time to create the data bricks workspace so now let's create that so it's time to create the datax workspace so now we will be creating datab

workspace so just simply click on plus create button the same way we clicked when we were just creating data Lake simply click on plus create and again the same Marketplace just

write data bricks datab Bri datab data and then just select this particular one provided by Microsoft

okay just click on this do not select this one this is an excess connector by the way we will be creating this one as well so do not worry so click on this then simply click on

create okay perfect perfect so now we just need to give the resource name workspace name so we'll simply give any name let's say data bricks Delta Lake

not Delta Lake uh DT data brakes DLT okay on the location I will Pi UK

South perfect perfect perfect perfect now we have like few options available if you want to create the workspace obviously we can create premium and do not create standard because Unity C log

doesn't work with this one standard okay now obviously you have a free account so you can just simply pick this one premium and if you just want to go with trial you can pick that what is a

difference basically trial is exactly same as the premium one but it will be just available for 14 days so I will just simply pick trial B and this is the manage Resource Group name so I will simply say

RG Delta live tables managed basically this is a manage Resource Group where it puts

your vmss that are virtual machines it puts your manage data so it is a kind of compute pan or data pane okay then click

on networking then next then next then next then review plus create simple then just simply click on create for one more time

and now it will just deploy this datab Brak workspace and then actually datab break workspace like datab bricks is a different entity but it is available as a first service provider in Azure so

that's why it is called as Azure datab bricks but yeah data braks is exactly the same like other data braks uh resources and other Cloud providers so

do not need to worry at all so it will be deployed I think in just few seconds and then we can also create the exess connector so why we need to use exess

connector let me just quickly tell you let me just simply go to home and let me just open my Delta life table okay so let's say this is your datab breakes workspace

okay this is the symbol of I think octane what was the chemical name of that one Benzene Benzene yeah C6

h126 I think if I'm not wrong C6 h126 so if you are a know medical student I took non-met in my 12th so this this is called as Benzene and its

chemical formula is I think this one C6 h206 so six bonds of carbon I think 1 2 3 yeah C6 s206 I'm so sure okay by the way we are not learning chemistry so

when I just created this one so I just remember that one so no so so so so don't need to worry by the way my favorite subject in my 11th and 12th was chemistry it it was my like favorite and

I scored I think 95 in my 12th boards in chemistry yeah I love I love that subject so this is your data brakes okay

and this is your ADLs Gen 2 which is our data link okay now we have created our folders within this so how this data brakes will access this particular data

L how obviously it is not allowed to do that let's say it goes here to ADLs and it will say hey I want to use this data

ADLs will say who are you man who who who are you so it will just say hey I'm a data brick so it will say I do not

care so just go back okay so now we need to include one entity a kind of ID card

a kind of permit so what it will do this is the connector let's say this is the connector okay now this connector has

the access to this data Lake by the way doesn't have but we will provide the access so this has the access so now what data

BRS will do data braks will say hey access connector just come to me so it will go here inside the data braks and

now data Brak will go again to access the data sitting in data Lake and this time ADLs will say hey who

are you it will say see I have the ID card so now it can use the data okay so for that we will be using access connector and we will provide the access to the data link to that access

connector so we'll simply click on plus create and I'll simply say access connector what do you want man I do not want your private offers why you just

showing me this again and again access connector so just simply pick this one and simply click on

Create and then we just need to name it I will simply say access DT okay then click on review plus

create and then create simple so it will just take I think few seconds to create this one as well and now the question is how we can assign the role to the data L it is very simple it is called rback oh

wow it is created see I'll simply go to home to my Resource Group so now we have three resources ready okay now our next step is to allow this excess connector

to read the data sitting in data L and write the data as well so we will simply go to access control wait first we need to select this one first of all we will just select the data link then we will go to

access control then we will say hey we want to add a role then we will pick a role and it is

called storage blob contributor this one click on next click on manage identities then select

members then search it uh access connector where is that this one then you just need to pick your access connector name and then click on

select that's it click on review plus sign perfect perfect perfect perfect oh it is assigned now we can just use that use that connector finally okay so now

it's time to actually go inside the data bre so simply click on this okay and then simply click on launch workspace it will just launch your data oh finally

finally finally Welcome to our lovely data brakes okay so this is your datab briak workspace homepage okay this is your homepage so here you can see a lot

of stuff in the left hand pane like in the left hand s side okay so let me just quickly give you an overview like what is that and why do we use these things so first of all these are like bread and

butter for you if you're using datab obviously you'll be working with workspaces which is kind of folder then obviously recent sta which is responsible to show all the recent

notebooks workspaces that you have opened recent workflows everything then catalog this is important this is important because we are working with unity catalog so this is really really

important and this is just renamed from data Explorer so it is the same thing then workflows where you just build ETL pipelines then compute where we actually

create the cluster do we create the cluster not really we just configure the cluster and data Bri creates the cluster for us all the VMS all the Vets everything so thank you so much datab

bricks for making our life easy so then it is Marketplace so we have a Marketplace here in datab as well because it also offers some thirdparty

tools thirdparty softwares to use within data braks sorted then we have this pane this is dedicatedly for like data analyst where they can just run SQL

queries SQL editor like it will be exactly same as your traditional SQL workbench then dashboards yes you can just build dashboards as well the same way you build in power BR table it is

amazing man it is amazing I will try to just add some uh pieces of this as well at the end of the video because our focus of this video is Delta life table so I will just try to add uh like small

pieces of this as well okay then we have data engineering this is our area where we have jobs where we have data inje where we have Delta live tables wow man

we are coming to you bro hold on then we have machine learning okay where we just build models and everything perfect perfect perfect by the way this button

was like used to be here at the end here here here here but they just moved it to the top maybe they have some plans to promote some stuff okay so this is all

about like datab workspace uh fundamentals and don't worry we will be just covering everything as we go along the video so do not need to worry much now so in order to just initiate start

working with databas we first need to create a meta store Unity meta store not just a meta store Unity meta store and why we need to create Unity meta store

so that we can enable Unity catalog really an really really yes sir really so now if I just click on this

catalog so you will see obviously something so this is your high meta store an lamba we already have a meta store why you want to create a new one

so bro my lovely bro so the answer is we do not use hi metastore we just work with unity metast store and unity catalog by the way this is not your metast store this is a catalog which is

named as high metastore because with Legacy metast stores we we we do not get the opportunity to create multiple cataloges so they just use to name our

cataloges as the same as the metast store so it this is not a catalog uh this is not a metast store this is just a catalog so as you can see this is catalog Explorer and these are the Legacy system then

this is uh looking for active computer resource because obviously you cannot see much thing when you do not have active compute so do not need to worry bro so now dat brick has launched an

amazing feature where they just create a Unity meta store for you for like by default but we do not use that we create our own meta store because meta store is responsible for everything bro meta

store will be um covering all your cataloges all your data workspaces like everything okay so it is very important it is very very very very very important

so how we can do that so for that you simply need to go to to the datab briak console and how you can do that simply click on this drop down and then you will see manage account and if you're

using datab for the first time you will not be seeing this thing manage account here why because it does not allow you to just access the console Page by

default so what you need to do so I will just provide you the link I will just show you right now as well don't worry so you just need to put that link and when you just log into that link then

only you will be able to see this manage account or or or I will again uh ask you to just watch my Unity catalog video because I have just talked about this

thing in detail I will just show you in this video as well but if you are not able to troubleshoot that just watch some part of that video right now in which I'm just creating the datab workspace I think it should be in the

initial phase of the video where I'm just creating the unity catalog so you can just check the chapters and then you can just see how I am creating this Unity meta store and how I'm able to

enter to this manage account okay so now I will just show you how you can do that if you do not if you are not able to do that just simply click on that video and just open that particular chapter and just watch that bro simple life is

simple so in order to go to the console page you simply need to just search on the Google that accounts dot yeah this one accounts do aure datab bricks. net

this one when you just click on it you will land on this page so it will just ask you to provide the Microsoft enter ID now if you just provide your Gmail account it will not take it so now

you simply need to go to the Microsoft enter ID click on home then search Microsoft enter ID and then you just need to go to users and then you can just simply pick your Gmail account and

it will just show you something email ID with hashtag external this things so you just need to copy this and just paste it here then if you are able to put the password it's right otherwise just for click on

forgot password and just set a new password then only you will be able to go inside this console okay so when you will be just able to go inside this

console then you will be logging into that console using hashtag email not with your Gmail account so while creating a Unity catalog you just need to keep some things in mind I will just

tell you what's that and if you are able to do that it's fine again I'm repeating if not just check the video uh that I just mentioned before as well the unity catalog video so that you can easily

easily easily do this thing so this is your account console page of data breaks and as you can see when I just click on this you will see I'm logged in with my hashtag email account this is not my

Gmail account okay so this is your console page okay okay okay sorted so first of all first of all you simply need by the way this is like nothing special which I just need to discuss

about this page this is very like obvious is when you just see the name it is like self-explanatory so you do not need to just say hey an can you please give us some brief it is very very very simple bro just like workspaces catalog

and these are the two areas where we just need to do the main stuff one thing that you can do simply go to user management now I will just tell you how you can just enable that manage Account

button in your normal Gmail account so currently as you can see many accounts do not worry in your case you will be just seeing this account like hashtag account then you can simply click on add

user then you you can simply uh put your email account which is like normal Gmail account and once you put that Gmail account then you will be able to see that manage Account button that I can

see right now here yes because only admins can see this okay so once you put this click on add user and it's done now you do not need to explicitly go every

time okay now then you can simply click on this drop down click on manage account sorted sorted sorted sorted okay

okay simply click on cancel so so first of all we will be creating a catalog okay these are the workspaces just in know these simply click on

catalog so this is the area where we need to create the meta store first so simply click on create meta store Okay Okay Okay Okay click on create metast store and now I will just give a

name in your scenario maybe you will see already a metast store created for you because datab brakes creates a meta store for you but we do not use that you can either simply delete that okay or

you can just create a new but remember one thing you can only create one meta store in a location so you should all you should create a meta store in the different location or you can just pick the location that I'm picking right now

so I will simply say DLT meta store okay and region I will pick simply UK

south UK UK UK South okay then it is asking me to provide ADLs gen to Bath do you remember we created a meta store container this is the same same thing

that we need to just put right now so what is the container name it is meta store at the rate what is the account name account name is let me just check

let me just check it is UN DLT unch DT data L okay unch DLT data link by the way you can just

copy the path from there as well so it's up to you it's up to you it's up to you then we just need to say DFS do

code. windows.net

code. windows.net perfect perfect perfect perfect and just add slash at the end it is important okay it is very very important now

access connector ID it is asking us to provide the access connector ID now you know why it was required so in order to just access the data Lake it needs access connector ID by the way why it

needs the access connector ID and why it needs storage account so whenever you will be creating the managed data managed tables in which we do not manage the data data bricks

manages the data for us right so in that particular scenario it will use this location it will use this container which is this one which

is this one meta sto okay simple now I just simply need to pick the access connector ID simply click on access connector and click on resource ID simple paste it here then simply say

create by the way by the way you will say an lamba yes sir this is optional why this is optional and if it is optional why we are giving it a location

and what will happen if we do not give bro bro bro just hold on and I will just answer all your questions so the thing is when you do not provide the location

to your meta store then whenever you will be creating a catalog okay whenever you will be creating a c catalog you have to have to provide the location

which is not a good practice we should always provide the location at the higher level okay so then you will have a choice then you have then you have a

choice if you want to give a location to catalog or not you will have a choice but otherwise if you do not provide the location at metast store level then you have to have to provide the location at

catalog level so this is the difference and you have you you should provide location at metast store level every time now our metast store is created it is saying hey assign to a workspace so I

will say okay just assign this metast store to this workspace simply click on this box then click assign so it is saying hey do you want to enable in unity catalog obviously bro

that's why we created it so simply click on enable and congratulations your Unity catalog is enabled wow that's it that's it we just need to do this stuff in

console that's it you can simply close it then you can simply refresh it refresh your datab break workspace and I am into catalog button right now

you can see a lot of stuff man now you can see a lot of stuff now you can see this plus sign as well that means I can create now cataloges as much as I want see obviously now you are not able to

see add a catalog because our cluster is not ready so before creating a cluster we need to create an external location

why because if we want to access data if we want to access data so now now the thing is let me just explain you this is very simple don't worry these are just the fundamentals and if you do not know

do not worry I'll will just explain everything so the thing is now what we want now we know that our access

connector has the privilege to read and write the data to whole data link okay simple very good now data brakes wants

to read and write the data in this particular container so for this we need to Define an external location in which we will say hey data braks this is your

external location and you can now freely read and write the data to this particular container because we have to define the

scope till container level only okay sorted clear very good so now we will simply create the uh external location and obviously what we will be

using credential like credential is your access connector simple so just go to catalog okay and then you will see external data simply

click on this then instead of clicking on this create external location first of all cck click on credential because first we need to assign the credential click on click create credential

credential name I will simply say unch creds access connector ID what is the access connector ID we just copied it so I will simply go again and copy

it uh resource ID perfect simply paste it here and it's done simply create so what we are doing here we are just

applying a gift wrapper on top of your uh access connector and we are saying hey this is your credential you

can just keep it safe safely okay simple this is created simply click on catalog Explorer and click on external data now we will create the external location so click on external

location and what will be the external location name I will simply say external data or external data lab simple and what will be the URL this is important

so the URL will be AB FSS okay oh I just added tripl s so then I just need to First write the container name container name is raw

at theate DL unch DLT datal l. DFS

doc. windows.net

simple now you will say from where did you get this link from where did you get this location so I just remember the syntax by the way you can just copy the

location from your data L as well and you can simply check the end points and then you will see everything here listed

okay obviously the link is like location is in the https form but data break accepts the location in abfs form so syntax is simple this is your container

name this is your uh data like data L of storage account name this is constant this is constant okay simple then storage credential we just created units

perfect then simply click on create so it is just creating user doesn't have credential location met store DT met store what what what

what user does not have create oh I see I see I see I see okay so this is a small bug that we need to fix so first of all we will simply go back to our

console page really yes yes yes yes really so this is my console page so the thing is I just told you one thing that this is your hashtag account so this is

the admin this is the admin of this Unity meta store but now what we will do do you will simply go to workspace simply select your meta store

and then within this you will see something click on permissions then this is the user this is the admin okay and this is also the

admin and if I just go back to the workspaces so this is the workspace admin this is fine but if you just click on catalog you will see dldt metast

store okay and then you will see edit button click on it and only this is the admin bro this is the admin but we are just using our databas using our normal

Gmail account so now what I will do I will simply say make me an admin perfect save oh oh oh I just picked the wrong account wait wait

wait uh a.com perfect oops oops oops oops Yeah perfect now

click on Save now you can just create that okay now simply click on this and simply click on create now you should be able to do that see error is gone by the

way that was not an error that was just a permission that you need okay now this is done now this is done now what we can do we can simply go to create our cluster because that is the most

important stuff so simply click on compute now we will create a compute and click on create compute and simply pick

uh personal compute because you do not need much stuff to process your data okay then data breaks run version or let's say runtime DRT you can say

anything just pick 15.4 that is for long time support right then terminate after simply pick 20 it's fine it's okay then yeah it's fine just click on create

compute so now it will just create your cluster behind the scenes see it is just creating this thing so now what will be happening do you remember we created the manage Resource Group so now in that

particular manage Resource Group data bricks will create the virtual machine for us which will be behaving as a worker node and Driver node okay so that is being done by the

data briak so say thanks to data braks so that is just taking all the overhead and actually improving the efficiency of the work because we do not need to manage these stuff we just need to

manage our data our processing our Transformations and that's it and that should be the scenario as well why we just taking care of like all those machines right right that's why we are

paying so much to dat out breaks okay okay okay sort it sort it sort it so I think it will take few minutes because it is just creating the virtual machines it is just turning on the virtual

machine so so far meanwhile what we can do we can just drink some water and you can also have something so once it is done I will just get back to you so our

cluster is ready so as you can see an lamba's cluster obviously in your case it will be different so our cluster is ready so now now when we can say that our cluster is ready that means we can

do anything within data braks anything yeah anything so just click on workspace because now we will be creating a workspace first of all click on this workspace for one more time then click

on create then H folder okay perfect I will simply create a folder called DT tutorial okay perfect then click on create then we

will create a notebook so this will just open a notebook obviously perfect so this is your data bricks notebook and we know

that our trial will end in 14 days so just get lost so first of all let me just give you a quick overview of the notebooks it is very simple by the way this is your notebook book name and this

is a name that you get when you create the notebook this is the default name so I will simply say tutorial one okay perfect tutorial one oh let me

just give tutorial one perfect now this is the run all button which is very very very handy when you just want to run

your whole notebook in one go okay then this is connect button so this will basically ask you hey do you have any available cluster and it will say yes so just select that cluster and it will

just attach that notebook to that cluster okay so this is a cell as you would know and this is the area this is the workspace this is the catalog so I will simply click on it because this is

really easy to see all of this stuff I will simply close this okay now this is spacious now this looks cool and I can

also drag it here as well so so so so in order to create anything first of all we need to create a Unity catalog because we should have a catalog and then within that we will create a schema so

basically there are two ways to create that using UI or using your code okay simple so I can simply say if I just want to go to catalog I

will simply go to catalog and I will simply say plus new button and this time I will simply say add a catalog this time I will say catalog name H so I will say

DT catalog make sense yeah then it is saying storage location now as you know that we have already created a Unity meta store with location so this is optional so I will just leave it I will

simply say create perfect configure catalog yes so this is your configuration everything looks fine and this is asking me to Grant access and we just have one

user so do not need to worry comment save perfect okay okay okay let me just go back to my notebook simply click on recents uh this time you can just leverage the recent stab because you

have something open right okay so here I will first of all tell you how you can easily change who is this man how you can easily change the languages within

the cells so as you can see by default language is python so if you just click on it you will see lots of options my personal favorite combination is python

plus SQL and I don't know who the f is using this this language I don't know by the way markdowns I love markdowns because let me show you why so let's say I want to create a heading so I will

simply say hashtag obviously first of all I will simply say percentage MD that is my magic command for markdown okay then I will simply say

um Delta live tables okay and I will give it heading as H1 H1 means just single hashtag and then

I will simply make it bold uh yes perfect let me just run this so this is my heading and that's why I love

markdowns because I can just make my notebook beautiful it will be so so so easy to read the code with the comments and headings and subheadings

okay so now I will simply say create a schema okay create a schema so by the way this should be my configuration okay

so if I just refresh this I can see my dldt catalog if I just click on drop down then I can see three buttons then you can see like open catalog path and so on and for now we just have like two

default schemas one is default and information schema now it's time to create the actual schema okay so how we can do that simply say create

schema and this schema will be my raw schema raw okay raw and by the way you cannot simply say raw you have to assign a

catalog as well so I'll simply say DLT catalog perfect let me just run this code so it will simply create a schema for me and obviously I haven't given any

location to the schema that means it is a managed schema and again like all of these things are like discussed in detail in that Unity catalog video like what are like manage schemas or external

schemas okay so I can simply delete this as well why because I do not need this one single cell okay uh yeah okay sorted

sorted sorted so what I will do now I will simply create my source for Delta life tables obviously we should have a source right and what will be our source

it will be a Delta table so it's time to create a Delta table and it will be an external Delta table okay so let's create that so I will simply say

percentage MD and this time I will simply say creating Source perfect and I simply say create

table and table name will be my let's say raw customers okay raw customers make sense okay then I will simply Define the

schema and ID will be my customer ID so it will be in int form then I will simply say name it will be in string okay then I will simply say

salary this will also be in int yeah let's say then it will be having

email okay and email will be in string perfect now I will simply say using Delta by the way this is an optional thing but I like using it it promotes

readability and yeah I love it then I will Define a location as well because I love working with external location external table so that I can actually see the data we can obviously see the

data in the manage tables as well by simply going to the manage data L but I don't like searching the table names by their IDs

ah yuck so I'll simply say location location will be my ab FSS okay then I will simply say my container name my

container name is raw at theate unch DLT data Lake okay DFS doc core.

windows.net perfect bro then I will simply create a folder called raw customers makes sense Mak sense okay let me me just run this

let me just run this okay what happened what happened what happened what happened what what happened failure to initialize conation stage account

what I think we defined the uh story external location right let me just check what's wrong un DLT data lake so

finally the error is resolved so the thing was there was just a silly mistake I was like what what what I just created the story account what happened I I thought like I just did a mistake in in

the spelling of the location I was like confused then I got to know I didn't write the catalog name this one DT catalog. row so when I didn't write this

catalog. row so when I didn't write this and when I just WR this raw customer so it was actually creating this table under default catalog and obviously we do not have any external location for

that particular default catalog so that was just a silly mistake and when I just ran this now our table is created okay obviously we do not have any data I will

simply insert some data in this okay so I will simply say inserting data okay so obviously you can just put any data bro

insert into table name and this time lamba just add catalog and everything okay then I will simply say values so

you can just give any values like you can just put like I will simply say oh this this is the thing that I love about datab it just monitors our code in the real time and it has given me some

recommendation so proo who is this so it is just giving us some suggestions so should I just take these suggestion let me just check what columns do I have I just have four

columns and you are giving me values as first of all let me just add t Okay 1 2 3 4 uh no so what I will do I will simply provide one value and then data

brakes will mimic that that's why I love data brakes so one then I will simply say name equals to let's say John no man

no no man what name should I should we give let me just pick any random name from my subscribers should we should we should

we uh oh bro man come on here uh do I remember any name okay let's keep it John

John is a good man okay one John and then salary we can simply say uh 50 by the way it's like K 50k oh let's put 50,000

man okay 50,000 and then email email we can simply pick John gmail.com do not email man do not

email okay simple okay perfect now I think it should just mimic it thank you so much thank you so much thank you so much that's why I love datab then just just mimic third value

perfect I want more bro perfect one more perfect one

more perfect who Sara one more Tom okay that's it that's it that's it okay let me just run this and before running it I just need to change the language of my cell I will simply pick SQL obviously I

could have used markdown or let's say magic command so I wanted to just show you so now it is in inerting some data okay let me just close some more tabs actually I was just looking at the

syntax like I'm just making some mistake because I just prefer writing the whole synex by my own but sometimes obviously I can also make mistakes so I I was just confirming the syntax like DFS do code

but I didn't find anything then I got to know on lamba what are you doing man so our data is inserted and let me show you the data as well so if I just go to home

and then if I just go to Resource Group then I will simply open my storage oops this is not the one uh this is the one Delta life tables so this is the storage

account and this is the container and then I'll simply open raw and within raw we have a folder called raw customers okay simply open it

perfect this is my data this is my data and what is the table name table name is raw customers perfect so as you can see

that data is already in the Delta format and obviously you should know what is Delta format okay so this is your data which

is actually in the pocket format and this is the Delta log which is actually holding all of your transactions so again if you want to learn Delta Lake in detail then definitely you should just

watch the video coming on the screen it is like Delta Lake full course which has covered everything everything everything related to Delta Lake if I have the book yeah I have that book so I just referred

this book Delta Lake up and running so this particular video is like purely based on this one and you will find a lot a lot a lot of knowledge okay so this is your data now our source is

ready our source table is ready I can also show you the data as well when I simply say select Ax

from uh DT catalog dot raw do uh raw customers

oh come on man oh oh come on man let's just run this then you will see you have like seven records okay and you are all set man your source is ready this is

your Source table okay perfect so this particular tutorial one or I can just rename it as let's say Source or I can simply say one source so we have

prepared our source our source is ready our source is Delta table okay so now let's create our first DT Pipeline and I will just show you what is that as well

so now it's actually time to implement the dldt solution so it's time to actually look at Delta live tables so in short form we call it as

DLT and by the way it is known as Delta live tables so basically just to remove your confusion DLT AKA Delta live tables

are not exactly tables it is a kind of framework okay it is a kind of it's cold here that's why I just put on my H cap

so it is a kind of declarative ETL framework by the way just forget the name declarative just remember ETL framework obviously you know what is ETL framework in which we just Define so

many activities in which we just Define uh task related to extract transform load similarly similarly DT is also an

ETL framework but what is so special in ETL framework AKA DT and we have workflows within data

breakes then why we need to use dld genuine question genuine so the thing is when we use data breakes workflow what data breakes workflows we need to manage

everything everything everything everything but in case of DT I know this is similar to datab breas workflows but this is not exactly same

and this is not an alternative to data bras work workflow data breaks workflows are different DT tables or D Delta life tables are different so when we need to

create an ETL framework or let's say I just want to create an ETL Pipeline and I do not want to actually take care of the steps that will be that that will be

performed behind the scenes I just want to tell my ETL pipeline hey just do this task this task this task and that's it

that means I am interested in just defining the tasks but not actually implementing it or actually maintaining them so this is

known as declarative framework in which we just need to declare the framework that said we do not need to actually take care of the tasks so who will take

care of it obviously data breaks obviously data breaks okay I know you haven't got everything in dldt right now because I I have just started this is

just the definition bro this is just the definition so let me just complete the definition it is also written that simply Define the Transformations and as I just said simply declare definitions

just simply Define the definitions or let's say definitions for transformations to perform on a data and let dld let dld pipelines automatically

manage everything for you orchestration transformation changes everything bro cluster management obviously monitoring data quality this is amazing this is

amazing data quality checks as well yes it will take care of your data quality checks as well or you can just say the constraints that we use to apply in the traditional databases so we can just

apply constraints as well and I will just show you everything bro I everything I just wanted to use this visual to explain you what are Delta life tables and the best definition that

I could use for Delta life tables and obviously Delta lake is the backbone of Delta life tables why because it will be dealing with Delta tables at the end

because DT is a framework but at the end you will get some tables right you will get some tables so DT is purely built on top of Delta Lake and we do not have much stuff on this page by the way this

is the documentation page that you can refer now let me just tell you what exactly we have in Delta like tables as compared to streaming okay so this is

the comparison of dldt pipelines with spark structured streaming as you know that we have spark structure streaming as well so why do we

need to use DD pipelines obviously these options are exactly same but data flow orchestration will be automatic that

means we do not need to actually build the pipeline no all those arrows all those orchestration workflows everything

will be done automatically by DT okay then data quality checks as I just mentioned that we get the option to apply the constraints on the top of the tables that will be also automated then

error handling and failure recovery that will be also automated and obviously cicd Version Control it is also

automated so all these manual tasks are eliminated by dldt that's why everyone is talking about dldt right now and you will get to know everything everything practically as well don't worry don't

worry don't worry then then then we do not have much stuff on this page okay so now we can just say this is just about

the theory behind Delta life tables now in Delta life tables we have two options to use one is Python and second is SQL I personally prefer Python and if you personally prefer SQL you can just make

a switch like this it's not much difficult but python is much more easier to understand if you just uh understanding let's say uh Delta life

tables so we will just continue our journey with python okay so we will just take care of everything using python from now onwards okay so now let me just

tell you what exactly we have Delta life tables like in within Delta life TBL so basically we have three things we have three let me just show you so I am here on another documentation page so here I

will just show you the components of Delta life table so basically basically there are three components let me just show you so the three components are first of all the streaming

table streaming table so streaming table is nothing just like whenever you just stream something like let's say you are just using streaming tables with Delta lake so we would just call them as like

Delta streaming tables so if you already know like streaming table in Delta Lake then obviously you would know like what is a streaming table if not then obviously you can just check my video of

Delta Lake right now and obviously it will just give you all the information in the world of Delta Lake and I hope that you know and even if you do not know just read it as a streaming table and we will just check the syntax do not

need to worry okay so next thing that we have is materialized view so basically if you do not know what are materialized views so materialized view is a view

that holds the query but as well as the result of that query at that moment so basically it's a view but it will just hold the data as well and it is very like common in the world of SQL as well

we use m I views and the third thing is like views views are like normal views that just hold the queries that's it that's it so these are the building blocks of Delta streaming Delta life

tables that's it streaming table metalize View and views we do everything in the world of Delta life tables with the help of these three objects with these three components that's it we do

not have anything else we do not have by the way we have like two things of views uh first thing is uh normal views and second thing is like streaming views

and again like views are views it will just hold the query so we will just discuss everything in detail do not do not do not worry then pipeline don't worry it will just create the pipeline

for us and how do Delta life tables data sets process data so as I just told you that these are like three objects and don't need to worry I will just tell you everything in detail then we just need

to go here at the end it is Delta life tables using SQL and python so we will be covering this one because we want to cover it with python

okay and if you want to just learn with seq it's same thing there's no difference with uh in performance so just click on this Python language reference and it will just take you to the documentation page where you will

just see everything and do not worry do not feel overwhelmed by looking at the code because I will just tell you everything in detail everything in detail so do not worry do not worry do not worry so these are like some of the

examples given by the data braks like how we can just use Delta life tables and I know it is looking it is like looking so so so messy right now do not worry do not worry I'll just tell you everything everything everything I just

wanted to just show you the documentation page so that you can just go and refer if you want to and let me just go back to my workspace okay okay okay okay so in order to create a new

let's say Delta life table pipeline we just oh my cluster is stop let me just start it so I will simply go to workspace

okay and then I will go to my workspace and this is DT tutorial then this is one source I will create another one and I will call it

as 2 DLT okay 2dlt simple ah

2dlt perfect so this is my second notebook and obviously my cluster is spinning right now so it will just take few minutes I guess to just start and meanwhile I can just prepare my notebook

do not need to worry because the thing is uh actually let me just tell you so I I think there's no need to just start it let me just terminate it why why anlama

why I will tell you I will tell you don't worry don't worry so the thing is in Delta life tables when we just need to run the notebook because obviously we'll be creating a notebook right we'll

be just writing all the code in the notebook and run that notebook we cannot see the output using allpurpose cluster allpurpose cluster that means like this

regular cluster that we use really yes really we cannot use this cluster if you want to see the output we have to create a new cluster it is called job

cluster okay so we will be creating a job cluster and before that we just need to write the code and then we can actually use that okay so yeah it is

good if it is terminated so I'm happy okay so now we can just write the code so first of all I will simply create a markdown and I will simply say

uh let's say DT Pipeline and we are just going to create our first pipeline which will very simple okay it will be very simple and I let me just show you the flow what exactly we need to do we have

a source okay this is our source which is raw customers okay that we just created sorry

this is our source let me just create a line okay then I will create three objects okay one will be for

bronze okay second will be for silver okay and third one will be

for COD okay perfect okay perfect so I will be creating one bronze one silver and one

gold okay okay sorted so what this will do this will just take the data okay obviously because in the bronze layer we do not make any changes then in the

silver layer I will make some changes or I can just add a column if I want okay then in the gold I will create an aggregated version of it simple that is like fundamentals of

ETL but here what I will do I will create here streaming

table okay then for this streaming table on top of it I will create a view in silver okay normal view not materialized

view just normal view then in the gold or let's do reverse I will first create the okay okay let's do it like this okay

I will first create streaming table then I will create a view okay and then I will create a

streaming table for one more time in the gold like I like we can create Metalized view as well here in this step so do not need to worry but for now we will just

create view we'll just cover all these don't worry so I will simply stream this data okay then I will create a view and then I will create a streaming table or let's create materialized view because

you will say hey we want to create materialized view blah blah blah okay let's create materialized view here okay so we will be covering all the

three different objects within the single pipeline okay so let's do that and I will just show you how you can just build that okay

so first of all I will simply write DT pipeline okay so now it's time to write uh bronze bronze and this will be a

streaming table so that we would know okay let me just run it so now in order to create a streaming table what exactly we need to do we need to use a

decorator a python decorator okay so what we need to say we will simply say add the rate let me just increase the size a little bit uh or I think it's fine yeah DLT

dot okay this is a python decorator that we need to use okay now we have an option to provide the name to provide

the name to our table that is our bronze table okay so now we have two options basically if I just say name equals to let me show you the code as well like

along the notebook so if you just go above and here you will see this is your dt. table and do not need to worry about

dt. table and do not need to worry about all the other parameters like comment this this this blah blah blah this is the main thing name perfect so if you just provide the name obviously when you

just creating at any table you would need to provide the name makes sense right okay I will simply call it as customers or bronze

customers okay so that I would know this is a bronze table or like customers in bronze layer okay this is the thing this is fine okay perfect now now I have

created my decorator I have created my decorator okay now hey bro do not give us spoilers just go away so now now we

will create a function in Python okay I will simply say d EF these are the like fundaments of python if you do not know DF then I would put any function name

any I will simply say my function okay but here's a Twist just see now you need to buckle up and need to be focused throughout the period

because now you cannot afford to be distracted okay so DF my function if I do not say anything here let's say I remove these braces and I do

not write anything then whatever my function name will be become will become my table name will become my table name and I

follow this strategy I do not provide any table name here when I write dt.

table I do not I prefer writing my function name with that let's say I want to give bronze customers so choice is yours you can

follow both the ways okay very good very good very good now within this function we need to create a data frame simple that we create every time and what will be the code it's simple spark.

readstream now why read stream because this is a streaming table this is a streaming table and on which source we are creating streaming table if if you know very good so I will simply say

readstream dot table because I do not need to provide the path PA path path and all okay so spark stream. table then I will

simply say DT catalog dot raw dot raw customers this is my table

name raw customers this is my table name that is there I can just show you let me just collapse it this is my catalog this is my DT catalog this is my raw schema

and this is my table simple I want to create my streaming table on top of this table simple this will create the streaming table very good what it will do it will just return the

DF why because in the bronze layer we do not want to make any Transformations we do not this is not advised by anyone so I will simply run this what will happen

what will happen nothing will happen because we do not have any cluster by the way even if we if we would have the cluster it would not show anything it would not run anything it will just show

hey your code is fine but I cannot show you the output just go away okay so we can just interrupt it we do not need to run this and now I can simply say

silver silver view because this is a view okay perfect so now we will just simply create a view and in order to create a

view in order to create a view you have to use dt. view there's a different decorator

dt. view there's a different decorator for it okay now same things same way you will just create a DF and then you will simply say silver customer thank you

help thank you for helping me data breaks okay now what you will do this is interesting now you know that what is my

source what Santos Pike joined the group okay so now you will say hey what is my source for view because for

bronze raw customers is my source now for my silver table this table should be my source but like this one this one but

this is not created this is not created now so how you will just use it let me just show you it is very simple so simply write DF equals spark. read.table

this time I will not use read stream because this is a view this is not a streaming view okay then I will simply

use a keyword it's called live okay live means we are referring the table that is that is being created from this

particular notebook that is from this particular DT pipeline this is the keyword live okay when I use live then I can refer to any table which is being

created inside this DT pipeline okay I will simply say bronze customers wow perfect perfect now as I just

mentioned that I will just uh add something just to show you the transformation so what I will say I will simply say DF equals DF bro it is

reading my mind let me just sayit tab okay so I will not cast my column by the way uh or let's cast it it's good I think it's already hey by the way we do

not have customer ID don't read my mind okay so what I will do I will add one column I will simply say flag okay and what will be the value of that flag

I will simply say lit new okay and if you know ppug you know this that we cannot use lit directly we have to import the libraries so we will quickly

say from pyspark.sql do

pyspark.sql do functions import Ax from

bpark do SQL do types import ax simply run this oh by the way this will not be running so let's just write it but do not run it

okay so I will simply add this column and then I will simply say return DF perfect perfect perfect perfect okay just run this obviously this will not be

running so just write it okay now the next code I will simply say now I know that you are understanding what the things that we discussed before in the definitions now

you will see the pipeline as well don't worry don't worry bro so I will simply say gold and this will be a Matt view or let's say materialized view same thing

so I will simply run this and this time I will simply say at the dt. table now

why we need to give table instead of dt.

view because this is a view no I know this is a view but materialized views actually hold the data as well actually hold the data as well so this time we

will simply be saying dt. table okay

simple sorted sorted sorted so I'll will simply say DF and obviously gold table or let's say

gold customers gold customers okay gold customers and this time I will simply say DF equals this time you know what you need to use yes you need to use live

keyword spark. read.table do not use stream do

spark. read.table do not use stream do not use stream because if you would use stream this will become a streaming table but if you do not use stream this

will become a mad view that is the only difference in the Mad views and streaming table when we are just writing the syntax that's it so just be very

careful okay and by the way we cannot use uh Matt view or just a view for our streaming Source I will just cover all those scenarios do not worry do not

worry bro do not worry do not worry just let me write write like complete this code then I will just make so many tweaks so many scenarios so that you can understand everything in the world of

Delta life tables okay so I'll simply say this one and I will make some aggregations let's say DF equals and uh DF

dot um H what columns do we have I think we have ID right ID and salary okay let me just do one thing um uh I can simply

H but we have unique numbers okay uh okay it's fine it's fine it's fine it's fine so what we can do we will simply say DF

dot Group by okay Group by and then I will simply say on which column I need to apply Group by on ID column and what do I need

I will simply say Ag and then I will simply say math maof flag.

alas yes let's do this I know the result will be same but I just wanted to just do some aggregations then I will simply say return DF perfect obviously I cannot run this our pipeline is ready by the

way yes it is ready it is ready oh oh oh nice nice nice it is ready so now the thing is in order to run this pipeline you simply need to click on first these three bars then you will simply go to

Delta life tables okay so once you click on this Delta Del live tables then you will simply say create pipeline okay very good now you

need to name this pipeline so I will simply say hm this is our first pipeline right so I will simply say first DT so that you would we would know okay this is our first pipeline this is our second

pipeline okay then product addition we will simply pick Advanced and then pipeline mode triggered we do not need to just continuously run this so triggered is fine just do as it is I'm

doing otherwise you will be facing errors and then you will be saying hey I'm facing this err hey I'm facing that error no bro this is really critical okay then path we do not need to just

worry about anything just click on this uh folder button and we do not need to type anything it will just pick everything so I'll simply click on DT

tutorial and this one 2 DLT hey 2 DLT H let's give two DLT here then I know it is first but I have given two because it

was at the second position yeah so this is The Notebook then which option you need to pick we will be picking Unity get aot because this is the future and

we do not need to work with the legacies code then catalog obviously we have a catalog dldt catalog what is a target schema what is a target schema we do not have any schema so we'll create a new

schema and I will simply create new schema and I can say um schema 2 just uh yeah schema 2

why because we'll be just creating new schema every time because I I want to just include so many so many so many scenarios okay then compute type this is important compute type obviously cluster

policy none then enhanced Auto scaling no we will simply say fixed size and what will what are what will be the number of worker nodes it will be only one okay and do not take on use Photo

acceleration Photon acceleration and then by the way Photon acceleration is a kind of C++ compiler that they have added during the Run time and it makes your code run faster so if you do not

know about that so I just wanted to highlight this as well then if you just go here in the worker type this is really really important why

bro here is the thing if you are using your free account or let's say your normal account you have a limited

quota which Kota qu quot number of CES and normally we get 10 CES okay and every uh uh

node usually have minimum four coures usually four coures okay so this will take total eight course how because one

worker node and one dri One driver node so your eight CES will be busy so now here's the thing after running this DT pipeline if you want to use your

allpurpose cluster or like any other cluster you have to first terminate this cluster have to have to have to how I will just show you it is very easy you do not need

to kill your cluster but you need to turn it off because only then you can use that your allpurpose cluster because that particular cluster will also have

one machine and that will also have four cores so how many cores there will be running it will be 12 do you have 12

cores no you just have 10 cores so this is the drawback of the accounts that were created earlier I don't know if they have increased the quota for the new accounts but I have

this limitation I just have 10 codes I can increase it I just need to raise a request so I will do it very soon I hope so but do not need to worry if you also have like just 10 course so this is the

workaround for that we will be just first turning it off and then we just need to go back it is fine in case we are learning so you do not need to worry

and yeah so worker type you have to have to select this one you have to because if you will pick this one eight course it will not be running your DD Piper and why because eight course plus eight

course it will become 16 and you you do not have 16 CES bro so just pick this one V2 and then same thing V2 only then you will be able to do that so simply

click on create everything is fine yeah simply click on create so now it will automatically create a pipeline for you okay not now we need to pick which mode

you want to use development or or production I will simply pick development why because in production mode it will kill your cluster or it

will just turn your cluster off once the flow is completed but in development it will keep on running so that you can just make some changes so that you can just make like do so many stuff and I

want to make changes yes what are those changes I will just tell you after running the pipeline so that I can just clear all your doubts and there're like so many scenarios don't worry bro it is just the beginning so I will just keep it in development mode and then I will

just turn it off when I need to just show you something okay okay sir sorted okay okay okay so I'll simply click on

start so initially it takes at least 5 to six minutes to create the cluster so job cluster take job clusters take time so you have to have to have to wait for

like five to six minutes and I will also wait and once it is done I will just show you what do I get okay so meanwhile I will have my shake

we'll be back soon so five minutes are over so now let me check my mic so now it can just create now you can see like it is waiting for the resources actually

it took a lot of time to actually create the resources and I'm so sure that my resources will be ready in few more seconds and once my resources are ready I will just show you how it will just

create the pipeline for you and then you need to take out your notebook because I will be covering so many questions that can be asked in your interviews plus it is not like just about the interviews it

is about your understanding because when you will be actually creating this pipeline for the first time your mind will be full of questions and trust me bro when I was learning this thing I was

like hey how did this happen Okay what will happen if we do this okay what will happen if we do this what will happen if we do that so I know I know I I know I I feel you I feel you I know you will be

having all these types of questions so I'm going to cover all those queries all those questions so as you can see my resources are completed see now it is initializing so now you should expect a

flow automated flow I will not create any pipeline it will create the pipeline for me by the way what is the biggest and um let's

say advantage of DT over streaming or let's say workflows we are creating streaming tables but did you just observe did I give any checkpoint

location anywhere did I give any schema location anywhere did I give anything no no I do not need to even manage my checkpoint location so it will be

managing everything for me everything everything everything everything so now it is saying setting up tables so now it will just create the flow bro we have the flow we have the flow we have the

flow so now it is running this one and if we see any errors do not feel sad because obviously we just wrote Our code without executing anything so do not need to worry we will fix it do not feel

sad by the way we don't don't have any errors wow man wow this ran successfully now this is running hey anch how you are seeing all these things bro it is just in front of me see now it is running

this one okay please run please run successfully I don't care just give me the errors I don't care bro okay so now by the way by the way by the way my

pipeline ran successfully okay okay okay now you can see it first ran this one bronze customers then it ran silver customers then it ran gold customers and

if I just zoom a little bit you will see the uh icons as well see this is your streaming table in which streams can be seen this is a view in which goggles can

be seen this is a matte view in which Thunderbolt can be seen so just remember these symbols okay so this is my pipeling wow wow wow wow wow now you would be saying hey we want to just see

the output wait wait wait baby wait I just show you everything before that I just want to show some of the amazing stuff because obviously if I just want to see my tables I just need to

terminate this one first and then I'll be going there it will take two to three minutes so I want to make sure every change and then I will turn it off and then I will go back to that cluster and

I will show you everything everything everything okay so now as you can see my cluster is still still on how because it is running in development mode okay so

now if I just go back to this show events I will simp click on it and it will just show me all the events obviously then if I just go back to my notebook okay let me just click on

recents then let me just click on this DT notebook this time okay now what I will do now I can just attach my cluster

really I'm not talking about Job cluster uh sorry I'm not talking about allpurpose cluster I'm talking about Job cluster simply click on it then you will see Delta live tables is green wow so

this is your compute that you can use and simply click on connect what this will do this will connect my job cluster with this one now I can run

all these things sell by sell and I can actually see the output by clicking on validate button okay okay sorted first

of all first of all first of all let's say let's say bro just move this button

okay okay this is the flow let's say and you will now observe the power of dld let's say you created this

gold customers okay now you are saying hey I do not like the name of this table can you please rename it I would say

what because obviously if we are not building DT pipelines it will be such a mess if you will ask me to rename my table it is so so so so so messy and it

is possible but you need to do a lot of steps right a lot of steps but do you know what what you need to do in DT can you expect can you imagine let me show

you okay if I want to let me just bro I'm serious bro it is so easy so and after this you're going to learn learn

you you're going to love DT so what what I will do you you asked me hey an please change the uh table name and obviously in the real world you know it it is such a hectic task if you want to rename the

table okay okay I will simply write alter table this this this and all you know what what I will do here let's say you could told me just rame it as like

total customers or let's say a customers it's like aggregated customers okay what I will do I'll simply go here and I will simply say AG customers and that's it no

I you're lying it it cannot be so easy it can not bro it is easy let me just click on validate let me just show you let me just show you now it will not take much time do not worry because our

cluster is turned on so it will just skip the Second Step that is waiting for resources see it is done now it will initialize everything okay and now you

will see the magic now you will see the magic magic magic so it is just initializing everything and then it will just create the graph for

you and don't worry I will just show you the inner workings as well so this is fine this is fine this has not run the pipeline but this is saying bro you do not have any errors this time I will say

if I do not have any errors then simply start it because my data fam wants to see that data okay in the new table not

new table like just the table with a different name right okay we didn't know like it can be so easy yes it is so so so easy yes yes yes

so it is just initializing and then obviously it will just set up the tables and then yeah that's why it is so important that you run your DT pipeline when you

are developing in the development mode because you can just make changes in the real time and you do not need to wait every time for six to seven minutes I know it's a hectic task but do not worry

so see see see see see see now what is the table name what is the table name just tell me what is the table name what

is the table what is the table name it is called gold act customers what it has just renamed it yes it has just renamed it that's why it is called

declarative ETL framework you just need to declare it bro an Lama we want proof okay sir ma'am I will give you the proof I'll simply go here Delta catalog I will

simply show you the schema too that is our schema for this dld pipeline I'll simply click on this drop down boom boom boom boom you have bronze

customers you have gold agree AG customers it has just renamed it it has just renamed it bro it has just renamed

it by the way you cannot see view because this is not a stored data this is just a stored query okay un this is

insane what else we want we can do I will show you let's say you asked me that hey I want do not want to perform

any aggregation I do not want to do anything I do not want any transformation just remove that column I do not want that column what I will do I will simply comment it out that's it

what you do not need to apply those statements like Drop Like column or remove column or like delete column nothing you do not need to apply alter

command no really this is I doubt it you cannot do this because this is a table this is a table okay I can just simply click on validate I can just show you I can just show you I can show you

by the way this is just the beginning bro this is just the beginning let's see what else we have see this is very dated I'll simply click on start now and I

will just show you I will just show you un this is unreal I know I know bro this is unreal this is not unreal this is now

real and that is why DT is the future DT is the future by the way DT is built on top of Apaches spk so this is a kind of

you can say a new player we have in the market and yeah yeah yeah yeah see now it is

running my flow it is running this one this is running this one now this running this one gold act

customers run run run run run perfect what this was amazing this

was amazing H I know I know I know so now now now now let's talk about some Insider talks

really really so what do I mean obviously that doesn't make any sense that this is not magic right this is not magic data is being written somewhere else we know we know this we know this

so an we need to just learn we just need to know what exactly is happening behind the scenes Let me just tell you what exactly is happening behind the scenes

okay okay okay so let me just tell you let's keep it everything like this let's go to the catalog catalog Explorer okay so this is my catalog Explorer just go

away bro this is my catalog Explorer I will simply click on DT catalog then I'll simply click on schema 2 now you will see these

two tables we know that like gold a customers and bronze customers okay that makes sense that makes sense that makes sense s okay now what I will do what I

will do I will simply click on details okay and then I will get something very good this is the thing

that I want schema ID schema ID okay okay okay so when I go here inside the schema ID I will actually see these two tables obviously like if you just click

on this bronze customers and then you you just click on details you will get the table ID so just remember let me just go to the storage account so that you can just see as well what what I'm

talking about so I will simply go to RG Delta live tables okay and this is my uh data L that we are using right go to containers and go to meta store because

it is using this container to store everything let's go inside this let's let's raid okay so this is your catalog these are your tables uh these

are your tables wow wow wow wow wow these are your tables so so so so so let me just show you something special

okay so these are my real tables that we are referring right now okay and do you know what it actually did behind the

scenes when I just rename this table what do you think what it actually did so actually it created a new table for me what yes it created a new oh sorry

new table for for me yeah so it created a new table for me and it called it as gold a customers so what it did with the

previous table it basically tombstoned it or let's say it marked it as not used or unused really how do you know this because datab bricks called me and just

told me oh come on man okay okay let me just tell you how do I know okay these are your like normal tables and if I just want to check the uh let's say

schema ID this is my SCH ID and if I just go want to go to bronze customers this is my table ID okay now I will simply go to data breaks internals that

is the reality click on this okay then click on this for one more time because this is the schema and inside schema we have tables what the why do we have three tables yes

that's what I was talking this is your bronze table this is your gold customer that you created earlier but when you actually re named it it created a new

table for you baby it created a new table for you so it didn't just rename it it created a new table for you but what it did then it just mapped this new

table with this particular object in the schema that we can see normally wow and if you just click on this table and then if you just click on

details by the way I know I'm just explaining this stuff in detail because I want to because I want you to learn everything okay this is the like real

knowledge that you should get and now you're getting it so I'm happy okay so now if I just click on this details I will see this table ID and you will find

exact table ID in this folder C fbd this one so this is your gold customers table the actual table customers that you

created earlier and this is the new one and this is the details and this is the ID oh e e e and this is the one see the

time stamp as well it's like 321 so it is the latest one so this is the new table okay okay okay okay so if you just click on it and if you just click on

Delta log you should just just see like everything in the latest log file as well so if I just expand it I just show you the Json file and then if you just click on

edit oh yeah so now as you can see if I just check the changes

uh okay so this should have that particular change so this is the first file

okay yeah so here you can see like All the Things are Written here when we actually created this table or let's say we renamed this table so these are like all the trans transactions that are

written in this particular table so now you know where actually it is making the changes actually it is doing all those stuff that you would be

doing using your coding using your Delta Lake code but now it is saying hey you do not need to do anything just tell me I will do it for you so you are not

actually seeing those things but all these steps are like are actually the same that you would do if you would be at his place now you got it very good this is just the beginning bro you have

already fallen in love and love with uh DT so move on move on no bro no don't don't move on bro move ons are hard

really hard so now now now focus on this thing okay so now what we'll be doing now I will just ask you one simple question and just take your not uh take notes on these questions because now

it's time to actually discuss those things okay because I wanted to show you the stuff the real stuff before actually doing anything so I will simply go to

recents okay and I will simply click on my where's my notebook where's my notebook yeah uh here it is okay I hope

that my cluster is alive yes yes thank you now if I want to make one change okay what's that change if I want to

create view here okay normal view not a streaming table and and then if I create streaming table basically I want to reverse these steps like I want to

create view first on top of my uh raw data okay and then I want to create a streaming table on top of this view what will happen what will happen let me just

tell you what will happen Okay an lamba why you want to do this so that you can learn bro okay so what I will do I will

simply say dt. viw

or not even view let's create a matte view let's create a Matt view okay I want to create a mat view so instead of creating streaming table I will simply say stream. table and this is my raw

say stream. table and this is my raw customers and obviously this time we need to just rename it because this is a new table so I'll simply say raw customers Matt okay so this is my mat

table now this time this will be a Matt table okay and instead of view I want to create a streaming table can I do that

just do it okay so now I will simply say dt.

table because this is a streaming table okay yeah makes sense makes sense and by the way an lamba you need to change name

here oh yeah baby Matt then here it will be a stream okay everything else will be same we just

need to add mat here okay perfect do not get confused bro this is very simple I'm just changing the type of object I'm just using here streaming table or sorry View and then I'm just using a streaming table on top

of that mat view okay do not get confused then rest of the stuff will be same and this mat view will be built on top of streaming table it's fine it's fine is it

fine okay then obviously we need to use here stream okay so now I will simply click on validate and you will see you will see something let me show you you

will see something something actually I made all these like combinations of questions like before actually doing anything so that you can

learn okay this is a mat view how this can be a mat view I just created a streaming table bro I created a streaming table oh I forgot to write stream that's why I was like how it can R how it can run bro

just throw us some errors okay so now we is validating it so yeah I was saying that I just created these combinations of things that can we do this can do

that out of curiosity so now I want to just show you that Curiosity show you that Curiosity okay okay okay so as you can see there are no errors hm

interesting so let me just click on start obviously it will be running fine but onu were saying that you will be seeing errors I will show you some errors wait wait wait this is just the

first iteration first iterations are fine first iteration is like easy to go then you will see this will work obviously this will work and this will

create a matte view then a streaming table and then a mat view on top of streaming table makes sense makes sense makes sense makes

sense okay so it is just setting up my tables okay quick baby okay so yeah first step is running right now mat

View and it is just bringing all the records that are I think seven makes sense and streaming table will read seven records makes sense and we should see by the way an where it is written so

this is the thing see seven seven records here seven and Matt view will read seven records obviously because it just have like seven

records okay perfect everything ran fine okay an lamba just rerun this pipeline for one more time okay now you will see

something I didn't make any change by the way I didn't make any change I just rerun my Pipeline and I want to just show you something I didn't make a change I'm

just repeating that I didn't make any change it is just initializing it okay then then

then setting up tables nice nice nice so now it is running Matt view it will run fine it will just bring seven records again then it will failed this

step streaming table show us some yeah wooo astrologer an I'm not an astrologer bro so the

thing is why did it happened and a wait wait wait hold on hold on hold on it ran fine in the first run it failed in the

second run if there would be some error why it didn't catch in the first run okay okay okay bro bro bro bro bro hold on hold on hold on this is the condition that's why I told you just take out your

notebook and just start writing now because an lamba is saying something important okay so streaming table whenever you are using streaming table you have to have to have to keep one

thing in your mind that you will always be using a source which will be a which will be an append only source which will

be an append only source I'm just repeating if you want to use streaming table your Source should be an append only source Matt view is not an append

only source no you cannot append anything because every time it will just bring all the records that is a property of a view it will just query everything that is there in the data so that is not

an upend only uh Source okay so here what is the best fix here if you were if you just want to apply a fix like what we can do then so there are two ways

either you you can create a streaming table for your Source like for your first step or you can create a streaming view yes you can create a streaming view because streaming view is also like

running a query obviously it is also just querying the source every time but it is a streaming view it is not acquiring all the data every time okay so you can use streaming views as your

like first step or as your source for streaming table sorted so I will just make a fix here and then you will see that it will work okay so what I will do I simply change the bronze table and I

will simply create a stream view so I can simply say St Str view okay and this time uh in the silver table I will keep everything same but I just need to

rename my table because this table was using this T this view first and now I have just changed the source so I will just change the table name and this time

it will be a stream table stream stream table same thing okay and this is my uh uh stream view St Str view perfect then

in the gold obviously we can just change the name it's fine we can simply say new gold act customers new okay and then my

source for this will be uh stream table new I guess what name we will give here yeah silver customer stream table perfect let

me just run it let me just first validate it okay on lamba now you have to run this pipeline two times because we need to confirm it okay bro okay okay no worries I will just run this two

times no worries so it validated it so let's click on start okay let's click on start start start tum tum

tum I know I know I know you are loving DT a lot and you should you should you should you should by the way you need to play a lot with DT then only you will understand it

so and I have specifically put so much of efforts and just put all the Curious questions that I had at that time and I'm just trying to explain and cover all those questions

so that you can just learn everything because in the interviews you can expect anything any anything right anything and you should have like deep knowledge and when you have deep knowledge only then you can

Excel perfect perect perfect perfect so now it is running streaming table okay then it is running hey why it is creating a mat view I created the oh

man I didn't change the definition my bad my bad I just need to type here View and I just need to say read stream okay let me just make this change my bad

don't worry don't worry don't worry don't don't worry then again I need to just change the name stream view new it's fine man it's fine it's it's fine

it's fine and I will simply say new here as well new and here as well I will simply say new

new it's fine yeah it's fine stream table new stream table new without underscore okay let me just validate it so that's the best thing of

this development cluster because if we have any error it will just show us here in this in this dag and we will say hey we do not have any errors so wooohoo simply say

start okay okay okay so this time we have streaming view yes because we have View and then we have given stream read Stream So This is a streaming view now

instead of just a view don't worry I will just run this pipeline two times don't worry don't worry don't don't don't worry let me just check what other question do I need

to cover before jumping on the next topic uhhuh okay now all the questions are covered after this now we will just discuss something special and it is called append flow append flow append flow a

pen flow yes a pen flow so this time you can see this is a streaming view instead of just a normal View and then it is creating a streaming table and obviously for the first time it runs fine we

actually want to see the result when we will be running it for the second time okay okay it ran successfully okay so

now I will just simply click on start for one more time let's let's let's do that let's do that so it is just initializing and it

should run fine it should run fine without any errors like if we have any erors we will just tackle it maybe we would have made any mistakes in the table definitions

but it should run fine so do not worry even if you have uh errors this time but I'm say this should run

fine yeah see see see see see now you should you should you should observe something let me give you only

three seconds 3 to one no just take your time just take your time what is that thing that I want you to observe so this time it didn't read any data why because

this is a streaming View and we do not have any new data in the source right so it br Zero Records an lamba love you

okay we got this but why we have then seven records so answer is very simple this is your streaming table so it is actually storing the data somewhere

right and it is the property of a view like normal View and Metalist view both that it always queries all the data sitting in the table all the data all the data so obviously we know that this

table has seven records because in the initial run it process seven records so it is just bringing seven records and it is again entering seven records in the

goal table wo dot join the group via invite link dot thanks

dot okay they are hiding their names okay okay okay so did you get the concept did you get did you get the concept did you get the concept if yes

proud of you I'm really really really proud of you so now it's time to actually cover the second concept that is called append flow and before covering that particular

concept I just need to prepare the data prepare the source for that one and let me just quickly do that and for that what we need to do we simply need to go

to the Delta pipelines like Delta life tables and simply click on this and simply click on production simple by the way I can also

kill this by because this is done and we do actually do not need to save it so simply click on Delta live tables I will simply click on delete because I will create a new Delta

pipeline so I will simply say delete deleting pipeline 2or DLT and it will yeah it is deleted now so it's fine now you can just turn on your compute simply

go ahead compute and click on turn on so it will just turn on your cluster and yeah yeah so now once we have this cluster on we will just make prepare our

source and I will just show you the upend flow then and actually uh in this particular session like the upcoming session I need to just show you so many stuff so I'll just show you once this

cluster is on so our cluster is on so it's time to actually jump onto the next session and this is really interesting and for that I just need to create the source okay one table is already created

I just need to create a second one whoa we are working with two tables yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah so so so so so so so so so

what I will do what I will do I'll simply go to workspace and I will simply go to this workspace and then I'll simply go to D tutorial and then I will simply create

three okay 3or DLT perfect so now what I will do I simply attach the cluster here

sorted and this is my 3dt and before even hey what is this console I don't want your console bro go away and even before preparing this notebook I want to prepare my source so I will simply go to

recents and I will simply go to oneor source so now what I will do I will simply prepare the next table so as you know I already have this table right

right customers raw customers let's say my source table has one more table and it has raw customers new let's say they have like new customers in the new table

table and I want to apply a union between both so I will simply create a new table I will simply copy it and I will simply paste it here and I will

simply name it as uh new like new customers right makes sense and I will simply create a new folder as well for it that raw customers new I'll simply run

this oh man oh man wait wa wait wait wait wait wait wait wait wait wait wait otherwise it would just throw some errors why because I didn't change the name here

yeah oh I changed it oh man oh man a come on man just run this okay cannot create table that St

loation is not empty oh man I think it has just run the previous cell let me just drop the table first let me just drop the table drop

table this one by the way if it is created I don't have any issues makes if it is created it's fine I can just insert some data so I'll simply pick

this data and I will simply copy it here okay and obviously I need to just put suffix as new and this time just to

make it unique I'll simply add values as I will simply add zero after every

digit like 10 then 20 then 30 then 40 then 50 then 60 then 70 simple because I

want to keep this ID unique so this will simply add my data to this new table what's wrong what's wrong what's

wrong cannot be found oh I think it has created that but I think it just messed it up so do not need to worry I can simply go here and I can simply drop the

data if we have yeah we have so I will simply drop the table and I will drop the metadata as well so drop

table this one SQL run it go away what cannot be found oh oh I got it I think it just created the folder but

it didn't write the table name wow perfect perfect now it can just create the table perfect perfect perfect that's why do not interrupt any

cell an lamba you did it you didn't okay so this is just created yeah this table is created almost now I will just

insert some data inside this uh not this cell this one yeah so now this table has 10 20 30 40

50 60 70 okay that's all that's I wanted to do in the source side that's it okay yeah that's it that's it that's

it okay perfect let me just go back to my three bars and let me just go to computer compute and let me just turn it off why the same thing number of course available quota course quota so I'll

simply click on this terminate confirm now let's create a new DT pipeline let's create a new DT pipeline okay and how we can just do that we all

know this time right is it terminated yeah now it is terminated simply go to workspace and I think we should have a third notebook because I

just created it yeah here it is so what I will do bro we know that trial is ending in 14 days okay so I'll simply

say percentage MD and this is a up and flow what's that hold on bro hold on I'll just tell you everything up and flow so now I will I what I want to

do let me just create the criteria again it will be good for you right so what I want to do I just want to create like these are my two sources first of all

okay this is my borderline this is my customers okay this is my customer new perfect these are these are my two

tables so what I want to do now I want to create a streaming table on top of this table okay perfect I want to create another streaming table on top of this

table perfect then I want to create one mat view one matte view like these is these are my bronze okay

this is my silver so now what this silver table is doing this is just applying a union union means it is just appending the

rows between the two tables okay so this will connect to this one like this this will connect to like this okay my silver is ready then at the end I will simply

create a streaming table for my gold okay make sense make sense yeah so what is so special in this we can do that I know you can do that but but now I want

to apply some optimization techniques in Delta life tables okay and we would love to know that I know I know let's do that

so first of all we will create uh bronze okay and bronze this is my uh

streaming table streaming customers okay and now you know how to create that so just do it on your own the D do table okay this time let's give the name

from this syntax let's explore this as well so this this will be my bronze cust yeah makes sense or let's

say bronze customers perfect then I can simply pick any function name so I will simply pick my function because this is my function okay oh I don't want to take

screenshots bro then colon then I'll simply say DF equals spark dot read

stream okay then table and you know which table it is DT dot uh dtore catalog.

row dot raw customers perfect and what we want to do just return DF because we do not want to apply any Transformations perfect this is my table number one okay

now my table number two bronze streaming hey streaming

customers perfect so by the way this is new oh o o

o perfect perfect now I simply said dt.

table okay name equals uh silver customers new not silver sorry bronze customers

new okay this is my second table perfect and I can simply copy this code I can simply paste it here and we know this time the table name is new this is

sorted this is sorted let's do the transformation I will simply say percentage MD

silver uh mat view Union Union Matt view perfect so now this time what I will do I need to

create obviously a dt. table without

streaming so I will simply say DF this will be my silver Union Silver

customers Union perfect okay perfect and what will be my DF this time I need to read two DFS two DFS okay so I'll simply

say see this is again giving me SP bro come on man df1 equals spark. read.table

instead of streaming we will be using read. table because because because

read. table because because because because we are creating a Matt view we are not creating a streaming view okay

so this is read. table and then what we need to do we will simply pick live. this table bronze customers and

live. this table bronze customers and bronze customers new okay this time it is bronze customers it is

fine okay df2 equals spark. read do

table then live dot bronze bronze customers new perfect now in order to apply a union it is very simple you will simply say DF equals df1

do Union oops df2 perfect now just return us the DF simple this is my silver table this

is my silver table okay perfect perfect bro perfect now finally we want to just create a streaming table on top of Matt view so just a quick question can we do that so the

answer is no very good I wanted to confirm this thing so what we can create then we can create a kind of mat view

obviously because we cannot uh create let's say streaming table on top of mad you right because we just discussed that in previous section okay perfect now you are learning good so we simply say

percentage MD gold and then this will be

my uh mat view simple okay so in order to create Matt view I will simply say just copy this okay and this time I do

not need to do this no union nothing so spark.

read.table

okay then live dot silver customers silver customers Union I guess perfect perfect perfect perfect yeah perfect silver customers un so this is

fine this was easy I what is different we just did like everything in the past section oh oh oh really really really let me show you what is different first of all I will create my DT pipeline I will

simply go to here Delta life tables perfect then now we simply create on create pipeline so this will be my new pipeline I will simply say uh

three DLT pipeline okay then again path uh uh uh 3dt perfect then Unity

catalog catalog will be my this one target schema will be new one or let's create let's pick the same yeah let's pick the same

um let's create a new one schema 3 okay because I don't want any M okay so

now again Auto scaling no fix size one then worker type V2 by the way if you just leave it empty it will not throw any error this area but if do not pick

it it will automatically pick the V4 version and it will just fill up all your course quot and then you will say hey I cannot do this I cannot do that so just be careful just click on create

okay then now same thing it will just take some time to actually create your cluster and for that you just need to click on start so once you click on start it will just do all the same steps

once it is on I will just make some tweaks and then I will just show you the upend flow because we haven't seen upend P flow yet I will show you a pen flow

okay so we have a red mark an issue what's that let me just click on it it is saying DF equals

spark. read.table

spark. read.table

live dot okay d.

table oh there's something wrong with the notebook notebook let me just check by the way the good thing is this is now started so we can now easily debug the

issue so I'll simply go to recents uh this one DT perfect so so so so just

attach your notebook with the ah no no no no no no no no no no no connect okay perfect so where is the issue where's the issue where's the

issue the issue is I think here oh man come on come on you cannot use the same name again you have to use

gold and you have to use just customers okay my bad so sorry let me just validate it okay okay

okay perfect oh oh oh oh oh oh oh again it will just throw error I just forgot to remove the like add the parenthesis

do not worry I'll just do it uh or let me just click on stop no worries and then I can simply remove

this one and then I can simply again Med dat yeah yeah yeah perfect perfect perfect perfect

M okay now let's see yeah now it is done oh nice flow made by an lamba Okay now what's the issue an lamba everything is

fine everything is fine okay nothing is fine let me just click on start don't worry it will not give us some uh any errors so do not expect any errors but I want to show you something like what do

I actually mean and what is like upend flow within Delta life Tables by the way all of these things are written here but obviously it is very difficult to just understand everything from the text so

that's why I'm just trying to explain everything with examples with the Practical lab session everything so so so so so if you just search like dld

Pipeline and then if you just click on if you just search let's say aend flow you should see something see aend flow then this is the documentation for aend flow only so you do not need to worry at

all and this is the expectation expectation yeah expectation is for error handling don't worry don't worry we will just tell you everything

so this is the view and you can see that I can expect like seven uh Records coming from here because this session is purely based on the number of Records it is processing so as you can

see see seven records are processed seven records are processed and this view will process 14 records make makes sense makes sense makes sense make

sense okay so now you can see 14 records are processed okay now this materialized view

will read 14 records again perfect but if I just rerun this pipeline okay let me just rerun this pipeline let me just say start again what will happen just

tell me what will happen what will happen you will say hey uh those streaming tables will again process the data and these are streaming tables so

obviously that will not bring the new records that makes sense that's why they are streaming tables that will only bring the new data okay so what's the big deal then I will just show you what's the big deal and that is the

issue that we are just trying to fix and that is why we have a concept called a flow so this should not read any record

perfect this should not read any record perfect 0 0 but but but but but but this

will read again the 14 records oh now I see what you're talking about an now these 14 records are again added in this materialized view that I do not want

that I do not want because I have not added any data so why this is adding this data again and again H Point valid point valid point so what is the fix for

this what is the exact like the real fix for this so we have something called aend

flow okay we have something called up and flow in which what we do we create a streaming table instead of materialized

view and we just append the data from this Source then from this source and that way it will just read the data incrementally because that is a

streaming table right let me just show you how we can do that so in order to create a uh Union so I'll simply edit this first

Union I'll simply say up and flow okay up and flow streaming table perfect oops it is running it because I

just hit shift plus enter but do not worry we are not running it right now so I have to just leave it like this so just ignore this so now first of all let

me remove everything so in order to create upend flow streaming table what I will create I will create an empty streaming table first empty streaming

table yes so we can simply say DT do uh create streaming table by the way I can import

uh from Delta life tables from Delta life tables or dt. pipelines from

Delta life tables what was the module name man import DLT what is the yeah import

DLT so we do not need to write from we just need to write import DT oh do not run do not run do not run bro yeah so import dldt then dt. create streaming

table and then I will simply put the name of streaming table so this is my streaming table I will call it as silver uh upend flow

table okay so this is my table okay so let's create a new cell this is my empty table this is a streaming table but this is empty we do not have anything no

nothing nothing then we will use something called as DT dot upend flow simple then what we will do we will say

hey we want to create an upend flow where the target will be this table simply copy

this so this is my target okay so I want to append the data but what I want to append the data this one

DF DF and then I will write anything like because this doesn't matter because this will be appended into this table

this table so I can simply say DF let's say uh stream one or bronze one bronze cust

simple okay then I will simply say DF equals spark dot read stream because this is a

streaming table by the way and even if you do not want to create a streaming table it is fine so simply stream. table and what will be our

stream. table and what will be our source and we all know live dot bronze customers simple this is my this Source this

one at theate dt. bronze customers okay so this is my this source and I want to add this data to this target

simple simply write it perfect so what this will do this will simply return the DF into this Target because we have added decorator on top of this function

these are like python decorators by the way what are decorators so decorate I'm not teaching python by the way just giving you a hint so decorators are used

to decorate your functions they are used to just add some wrappers on top of your functions without changing any code okay so obviously we do not know what they have written inside this decorator but

obviously these are like decorator so these are taking our function as an input and then it is just returning something else and we know it is just doing everything in the background by

the way nice logic I love that okay this is my first uh DF then I will do the same stuff with the other one as well I will use the new customers as well okay

and the target will be same because we want to apply the Union right perfect I'll simply say new okay perfect now what it will do it will

create this particular streaming table and it will be having Union of both the tables perfect so this is my table name which I will use it for gold okay then I

will simply say live dot silver appendto table okay and obviously customer table I need to just change it I will simply say append

append flow so that it will just distinguish it okay this is my append flow table okay now just validate this now let's validate this and now you will

not see that issue you will not see that issue oh one error what is that error bro what is that

error what is that error event log uh okay let me just read what's that dt. up andf flow oh oh I got

it I got it I got it I got it I got it I got it I got it just just remove this then just say validate because I think we do not add

add the rate I remember we do not add at the rate with slowly changing Dimensions what are slowly Chang dimensions on hey wait wait wait wait why didn't pick the data from this one bronze customers why why

why why man why it should just pick the data from both the tables what's wrong uh

uh okay bronze customers okay I can see bronze customers I can see so why is not picking that one [Music]

uh this is my pen flow okay this is my target equals to this oh wait wait wait I removed no no no we

have to added so what was the issue then what was the issue I think issue is something else let me just try to debug

it h issue is something else because it didn't pick the so at the dt. append flow Target equals to this one

okay silver and for table H cannot have multiple queries named bronze cust

oh for silver aend Flo table additional queries on that table must be named oh I see I see I see I got it I got it I got it I got it I think so I got it so the

thing is I think we have not CH changed the name see DF bronze cust and then again we are using D DF bronze C so that is the issue and obviously we cannot

have the same name so this is DF bronze cust new yeah this is the issue that's why whenever you just uh duplicate the cell just make sure that you're changing

the names and yeah it's fine that's how you learn that's how you debor small small issues perfect this is the view that I wanted to see perfect now we can

even enhance this flow as well because this should also be the streaming table if you got the fundamentals because even if my streaming table is pulling incremental data but this is a view this

will again query the same stuff and it will just again add all the new records right so let me quickly add that I wanted to test your knowledge so that's why I didn't do it at that time so now

what I will do I will simply validate it again and this will eliminate the problem end to end eliminate

it okay this is amazing then okay this is nice so now I will simply run it and you will see let me

post run it let me post run it let me first run it don't worry we have not added any data in the source but still for the first run it will just

load the 14 records because the silver table is new so according to the fundamentals it should just write all the data in the initial run but in the

second run it should not pull any new data and I will just show you both the runs just to validate the information the statement that I'm saying currently right now so just hold on just hold on

just hold on we have yes I have one [Music]

sip hm perfect so now you will see that this particular table brought Zero Records why because we do not have

any new data but still I will be looking here 14 records why because this table is new bro I know that this table doesn't have any data new data but still

this table needs to have those records right and this table obviously will have 14 records as well but but but but but but when I will just run this pipeline

for one more time then I should not see any new records then I should not see and that is the concept of emation

that means exactly once our table should process the data exactly once not twice not Thrice just exactly

once okay this word is really important utation so you should just keep these technical keywords in your mind in your mind in your mind in your mind okay

simple simple simple perfect now this is the testing phase we should not see any new data in any of the tables in any of the tables I should not see

perfect and gold I should not see any data any any new records perfect perfect man perfect this is the magic of a pen flow this is the magic of upend flow now

our pipeline is ready to be deployed ready to be deployed so this is the power of a pen

flope now I will just talk about parameters parameters yeah parameters so how we can just work with parameters let's

say I have here uh obviously two tables okay makes sense two tables yeah what what if I

want to create two streaming tables based on the column based on any column so I will simply add one column let's say I will just add one column and I

want to split my end result into two streaming tables based on the value of that column based on the value of that parameter how we can do that I will just

show you I will just show you I will just show you so in order to create the parameters okay in order to work work with parameters so what we will be doing

first of all let's go on the top and just create or yeah should I just add parameters

yeah okay perfect so simply say parameters okay simply run it Hey whenever you just hit shift shift plus

enter it will just run the whole notebook so I just I had to stop it so parameters so in order to work with parameters so obviously we need to create a new DT pipeline because we need

to add some configurations or we can just edit this pipeline as well do not need to worry but it is not a big issue we can imagine in our head that we have a parameter so I will imagine in my mind

right now that I have a parameter and I will first of all import that parameter import that how I can just uh import

that I will simply say myw equals spark.

configuration do uh get not set it's get so now I can imagine any name of the parameter so I will simply say p names okay because I want to just pull the

names and let me just share you the scenario as well so what I'm doing if you know we have names in our table right we have names such as John Jane or

let's say Bob or Allies allies yeah John Jane analyze these three names are confirmed so what I'll be doing I will pull the names in this parameter okay this parameter I will create in DT don't

worry but you can imagine for now that we have this parameter and this parameter has these values such

as P names equals to uh John okay then

Jane then allice okay Al Le whatever you want to say so this is the value I will give the value to the parameter but you can imagine for now okay we imagined okay

nice nice good good good so what I will do once I have this value so this is in the form of string so first of all I will convert this these values in the in

the form of a list so I can how I can just do that so

myw uncore list equals to uh myw dot split based on comma first of

all so when I will do this when I will write this so it will create a list for me based on the delimiter bro come on these are the fundamentals of python so

it will just create the list for me once I have the list I good because now I can just use this list so what I will do let me just show you what I will do I will

simply go to my goal table okay and I will run a loop on the whole table I will simply say for I in

my Vore listor list perfect and obviously according to the indentation rule I have to use

tab perfect so now it will just run a loop perfect it will just run a loop so what I want to run like what I want to do when I want to run this Loop so obviously it will be running the loop on

top of every value so what I will say I will say name because I cannot use the static name I have to use dynamic name so I will use name here name

equals F string because I want to use variable in the name because I want to create a new table for every name we I'm taking three names for now so I want to

create three tables like I want n number of tables which are available in this list wow and I want unique name for that as well so what I will do I will simply

use FST string and this is like f then double quotes then I will simply say gold and then customers okay and then underscore and then I can

just simply use variable I perfect perfect done and this name will be nullified because we have given the name here perfect

then once I read this table I will make one filter as well so I will simply say DF equals DF do filter where name equals

to equals to I perfect so what this will do this will just return the DF data frame which have names equals to this

equal to this wow perfect let's do it man let's do it but before that it will not work because I have not put the parameter inside this so I will simply

go to uh Delta life tables and I will just try to edit it so go to here 3D Pipeline and then

settings and then here in the uh uh uh where's configuration where is configuration where is configuration just find configuration with

me where is configuration man source code add source code okay then cluster tags notification oh here

it is ADD configuration so add configuration this is my parameter name and it was I think P names and value equals to

Alice John Jane perfect save save save save save okay let's validate this let's let's let's validate

this click on start okay even if we have some errors I think maybe some naming issues so we can just correct it don't worry maybe the

parameter that I have used it was P name or P names I don't know I will see so I just run this obviously resource was active so we do not need to kill the

cluster and restart it okay we have an error very good let's see what's the error uh gold customers allies was not added to the catalog because its name is

invalid H oh I got it because it is using this double codes as well oh then what we

need to do we need to first of all remove these uh let's say double quotes or I can use en numerate

function wow en numerate function yeah I can use en numerate function as well this is like all python I'm not discussing anything fancy in Delta life table this is out of scope don't worry

this is pure python so what I can do I can simply run my pipeline where is the pipeline where is that notebook 3dt yeah here so what I will do I will just do a

simple stuff like I can just remove those uh double quotes that not a that's not a big issue but that will take some time and I do not want to do that so

what I will do I will simply say I and basically not here here J comma I in EN numerate

perfect so now I can use J and it should work fine so if you do not know about en numerate function this will give me the position of that particular I trated value and I think this should work fine

now because this was was the issue related to that double quat let's see let's see if we have any issues we will just fix it not a big deal man that's how you learn how to

debug it that's how you learn that's how you learn right oh it worked fine lovely an lamba see I can see the table name 012 this is my zeroth value this is my

oneth value this is my second value so one way of doing it was removing the double Cotes so if I just go here you can see this variable

will be having the list but in that list it will be also adding that double codes but in order to remove that we have to just use like some uh replace function

and all so I didn't want to do that so I just simply used enom function and enum function just gave me the position of that I treated value 012 and it added as

a suffix perfect man perfect so now you can see I can now click on start and this time this will create

Dynamic streaming tables dynamic dynamic streaming or Delta live tables

Delta live table perfect perfect perfect this is awesome man I know dynamic dldt uh streaming tables did you like it did you like it this was my personal

experiment okay so just click on start and this will just start your DT pipeline perfect perfect and this is done once it is completed this is done

now after this we will be learning slowly changing Dimensions slowly changing Dimensions the favorite topic of every data engineer slowly changing Dimensions because obviously at the end

we need to just create dimensions and fact tables so I love Delta live tables and slowly changing Dimensions like the way it manages slowly changing

Dimensions is one of the biggest reasons I love DT and you will see the reason why it is so so so good man it is so good you do not need to do anything

really trust me bro you do not need to do anything it will obviously take care of everything from scratch it will create like slowly changing Dimension type one type two automatically for you

automatically and you will love that so now as you are seeing that it is bringing Zero Records Zero Records but obviously these tables are new so I should see two two records each because

in The Source table we have two records each in each name oh maybe like uh it didn't pick any new data because of that particular name like equals to equals to this thing

equals to equals to I this one DF do name equals to equals to I so it didn't like remove the double quotes but in the data we do not have double

quotes So this is just like small issue you do not need to worry this is the main thing that you were supposed to learn that we can just create Dynamic and DT pipelines as well so so so so so

this is this is done this is done so now it's time to actually look at the slowly changing Dimensions so now it's time to cover slowly changing Dimension and we

have a new API for it it's called apply changes so similarly we had uh we just had discovered and not no we didn't discover so we just got to know like we

have an API for Union it's called apply it's called Uh aent flow so now we have apply changes for slowly changing Dimensions this is really really insane

I love this API and you will also love it once you know the powerful features of it so let me just show you the code and not just the code I will just show you show you everything that we need to

do so this is just basically the code that we will be referring when creating the slowly changing Dimension so now I think you would be like you will feel familiar with this kind of code because

now you know a lot of stuff but do not worry what are all these parameters what are all these things we will just discuss everything in detail so you do not need to worry at all so let's

quickly jump on to our source yes because for it we just need to prepare one new table which will be our source okay so let's quickly prepare one table

or let's use the existing code because we can just create a customers demm okay but I want to name it as Dem okay sir

sort it so I'll simply say raw customers dim okay simple sorted now I also want to attach one more column and it will be

let's say date column okay so I'll simply say date and the data type for it will be date and I will just save it in a new column so I simply say raw

customers dim okay sorted I will simply get the values for insert as well okay so meanwhile it is running so I can simply paste the code here and I

can just modify the values I will simply remove all these values and I will simply add the date so it will be let's say

2025 um 20 let's say 01 01 let's it for now okay let's say then just give me some suggestions data breaks please just fill some values

for me uh maybe it will be just helping me once it is done okay this is done so it has created the data now I will just insert some

data so why it is not giving me some suggestions why why why why why so I'll simply say Jane and salary would be let's say

60,000 email will be Jane at the rate gmail.com and then let's say 20251

01 okay now are you giving me some suggestions no it is not in a mood okay nobody I can simply copy the above values okay no worries and this time I

will simply say Bob and Bob perfect so this is my table okay so let me just insert this insert

into values uhuh so now it is inserting the values okay writing the Delta table migration wait what happened it is

saying mod schema why why why why a schema mismatch detected oh I see because it is just treating it treating it as a string but

not as a h oh I think I forgot to add them

so let's try one more time if it takes otherwise we can just use Python data frame to insert the values not a big deal but I think it should take okay so it

worked fine okay okay okay so let me just simply quickly query the data select ack from DT dot

catalog not DOT it's underscore okay then raw then dim okay let me just qu it you will get to

know why I have added this column date don't worry don't worry don't worry don't worry and I can even add one more column and

it's called action the action that we want to take okay because it will be very very very handy so what we can do we can insert one column or we can say

we can create a new column it's called with double M okay it will be easier and I can perform I can add one column it's called

action and it will be string okay perfect perfect perfect perfect and double m double m and then obviously

we need to add action it will be I action will be here I action here will be I don't worry you will understand everything why I'm doing

all this stuff okay let me just recreate this table let me just add the data to the table okay

perfect let me just query the table okay perfect perfect perfect perfect double M perfect perfect now you can see action

as well so this is my source table this is my source table now I want to create slowly changing Dimension type one slowly changing Dimension type two so I want to create both the things like two

different tables and I want to use this table as my source why I want to create both the tables like both the types so that you can actually feel the difference between two and you will see

you do not actually need to do anything so our source is ready okay so now let's terminate this cluster and let's create our Delta live tables okay create

pipeline by the way I just uh deleted the previous one so you can also delete that if you haven't okay so I simply say dim DT okay perfect dim

DT then I'll simply pick that dim uh by the way we do not have so we first need to create the notebook okay so just ignore this then go to workspace

and then DT tutorial then create one create a notebook perfect for

DT perfect perfect perfect so we do not need to attach anything so I will simply say percentage

MD slowly changing Dimensions okay slowly changing Dimensions oh

dimens Dimensions slowly changing dimensions that's it so now what we'll be doing uh I will create a view so this

time we will just create two layers one layer will be our silver layer and the second layer will be our gold layer why because we are treating our source as our bronze layer that is raw layer Okay

so so I'll simply say silver and this will be my streaming view okay perfect

so I will simply say at theate dt.

view DF okay and I will simply say silver customers perfect then I can simply say DF equals

spark. read

spark. read stream. table and table will be

stream. table and table will be my uh dldt catalog do raw dot

raw customers dim with dou M if you remember okay so I'll simply say return DF my streaming view is ready perfect

now now is the thing now we will simply say gold SCD type one okay STD type 1 so now we need to

create an STD type 1 table so the fundamental is same fundamental is we will create an empty streaming table first so let me first quickly create

that DT and we need to import DT import DLT okay then I will simply say dt.

create streaming table okay create streaming table and I will name it as dim customers type one okay make sense yeah

this is my empty table this is my empty table now I will use the apply changes function so in order to use that I can simply say dt. apply

changes this is my API now we need to give so many things first of all target as we all know what is the target Target is this

one very good now second thing is Source what is Source what is source source is this one silver customers we have just created it

right because this is our source okay then we need to give keys then we need to give keys by the way I can simply bring all the things so this

let me just zoom it so this is your code that you need to use so as you can see Target is done

source is done next thing is keys keys will be your let's say business key key will be your primary key on which it will decide whether it needs to insert

the data or it needs to update the data that is called upsert statement upsert let me just write it for you so if you don't know upsert is also known as or let's say type one STD type 1 is also

known as upsert upsert means update plus insert okay just keep this thing in your

mind so these are keys so what what is my primary key my primary key is is ID yes so I will simply say keys equals to

ID obviously if you have multiple keys that is also called composite primary key so you can simply write multiple keys but I have one so I will simply write one then I want to use sequence by

what is the sequence by what is the sequence by so now let me just tell you what is the sequence by let me just remove it first so the sequence by will be our order in which we decide to use

the value so let's talk about the fundamentals in slowly changing Dimension type one we always use the latest value we always use the latest

value so I will give the sequence by equals to column of date that's why I created that column that's why I created that column

column of date so whatever value will be the latest one according to the date column it will take that value simple very good then what is this Co accept

column list what is this what is this so basically let me first write it here let me just remove it here accept column list it is saying hey bro when you will

be creating your slowly changing Dimension table I am giving you the authority that you can pick the columns so in most of the cases we do not keep

the date column in the slowly changing Dimensions we just pick the columns we just pick the columns which have like contextual values related to the fact

table that's it we do not keep the date column that is coming from the source do not get confused it is not the date column of the slowly changing Dimension no it is not the create date or update

date no this is the date which is coming from the source this is this can be your order date this can be your customer dat can be your anything but we do not want

to keep it okay so I will simply remove it I will simply say I do not want date plus I also do not want one column called action it's called action by the

way what is that uh column let me just tell you so that particular column is this one apply as deletes apply as

deletes so in the source I have the option on which I can apply deletes as well so by default I gave the value as as I

let me just write it for you the column was action okay do not mind my handwriting so for now we have given action equals

to I so it will just insert the data that is just an abbreviation do not feel like it is a fixed value no but I want whenever I will add action equals to

D that means I want to delete that data just delete it just delete it so I will simply say accept column list it's is

fine then I will simply say apply as deletes apply as deletes equals to expression and my column name is action

equals to D equals to D not delete because I I will be using D so it is not a fixed value it is up to me then this is the thing that we need to do which type of slowly changing Dimension you

want to create either it is type one or type two I want to create type one so bro so bro your slowly changing Dimension is

created your slowly changing Dimension is created do you know do you know how long it takes to actually create a

slowly changing Dimension and it is just type one it is just type one do you want do you want to know how long it will take to create a slowly changing Dimension type

just Google it you have to write at least at least I think 50 to 60 lines of code to just create slowly changing Dimension type

one and it's not just about writing 50 to 60 lines of code bro it is about managing those errors managing those updates managing those inserts and I'm

just talking about slowly changing Dimension type one type two is way more difficult to manage because we need to take care of update date create date

expiry date effective date everything and do you know how we can create slowly changing Dimension type two in Delta live table let me just show you let me just show you bro so this is

your story changing slory changing Dimension type one let me just create type two let me just create empty table for type two okay uh first of all I can also write

the heading okay s uh gold

SCD type two okay let me just show you this is your type two let me just rename it

okay and and and and and and this is your slowly changing Dimension type two that's it that's it you have created

slowly changing Dimension type two can you imagine how long it takes and how hectic it is to actually create an manage

it bro that's why I love dld because of this feature like because of the overhead that it has removed to manage

slowly changing Dimensions amazing amazing so now it is ready okay so now I can simply say create DT for me I will

simply create a new pipeline don't worry we will validate this data as well we will perform multiple iterations so do not worry so I'll simply say DM DLT

okay then path then DT tutorial oops then for DLT then select then Unity catalog then catalog will be my same and

I will say dim schema okay perfect then Pi size and

one then okay this thing this thing perfect let's click on create so let me click on

start so this is really really really really cool and really powerful I love it so meanwhile it is starting our cluster so we can just wait yes so we

have one error and let me see what is that error sequence by column oh column is not defined oh answer a mistake so actually we didn't actually imported the

library to use the column object so we have to we have to we have to do that so we can simply import the code here like

from bpark do SQL dot functions import ax import all the functions okay so now I

can simply debug my pipeline from here as well because it is on okay and then I will simply say validate simply say validate validate

validate and they should validate the information y yeah yeah yeah yeah yeah this should validate the information so initializing let's see

let's see let's see let's see okay perfect so our two tables are validated I have not run that so now I will simply click on start so now it will just run my Pipeline and it will

actually create two different dimensions for me and I'm really really really happy I will just show you the validation and all everything do not

worry do not worry at all and we will just insert some new data as well so you will see the changes accordingly okay so now it is

initializing setting up table so I will simply run this pipeline for one more time just to show you that if we do not have any data in the source that means it will not do anything and it will just show Zero Records because both are

streaming tables and this is streaming View okay obviously initially it will just pull seven records each because we have I think seven records in the view

or three I guess yeah we have three not seven because we created the new table yeah yeah yeah yeah yeah yeah okay so now it is running it may

take a longer than the previous pipelines because obviously is now creating everything behind the scenes for you and trust me bro it is not an easy task to perform all those stuff

behind the scenes that it is performing right now so as you can see 3 three now I'll simply click on start for one more time and I should not see any records this time I should just see 0 0 that's

it that's it that's it that's it that's it that's it that's it what do we have after this we have one most the

most important function or let's say API for it and it's called expectations and it is basically used to apply the quality checks on top of your Delta life table so after discussing the slowly

changing Dimension we will be discussing expectations as well in detail so this is running the pipeline and perfect perfect perfect perfect this time you are seeing Zero

Records Zero Records Zero Records Zero Records and let me just show you that dim now so I will simply I will simply go to Delta live

tables click on it I'll simply put this on production so that it can just uh close the cluster turn off the cluster then I'll simply go to my notebook okay

dem4 DLT you can just open any notebook because I just need to query the data so I will simply say turn on this cluster that is allp purpose cluster so now it is turning on

my All Purpose cluster and I will simply write here select Ax from what was the schema

name inside DT it's called DM schema okay dtore catalog okay dot demm

schema Dot customer type 1 okay and in the second code I will simply copy this and we'll paste it here and I will

simply say two perfect perfect so once it is turned on I will simply run these two commands and you will see your Dimensions ready without even doing anything without even doing anything trust me bro

what like we haven't done anything right so what we'll be doing after this we will insert some new data as well in the table and then we will see the behavior

of it okay and and and we also need to discuss expectations yes so let's wait once it is turned on so now our cluster is on let me just run this let me just

run both the commands because I want to see my table L you talked a lot about it creates Dimension like a pro it creates that it creates this let me just show

you bro let me just show you so so so so so by the way we will just make some um additions as well in the data so that you can just validate

these Dimension are working exactly fine and we will just test all the things like deletion and expiry date effective date everything by the way in this

slowly changing Dimension type two it will create two additional columns which will be your start date and end date and if you are familiar with slowly changing Dimension type two you should know what are these two columns and this is the

only thing that is the most hectic thing in slowly changing Dimension type two so let's see how it has managed everything okay so it is running slowly

changing Dimension type one it is fine wow wow wow wow wow wow what's so wow in this what's so wow in this because this is the initial run

there's nothing but in the second run it didn't mess up with any data so it upserts the data it updated all the data but in the story changing Dimension type

two this is the thing wow as you can see this is my start date end date is null because these all are in use right now so these are like

fundamentals of storing Dimension type two you should know all these now what we will be doing we will simply go to first uh source and I will just make

some some tweaks in the data to just test my Dimensions to test the DT slowly changing Dimensions okay I'm just so now

in order to do that what I will do H obviously we are just inserting new data okay and this time uh I will just remove

these two records okay so in Dimensions we expect that we will get new data obviously and we expect that we will also get some old data which will be

updated and some data that can be deleted so what I will do I will only remove this row so I want this second

record should be deleted so I will make this thing as D okay and this third uh customer name I want to change it from

Bob to Tom so this is the test for my updates and now I want one more thing uh

huh so for this if you see we also want to change the date because this is coming on the second day okay okay because we have just given the

sequence by column then I will add one more record which will say four and let me just copy

this and let me just give Jerry okay sorted so now what it will do it

will first of all delete this record from my Dimension it should delete then it will update this record because name is changed and it will

simply add this record so do you know actually operations will be so so so different in both the dimensions because in slowly changing Dimension type two we

keep the history as well so what it will do it will not delete the data it will simply give the end date as our this date don't worry I'll just show you

first of all let me just insert this data to my table okay let me just insert this

data H it is running it is running it is running it is running so it will just push some data so finally it is done so now what I will

be doing I will simply go to my Delta live table first of all I will just terminate this because I do not need this I will simply go to my DT and I

will simply run it in development mode or production mode is also fine because we do not need to make any changes we just need to test our pipeline so now it will just create the resources

obviously and once it is done I will show you the changes it is it is so so so good so as you can see this pipeline is completed and I am really excited to

just show you something so if you just see if you just notice it has inserted two records why two records because one was

updated and one was new obviously and one is deleted as well perfect but in this case it just upsert four records it didn't delete anything it didn't delete

anything because it will not delete anything it will not delete anything so now I will just show you the output like what exactly it has done to our tables

so for that what I can do I can simply switch it to production okay and then I will simply go to my recent Tab and then I can simply query the tables again to

just show you what it has done to our dimensions and that is the power of the this table bro and I think we just did it in four yeah dt4 I guess yeah there

so we here we have queries okay so let me just start the cluster and then I can simply say start once it is turned on then I will just

show you what it has done to our slowly CH dimensions and that will be the end of this particular session and then we will be just learning about expectations in Delta life tables so meanwhile I can

just show you as well expectations or expect all same thing man okay it is called manage data quality with expectations it is like the constraints

let me just tell you don't worry don't worry don't worry so this is a kind of diagram that they have built so if you just try to understand it it is basically saying

this is your raw data okay this is your expectation expectation means you are expecting that this table will be following this set of rules simple right

simple then if it it pass if it passes the expectation then you will keep that record if it fails you have actually

three options this diagram is really really good one is warning what is that what is that so warning will

actually will not do anything warning will just give you the warning and it will say hey bro your table is not

following all the rules your few records are not following the rules I'm giving you the warning but I will insert these

records and I will add these records to the main but this will just throw the warning perfect then another thing is it

will drop all the records which will not be following the expectation or the rules but it will add rest of the

records sort it then this is the dangerous one man this will say hey bro if even your single record is not

following the law it is not following the rules I will break the pipeline simple so these are basically three options that we get if the expectation

is not met so we will be covering this feature right now after the slowly changing Dimension what it once it is ready we will just figure out how we can just work with this as well okay okay

okay okay sorted sort it sorted sorted very good very very very very very good and this is our last topic of today's video that is expectation and this is the best feature I would say because you

can actually work with the quality checks these are like constraints that you can add on your tables okay perfect perfect it is very simple trust me it is very simple so I can just show you the

code as well so what you need to do after writing dt. table and before writing your function you just need to add

this what is this this is basically a rule this is basically a rule this has obviously provided the rule directly here but I would suggest you to create a

dictionary don't worry I'll will just tell you how you can just create a dictionary because in that particular dictionary you can just Define multiple rules and then you can just Define like

pass the whole dictionary to this to this to this expectation okay okay okay okay

so this is the thing by the way we have this is like just using one single thing that is expect by the way we have dt.

expect and then we have expect and then fail or War warn is like uh by default if I I will just show you don't worry don't worry don't worry so I just wanted

to tell you that we have expect and expect all if we just have one rule then we write expect otherwise we'll write expect all okay so don't worry let me

just check my dt4 pipeline okay this is ready let me just run these two commands let me just show you how Delta life table slowly change Dimensions look like

and I'm really excited to see those tables see those tables okay H quick quick quick

okay okay okay perfect perfect perfect so so so so so so first of all it has removed ID number

two because I passed the operation as D then it has also updated the name to Tom from

Bob love you test passed slowly changing Dimension type one test passed now this is the main thing bro okay what exactly happened let me

just tell you nothing special just following the fundamentals of slowly changing Dimension type two first of all these two records are fine because we

didn't change anything then we said change the name from Bob to Tom so as per the fundamentals of slowly changing

Dimension type two what it will do it will say the start date of this name is this and end dat is this end date is

this why because it is Chang changed now this is its new start date and end it is obviously null because it is not ending

then when I say remove this ID why it didn't remove this ID because in slowly changing Dimension type two we do not delete anything but here we have added

the end date that means this particular ID will not be referred by any table because end date is fixed sorted just check on the the

Google and see and find how difficult it it is it to just actually create and manage the slowly changing dimensions and then you will say thanks to data

Brees okay so our dimensions are ready and we are all set to now work with the expectations okay so in order to work with expectations we already have a

table and we will be using table called as this one um let me just show you which is One

Source this one raw customer so we'll be using existing table okay so now without wasting any time let's terminate this

and let's oops not start just terminate it and let's create our new notebook which is our fifth one and which is our last notebook and in which we will be learning about expectations so I will

simply create a new notebook I will simply say expectations which are like data quality checks okay okay sir so I will simply write

percentage MD uh expectations or data quality checks data quality checks okay so

now let's let's let's do that so first of all I will simply write percentage

MD okay and I I will simply say bronze okay bronze streaming table or let's say bronze then hyphen streaming

table okay perfect and now I will be writing rule for my bronze table okay and I will simply say bronze

rules and I will now create a dictionary in dictionary concept is very simple key value pairs you need to Define so first of all I will say rule one and this is my name like I can give any key like

rule one or let's say rule XYZ whatever you want to keep then the rule itself now I want to pick any column on which I want to apply the constraints let's say

in the bronze I know that I have the ID so I should mention ID is not null ID is not null because if it is null

otherwise I will just Define like what it should do otherwise but this is my rule number one rule number two is let's say I have I

think salary as well okay salary um greater than 50,000 okay okay these are my like two

rules for my bronze tables okay for my bronze table not bronze table that not bronze table it's just like bronze table

okay let me just add a cell then I will just create create my streaming table on top of the source like normal way that we create it okay so I will simply say

atate d. table okay and this time I will

atate d. table okay and this time I will simply say DF bronze uh

customers okay then I will simply read the data spark.

read stream. table and then oops

stream. table and then oops dldt catalog. raw dot customers raw

dldt catalog. raw dot customers raw customers perfect return DF but this time I also want to Define

at theate DT dot expect see it has already suggested me expect all so now if you just check the documentation it has used

expect where it is where it is where it is where it is I think here expect why because it is using only one only only only one condition only one rule if we

have more than one rule then we use expect all otherwise we use expect don't worry I'll just show you expect as well so this is my expect all function so an

you just told us that we have three features we have three features one drop and fail yes so where we will be

defining this one so when we do not Define anything when we just write expect all it is already taking warn as default otherwise I would be defining it

like this expect all or drop or drop otherwise or fail like this okay so if you just check the

documentation as well it should just show if I can search drop yeah see expect or drop expect or

fail and by default it is warn it is written here default okay simple so let me just change the name as well uh

expectation perfect so now what are the rules rules are these and what is this Library uh sorry what is the dictionary this is one this is the one perfect so

now when I'll be running this table if this table which table bronze customers will not be following these rules then it will warn me it will warn

me and we know that we have one record I think many records which have like I think at least one record which has salary less than 50,000 or let's say equal to 50,000 so at that time it will

just send me the warning it will just send me the warning very good once we have bronze then we will simply create the

silver table okay okay I'll simply say

silver then I can simply write one uh more more more one more uh Rule and here what I will

say let me just create a new one instead of copying it so silver rules

okay so here I will simply say rule one and here I can simply say ID is not null that makes sense but this time I

want to make it strict how I will simply say first of all I will simply rename it silver customers and expect all we will simply simply write write expect because

this is only one rule then I will simply say expect or uh drop so it will simply drop that

particular uh row and I want to show you as well so what I will do I will change the rule and I will simply say salary greater

than 50,000 so it will just drop that record which will be having salary equals 50,000 let's let's let's do that and we will simply say

silver perfect and here we need to use live uh live Dot bronze customers perfect perfect perfect

perfect perfect now we will create our final gold one gold streaming table this is also streaming table okay then let's define a rule and

this should be a very strict rule this should be very strict one okay so what we can do uh let's let's fail our not fail what

I will do I will just make it lenient and then I will just uh create rule more strict so for now for by the way you can have one question that can we just Define only one rule no you can Define

multiple rules so you can simply write it here like multiple times at theate DLT so on so on so you can just add on so many constraints okay so I'll simply pick this

one I will simply copy it here and for that I will simply create gold rules okay then rule one and I will simply say ID is not

null ID is not null okay so I will simply say gold rules uh one so I will just show you you

can just create multiple rules like multiple dictionaries as well so gold rules to I will I will just first run this notebook without multiple rules then I will just add one strict Rule and

I will intentionally fail my pipeline just to show you that just to show you all the options okay because so far we have just covered two options so in this what we'll be doing we will simply say

expect or drop that's it and this is gold gold rules yeah gold rules Perfect

by the way this should be fail okay because this is a gold layer so un be strict okay sir so I think this is fine this is fine live. silver yeah

everything is fine so let's create our pipeline so simply go to Delta live pipelines Delta life tables and now you can just simply delete it the existing

one create pipeline then I will simply say expectations and then expectations

select by the way if you love Delta live tables and if you love me as a teacher as a bro then simply

write down in the comments that love the video or love the way datab developed DT simply just use word love right now

right now right now right now for me for datab for dld so for all of us okay so Target schema will be I'll

simply create a new one expectations schema okay Chima just just name it like this bro our schema we are the owners then I will simply say fix size

and then I will simply say one then I will simply say V2 V2 create so now it will just create it and I will simply

say start so I will just wait until it is turned on and once it is on then we can simply do it by the way I want to show you one more thing after this which

is very special thing and this is related to autoloader and it is really important so after confirming this thing we will create one more thing we will just uh first of all don't worry we will

just validate everything in this then we will create something new and it is called autoloader it is not new but we need to know how we can use autoloader and why we need to use autoloader in

dldt by the way this was not planned this was just came in mind so I thought like I have to have to have to tell you this because it is really important it is really important

okay so let me have some water till the time it is turned on so I will just be back in few minutes so our DT cluster is

turned on so by the way there's a small fix if you are using so I have just made the fix just to show you both the cases so if you remember we were using

expect here instead of expect all so the thing is if you want to use a dictionary then you have to have to use expect all okay now it it used to take this but now

it is asking me to just have to use expect all and if you do not want to use expect all by the way I think it's a good thing because it promotes standability standability

standardization so it promotes standardization because you do not need to follow two different type of syntaxes so you just need to follow the same syntax if you have like one rule or like

multiple rules I love this feature now because if they have implemented this this is nice but even if if you intentionally want to use expect I don't know why but if still then there's not a

big deal instead of passing the dictionary you can hardcode the only one rule because you just have one rule the same way they have done in this documentation and uh in documentation as

well but I personally prefer expect all because it promotes standardization and now I'm even more happy because we do not need to follow expect expect all two different things okay let me just click on

validate and let me see if we have everything fine okay okay okay okay and now I want to show you as as

well so that thing will be validated when I will be running this so I have just clicked on start because I want to show you that see this is the warning

see this is the uh let's say drop data is Dro this is the failure so I want to show you that and for that we can even

go to DT live tables as well uh we can simply click on this just to see the data here so it is running now let's see what it

brings okay let's see let's see let's see okay perfect so now it is running let me just zoom out a little bit so this is my bronze customers and if you

remember we have applied a warning for those records in which we have ID is null and Records which have less than 50,000 or equals 50,000 so I should see

one warning if I just click on this if I just scroll it down I should see the logs wait let me just complete let me just wait until it gets

completed okay okay okay by the way I got I got it I got it so this is very nice so this is completed as you can see we have seven records it didn't drop

anything because we used warning so even if our records had salary equals 50,000 it said okay but this is a warning but

in my other expectation I specifically mentioned that if it is not following the rule just drop the record so if you can see one record is dropped and similarly only six records actually

entered in the gold customers that is that is very nice now how I can just see the warning just click on this and just see the warning by the way the this is

not the warning for this so I think it should be somewhere here this one and here it will show me the warning they just they they love

changing the places of these things so let me just find it let me let me just find that warning just give me a

sec I think it's not here wait wait wait let me go back to the Delta pipeline I think it used to come in the right hand side so these are like things startat so

I can simply click on it so let me just see the stats and here you will see the warning most

probably uh uh uh data quality and yeah I think this should be here so yes as you can see it

is writing here failed records it is saying failed records it is not saying that it is dropped it is showing you the

warning that this particular rule was broken but action was allow I allowed it but this is the percentage of records

that is 14.3 which didn't follow the rule so failed records was one okay and it has also shown us the proof that it has not dropped anything it has written

all the records but take care of these things from the next time okay so simply click on this and then simply click on stats and data quality earlier it used to just be come

on the right hand side whenever we used to click on this so they just they they love playing with the UI so they have just added a little button here for the stats so now you know now I want to

intentionally break my pipeline why just to show you so if I just go to my re uh recents okay then if I just go down

and if I just want to add another rule this rule is a harsh Rule and what is that rule so I will simply say rule

one rule one okay and I will simply say salary okay greater than 50,000 if it is not

greater than 50,000 then I will just tell you what will happen I'll will simply say atate DT do expect all because I love using expect

all so or fail okay then I will simply say gold rules to perfect this time I

will simply click on start and this time it will fail the data because one record is having a salary equals 50,000 so it should just it should just fail the record fail the pipeline so I will

simply click on Delta live tables Pipeline and I should see pipeline line failing I should see that okay okay okay let's see let's see let's

see perfect so for the first time we are happy with the error it has given cannot redefine

expectation rule one for gold customers okay so it is saying that it has not followed the rule so that's why it has failed the pipeline lovely lovely

lovely lovely lovely lovely lovely okay so this this was all about expectations okay so let me just remove that rule that is a harsh rle by the way

we do not need to worry about that one okay I hope now you know everything in expectations as well so now yes I

know I want to cover stream uh not streaming like uh autoloader as well so what we will be doing using autoloader what we will be doing

so why first of all why we need to use autol loader okay why so if so far we are using Delta table as

our source Delta table as our source in most of the scenarios we'll be using Delta table as our source in the real world as well but still like 0.1% if your source is not in the Delta

format if it is in let's say CSV format Json format pocket format any other format so in that particular scenario you cannot run streaming you have to you can run

streaming but in a different way using autoloader instead of using Delta Lake streaming so so far we were using Delta Lake streaming to actually pull the data but this time you will not be using

Delta Lake streaming you will be using autoloader okay so for that I first need to obviously oh man but still it's worth it

so let's quickly uh turn it off okay so now why I just turned it off because I will be creating a new workspace not new

workspace like just a new code because I want to add some files and instead of creating a table I will simply create a volume where I can just dump I can I I

do not even need to just create the volume but still like volumes are easier and you would also know how to create volumes so I'll simply go to dld tutorial and I will simply say source

and this time I will simply say hey just turn on my cler confirm confirm confirm perfect perfect perfect perfect so in

order to create volume by the way volumes are the unstructured or semi-structured data that you can govern under Unity catalog so that's a cool feature in uh in volume that we have in

data breaks so it is very simple you simply need to say create volume and then volume name and volume name will be exactly same as like the way you define

the table name it's like DT catalog. raw dot let's say my

catalog. raw dot let's say my volume okay and this will be my managed volume but I want to create an external volume so I will simply

say external volume then I need to define the location okay and location will be my volume perfect so it will create a

volume at this location then whenever I will be dropping my files within this folder within this folder that will be governed by the volume okay that will be governed by the unity catalog as well so

that is the power of volume so we will be using this volume as our source for auto loader okay got it got it got it got

it got it got it good good good good good good by the way there's one hectic thing in

CSV files CSV files do not actually carry any kind of of schema no they do not so obviously you need to explicitly Define the schema and in order to define

the schema in autoloader you have to use schema hints which allow us to just provide the schema otherwise it just infers the schema but do not need to worry we are just uh covering autol

order as our source so you do not need to worry about schema and all do not do not do not worry at all so what I will

do I will simply go here okay and then obviously for now we do not have any anything here created uh like regarding volume but it

will be created now once the cluster is turned on then it will be creating one folder here with the name as my volume then I will dump one CSV file inside

this that I have right now with me and I will upload that CSV file in my GitHub repository you can actually download that from there and you can even upload

any CSC f because this is not related to dldt only you can even use any CSC file okay because we are not building anything on top of that you can use any CSC file even if you have like any CSC

file of from my previous videos you can use that as well and I'm also thinking to download the CSC file from my GitHub repository so I can simply go to my existing yeah yeah I can

simply go to my existing oh by the way this cluster is turned on so let me just run this so I'll simply type

GitHub and then my repositories then I will simply pick any file let's say I want to pick um which one should we pick let's

say uh uh uh fabric data engineering H what achievement is this man Star Struck created a repository that has many stars oh wow than thank

you man H I didn't notice it thanks thanks thanks thanks so I'll simply pick any file let's say fabric

tutorial and if I have any file under data I will simply download it yeah I have many files so I will simply pick let's say

um yeah sales. CSV like sales 2016. CSV

so you can also download this and you should have this link from my previous videos and if you do not have I will say simply say copy this URL so yeah I'm just zooming it a little bit you can

simply say an lamba get and then it will just take you to my GitHub and then you can just simply download it okay simple so this is completed let me just refresh it

perfect now I have my volume and I can even show you here as well if you just refresh it you should see that

volume just refresh it just just refresh it uh wait wait wait oh let me just scroll

it it is hanged it is fine okay so schema is raw then you can see volumes now I have volume with the name my volume see obviously we do not have anything inside this so we can drop it

so I can simply go here and I can simply dump my file quickly so I have uploaded this file okay I will simply click on upload and this is uploaded so now now

now now my volume is ready let me just refresh it uh let me just refresh it now you should see yes my file is here my file

is here okay perfect now I can use this as my source in my dld pipeline so I will

simply go there and I will simply create this as my source in autoloader and you will see that we can actually use streaming table in the form of

autoloader as well instead of just Delta L streaming T by the way trust me in 99% of the cases you will be using Delta L as a source but still you should know how to use autoloader as well when you'll be working with DT okay because

it doesn't make any sense you are working at Delta live tables and you're using pocket as a source boom boom but still you can get

any requirement bro you can get any requirement that isn't not not in your hand so simply terminate this how many times I have terminated this cluster man

just for you just just for you just hit the Subscribe button right now right now okay then what I will do

what I will do what I will do I'll simply go to Delta live tables uh not really I'll simply go to workspace and then I will simply say create a new

notebook yes create a new notebook and I will simply say autoloader perfect bro perfect

perfect then I will simply say uh autol loer okay perfect autol loer then I will

simply add a table and I will add a autoloader as a source simple simple simple simple simple simple process is exactly same we just need to change the

DF reading statement that's it that it so I'll simply say dt. table because I'm creating a streaming table then DF and I

will simply say bronze T bronze customers okay then we just need to define the autol here so if you are familiar with auto loer you simply need

to say DF equals to spark. read stream

yes because it also use streaming then we need to Simply say dot format equals to Cloud

okay then it will say options Cloud file format yes we know Cloud files. header

equals to True by the way header doesn't matter much so it's fine then I will simply say dot load hold on baby we are

not loading directly so what we can do now we have successfully given the cloud files format that is CS S V but we still need to say do

options do option basically not options and I will simply say options because like we have multiple statements so what I will do I will simply say cloud

files do schema location dot schema location and then I need to pick schema location and what will be my schema location I can simply

pick my location from volume okay there's a complex syntax to use volume and I can just show you as well here as well so it says like volume then

blah blah blah so I can simply say insert this or insert this perfect so this is the

location by the way why it is complex because the thing is this is constant then your C catalog name then your schema name then your volume name then

if you have let's say more files within it let's say you have more folders within it then you have to define the folder as well

because I can just hierarchically store as much folders as I can so that's a good thing about volumes so now here I can just create

one folder as well oh I think I just need to create folder oh that's not a big deal that is a big deal because that will act as a parent

folder uh okay so not a big deal I will simply say schema okay and then by the way where is s where is your s okay it's fine so what

we need to do I think we need to make some change because this is your file and this will just read this particular folder and inside this folder I want to create one more folder so I have to save

this file inside a directory so I'll simply say data okay then I can just upload this data in this yeah now I can just upload this

file again so this file is uploaded let me just click on upload okay perfect now I will go to my volume I will just delete this and I will create one more directory for

schema perfect schema location schema okay okay okay perfect perfect perfect so now I have this folder

available okay okay by the way I don't like this options so out files hey who where is your files man okay so let me just

remove I don't like using options I will simply write option oh it just removed that one so okay not a big deal so I will simply

write iners schema true or header let's say header header header header because it automatically infers the schema so we do not need to

exp say it okay header equals to true then everything is fine now we just need to say load from where it will be loading the data and do

option I think it is exactly same like this one not exactly same but yeah we just need to change the folder name from

schema to data perfect perfect perfect perfect so this is my source and on top of it I will just create one view or anything so I'll simply say at theate

dt.

view okay and then I will simply say DF silver customers DF equals

spark. read.

spark. read.

format and then live dot bronze customers perfect return DF perfect perfect per let me just turn

on my Delta life tables let me just delete this expectations okay perfect perfect perfect just deleted man what's wrong

with you yeah deleted create Pipeline and I'll simply say autol loader and this will be my

path okay Unity catalog D calog and oh man I'm tired fix SI fix SI fix SI

okay then V2 V2 oh bro I know that it is not available simply click on create then

click on start so now let's wait for our resources and let's validate this autoloader together and once it is done byebye don't worry we will just

validate it together and then it's all done we will pack up and trust me I know now you have like end to end knowledge of Delta life table end to end and I

will encourage you to just play with it a lot a lot a lot so you will gain more and more knowledge okay it is new it is new and you can see like even Unity

catalog is in preview and I think for now it they do not have any support for external streaming tables for sources so we can only create manage tables till

now so there are like so many developments that are coming uh in the near future so you can just play meanwhile and you can just take advantage of DT and you can just build

your H projects based on this and do not worry in my upcoming projects as well I will just try to invol dldt pipelines as well whenever we will be creating star schema or like dealing with facts and

dimensions so you do not need to worry we will just do a lot of lot of lot of stuff so for that just hit the Subscribe button right now right now right now

right now have an error let's check that let's check that option Cloud files. format equals

equals to CSV who put equals to sign bro who put it who put equals to sign bro I didn't write it I just clicked on

uh Auto suggestion by data breaks do not trust do not trust that who write equals to Bro an option oh man oh man oh man let me just connect it

here let me just click on validate again I was like equals to in CSV in format oh man oh man oh

man I am really really hungry what should I eat today what what do you want to eat on sha

oh you can eat water um really hungry I'm hungry and this is this guy is throwing errors I don't know why why

why why what do you want man what do you want what is the issue what is the issue man oh we have multiple

errors fail to resolve flow bronze customers what do you want H let me just click on

it cloud files is not defined okay okay H what is the next

error silver query defined in a function silver contains returned DT it must either a spark or data frame okay

in silver customers okay return DF spark. oh man

H I got it see what I have written spark. read. format instead of table

spark. read. format instead of table spark. read dot format table wow wow W

spark. read dot format table wow wow W an L eat something eat eat eat something oh man okay what is the third error by the way other two errors are not mine

this is created by databig it has just given given me the wrong code instead of just applying double quotes it has not put anything there so I will simply click on

validate for one more time and let's see what we get what what what we get H one more error what's that man

data frame readers option required one value which one and where and what is the second option that is [Music]

wrong oh really dat stream reader option uh okay and it is near to bronze I

guess why did I just allow see see see see instead of option I have instead of load I have just written option oh man oh man oh

man it's fine it's fine ignore ignore ignore silly mistakes do not take stress so it is now validating and show

results show us the errors what is the error now man query defin in function bronze customer return nonone type it must either return a

spark or coess data frame okay okay who will write return

bro oh I'm not kidding this particular query was created by data bricks it was giving me suggestion so this is a mess and I will not allow it again see now it ran

sucessfully because I have corrected everything corrected everything so finally thank you so much datab brakes what what are you doing man what

are you doing data break giving me false statements so now this is done okay so obviously we have not run it I can just simply click on start and it will just

run so now we can actually load the data using volumes it can be tables as well but obviously for that you first need to create the table on top of a CSU file

which is itself a task okay so I just created volume and in volume we do not need to create in table but still we can just query the table or query the data inside that unstructured or

semistructured data which is a CSV and then we used that particular source and this time we

leveraged the file format which was other than Delta okay it can be CSV it can be paret it can be Json it can be anything it can be orc so it can be any

other file format other than Delta okay so now this is the code that we use for autoloader every time obviously we do not need to worry about writing because we are not writing this data anywhere so

that's why I didn't write start stream or something like that we just need to read this data in the form of a of an autoloader okay so obviously everything is same this is the cloud file schema

location obviously we have to have to have to provide schema location in case of Delta Lake we do not need to provide why because it creates the Delta log for schema location we do not need to worry

about that okay then obviously this is load from where we it needs to read the data and it is the volume so finally finally finally thank you so much data breaks our pipeline is completed as you

can see that we have success Yul loaded data using autoloaders so what are your next steps right now so first of all you have like two videos that you should watch first is for Unity catalog if you

want to learn Unity catalog in detail and if you want to know everything in the world of Delta L which is the backbone of modern data engineering Solutions and architecture you should watch the video coming on the screen and

I will see you there

Loading...

Loading video analysis...