LongCut logo

Learn Databricks in Under 2 Hours

By Alex The Analyst

Summary

## Key takeaways - **Databricks Free Edition Signup**: Sign up for Databricks free edition using just your email, no credit card required, and get immediate access to serverless compute, SQL warehouses, notebooks, and all AI tools completely free. [02:14], [11:08] - **Query CSV Files Directly**: You can query CSV files directly using their file path with CSV. notation without converting them to tables first, making it one of the easiest file formats to work with in Databricks. [13:43], [14:17] - **Connect External Data Sources**: Use Databricks connectors to easily sync data from Google Drive, Salesforce, Google Analytics and more, automatically adding a sync timestamp column to track when data was ingested. [17:14], [19:26] - **Dual SQL Editor and Notebooks**: Use SQL editor for simple querying and saving queries, but switch to notebooks when you need to mix SQL, Python, R, or Scala with markdown notes and share comprehensive analysis with your team. [20:54], [21:08] - **Genie AI Natural Language Queries**: Genie lets business users ask natural language questions about their data like 'What zip code has most pickups?' and automatically generates SQL queries, visualizations, and narrative insights. [06:58], [47:36] - **AI Assistant Fixes Code Errors**: The AI assistant diagnoses errors like thousands separators in numeric columns, suggests fixes like using REPLACE before CAST, and can convert code between SQL and Python or add comments automatically. [52:02], [01:14:07]

Topics Covered

  • Full Video

Full Transcript

What's going on everybody? Welcome back

to another video. Today we're going to be learning data bricks in under two hours.

Now, for the past 5 weeks, we've been diving into data bricks. And in this video, we're just putting all that together so you can follow along really easily. We're going to start by taking a

easily. We're going to start by taking a quick walkthrough of data bricks and seeing everything that it has to offer.

Then, we're going to start importing data not just from a flat file, but also connecting to a data source. Then we're

going to start using their SQL editor as well as their notebooks and visualizing data. After this, we'll be using their

data. After this, we'll be using their AI tools like Genie and their AI assistant. And finally, we'll be

assistant. And finally, we'll be building a full project at the end. The

best thing about all this is it is completely free. Data Bricks has

completely free. Data Bricks has something called the data bicks free edition. I will leave a link in the

edition. I will leave a link in the description. You can make an account by

description. You can make an account by just using your email. That is it. So,

be sure to go and create your free account so you can follow along with this entire video because we have a lot of things that we are going to cover.

With all that being said, I hope you learn a ton because data bricks is a fantastic platform. So let's jump right

fantastic platform. So let's jump right into it. Now data bicks is an amazing

into it. Now data bicks is an amazing platform. I've been using it for many

platform. I've been using it for many years and it's built on top of Apache Spark which means it is very good at handling large amounts of data. It's

designed for entire teams of data engineers, data analysts, data scientists to work with data collaboratively. So that means ingesting

collaboratively. So that means ingesting the data, analyzing the data, and visualizing data all in one place. In

this lesson, we're going to be diving into data bricks and look at some of the core functionality and features that data bricks free edition has to offer.

and then in the next several lessons, we're going to get really hands-on and start digging into these features and start building things out. With that

being said, let's jump on my screen and get started. Before we jump into the

get started. Before we jump into the platform itself, I just want to show you that this is where you sign up. So, I'm

going to have this link down below so you can just click on it. You can create an account and you can sign in. There is

no credit card information that they take. You just literally sign in and you

take. You just literally sign in and you are good to go. But on this page, you can sign up for the free edition or login, which we'll log in in just a second. But you can also learn a lot

second. But you can also learn a lot more about data bicks free edition and everything that they have to offer. You

can see that they have built-in AI agents and they have AI builtin and that is something that we're going to cover in this series. You can also visualize your data and create full dashboards.

And of course you can interact with your data with Python and SQL either through notebooks or you know through a code editor. Let's go back up and let's go

editor. Let's go back up and let's go and sign up for the free edition. All we

have to do is log in. You can do that with Google, with a Microsoft account, or just an email. I'm going to sign in with my Google account. And all I have to do is say where I'm from. So, this is Alex. We have the free edition, and I'm

Alex. We have the free edition, and I'm from the United States. Let's go ahead and click continue. And just like that, we are signed up for the Data Bricks free edition. It was about as seamless

free edition. It was about as seamless as it can possibly be. Now, there's a lot to cover. Data Bricks has so many different features. They have so many

different features. They have so many different things that you can do from building interactive dashboards to writing SQL queries and sharing that with your team and being able to work with your team. You can even build AI agents. There is a lot of things that

agents. There is a lot of things that you can do and some of this might be a little bit intimidating if you've never used a platform like this before. If

you're used to just working something like SQL or R or Tableau, it's a little bit more advanced than that. So, there's

a lot of things that they have that are all combined into one place. And so,

we're going to walk through a lot of these things that you see on this lefth hand side to see exactly what you can do in data bricks. Now, I'm going to start at the very top because this is a workspace. And a workspace is basically

workspace. And a workspace is basically the place that you work and you can collaborate in as well. So you can give your teammates access to your workspace.

So if you have data in here, if you have code in here, if you have visualizations, whatever it is, you can share that with them and they can get access to all of your work. Next, let's

go to catalog. You can think of catalog as like a schema. So if you have a schema in a database, you're going to have all of your tables and your views and your store procedures and everything underneath it. And this is very similar.

underneath it. And this is very similar.

So if I go into a workspace or if I go down here into the samples, we have different databases. And so we can click

different databases. And so we can click into these databases and we can look at all these different tables. And so this is where you're going to have access to view all of your data and files and tables within data bricks. Next, let's

take a look at jobs and pipelines. This

is where we start getting into some automation. So here we have an ingestion

automation. So here we have an ingestion pipeline, an ETL pipeline, and a job.

And each of these does a slightly different thing. If you create an

different thing. If you create an ingestion pipeline, that's going to be like the extraction of the ETL process.

You're extracting data to bring it in.

Then we have our ETL pipeline, which is the entire process of transforming it, loading, and actually putting it into a table within data bricks. And then we have our jobs. So that's going to orchestrate it. We're going to be able

orchestrate it. We're going to be able to say, here's when we actually run these pipelines so that we can time it.

We can do it, you know, daily, weekly, monthly, whenever we want to run these pipelines. So we don't have any created,

pipelines. So we don't have any created, but when we do, they'll be down here at the bottom. Next, let's look at compute.

the bottom. Next, let's look at compute.

Now, compute is very simple in the free edition because you don't have any options here. We just have our

options here. We just have our serverless starter warehouse. This is

what you get for free with the free edition. Now, if you go to the full data

edition. Now, if you go to the full data bricks product, then you're going to be able to kind of customize your compute to your needs. Before this free edition, we have uh the owner, we have the size,

which is a 2x small, and then we have whether it's active or not. Now, right

now, we're not running anything. We're

not looking at any tables, we're not running any queries. So, it's not active at the moment, but the second that we open something up and start working with real data, it will activate this serverless compute and we'll be able to

use it for free. One small nuance to this is there actually is one other type of compute. You just can't see it. And

of compute. You just can't see it. And

that's for when you run Python in a notebook. It's called the generic

notebook. It's called the generic compute. It's just for that one specific

compute. It's just for that one specific use case. For the most part, you're

use case. For the most part, you're going to be using this serverless compute, but I thought it was worth mentioning. Notice when I clicked on

mentioning. Notice when I clicked on compute, it also took me right down here to the SQL warehouse. This is our SQL warehouse that we are using. Next, let's

go down to the marketplace. Now, the

marketplace is basically just, hey, here's a lot of companies that work with us, that we partner with, we have connectors to, and we have, you know, partnerships with, and it allows you to work with them a lot easier. So, here we

go. Partner connect integration. So, if

go. Partner connect integration. So, if

you want to connect to Fiverr, to PowerBI, to Tableau, DBT, Prophecy, you can easily connect to these. I know at my previous job, I connected it with PowerBI a lot. And so these are great

connectors where you can just kind of search and you can see hey do they connect to this tool that we use and most of the time they do have that connector. Now the marketplace has other

connector. Now the marketplace has other things as well. If we come over here to products we have things like tables or files which is data. You can also search for models and notebooks and all sorts

of other things. So you can come in here and we can search for files. And so

these are all free resources and files that you can search for and use. If you

can't find data here in the marketplace, you can always just go to something like Kaggle and get free data and bring that into data bricks as well. Now, let's

come over here and go to the SQL editor.

You can see we have kind of these uh larger overarching sections up here, but then we have SQL and then we have data engineering. Then we have AI and machine

engineering. Then we have AI and machine learning. Right now, we're in the SQL

learning. Right now, we're in the SQL section. So, right in here, we're able

section. So, right in here, we're able to come in and we're able to write SQL just like any other platform. The neat

thing about this though is that they have AI integrated into it. It helps a lot with writing some of the base queries that you're going to write. Now,

we're gonna have a whole lesson on Genie and so we're going to dive into that and see how you can generate code and do a lot of different things with Genie, which is their AI system within data bricks. And Genie just allows you to get

bricks. And Genie just allows you to get insights from your data just using natural language. But you'll be able to

natural language. But you'll be able to write SQL right here in this editor as well as have multiple tabs and choose what workspace you're working with in and choose what workspace you're working in. Right now, we're just on the

in. Right now, we're just on the default, but if you had different workspaces, you'd be able to just click on those and select where you want your queries to be pointed at. Next, let's

take a look at our queries. So, once we get in and we start developing all these queries, we want to save them. And

organization is a big piece of that. I

know that I used to have like 20 30 different kind of queries that I was saving on my file explorer back in my old job. And then I started working in

old job. And then I started working in data bricks. I'm like, hey, I can just

data bricks. I'm like, hey, I can just save those within this. And then I don't have to email someone my query and then they pull it up and copy and paste. like

it's all here within data bicks, which is really great. Next, let's go look at dashboards. Now, these are some sample

dashboards. Now, these are some sample dashboards, and we're going to have a whole lesson on analyzing and visualizing data within data bicks. So,

we'll be able to build something like this. Let's click into one of these

this. Let's click into one of these dashboards. Right here, it says my

dashboards. Right here, it says my warehouse is starting. That's cuz it's connecting to the data to be able to visualize our data. So, let's get rid of this filter really quick. So, this is what our dashboard looks like. And

you're able to create all these different visualizations. is able to

different visualizations. is able to drill down into the data and this is all connected to data that is already in data brick so it's all in one place.

Next let's go take a look at Genie and they have a whole section just for this and they also have an entire section that is just devoted to this warehouse.

Now, this is something that a lot of platforms are trying to build to, but data bricks already has it, which is they have all the data. They have the ability to query and analyze and visualize data all in one place, but

being able to use AI with it to get a lot of those things done faster. And so,

you can ask it questions and it's going to prompt you because this is kind of a uh, you know, starter section. And you

can ask it questions and it's going to run these queries and it's going to get you that information really quickly. In

one of our future lessons, we're going to be diving into just Genie, focused only on what it can do, how it helps you analyze your data faster, and how it can be really useful to you as a user. The

next thing that we're going to take a look at is alerts. Now, an alert is basically like a trigger. When a

condition is met, it's going to do something. And so, you can customize

something. And so, you can customize these alerts and set these conditions, and they'll email you or they'll message you when these conditions are true. Now,

in this section up here, this is where we're going to spending the majority of our time. There are other things like

our time. There are other things like the job runs which is where you can create your jobs which is part of these uh jobs and pipelines and then of course the data ingestion as well which allows

you to connect to your data. So if you have a data source that you want to point at maybe it's Google Analytics you can just click on this and connect to your data or if you just want to upload a file you can do that as well. Down

here we have our AI and machine learning section. So you can click on the

section. So you can click on the playground. This is another area where

playground. This is another area where you can interact with the AI within data bricks and you can select some custom parameters as well as the prompts that you are using. We can also look at experiments. This is where you can build

experiments. This is where you can build out AI agents and machine learning models and you can actually test them out. And so this is a fantastic place

out. And so this is a fantastic place where you can actually learn how to use a lot of these foundational models and you can take the data and you can deploy it and you can work through these issues and really learn hands-on how to work with them. Naturally, they're going to

with them. Naturally, they're going to have AI integrated into all this. So you

can work with the data bricks assistant to get help with all of your coding. So

as you're going along, if you get stuck, you can always get help with your code.

As I said before though, we are going to be spending most of our time right up here. Of course, we'll need to ingest

here. Of course, we'll need to ingest our data. But after that, we're going to

our data. But after that, we're going to be spending a lot of our time visualizing, see how we can work with our data and how we can use their integrated AI to be able to do our work faster. At the end of this entire

faster. At the end of this entire series, we'll be building a full project using a lot of the things that we're going to be looking at in this series. I

think building and getting hands-on is the best way to learn. So, I cannot wait to get started on that. There are a lot of ways to work with data in data bricks and we're going to cover a lot of them in this lesson. We're going to upload data like a CSV and a JSON file. Then,

we're going to connect to an external data source and we're going to bring that data in. Then, we're going to see how we can actually use our data working in the SQL editor as well as using a notebook. The best way to follow along

notebook. The best way to follow along is to create a data bicks account. I

will leave a link in the description.

It's data bicks free edition, so it is completely free. All you have to do is

completely free. All you have to do is create an account and sign in and you'll have access to everything. You don't

have to put in a credit card at all. It

is completely free which is amazing.

With all that being said, let's jump on my screen and get started. We are

starting out fresh. We haven't uploaded any data or ingested any data into data bicks yet. So, you are right where you

bicks yet. So, you are right where you need to be. Now, there's actually a lot of different ways you can bring data into databicks. Right now, we're in our

into databicks. Right now, we're in our workspace, but let's go over to our catalog and let's go into our workspace.

Let's go into default. Right now, we have no data in our default workspace.

Now, what we're going to do is let's click into this default and we're going to come right over here and we go to create. Now, we have a few different

create. Now, we have a few different options. These two are the only ones

options. These two are the only ones that are really relevant to what we're doing right now. So, let's create a new volume here. And I'm just going to call

volume here. And I'm just going to call this uh YouTube series. And then I'm going to go ahead and create this.

Now, you'll see right over here under default, we now have this folder called YouTube series. If I click in this

YouTube series. If I click in this YouTube series, we can upload to this volume. So, we can do that right here.

volume. So, we can do that right here.

We don't have to actually go outside of it or go to any of these other options.

We just click on upload to this volume.

So, let's go ahead and click browse. And

I have all these different files. Now, I

created these. They're very simple files. We're not getting crazy in this

files. We're not getting crazy in this lesson, but I have a customers CSV and a customer's JSON. So, one's a CSV file,

customer's JSON. So, one's a CSV file, one's a JSON file. I also have orders CSV, orders JSON, product CSV, and products JSON. We're not going to use

products JSON. We're not going to use all these, although I will have all these files in a GitHub. You can just find them in a link below. So, you can download these exact files if you want to work alongside me. But, let's bring

in the customers CSV. So, I'm going to click on this customer CSV. And you can see this is the destination path that we are using. Let's go ahead and upload

are using. Let's go ahead and upload this file. And very quickly, like in one

this file. And very quickly, like in one second it took, we now have our customers CSV CSV. Now, I just named it that. So, you know, if the CSV wasn't

that. So, you know, if the CSV wasn't there, we'll still be able to see it.

Let's click into the CSV really quickly.

We have our customer ID, first name, last name, country, and signup date. And

then you can see our data is all separated by a comma. Now, this data is just being stored as a file. This isn't

actually being stored as a table. So,

you can see it just sits in here as a CSV. So, if we go over to our files,

CSV. So, if we go over to our files, it's going to sit there as an actual CSV file. Now, we can still query off of

file. Now, we can still query off of this data just as it is as a file. We

don't actually have to have this in a table format. Let's come right over here

table format. Let's come right over here and let's copy this path. And then we're going to go down to our SQL editor and let's come over here to a SQL query.

Now, if we want to access this data, we can say select everything and then we'll say from and I'm going to put this path in here. I should be able to use back

in here. I should be able to use back tick just like this. And let's try running this. And actually, I need to

running this. And actually, I need to put CSV dot right here. Now, let's try running this. And as you can see, we

running this. And as you can see, we were able to read in our data without it actually being in a table. A CSV file is one of the easiest files to work with.

It's just values that are separated by commas. And so when we do this, when we

commas. And so when we do this, when we say CSV dot and then we provide the path, we are reading in this data as if it is in a table. Now, if you come from just a SQL background, this may seem

very unintuitive to you. And that's

okay. Let's go back to our catalog. And

we're going to come over here to workspace. We're going to go to default.

workspace. We're going to go to default.

And instead of going into our volume right here, we're going to come back to our default. And so now we're going to

our default. And so now we're going to create a table. When we go to create a table, we're still using our serverless starter warehouse here, but now we have the ability to connect to a data source

or to upload a specific file format.

Let's go ahead and click browse. We just

uploaded the customers CSV, but let's go down to the orders JSON because JSON is a totally different format. Let's go

ahead and open this up. It's going to read in that JSON and make it tabular, which is fantastic because JSON by definition is not a structured format.

And so being able to read in that data really easily and then put it into columns and rows is very helpful. So now

we're going to do a create a new table.

We can name this table anything we want.

I'm just going to get rid of this JSON because once it reads it in, it's like a table anyways. So, we're going to keep

table anyways. So, we're going to keep it as orders here. Then, we're going to create this table. Now, you'll notice over here under our default, we have tables and we have volumes. So, they are

separated out because they're two totally different things. One is going to be storing different files in kind of a folder format. And then one is the tables underneath our default workspace.

So, let's come down here to orders. And

that's actually right over here. So now

under our orders, we can see customer ID, order date, order ID, product ID, quantity, and total amount. We can get a little bit of metadata on this file. And

if we come up here, we can come here and just create a query. So let's click on create query. It's going to open up a

create query. It's going to open up a new query in here. And now we are just selecting everything from our orders table. It's already hitting off of our

table. It's already hitting off of our default schema in our workspace. So, we

don't have to start doing, you know, workspace dot uh default. Orders. We don't have to have all that. You can uh but I'm just

going to hit control-z here. Let's go

ahead and run this. And now we can see all of our data in this really pretty view. So, so far we've ingested a CSV

view. So, so far we've ingested a CSV file and we just did that as a CSV file.

We were still able to read that data in which is really fantastic. But we're

also able to just create tables and then read that data in like any other SQL database. But now let's come over here

database. But now let's come over here to data ingestion. There's a lot of different ways you can access data that is not just sitting in a file. Right off

the bat we have our data bricks connectors. Things like Salesforce,

connectors. Things like Salesforce, Workday, Service Now, Google Analytics, Azure SQL Server. These are a lot of the things that I've used in my actual work.

Almost all companies are going to have at least one of these. And so connecting to that data source bringing it in is really common. We can also bring in data

really common. We can also bring in data from a file like we did before or create a table from Amazon S3. Then we have our five connectors right down at the bottom. I know personally I've worked at

bottom. I know personally I've worked at different companies. I've consulted with

different companies. I've consulted with companies. They use Google Drive as like

companies. They use Google Drive as like their store of information. That's where

they keep everything. So let's go over here to Google Drive and I'm going to actually connect this. So I'm going to say I want to put it in my workspace.

Let's go ahead and click next. really

quickly. I have a file over here, orderscv.csv.

orderscv.csv.

It's sitting in a Google Drive. So,

that's what we're going to go and try to connect to. Let's go ahead and click

connect to. Let's go ahead and click next. This is my email that I'm using

next. This is my email that I'm using for this Google Drive. We're going to connect this to Fiverr. Since this is a new account for Fiverr, I need to create a password. So, I'm going to do that

a password. So, I'm going to do that really quick. We're going to come right

really quick. We're going to come right down here and we're going to go to share. And all we have to do is make

share. And all we have to do is make sure that this is not restricted. So,

we're going to say anyone with this link, which means fiverr as well. And

then we're going to copy this link and put it in our folder URL. Let's go ahead and save and test this. And it looks like our connection test passed. Let's

go ahead and click continue. And now all we're going to do is sync our data.

We'll start the initial sync. And it

should be very quick because I do not have much data in this folder. So, it

looks like our connection was successful. Let's come over here to our

successful. Let's come over here to our schema. our one file that we have in our

schema. our one file that we have in our Google folder is synced up. So, we

should be good to go. Now, let's come back to our data bricks. Let's come

right over here and we're going to go into our catalog. We're going to go into our workspace. And now we have default

our workspace. And now we have default and we also have Google Drive. Let's

click on our Google Drive and we have this orders CSV.

And you'll notice that we now have it in here as a table. Let's go ahead and create a query for this so we can look at our data. So now we have workspace.google drive is connecting to

workspace.google drive is connecting to a different schema. So we have select everything from order CSV. Let's run

this.

And now we have our data. It also adds this in which is five transync which is a really really useful column because if you're syncing this data consistently you really want to know when that data

gets put into this table. I promise you that's really helpful that they put that in there. And then we have all of our

in there. And then we have all of our data that we have. And so that's how we can connect to outside data. In this

case, we use Fiverr, but sometimes you'll just do a direct connection depending on your data source. Now, this

SQL editor works like any other SQL editor. You're going to be able to make

editor. You're going to be able to make joins and aggregations and all sorts of different things, but there is a different way to interact with your data. Let's come right over here. Let's

data. Let's come right over here. Let's

go to new. We can also go and use a notebook.

We can add code. We can add text. You

can also use an AI assistant to help you with these things. So, we have markdown file right here. I can say uh this is my first text

right here. And then I have my code down

right here. And then I have my code down here. So I can start writing and typing

here. So I can start writing and typing my code. Now I can specify right here

my code. Now I can specify right here whether I want it to be SQL, Scola, R or Python. So for this notebook, you can

Python. So for this notebook, you can use any of these. In this cell right here, I have Python. But let's add another one. And I can use SQL in the

another one. And I can use SQL in the next one. And then in the next one, I

next one. And then in the next one, I could use R. And so you don't have to just use one. You can use multiple. Now,

it depends on what you're doing, whether you want to use a SQL editor or you want to come in here and use a notebook. When

I'm just kind of querying data, I'm just looking at it. I'm not doing a lot of transformations. I don't need a big

transformations. I don't need a big programming language. I'm just querying

programming language. I'm just querying the data. I'm going to be using a SQL,

the data. I'm going to be using a SQL, you know, editor right here most of the time. And I can always save these

time. And I can always save these queries and I can pass them along. But

if I'm really digging in and I need to be able to break things out and leave notes and I'm going to share this with my team, a notebook is kind of the way to go. So let's look at SQL really

to go. So let's look at SQL really quick. We already have a query for this,

quick. We already have a query for this, albeit a very simple query. But let's

come over here and let's run this SQL query right here. So let's go ahead and run this. And so now if we scroll down,

run this. And so now if we scroll down, we're going to be able to see our data just like we did in the SQL editor. But

now let's go write the same thing but in Python. In order to do that, let's come

Python. In order to do that, let's come up here. And this is already in Python.

up here. And this is already in Python.

So, we'll just say spark.t.

And we need to read in uh the appropriate table. And all that is is

appropriate table. And all that is is orders. And let's go ahead and run this.

orders. And let's go ahead and run this.

And it is reading it in as a dataf frame. Let's call this uh dataf frame.

frame. Let's call this uh dataf frame.

And let's come right down here. And

we'll say display dataf frame. And let's

run this.

And now we're going to get our data in a table. Just like before, of course, with

table. Just like before, of course, with this, it's going to save this data, but you can always come in here and you can rename and export it and you can put it into git. You can share this with your

into git. You can share this with your friends because all your friends really care about your notebooks that you are writing. Or you can create a new

writing. Or you can create a new notebook and start from scratch. But

this is a totally different way to interact with your data in data bricks.

This is where most of your work is going to be done. is right here querying data whether it's a SQL query or over here in a notebook writing a bunch of code whether it's in Scala or Python or R or

SQL itself. Now I hope that was really

SQL itself. Now I hope that was really helpful because in the next lesson we're going to be analyzing data with SQL and then also building out visualizations in data bricks. Now in the last lesson we

data bricks. Now in the last lesson we looked at the SQL editor as well as notebooks. And so what we're going to be

notebooks. And so what we're going to be doing is we're going to be looking at some data. We're going to be analyzing

some data. We're going to be analyzing that data and then we're going to be putting it into a dashboard and creating different visualizations. This is

different visualizations. This is actually the dashboard that we're going to be creating in this lesson. And I

know what you're thinking, Alex, you put the exact same visualization twice. That

makes no sense, but it will make sense once we get to it in the lesson. I

highly recommend following along. All

you have to do is have a Data Bricks free edition account. I will leave a link in the description so you can make that account and follow along. With that

being said, let's jump on my screen and get started. All right, so we're getting

get started. All right, so we're getting started right here on the dashboards.

All we're going to do is we're going to come up here and we're going to create our own dashboard. So, we get started with this blank slate and it has all these little arrows and I recommend you read through these really quickly. But

this is how you add filters. This is how we get our data which is right up here and I'll show you that in just a second.

And this is how we actually add our visualizations to what they're calling our canvas. Let's come right up here and

our canvas. Let's come right up here and let's go to our data. Now, within our data set, we are able to write SQL queries. And I will say this is one of

queries. And I will say this is one of my personal favorite things about this is you can write the query and then you can use that query to then create a visualization. We're going to do that in

visualization. We're going to do that in this lesson because I like being able to visualize and kind of see my aggregations when I'm doing some type of group by in SQL. I like to see what the actual output is and you can really

easily do that in here. You can also come down here and you can just come in and select some of your data. Now we are actually going to be using a sample data set. Everyone should have this data set.

set. Everyone should have this data set.

It's right here called the bake house.

And so we can come in here and we can also just select one of our data sets.

When we select this sales transaction, it's going to start our warehouse. So

now it's connected to our data and our serverless warehouse is running. So now

we have access to this data. If I come right here, I click on these three dots and I can click add to dashboard. So I'm

going to click add to dashboard. It's

going to read in this as a SQL query. So

if we go back up here, we now have sales transactions as one of our data sets. So

we can just come in here and write it ourselves or we can also come into the catalog and just select a data set and have it run it for us. Now before we start actually diving in and kind of analyzing and visualizing this data.

There are other ways you can actually analyze data. And we looked at this in a

analyze data. And we looked at this in a previous lesson when working with data set. But as it pertains to connecting it

set. But as it pertains to connecting it to a dashboard, what we can do is let's come right up here and let's click on a notebook. Let's say we want to run that

notebook. Let's say we want to run that exact same query. And I'll just go back to the dashboard. I'm going to rename this really quick. I'm going to rename this as our transaction. If I can spell

this right, dashboard. All right. So, if

I go to my data, I could just copy this and then I come right over here to the SQL editor and we'll actually have access to uh our notebook right over here. So, this is just our fresh

here. So, this is just our fresh notebook. We haven't used it. I'm just

notebook. We haven't used it. I'm just

going to make this SQL just so it can you can see how easy this is, but we're going to run this exact same thing.

We're going to get our output right down here. Now, let's say this is the output

here. Now, let's say this is the output that we want. It isn't, but this is the output we want. But if we come right over here to these three dots and we scroll down, I can click add to dashboard. So I'm going to add this to

dashboard. So I'm going to add this to my dashboard. I'm going to say add to

my dashboard. I'm going to say add to existing dashboard. And when I click on

existing dashboard. And when I click on this, I'm going to click on the transaction dashboard. And I'm going to

transaction dashboard. And I'm going to import this. This data is then going to

import this. This data is then going to be imported over as an untitled data set. We can rename this. Uh I'm just

set. We can rename this. Uh I'm just going to say that this is from notebook.

So this is our data from the notebook.

It is the exact same query as you can see, exact same data. But you don't just have to create your data from right here in the create from SQL or in this dashboard tab. Now let's start actually

dashboard tab. Now let's start actually working on our dashboard and kind of analyzing the data as we go. Let's just

start by taking a look at our data. So

we're working the sales transactions. We

have transaction ID, customer ID, franchise ID. If we scroll over to the

franchise ID. If we scroll over to the right hand side, we have the date and time that this transaction went through, the product that was purchased, the quantity, the unit price, total price,

the way that they paid, and their card number. Now, this is not real data, so

number. Now, this is not real data, so don't try to steal any of this card number, but we are going to be using this data to create our visualizations.

Now, typically when I'm analyzing and then visualizing data, I have an end goal in mind. I know what I'm going to be doing with the data, and so I know how I need to analyze it. if I need to be use a group by or a window function, if I need to clean the data or pivot the

data, these are things that I'll know ahead of time. Now, in this lesson, we're going to keep it kind of simple, just learn how to build these things.

But in the last lesson in this series, we're going to be building a full project. It's going to have really messy

project. It's going to have really messy data that we need to dig into. But here,

we're going to learn a lot of the fundamentals. Let's start with that

fundamentals. Let's start with that first bar chart that we saw earlier on.

So, we're going to come right here.

We're going to go create from SQL. And

let's call this one, let's rename this before we actually write it. We're going

to call this one product sales descending. Now, what this means is is

descending. Now, what this means is is we're going to take the product sales.

So, right over here, we're going to take the product and then we're also going to look at this total price. So, we're

going to calculate the sales. This is

actually going to be quite easy. If you

know SQL, this should be uh pretty straightforward because we're just going to be using a group eye for this. So,

I'm going to come right over here and I'm going to say I want to take the product. So, let's take the product and

product. So, let's take the product and I use tab for autocomplete here. So, I'm

going to do uh comma, then I'm going to do the sum, and then I'm going to take the sum of total price. And again, I'm just going to hit tab. So, now that we have the product and the sum of total

price, all we have to do is come right down here and say group by. And we're

going to group by the product. And we

should also order by. So, we're going to order by and we'll do total price. And

we'll do that descending. And so, all we're doing, and let's run this. All

we're doing is we're taking the product and we're grouping it. And then we're taking the sum of all of that total price. And I actually need to order by

price. And I actually need to order by the sum of total price. I just took the column itself, but in this query, we're using the sum of total price. Let's try

running that one more time. Listen, it

happens to the best of us. So now we have the product right down here, and we're ordering it based off the sum of this total price. Now I'm going to leave this just like this. Although typically

I would use an alias. I would say as and I would say total price or something like this, right? But

I'm not going to do that because I want to show you in the dashboard how we can easily rename this. We don't have to use this column name. So now this data right here is ready for us to visualize. We

can use this. So now let's come over to our untitled page and let's title this.

We're going to say uh dashboard. So

we're just going to name this dashboard.

And we can get rid of these global filters for now, but we will need that in a little bit. Now, you have all these options down here at the bottom. We have

the move, we have add visualization, add a text box, add a filter, undo, and redo. Now, let's focus on adding a

redo. Now, let's focus on adding a visualization first. And we'll worry

visualization first. And we'll worry about the text box later because we'll give it kind of a header of transaction dashboard. You don't have to, but we

dashboard. You don't have to, but we will for this dashboard. Now, when you first create a visualization, we're going to have this widget on this right hand side. This is where you basically

hand side. This is where you basically build out your visualizations. We're

going to come in here and we're going to select the data that we want. We just

built this product sales descending.

Let's go ahead and click on this. We do

want this bar chart, but let's come in here and let's look at all these different options. There is a lot of

different options. There is a lot of different types of visualizations that we can create, and a lot of these are the ones that you'll use 99% of the time. So, we're going to click on this

time. So, we're going to click on this bar chart. Now, we have to select the

bar chart. Now, we have to select the x-axis and the y-axis in our chart. The

axes are like this. We have we have two separate axes. And so, we need to select

separate axes. And so, we need to select what data goes on each of those axes.

So, let's select our x-axis. For this,

we're going to do the sum of the total price. And then for the yaxis, we're

price. And then for the yaxis, we're going to select the product. Now, this

looks perfectly fine, right? I'm going

to uh expand this a little bit just for a second. This looks perfectly fine as

a second. This looks perfectly fine as is, but there's a lot of little things that we can do to make it a lot better.

The first thing that I'm noticing is that I, you know, I'm having a little tough time reading this. I'm saying,

okay, how much is this one exactly? It's

a little over 11,000. And if I hover over it, it'll tell me the exact number.

But that's not good for a customer or somebody to see. I'm going to add these labels. And now we can see it right

labels. And now we can see it right away. So, we don't have to kind of

away. So, we don't have to kind of guesstimate. we can see the exact

guesstimate. we can see the exact number. The next thing I'm noticing is

number. The next thing I'm noticing is that it's kind of all over the place. It

looks like it's in alphabetical order here, but if you remember in our data back here, we had it in descending order based off of the sum of total price. So,

highest to lowest, that's descending.

And I want that in our dashboard as well. All we have to do, so we're going

well. All we have to do, so we're going to click on this. We're going to go down to these three bars right above product.

And then we're going to click buy xaxis.

And then right here is the descending.

So now we've ordered our data from highest to lowest. It's no longer alphabetical. Now there are some other

alphabetical. Now there are some other things I want to highlight. We don't

have to add these, but we absolutely can. First, let's add a title. So we're

can. First, let's add a title. So we're

just going to call this total price by product. Keep it super simple. But we

by product. Keep it super simple. But we

can also add a description. For example,

let's say we wanted to add some context to this. Or maybe we wanted to highlight

to this. Or maybe we wanted to highlight that this is our biggest seller. So, we

can say Golden Gate Ginger is our highest selling product eight years in a row. Now, I'm just making this up. Uh, this isn't real.

This is just as an example, but let's say we had some historical data and this is now showing, hey, this is still our best seller and I just wanted to add that as some context. You can definitely do that. You don't have to, but you can.

do that. You don't have to, but you can.

One other thing is let's take a look at these colors because we don't have to just keep these colors. We can also use a custom color. So they can be you know whatever color you think is best for

your dashboard. We can also click on

your dashboard. We can also click on this plus sign and we can create some other options. Now if we click on

other options. Now if we click on product and this is one that I don't necessarily recommend. So each product

necessarily recommend. So each product is going to get its own color on each row. It's a little bit redundant. Now

row. It's a little bit redundant. Now

let's go back over here. Let's get rid of our product and let's select our total price. Now we have this gradient

total price. Now we have this gradient scale from blue to white. This isn't my favorite. I actually prefer if we come

favorite. I actually prefer if we come right up here and let's get the green blue. And so this is a great way to

blue. And so this is a great way to analyze and visualize at the same time.

Sometimes you already know what you're going to be building out and so you can work with the data while you're building your dashboard. And now I can very

your dashboard. And now I can very easily see I'm like Golden Gate Ginger Man that is our product. like we're

killing it with this product. Uh but

Richard Oasis, nobody likes that. It's

doing okay, but it is our worst seller.

Now, just remember, we use this product sales descending and let's go take a look at this. This is the only data available to us in that visualization.

Let's come over here and let's look at our sales transactions. We have our sales transactions right here. This is

all of our data. And so there are times where we don't even need to write a custom query for each visualization.

Let's take a look at this. So let's use our sales transactions to create a visualization in our dashboard. Let's

come down here. Let's create our new visualization. And the next one that

visualization. And the next one that we're going to take a look at is payment type. So let's change this data set to

type. So let's change this data set to the sales transaction. So now we're using two different data sets. Just note

that for future reference because that will come into play. But we're going to come down here and let's say we wanted to create a pie chart for the angle. And

I'm just going to come down here. We're

going to do payment method right here.

It's going to give us a count. And we

don't have to do count distinct. In

fact, uh we probably shouldn't because it's just going to if we hover over it, you're going to see three because there's only three options. But if we do a count and hover over it now, it's 3.33,000.

So now we're going to be able to see how many transactions are using each payment method. Now, for the color, this one is

method. Now, for the color, this one is actually really important for a pie chart. Let's come in here and we want to

chart. Let's come in here and we want to do this based off of the payment method.

We want to break it out. So, we have this payment method. We have Mastercard, AX, and Visa. Again, we can see the split based off of the color, but I really like labels. I think they're just

super important. So, I'm going to add

super important. So, I'm going to add these labels in right here. Now, this is an example where it says count of payment method right here on the dashboard. I don't like that. I mean,

dashboard. I don't like that. I mean,

that's just by default. It's going to take the uh name of it. But I'm going to come in here. I'm going to change this display name. So, all I'm going to do is

display name. So, all I'm going to do is I'm just going to call this payment method breakdown.

And let's change that. And that looks a lot better. It just doesn't seem so I

lot better. It just doesn't seem so I just tossed it in there. Seems like you intentionally named this dashboard. So,

we were able to go in to this sales transactions that has a lot of different fields, a lot of different columns, and we're able to just kind of pick out which one we want to use. So far, our dashboard is looking great. We're just

going to build one last visualization and then we're going to work on filtering. So, let's come right down

filtering. So, let's come right down here and let's create one last visualization. And this is going to be

visualization. And this is going to be our dashboard. So, we're going to come

our dashboard. So, we're going to come in here. We're going to use sales

in here. We're going to use sales transactions again, but this time we want to see our transactions over time, right? We have this date column and we

right? We have this date column and we want to use this. Let's use a line chart. And if we go back to our data, we

chart. And if we go back to our data, we have this date time. And this is really useful. We want to utilize this and see

useful. We want to utilize this and see how many transactions are we having over time. Maybe there's a certain day of the

time. Maybe there's a certain day of the week that people are just making a lot of transactions. This is really useful

of transactions. This is really useful data for someone to know. So let's go back to our dashboard and let's come down here to our x-axis. And for this we want it to be our date column right down

here. So we'll choose our date time. Now

here. So we'll choose our date time. Now

by default it's going to select monthly but you can come in here and you can transform this. It's going to take that

transform this. It's going to take that datetime column. It's going to be able

datetime column. It's going to be able to automatically change it to basically anything you want. So, let's just choose daily for now, but we can change it later on. We just have to choose our

later on. We just have to choose our y-axis. So, let's click on our plus sign

y-axis. So, let's click on our plus sign in order to see our sales over time.

Let's take a look at our quantity. It's

going to do the sum of quantity here.

And if we go back just to check on this data, the quantity is the actual amount that we sold. So, we sold eight, we sold 36, we sold 40. And so, it's aggregating that data for us, which is really nice.

If we come right over here, we can hover over this and we can kind of see each day and the sum of quantity. Now, again,

we need to transform this a little bit because it's just sum of quantity, date, time, um, doesn't display the best. We

want to customize this. We're going to just say date to keep it simple. And

then for the sum of quantity, we're going to say quantity sold and keep it just like this. for our title because I think this one may need one. We'll say

quantity of sales over time. Right,

there we go. Now, we have our dashboard built. It looks great. I'm super happy

built. It looks great. I'm super happy with this. One last thing we should add,

with this. One last thing we should add, and this is optional. You don't have to do this, but I'm going to add this title really quick. I'm going to call this our

really quick. I'm going to call this our transaction. Let me spell that right.

transaction. Let me spell that right.

Transaction dashboard. And I'm going to format it just a little bit. I'm going

to do it like this. I'm going to increase the size just like that. You

can change it to be a different color, but it just kind of adds a little bit of finesse to it, right? We can put that up and make this smaller. Um, we can make

this smaller if we want to go kind of that route. But it is up to you. The

that route. But it is up to you. The

next thing that I want to show you though is adding a filter. Now, filters

are quite important. customers, clients,

managers, whoever is using this dashboard are going to want to filter in some way. They're going to want to say,

some way. They're going to want to say, "Oh, I want to filter on this product or I want to filter on this month or this year or whatever it is." And so adding a filter is really important. We can add a

global filter right here. Now, let's

come over here to our widget because we do have the ability to change the type of filter. We can select multiple

of filter. We can select multiple values. It could just be a single value.

values. It could just be a single value.

Could be a date picker or range picker.

So, a range of dates instead of just a single date. It could be a text entry

single date. It could be a text entry where they're searching for something.

Maybe it's a product. You can also do a range slider. Let's click on this range

range slider. Let's click on this range slider and let's come in here. We're

going to go down to the sales transactions and let's go down to the quantity. Now, what this means is is

quantity. Now, what this means is is we've created this global filter on quantity. Quantity is a numeric data

quantity. Quantity is a numeric data type. So, we have this slider where we

type. So, we have this slider where we can basically say, hey, I only want to see where the quantity was over a certain amount. So, we have a picker

certain amount. So, we have a picker where we can select a range for the quantity. Maybe we only want to see the

quantity. Maybe we only want to see the transactions that have a quantity greater than 30. For example, we would be able to do that in this dashboard. We

can click in here and we can customize this. Maybe the minimum is zero and the

this. Maybe the minimum is zero and the maximum is, let's say, 100. I'm just

going to set it for now. Let's come

right over here. Let's say we only want to look at it where it's greater than 30. So, I'm going to click on that.

30. So, I'm going to click on that.

You'll notice that this doesn't actually change at all, but this one changed and this one changed. And let's just highlight that a little more. Let's kind

of slide this around.

And you'll notice that only these ones are changing, but this one is not changing. The reason for that is the data set that we chose. So,

let's come back here. Right up here, we have the product sales descending. And

then if you go down here, we have our sales transactions and we have our sales transactions. But what are we actually

transactions. But what are we actually filtering on? We are filtering on the

filtering on? We are filtering on the quantity. And if you remember, let's go

quantity. And if you remember, let's go back to our product sales description.

We don't have the quantity in here at all. And so this quantity is not

all. And so this quantity is not connected to this data set. And so when we're applying this filter to this dashboard, it is not connected to this right here. Now, the way that we can fix

right here. Now, the way that we can fix that is we can replicate this exact dashboard. And this is what I was kind

dashboard. And this is what I was kind of talking about earlier, which is why we're going to create a second dashboard, but we're going to create the exact same thing, but we're going to use

the other data set. So now we're in sales transactions. We're going to

sales transactions. We're going to create our bar chart. For the xaxis, we're going to do the sum of total price. And for the yaxis, it's going to

price. And for the yaxis, it's going to be the product. Let's find it right here.

Let's scan this all the way over. Let's

go down, add our labels, create our custom colors based off of the total price, and we'll change that coloring to be the

green, blue. And you can already see

green, blue. And you can already see that this is filtered based off of our global filter right over here. So, let's

get rid of this. Let's just make it the same. And we actually need to do one

same. And we actually need to do one more thing. We need to

more thing. We need to go like this.

So now we have the exact same visualization, but this one is going to be connected to our filter. So this is a really

our filter. So this is a really important just thing to understand when you're building out these dashboards is people really like their filters. They

want to be able to do that. And this is actually going to be a big thing that people request once you build your dashboard. You're going to build it out.

dashboard. You're going to build it out.

It's going to be great. And they're

like, "Hey, I want to be able to filter on this. I want to be able to filter on

on this. I want to be able to filter on this." And so you're going to have to

this." And so you're going to have to build that out. and you may have to go back and connect your data in certain ways to be able to accommodate certain filters. So that's just something to

filters. So that's just something to think about and something to know. But

now this one is connected just like this. And so I would actually replace

this. And so I would actually replace this one with this new visualization right down here because I want it to be connected to all of our other data for these types of global filters. Now there

are of course some other things that we could do. We can come in here and change

could do. We can come in here and change some of these uh axes. We could add some more context in here. This is our sample data set. This is as far as we're going

data set. This is as far as we're going to go in this lesson. But like I mentioned earlier in the last video in this series, we're going to be building out a full project using real raw data.

So we're going to have to do some data cleaning. We're going to have to really

cleaning. We're going to have to really dig in and analyze our data and visualize and create our dashboard. The

first thing we're going to be looking at is Genie. And Genie is built for

is Genie. And Genie is built for business users to be able to get insight from their data just using natural language. Next, we're going to be using

language. Next, we're going to be using their AI assistant to help us code. So

we'll be using that in the SQL editor as well as the notebooks. It's going to help us generate code, but it's also going to help us diagnose and fix issues if we run into them. Next, we're going to use their AI assistant in the dashboards tab. So, it's going to help

dashboards tab. So, it's going to help us create visualizations. Now, what's so amazing is you're going to be able to try all these things completely for free because in Data Bricks free edition, you can use all of their AI tools completely

for free. Be sure to use the link in the

for free. Be sure to use the link in the description to create your account so you can follow along and practice and actually use all these AI tools. With

that being said, let's jump on my screen and get started. When you first pull up data bricks, we're going to need to come right down here within the SQL section to Genie. Let's go ahead and click on

to Genie. Let's go ahead and click on this. And this about sums it up for

this. And this about sums it up for Genie. You can ask questions about your

Genie. You can ask questions about your data in natural language. And that's

what it is. It's just a way to converse with your data. Ask questions about your data. Let's come right over here. We're

data. Let's come right over here. We're

going to go to new and we want to connect to a data source. We are going to use a sample data set. Let's actually

go to all. It's within the samples right down here. this New York City uh data

down here. this New York City uh data set. This is the data that we're going

set. This is the data that we're going to be using for this video. Let's go

ahead and create this. It's going to spin up our SQL warehouse because that is what we're going to be using to interact with this data. Now, while this is going and while it's uh just starting

up and we can get rid of this. We now

have this data that sits right over here. Now, I'm just going to give you

here. Now, I'm just going to give you kind of an introduction to kind of this interface, but we're right here in this configure tab. We can come over here and

configure tab. We can come over here and we can create custom instructions for whatever we want this genie to be. So if

this genie is supposed to be for a specific team or a specific project or whatever it is, you can give it these guidelines because then you can share this with other people and they can interact with it. So if it's people on your team who are interacting with a

specific data source, then everyone can interact with that AI in the same way and it's pretty cool. If we come right over here to settings, we can also name this. So, I'm just going to call this

this. So, I'm just going to call this one uh datab bricks genie taxi. And I

could add some description, but I'm not going to at the moment. Now, right over here, you'll notice in this space, we have a little bit of kind of prompting that is just, hey, here are some things that you can click on and we can give

you that information. We can also add a sample question. So, if you want to add

sample question. So, if you want to add one for your team or whoever you're working with or just for yourself, you can add a question right here. Let's

just go ahead and save this. and we're

going to get out of here. So, we have just this Genie interface right here.

Now, as I mentioned before, Genie is really great for a business user, someone who's just going to be using natural language in order to ask questions about the data. If you are a more technical user, you're most likely

going to be using the AI assistant that we're going to be looking at in the SQL editor and the notebooks. You can edit and dive into the code and get into the more programming side of data bricks.

Let's just go ahead and ask it to explain this data set because we haven't taken a look at this data set at all. In

fact, I don't even know what it looks like. It's just a New York City taxi

like. It's just a New York City taxi data set. So, it says here's where it's

data set. So, it says here's where it's located. It contains taxi trips,

located. It contains taxi trips, including pickup, drop off, trip distance, fair amount, and pickoff/dropoff zip codes. This is the only table in

zip codes. This is the only table in here. Now, we can actually come over

here. Now, we can actually come over here to configure, and we can click on this table, and we can see a very quick kind of sample of this. But I'm going to ask it to show me a sample of the data,

please. I like to be polite uh when

please. I like to be polite uh when working with AI. You just never know, right? I just want to make sure I'm I'm

right? I just want to make sure I'm I'm being uh respectful here. So, now it's going to give me a sample of our data.

Now, you'll notice right over here we have this show code, and this is what it's going to do for basically anything.

It's going to show us kind of what it's doing under the hood, which I really like. You can also edit in here. So, if

like. You can also edit in here. So, if

we want to come in here and we want to say, okay, let's limit it to 20 because we only have a small sample down here.

Now, we can run this again, and it's going to keep all this information, but we're now rerunning and kind of changing the code as we go if you want to do that. So, now let's take a look just at

that. So, now let's take a look just at our data really quickly. This is for taxi data. So, we have a pickup time, a

taxi data. So, we have a pickup time, a drop off time, how long the distance was, how much it cost, the pickup zip, and the drop off zip. So, fairly simple data. Now, there's a lot of things I

data. Now, there's a lot of things I could ask about this data set. I'm going

to keep it pretty simple. As we go further along in this video, we're going to get a little bit more technical, a little bit more challenging to the AI to really see what it can do. So, I'm just

going to say, what zip code are people being picked up at the most? So, I just want to know where

the most? So, I just want to know where are most people being picked up. Maybe

there's a specific zip code that 99% of these users are getting picked up at, and that would be really useful information. So, let's see what Genie

information. So, let's see what Genie comes up with. So, it says it right here. The most common pickup zip is

here. The most common pickup zip is 10,01 with 1,227 pickups recorded. Now,

you can see that right down here, but again, you have to remember that this is mostly for business users. And so, I think it is good that they include this information up here just as a narrative that you can read. Let's go and look at the code. So, it looks like uh they're

the code. So, it looks like uh they're looking at the pickup zip, which is exactly what we would want to do.

They're doing just a count of everything, and then they're grouping on that pickup zip as well, but then ordering it descending on the pickup count and limiting by one. This is how I would have written it as well. This

isn't anything crazy complex, but I think it did a really good job answering this question. Now, as I've been testing

this question. Now, as I've been testing and working within Genie, it is pretty good with answering a lot of the questions about the data. Let's get a little bit more difficult with this one.

So, let's come down here and let's say, "What time of day do most people get a ride?" Now, this is kind of how I would

ride?" Now, this is kind of how I would imagine someone who's not very technical might ask this question, right? It's not

very specific, and I'm curious as to how it's going to handle it. So, let's see what code it writes and the output that we get. So, it looks like it used the

we get. So, it looks like it used the pickup hour. It says the highest number

pickup hour. It says the highest number of rides occur at 1,800, which is 6 p.m.

with,455 rides during that hour. Let's look at the code really quick. So it looks like it's using this hour right here within our datetime. It's running a count. It's

our datetime. It's running a count. It's

kind of filtering a little bit and grouping on this and doing kind of the same thing it did in our previous query.

Now Genie does have the ability to create visualizations as well. Let's see

if it can visualize this. I'm going to say uh can you visualize this?

All right, it did exactly what I asked it to do. I asked it to visualize it and it literally is visualizing uh the pickup hour and the count amount. This

is not exactly what I had in mind. I

actually wanted to see all the pickup times over time. So I'm going to ask it if it can show me all the pickup hours and the counts and visualize it. So I

just went ahead and wrote that. I said

write a SQL query to show me all the pickup times and the counts and then visualize it. Let's see if it's able to

visualize it. Let's see if it's able to do this one because the previous visualization was not exactly what I was hoping for. Although it gave me exactly

hoping for. Although it gave me exactly what I asked for. So it looks like it did this properly. And let's just look at this. They really are just doing the

at this. They really are just doing the same query, but now they're not, you know, filtering on uh it descending and then limiting it by one. Now we have all

of them. And whoops. Let's go right down

of them. And whoops. Let's go right down here. And now we have a visualization of

here. And now we have a visualization of this data. As you can see right here,

this data. As you can see right here, this is our highest one. This is our 1800 or 6pm. And so now we have this visualization. Now, the thing about

visualization. Now, the thing about Genie is it's kind of its own standalone thing. So, if we want, we can come up

thing. So, if we want, we can come up here and we can copy this. So, we can copy all of this and bring this over to the SQL editor. But again, that's not really the use case here. The use case is that a business user can come in here

and ask questions about the data and get their questions answered without having to go into the SQL editor or a notebook or build a dashboard. So, with that being said, let's come right over here

to our SQL editor and let's go right over here and let's take a look within a new query window. Let's take a look at the AI assistant that they have. Now,

they have one right here, but you can also see a button right up here. They

are both the assistant, but they are used in a different way. I'm going to demonstrate that in just a little bit.

We're going to generate some code, but then we can ask larger context questions about the code and the data and revisions and insight into our data with our assistant up here. But we can't really do that with this one because

this is purely for generating code.

Let's go ahead and click on generate code. Now, we don't have this connected

code. Now, we don't have this connected to our data. So, I'm just going to say, show me our New York City taxi data. And

I'm going to ask it to generate this.

Now, this is very generic. I don't even know if it's connected to this data.

Let's just see what it does. It does not know exactly what data I'm talking about. And that's because we're in this

about. And that's because we're in this workspace.default.

workspace.default.

I just wanted to see if it would be able to pick up on it, but it's not doing that. Let's click over here. Let's go to

that. Let's click over here. Let's go to our samples. And for our schema, let's

our samples. And for our schema, let's come down here to New York City. I'm

going to accept this really quick because I want to demonstrate something.

Let's go ahead and try to run this. It's

not going to work. And that's okay. But

what we can do is we can come right down here and we can say let's diagnose this error. So let's click on the diagnose

error. So let's click on the diagnose and the assistant in the top right is now going to be activated. It's going to do this forward/fix. And so now it's going to say okay we need to use the

proper schema and even if we hadn't switched over honestly it would have picked this up. This assistant up here is slightly more contextaware I've noticed than the actual code generation

right down here. So now we can try running this. And we can run this right

running this. And we can run this right over here. And we're going to be able to

over here. And we're going to be able to see our data at the bottom. And I'm

going to click this button in. It's

going to say replace active query content. And so now I'm going to paste

content. And so now I'm going to paste all this code right into this window.

Let's get rid of this for now. And what

we're going to do is I'm going to edit this code. So as you can see, it

this code. So as you can see, it highlights all of our code. And I'm

going to do two things. One, I'm just going to do this forward slash. And

we're going to get some options here.

We're going to say forward sldoc/explain.

We have all these options and it tells you what they do on this right hand side. So we can explain the code,

side. So we can explain the code, improve the formatting, optimize the code, replace parameters, fix errors in our code. There's a lot of different

our code. There's a lot of different options and they're just kind of defaults. And these are things that you

defaults. And these are things that you can use really easily. We just fixed our code, so we're not going to do that right now. But what I'm going to ask it

right now. But what I'm going to ask it to do is do something totally different.

Let's actually run our code really quick because I want you to be able to see the data in here. But we have two really useful columns, a pickup time and a drop off time. And I'm going to ask it to

off time. And I'm going to ask it to calculate the difference between this and give us the average of those times.

What's the average length of ride? It

says the trip distance, but it doesn't tell us how long the trip took. So, I'm

going to go into the code. I'm going to say I want the average time it took between

pickup and drop off. Now, this is a little bit more difficult. This is more difficult than anything we did in Genie.

Let's see if it's able to write this code correctly. Now, we can see what

code correctly. Now, we can see what it's getting rid of with the red lines and with the green lines. We can see what it's actually keeping. We can

accept this either by clicking accept or clicking tab. I'm going to go ahead and

clicking tab. I'm going to go ahead and click tab. And basically, what it's

click tab. And basically, what it's doing is it's taking the drop off time and it's taking the pickup time. It's

subtracting it and then it's taking the average. And so, we're just going to go

average. And so, we're just going to go ahead and run this.

And in seconds, that works perfect. But

I don't want it in seconds. I want it in minutes. So I'm going to edit this. I'm

minutes. So I'm going to edit this. I'm

going to say I want the average to be in minutes, not seconds.

It's going to rewrite this and it's going to divide it by 60. And that's

really uh all it needs. Let's go ahead and tap this. Let's run our code. And

now we have average trip duration in minutes. It's around 15 minutes. Now,

minutes. It's around 15 minutes. Now,

this is really great, but somebody who doesn't know what this is might be very confused. So, I'm going to come up here

confused. So, I'm going to come up here and I'm going to use one of these forward slashes and I'm going to say add comments to code. So, I'm just going to hit enter. It's going to start

hit enter. It's going to start commenting on this code and I can click accept. It's just going to say

accept. It's just going to say calculates the average trip duration in minutes by subtracting pickup from drop off timestamps. Very simple, very

off timestamps. Very simple, very straightforward, but it is really helpful for somebody who's going to be coming behind you and reading this code.

Now, as I mentioned before, this is for coding, but if you have other questions about the data, you can also come over here. So, I'm just going to say, tell me

here. So, I'm just going to say, tell me a bit about this data. Now, this is just kind of a really simple overview of the columns and kind of what's in it. It's

pretty straightforward, but now let's ask it. We're going to say, give me a

ask it. We're going to say, give me a simple data dictionary for each column. I could have said for the whole data set. I don't know why I

said column, but let's see what it gives us.

And just like that, it is going to give us a data dictionary, basically a description for each column. And this is really useful. It reads in the data and

really useful. It reads in the data and just gives you a little bit of information about what is in that data as well as the data type and the column name. So this is really useful. Now, as

name. So this is really useful. Now, as

I mentioned before, this isn't only in the SQL editor. We can also do this within a notebook. So let's come over here and take a look at the assistant within a notebook. You can see we have

this generate. It's going to look very

this generate. It's going to look very similar as it did before in the SQL editor. Now it can do a lot of similar

editor. Now it can do a lot of similar things. It's going to generate code but

things. It's going to generate code but it can do it in different languages now.

So we can do markdown Python SQL and R.

By default though it is in Python. So

let's come over here. Let's go ahead and ask it to give us the sample New York City taxi

data and just see what it comes up with.

It looks like this one is going to be perfect. Let's go ahead and run this and

perfect. Let's go ahead and run this and it's going to give us exactly what we want. We have our data right here. Now,

want. We have our data right here. Now,

one of the things that I want to highlight is that what you can do cuz when you're working within these notebooks, you can use whatever language you'd like. You can convert it to other

you'd like. You can convert it to other languages. So let's come up here to our

languages. So let's come up here to our code. I'm going to say can you convert

code. I'm going to say can you convert this to SQL. It's going to take this which was a dataf frame a spark.sql

dataf frame and it displayed it. It's

now converting it to SQL right here. And

then we're reading in just the SQL.

Let's go ahead and tab this and we'll run it. And we should get the exact same

run it. And we should get the exact same output. So that's one of the things that

output. So that's one of the things that you can do in notebooks that of course you're not going to do in a SQL editor because you're just using SQL. So that's

how the assistant works within a SQL editor as well as a notebook. But let's

head over here to our dashboards. Let's

go and create a new dashboard. And what

we're going to do is we're going to pull in that data that we were using before in our samples. So let's go to our catalog. Let's go to our samples New

catalog. Let's go to our samples New York City. And we'll click on this. And

York City. And we'll click on this. And

we're going to say add to dashboard.

So now we have this data right up here.

This is our trips data. And what we're going to do is come right over here and we're going to add a visualization. The

first thing that's going to come up right here is the assistant. It said ask the assistant to create a chart or to create anything that we want. And so

let's ask it to create something really simple. I wanted to create a card with

simple. I wanted to create a card with the average fair amount. And let's go ahead and run this. So, I just wanted to show us the average price for one of

their fairs for the taxi. And there we go. So, super easy, super simple. Now,

go. So, super easy, super simple. Now,

let's ask it a little bit more difficult question. Create a line chart

question. Create a line chart with the count of pickup times.

And let's go ahead and run this. Now, it

did do this. And what's nice is we have it right over here. It did this based off of the month. I'm going to change this manually to the day so that we can

see a little bit easier what days have a lot or have a little. And we're just doing a simple count on the pickup dates. That's it. Now, with both of

dates. That's it. Now, with both of these, I didn't click the accept button, but once you do, once you click accept, and you can of course uh change it, it is going to stay there like you just created it as a normal visualization.

Now, let's say I want to make a change to this. Let's come up here and say

to this. Let's come up here and say change the daily.

And I need to spell this right. Daily to

hourly. Let's go ahead and run this. And

so there going to be a lot of people they don't know how to come in here or don't want to come in here. They can do this by just writing change the daily to hourly. I'm going to accept this. Now

hourly. I'm going to accept this. Now

this uh looks wild, but let's uh look at it. This is looking correct to me. Take

it. This is looking correct to me. Take

away that global filter. And now we're looking at it hourly. And so we have a very very accurate and specific view of hourby hour the pickup times. And so

this is how we can create different visualizations by just asking it. We can

of course come in here because it's still using the exact things that we would use to create these. And so if we want to add labels, I'm going to say add labels. I wouldn't on this data set.

labels. I wouldn't on this data set.

That's going to look terrible. Um but

it's going to turn on the labels for you. And this uh looks horrible, but it

you. And this uh looks horrible, but it did exactly what I requested. uh it's

too granular to add labels. We would

have to change this to uh probably weekly maybe um and add the labels for it to even make somewhat sense. There we

go. So, this looks a lot better. But we

can do a lot of the things that we can manually do just using this. Now, if you already know how to create these visualizations just by memory, you won't need to use this all the time. But let's

say you just can't figure out how to do it yourself, you can always ask. It is

there to assist if you need it. You

don't have to use it, but it can be useful if you're like, I just can't figure out how to create this. Let me

just ask the assistant and see how they would create it. As you can see, there are a ton of ways that you can use AI and data bricks. You can use the genie, you can use in the SQL editor with the assistant. We can also use this

assistant. We can also use this assistant up here, which gives a little larger context, and we can use it in the notebooks. So, it's integrated into a

notebooks. So, it's integrated into a lot of different places that you would use as an analyst or a scientist or a data engineer. Now, within this entire

data engineer. Now, within this entire data brick series, we've learned a lot of things and we're going to apply all of that into our final project. If

you've never done one of my projects on this channel before, they can be a little bit long because I leave everything in. I don't cut out a lot of

everything in. I don't cut out a lot of stuff. So, if I make mistakes, you're

stuff. So, if I make mistakes, you're going to see it so you can learn from it as well. I will leave a link in the

as well. I will leave a link in the description so you can go to the data bricks free edition and create an account so you can follow along with this project. I will also have the data

this project. I will also have the data set that we're going to be using in a GitHub. So all you have to do is go and

GitHub. So all you have to do is go and download that data set and you can import it and use it just like I'm doing. We are going to be working with

doing. We are going to be working with real raw data. So we are going to encounter some issues and some things that we'll have to clean up along the way. But we're going to learn a lot and

way. But we're going to learn a lot and this is what we're going to end up with at the final project. We're creating

this United States emissions breakdown dashboard. It's going to be an awesome

dashboard. It's going to be an awesome project. So I cannot wait to do this

project. So I cannot wait to do this with you. Let's go ahead and jump onto

with you. Let's go ahead and jump onto my screen and get started. Now before we jump into data bricks and actually start building out our project, I want to take a look at the data because there's a lot of data here. And when I say a lot, I

mean, you know, there's only 3,000 rows, but there's a ton of different fields.

So, if we start scrolling over, we're going to see there's so many different things that we can work on. And I don't want to focus on everything because it's going to be impossible to get everything in there. I have highlighted with yellow

in there. I have highlighted with yellow the ones that we're going to be working with. This is the emissions uh for

with. This is the emissions uh for megatons of CO2. We have our population, latitude, longitude, the county name, county, state name, and state abbreviation. We're gonna be focusing on

abbreviation. We're gonna be focusing on emissions, although there's so many other things we could look at in this data set, but we just don't have that much time and we don't have, you know,

days and weeks to comb through this and build everything out for, you know, all the different scenarios and dashboards that we could build out. So, this is the data set that we're going to be working with. Again, you can get that in the

with. Again, you can get that in the GitHub. Let's get out of this. And what

GitHub. Let's get out of this. And what

we're going to do is I'm right here. We

are in our data free edition account.

I'm gonna go ahead and I'm going to create a new catalog. So, I'm going to call this one I'm gonna say this is our emissions uh catalog. I'm going to go ahead and create this. And now we're going to have let's get out of this.

We're going to have our emissions and I'm going to go under my default and I'm going to come right over here and I'm going to create a table. So, we're going to drop this entire thing into this.

Let's go ahead and browse and we're going to come up here to emissions data 2023. Now, this data is large. may take

2023. Now, this data is large. may take

just a second uh to bring it all in. Um

but you know, it's not like 10 gigabytes or anything. So, we should be able to

or anything. So, we should be able to create this quite well. I'm going to rename this to just emissions data because I think it looks cleaner. You

can keep the 2023 if you would like, but I'm just going to take uh emissions data.

All right. So, our data is in here. This

all looks good. We are not going to transform any of this. Although we may need to transform some of it, right? But

it's going to autodetect all of our data types. So, you know, if we need to

types. So, you know, if we need to change something, we're going to do that after the fact. We're going to take it as it is with the raw data. Now, all we have to do is come down here to create

table, and it's going to create our emissions data table for us. Now, it's

going to give us some of this AI suggested description. This looks good

suggested description. This looks good to me. I'm going to go ahead and just

to me. I'm going to go ahead and just accept this just to have it in this data set, which was uh nice to have. Now,

what I'm going to do is just give you the scenario, right? You were hired by the EPA to create a dashboard, and they want to focus specifically on emissions, but they want a breakdown of things like where it's actually coming from in the

United States, where the most emissions are coming from. They want to see what states and counties are emitting the most emissions as well as just in general as a population, how much emissions do we have per person per

area. So, these are things that we're

area. So, these are things that we're going to have to dive into. We're going

to write some SQL queries in order to do this. Let's come over here and let's go

this. Let's come over here and let's go to our SQL editor.

Now, this is from a previous lesson. I'm

going to go ahead and get rid of all these previous queries and we're going to get started just with a fresh new query. And we will do this by selecting

query. And we will do this by selecting emissions and going to default. And it

should be our emissions data. Let's go ahead and run this. Now,

data. Let's go ahead and run this. Now,

as our dashboards, that's going to be kind of our final product that we're going to hand off. I want to start with our location data because I personally am super interested in this. We of

course have the state name and the county name, but it's not going to be as specific as something like latitude and longitude. And so I want to use this

longitude. And so I want to use this data along with our emissions data. And

let's come right over here and find that. It's right here. So it's GHG

that. It's right here. So it's GHG emissions m tons CO2 E. It's a long name, but that is the column that we want. So let's come right here and

want. So let's come right here and instead of getting all of or taking all the data, I'm going to do uh latitude.

I'm going to do tab to autocomplete.

We're going to do longitude.

and let's see if it can find the CO2.

There it is. So, I'm going to go ahead and hit tab. And I'm just going to call this as emissions. I feel like that's just going to be easier. Uh let's go

ahead and run this. And there we go.

Now, in terms of actually analyzing this data, latitude and longitude tends to be a little bit more difficult to work within something like a county or a state because it is so specific. So, you

can't really aggregate on it. It's more

of a visual thing. So, let's actually just take this over. I'm going to copy this. I'm just going to bring this over

this. I'm just going to bring this over to my dashboard. And we're going to create our new dashboard.

Uh, let's rename it really quickly before we get into data. I'm going to call this our emissions dashboard. As we build things out, we

dashboard. As we build things out, we will then come and add to it. We don't

have to do it that way. We could get all the data up front and then uh go from there. But, I'm going to create from

there. But, I'm going to create from SQL. And I'm going to run this. And we

SQL. And I'm going to run this. And we

actually need to specify where this is coming from. So let me get rid of this.

coming from. So let me get rid of this.

I'm going to call this uh emissions default emissions data. So I was just tabbing there. It got our catalog, then

tabbing there. It got our catalog, then our schema, and then our table. And I

just tabbed along the way. It makes it, you know, kind of easy to work with.

Let's go ahead and run this. And now we have our data right down here. Now, like

I said, this isn't the easiest data to work with if you don't really know latitude and longitude data, but it's really easy to visualize. So, let's come up here and we are going to add I'm

going to get rid of our filters, but I'm going to add a visualization.

And the data that we'll connect with is apparently called our untitled data set.

Let me change this. So, I'm going to rename this as our location data. We

will use some other location data, but it's like state and county. And so we should be good. Let's use our location data. And what we need is actually a

data. And what we need is actually a map. And we didn't do a map when we were

map. And we didn't do a map when we were creating and building dashboards in a previous lesson. So this is a new one

previous lesson. So this is a new one for us. But let's come right down here.

for us. But let's come right down here.

We're going to go to a point map. Now,

you can see we already have the coordinates available for us. Longitude

and latitude. So this should be really easy. We're just going to do uh for the

easy. We're just going to do uh for the longitude, we'll do longitude. And for

the latitude, believe it or not, this is gonna sound insane. We're gonna do latitude. Now, this is a little tricky

latitude. Now, this is a little tricky to work with. Sometimes it's kind of a double click. So, I'm going to double

double click. So, I'm going to double click in. I'm going to double click in.

click in. I'm going to double click in.

And you can kind of scroll back. There

are some things that we need to customize because this is just super blurry. I'm going to go to the size and

blurry. I'm going to go to the size and I'm going to decrease the size a little bit. I think that makes it a lot better.

bit. I think that makes it a lot better.

We can always, you know, doubleclick and zoom out just a little, just to have enough. I will say if we zoom out more,

enough. I will say if we zoom out more, you'll see some information right here.

I think this is Hawaii. Don't cancel me if I'm wrong about that. Geography is

not my uh best area. Then we have Alaska up here. So maybe we want to include all

up here. So maybe we want to include all of that. Um that would be fine if we do,

of that. Um that would be fine if we do, but I'm gonna come right in here. I

think this is what I want to keep, which is mainland United States. So, we're

going to keep this. This to me is really useful and it makes a lot of sense. This

is some of the most densely populated areas over here in the uh kind of the Mid East and the East. And then right over here based on emissions, right?

Emissions is much lower. Even in

California, there's a lot of people in California and on the West Coast, but not as many emissions. And so, most of our emissions just looking at our, you know, our visualization are over here on

the right hand side kind of centralized.

What is this? Like Ohio. Uh Chicago is in Illinois. It's like Illinois, Ohio

in Illinois. It's like Illinois, Ohio area. I'm not a geography expert. Don't

area. I'm not a geography expert. Don't

hate. I just I don't know geography that well. So, this is really interesting. I

well. So, this is really interesting. I

would not have guessed this, but this is official EPA data. So, this is, you know, pretty useful. Now, real quick, I'm just going to come in here and I'm going to create our title. We're going

to call this the United States uh emissions breakdown. And I need to spell this prop

breakdown. And I need to spell this prop properly. Let's put this in the middle.

properly. Let's put this in the middle.

Let's uh make it bold. And we'll make it larger. Not going to get crazy with it.

larger. Not going to get crazy with it.

Maybe we'll do one more. Let me see.

That's too big. All right. Let's go back to 24. Underneath it though, I'm going

to 24. Underneath it though, I'm going to add like some notes. Uh make it a lot smaller. We'll say this data was

smaller. We'll say this data was collected by the EPA. I'm gonna say uh Environmental.

It's always tough to watch people watch me spell, by the way. I'm not good at environmental uh protection agency. And

then I'm going to say in 2023. I think

this is fine for now. If we want to add something later to it, we can. Or make

it larger. Uh whatever we want to do.

But I'm going to keep it just like this for now. Actually, I think I do want to

for now. Actually, I think I do want to make this bigger or this part at least bigger. It seems like it should be. I

bigger. It seems like it should be. I

feel like it should be bigger. That just

feels better. I don't know why. Uh don't

get on to me. Now, we can add more visualizations to this. And of course, we will. But the first one, this one is

we will. But the first one, this one is probably the easiest one because we're not really digging into the data itself.

We're more just trying to visualize it because latitude longitude data is, you know, pretty specific. Now, let's come back over here to our SQL editor. I'm

going to try using some AI here on this next one because uh the next one's going to be a little bit trickier, I think. Uh

let's just create a new query because I think that's perfectly fine. Uh but I want to make sure it's in the right schema. It is not. Let's go to

schema. It is not. Let's go to emissions. Let's go to default. In fact,

emissions. Let's go to default. In fact,

I could have just copied this over. Um

honestly, so I'm going to come over here. Let's paste this in here. Now, I'm

here. Let's paste this in here. Now, I'm

going to edit this. I now want to focus on a new thing that we're looking at. I

want to take a look at some data. Now,

I'm going to do this because this is what I personally do. I'm going to duplicate this and I'm going to bring it back to uh the catalog

and go to emissions, go to default, go to emissions data and look at the sample data. The reason I'm doing this is

data. The reason I'm doing this is because I want to be able to just oneclick over, take a look at my raw data. You can also just create a query

data. You can also just create a query and tab over. Um, but then I don't know, that's not my that's not my workflow. I

like having a different thing for it.

You can do that if you'd like. uh just

do you know select everything from emissions data. Now what we are going to

emissions data. Now what we are going to do is we're going to use the AI and what we're going to be looking at is I want to take a look at this county name as well as the population. So for each

county so this is in Alabama. This is

the county state name in Alabama. I want

to look for this county and this population. Let's take a look at our

population. Let's take a look at our emissions. But more specifically I want

emissions. But more specifically I want to take a look at the emissions per person. So, we're going to have to do a

person. So, we're going to have to do a calculation here. So, I'm going to ask

calculation here. So, I'm going to ask the AI. I'm going to say I want to look

the AI. I'm going to say I want to look at the emissions per person in each county. Let's go ahead and generate

county. Let's go ahead and generate this. Let's see if it will get exactly

this. Let's see if it will get exactly what I want. It may or may not. And

let's accept this. I'm just going to hit tab. So, we're casting this as a double.

tab. So, we're casting this as a double.

And then we're casting the population as a double. And then we're saying that's

a double. And then we're saying that's the emissions per person. Now, this

theoretically should work, right? But

let's go over to our GHG emissions right here. Now, here's the thing, and this is

here. Now, here's the thing, and this is not, you know, a data brick specific thing, but this happens all the time in almost any platform is this is a number.

This is a 100% a number has a comma there. It should be rented as a number.

there. It should be rented as a number.

But you can see right here I can just tell you right now this query is going to fail um because it needs to be changed. Let's go ahead and try to run

changed. Let's go ahead and try to run this.

Now the reason why this is going to fail is because this is a string, right? We

have letters, we have characters that are not numeric in this column and that is an issue. And so what we need to do is we need to convert this or we can

also in here ask it to diagnose this error. I'm going to see if the assistant

error. I'm going to see if the assistant can do it for us because if it can, then that would be fantastic. Let's see if it's able to do this. This is kind of a specific fix. It needs to look into the

specific fix. It needs to look into the data.

And it looks like it got it perfect. It

says you need to remove the thousands separator before you cast it. This looks

excellent. Let's actually uh replace this. Let's get rid of that. And now

this. Let's get rid of that. And now

let's try running this. And wait a second. We have latitude and longitude

second. We have latitude and longitude here.

Let's go back because I didn't want latitude and longitude in the first place. I was just looking at this and I

place. I was just looking at this and I got mixed up. Um, okay, that's fine.

We'll keep this. But I am before we even run this, I am going to say I don't want latitude. Did I spell it right? And

latitude. Did I spell it right? And

longitude. I want the county and the population for those columns. I'm just using AI here. I

columns. I'm just using AI here. I

normally I could just fix this myself. I

just want to test it because I think it's, you know, an interesting thing.

All right, let's accept this and let's go ahead and try running it and see what happens. All right, we're running into

happens. All right, we're running into some issues. And I'm just going to tell

some issues. And I'm just going to tell you that's going to happen and that's okay. Let's take a look. One, we I don't

okay. Let's take a look. One, we I don't think we even have a county column.

Let's go right over here. We have a county name column, which actually should be fine since we're not grouping on anything, but I actually want this column right here. So, I'm going to copy

this as the column name and we'll come right back here. Population should be correct. And then we're working with a

correct. And then we're working with a replace. I believe that we need back

replace. I believe that we need back ticks for this and not brackets.

Let's just go ahead and try that. And

let's run it. All right. Now, this is working. Listen, AI is not perfect. Uh

working. Listen, AI is not perfect. Uh

sometimes we got to step in and do what we do. So, this looks correct to me.

we do. So, this looks correct to me.

Let's take a look just really quick because we are breaking it down by the county state name. Um, which is what I would do. I wouldn't want to use this

would do. I wouldn't want to use this county name because what if we have another county name that's Baldwin County, right? And when we start trying

County, right? And when we start trying to visualize this data, then that could be issues because it's going to probably try to group that data and it just it'll cause issues.

Now, what we're going to do with this is we're actually going to look at the top 10 emissions uh based off of probably the emissions per person. I think that's all we should do. So, I'm just going to

add it this time. I'm not going to ask the AI. So, I'm just going to say order

the AI. So, I'm just going to say order by and then I'm going to say emissions per person and we'll do descending. And

this should be really Actually, I should have done I should limit it by like 10 or something. I'm going to say limit 10.

or something. I'm going to say limit 10.

There we go. Now, let's run this. And

I'm super interested. This is real data.

So, you know, I'm really curious. So,

emissions per person, the population is only 5,000, but they have a lot of emissions. That's in New England. This

emissions. That's in New England. This

is in North Dakota. New England. North

Dakota. Very interesting. So, based off the emissions per person, these are the places. And this is not what I would

places. And this is not what I would have guessed. I would have guessed like

have guessed. I would have guessed like New York or, you know, I don't know, North Carolina or something like that.

But these are we got New England and North Dakota. And then I think MMO is

North Dakota. And then I think MMO is Missouri.

Don't quote me on that. But this looks good to me. Um, this query was a little bit more challenging than our last one.

I think you would agree. Let's bring

this over. We're going to go back to our dashboard. I could also make this

dashboard. I could also make this another tab. Um, and it looks like that

another tab. Um, and it looks like that reset. We'll do that at the end. Um,

reset. We'll do that at the end. Um,

we'll just place that how we want it at the end. Now, what we actually need to

the end. Now, what we actually need to do is put this in our data. So, I'm

going to come in here. I'm going to place this in there. Again, it's not um it's right here. The emissions, we can change this. So, we don't have to um we

change this. So, we don't have to um we don't have to do that. Let's go ahead and run this.

And our data is working. Even though

it's underlined, it's still reading it in fine. We do have it uh correctly

in fine. We do have it uh correctly done. We can always like we did before

done. We can always like we did before uh do emissions and it shouldn't say emissions data emissions.default

emissions.default emissions data just to get rid of those red lines. We can do that. Now we have

red lines. We can do that. Now we have our data down here. This is really useful. We don't necessarily have to

useful. We don't necessarily have to even use this column, but we do have the population for emissions. And uh let's get rid of our limit because I want to

make a specific type of visualization with this. It's a scatter plot. Scatter

with this. It's a scatter plot. Scatter

plots are really great because we're going to be able to use the population and the emissions per person on a scatter plot and kind of take a look.

Hey, as the population increases, does the emissions per person increase as well or does it decrease? A scatter plot would be great for this. Let's call this uh let's do emissions

per person. That's what we're going to

per person. That's what we're going to call this one.

Let's come over here and we're going to come in and we're going to choose our emissions per person. Now on our x-axis we can choose the emissions per person.

We can switch this around, see which one works better. Then we'll also choose the

works better. Then we'll also choose the population. Let's scroll down a little

population. Let's scroll down a little bit. Now we need to change this from a

bit. Now we need to change this from a bar chart cuz that does not look right.

We're going to change this into a scatter plot. Now for the scatter plot,

scatter plot. Now for the scatter plot, we want the raw data. We don't want any aggregation. So let's say none here. And

aggregation. So let's say none here. And

for the sum of population, none as well.

That should be good. We can also really quick change this size because it's a little blurry. I want to see a little

little blurry. I want to see a little bit more granular of what we're looking at. We have this one up here. Uh this

at. We have this one up here. Uh this

emissions per person is 0.5. Okay, so

that's low, but the population is 10 million. I'm curious where this actually

million. I'm curious where this actually is. Let's come down here to the tool

is. Let's come down here to the tool tip. Let's it add in the county state

tip. Let's it add in the county state name. So now when I hover over it and

name. So now when I hover over it and it's so small. Um now when I hover over it, this is Los Angeles County. Um this

is actually really good. There is a very low emissions for that amount of people.

And you can see that because it's over on this left hand side. If we had something um over here, this is not going to be good. Uh this is one of our New England ones. They have emissions

per person of 5.95 and that's in uh I think megat tons. So that's a lot of emissions. Their population is 5.16,000.

emissions. Their population is 5.16,000.

That's not a lot. That's a little over 5,000. They have a lot of emissions. So

5,000. They have a lot of emissions. So

as you can see, the areas that actually have a higher population tend to have less emissions. Um, overall, I mean,

less emissions. Um, overall, I mean, these are Cook County, Harris County in Texas, uh, Maricopa County in Arizona,

but a lot of these ones are actually fairly low per person. In the areas that don't have a lot of people, they actually tend to looks like they have higher emissions for the lower

population.

I mean, some, of course, are over here with very low emissions per person. It's

just really interesting. I'm I'm kind of, you know, I'm kind of looking into this as we go. Let's add this title.

We'll say emissions per person. We'll

say emissions verse population.

And uh we could add some type of description. We don't have to, but I'm

description. We don't have to, but I'm just going to say higher populations tend to have lower emissions per person.

Um, and you know, we could go more in depth and add more notes. That would be useful, but we're not going to. But I

think that's really interesting. Uh, I'm

kind of fascinated by that myself. So,

we have this emissions verse population one. Really quick, I'm going to add a

one. Really quick, I'm going to add a title to this one. I'm going to say emissions per location. I'm just going to keep it

per location. I'm just going to keep it like that. I think that's uh perfectly

like that. I think that's uh perfectly fine.

Uh, let's scroll down really quick.

We're going to add in some more visualizations. I'm going to just put

visualizations. I'm going to just put these here so that we can scroll down a little easier. Um, but now we have these

little easier. Um, but now we have these ones in here with us. Let's come back to our SQL editor and let's add a new tab.

So, I'm going to say a new query here.

Now, what I want to look at is the total emissions by state because I want to see what states are just doing the worst.

They have so many emissions. They're the

top 10 worst states in America. I

wouldn't say that. That's actually

that's a bit extreme. We will need to reuse this really quick. Um, we're going to have to reuse this replace. I'll just

copy this over. In fact, I feel like I can write this quite quickly. If we were in the emissions default here, um, what we need to do is we need to come over

here and let's see. So, I don't want to have to take this and break it out by the state over here. We have the state abbreviation. So, let's use it. Let's

abbreviation. So, let's use it. Let's

copy this column name. Let's bring it back. And all we're going to do is just

back. And all we're going to do is just keep the state abbreviation. We don't

need to do any of these calculations. Uh

we should be able to get rid of all of that. And let me see what I'm doing

that. And let me see what I'm doing wrong here. Okay, it looks like I have

wrong here. Okay, it looks like I have an extra parenthesis. Then we're just going to say total emissions.

That should be fine. Then we need to group by. So we're going to say group by

group by. So we're going to say group by because if we look at our data, we have a lot in Alabama, right? We have a lot of counties in each state. So, we have to group on this state abbreviation. And

then I'm not going to go over to it, but then we're going to group and do our sum right here. Right? And we're have to

right here. Right? And we're have to actually do the sum in just a second.

So, we're going to say group by the state abbreviation.

And there we go. And we'll take this.

And then right up here, we just need to do the sum. So, I actually did need that extra parenthesis. I got rid of it. I

extra parenthesis. I got rid of it. I

was I was so shortsighted. I got too excited. And we'll keep the top 10

excited. And we'll keep the top 10 actually. Um because I know that when we

actually. Um because I know that when we start visualizing this, we're not going to need uh all of them. Uh although it would be, let me just comment this out

for a second because I am curious to look at all the states and see which one has the um highest. So the total emissions is Texas, Florida, Ohio, Illinois, Georgia. You know, that's a

Illinois, Georgia. You know, that's a lot of the south and southeast and and not a lot from uh the west coast. Let's

come down here. We have Vermont, Richmond, AK, is that Arkansas, uh DC, me, Maine. So, actually some of the

me, Maine. So, actually some of the Northeast is actually quite good with New Hampshire. Uh, but then the West

New Hampshire. Uh, but then the West Coast and like kind of the Midwest tends to be really low emissions probably because their populations are lower, but

I guess maybe not because of what we looked at earlier. Anyways, let's limit this by 10. And then I'm going to copy this over. Yeah, we'll go back to our

this over. Yeah, we'll go back to our dashboards. We go back to the emissions

dashboards. We go back to the emissions dashboard.

Let's scroll down.

And actually, we got to go to the data first. Let's create this and we'll run

first. Let's create this and we'll run this. I need to have our emissions and

this. I need to have our emissions and our default.

That should be good. Let's run this data. And then I'm going to rename this.

data. And then I'm going to rename this.

I'm going to say uh total emissions per state.

That should be good. So now we have these. These are our top 10. What I

these. These are our top 10. What I

actually want to look at this for is percentage-wise. We're going to have to

percentage-wise. We're going to have to do some other analysis because I do want to add something that you can't necessarily visualize. Um, we're just

necessarily visualize. Um, we're just going to have to dig into that in the data really quickly and perform a little calculation. It's not nothing crazy. Um,

calculation. It's not nothing crazy. Um,

but we may use AI for it. We'll see if it can uh it can work for us. We'll do

totally emissions by state and we're going to do a pie chart right here. Now,

for the angle, all we have to do is the total emissions, but we want to break it out our color. We want to break it out by the state abbreviations. Uh, we

definitely need to add the labels here.

Let's add our labels. There we go. Um,

and it has it over there. So, we have this little legend right down here. We

have Texas. That's 20% in the top 10.

That's in our top 10. These are the total emissions. If we tried to add all

total emissions. If we tried to add all of them, which you know what? Let's just

go try it. We can get rid of these 10.

Uh we can do that. Let's run this. We'll

have all of our data.

Let's go back to our dashboard.

And now we have all of them. So, it

looks like Texas is 10%. But that is a lot, right? We have a lot of different

lot, right? We have a lot of different colors. And it's really hard to see uh

colors. And it's really hard to see uh almost anything if I'm being honest.

If we want to, and maybe this is a good uh thing to do, is we can add a filter right here. And I haven't even I haven't

right here. And I haven't even I haven't done this at all in this video yet, but I'm just going to say um maybe a range

slider might be good. Uh for the field, we can do let's see total emissions per state. We'll look at the total

state. We'll look at the total emissions. And let's see if I can just

emissions. And let's see if I can just kind of scroll up and do something like this. That might

work. But then again, it's then, you know, only doing the calculation or the percentage based off of that. So, we're

just reinventing the wheel here. I'm

just demonstrating it can be done. Um,

but I don't think we need this. All that

being said, we're actually going to delete that.

But that's okay. Let's go back. We're

going to add the limit 10. Listen, I'm

here for education purposes, right? I

want you to know what you can do. If you

want to do that, you can. Um, but we're not going to do it. Let's go back here.

Now, we have this. This looks good, but I don't like uh this title, actually.

Let's get right here. I'm just going to say

right here. I'm just going to say emissions percentage or something like that is fine. Feel like that's good enough for

fine. Feel like that's good enough for like a title even. We can and I think we should come in here and add some type of description, some information here. So,

I'm just going to say the top 10 states account for x amount of emissions for the whole for all of the US. Now, I'm

saying x amount because we don't have that information yet. We actually need to write that query. Let's go back to our SQL editor and let's dig into this.

I am going to try to use this assistant because I feel like it might be able to understand um what I'm doing. I'm going to paste this in and I'm just going to say I want

to know what percentage me I need to write this right but what percentage

of emissions are the top 10 states for the whole country.

And let's see if it understands that. I

don't know if I even wrote that. Well,

um, but let's see if it did with it. If

it does, uh, this is a CTE it's writing.

I'm just going to see what it's writing.

It's writing a CTE. It's calculating the top 10 and then it's selecting it's selecting the sum of the total emissions.

Okay, this could work. Let me take this query real quick and I'm actually going to open up a new tab or a new query.

And let's make sure. Oops. I need to get all this.

Oh, it's doing it for a notebook. That's

all right. Um, I'm going to get rid of that because I'm going to come down here to emissions to default. Let's paste

this in here. And this may work. Let's

run this.

Okay. So, this is the top 10 emissions.

This is the top 10 percentage. I believe

this is correct. Um, you can validate this, but what it's doing is we don't need this SQL here. What it's doing is it's creating a CTE, calling it the top 10, then we're selecting the state

abbreviations. We have this query. It's

abbreviations. We have this query. It's

taking the total or the sum for the top 10. And then we're using that later on.

10. And then we're using that later on.

So now we're doing select and this is the sum of the total emissions from up here. And then we're taking the sum of

here. And then we're taking the sum of the total emissions divided by this number right here, which is all of them.

Now, wait a second. I may be wrong on this because we're only pulling from the top 10. So, maybe it's just looking at

top 10. So, maybe it's just looking at the total emissions from up here as top 10 emissions. It's taking the sum of the total emissions and then

selecting the sum from emissions data.

No, you know what? No, it is taking it from it's doing a subquery in here. I

missed that. So, this subquery is actually selecting all of the emissions from the emissions data. This is great.

I'm going to say it accounts for 51%.

This is uh great job, AI. I'm glad it uh did a good job. I mean, I I I think I am going to add uh make a duplicate of this. I just that's how I that's my

this. I just that's how I that's my personal workflow. Okay. So, let's come

personal workflow. Okay. So, let's come in here and we're going to say that this accounts for the top 10 account for 51%.

Uh, let's make sure I spell this right.

The top 10 states, I'm actually going to say these 10 states. These 10 states account for 51%

states. These 10 states account for 51% of all emissions.

I'm going to say of What am I What am I doing of all

emissions in the US? I'm just I'm having a tough time writing.

That is a useful statistic.

That is a really useful statistic. So, I

think this one is done. I think this is a really good visualization. Look, pie

charts have its place. Percentages like

this, I think, are really useful. Um,

let's come back. Let's go to our SQL editor. I'm going to have that over

editor. I'm going to have that over here. So, I'm going to have my SQL

here. So, I'm going to have my SQL editor, emissions dashboard, and the data. Now, what I want to do is I want

data. Now, what I want to do is I want to name and shame a little bit. Okay. I

want to take a look at the county state name. What specific county? And we kind

name. What specific county? And we kind of looked at this earlier when we were getting our population data because we had to ground it with a group by something, but now we're actually using

the county state name. We're going to take a look at the top 10 counties within the US and we want to create a bar chart for this. So, let's uh use

this and come right back here. Uh this

was by far the hardest query uh that we've written today. AI did well. Um we

you know we gave it a lot of the context, but I think it did a good job.

Let's bring in some of our previous like this was a really simple um thing here. Actually, let's use this

because it has the uh a lot of the information we're already wanting. Now,

all we're doing is we're just looking at the total emissions, not per person. So,

let's get rid of this. We're literally just looking at it

this. We're literally just looking at it like this. So, we have the county, state

like this. So, we have the county, state name. Let's go to emissions default. Um,

name. Let's go to emissions default. Um,

we have the county state name from emissions data. And we don't have to do

emissions data. And we don't have to do this. We need to do it.

this. We need to do it.

Let's close this parenthesis and call this as um total emissions. And then we have to use this alias for this part to

order by it down here. We don't have to technically. We could just copy this

technically. We could just copy this whole cast uh and replace, but we're not going to. So now we want to look at the

going to. So now we want to look at the top 10. So this should work. Now we have

top 10. So this should work. Now we have the county state name in Arizona, Texas, Illinois, Florida. Geez, just so many so

Illinois, Florida. Geez, just so many so many emissions. Uh, you know, think of

many emissions. Uh, you know, think of think of the world. Think of think of your country. Let's come in here. Uh,

your country. Let's come in here. Uh,

no, we need to go back to the data.

Let's create from SQL. And we're going to put this in here. Let's go back to our emissions. Let's run this. This

our emissions. Let's run this. This

should be good. Awesome. So now we have the highest total emissions right here.

I mean, listen, even though Los Angeles County is in the top 10, it's because their population is four times as much as some of these. I mean, I just it's it's really interesting. That's why we

did the emissions per person earlier because there are outliers and, you know, the total population does matter.

Let's come in here and we're just going to rename this uh county shaming. All

right, that's what we're going to call it because that's what we're doing.

Let's be honest. All right, we're just going to make a bar chart. We're going

to go to the county shaming. We're going

to use a bar chart for our xaxis. Uh, let's use total emissions. Then for our yaxis, we can

emissions. Then for our yaxis, we can use the county state name. Um, and I do want to look at that from highest to lowest. So, let's do it like that. We

lowest. So, let's do it like that. We

can change the coloring if you want. I

definitely want tool tips or not tool tips, sorry, labels. I definitely want labels on this. Uh, this is just our total emissions in like megatons. So,

I'm going to say that I want to add a title. I'm going to say uh total

title. I'm going to say uh total emissions.

I got to spell right. Emissions by m ton m ton I think of CO2.

Uh, and I want to make sure that's correct because listen, this dashboard, I'm going to send this to the EPA right away. Uh, so we have m tons of CO2.

away. Uh, so we have m tons of CO2.

Maybe CO2 E. I don't know if I should be putting that in here. Let's add that.

All right. This looks great. Um, let's

adjust this real quick. I'm actually

going to come over this way. Double

click.

Double click. Come out a little bit. Uh,

this this is something that I just have not gotten the hang of. I don't know if I'm doing something wrong, but let's click in on this and uh have that right there. We need to

bring back our emissions versus population and bring this up.

Now, I do Oh, what did I do here? Oh, I

think I might have clicked on. Yeah, I

zoomed in. Whoops. Uh, don't mind me.

Listen, I make mistakes all the time. I

want to clean this up just a little bit.

Things like the total emissions. I just

want to say total emissions, right? I'm

cleaning up these x and y axes. I'm

going to call this uh the county name.

Maybe I should say county state name, I guess, to be more accurate. That looks

good. Emissions. This one looks good.

Here we need Oops, let me zoom out.

We need to rename this one emissions per person. We're gonna call this

person. We're gonna call this uh emissions.

Oops. Did I spell that right? Emissions

per person. That I does not look right, but maybe it is. And then I'm I'm going to change this population to be capitalized. I just I think it looks

capitalized. I just I think it looks better. Population.

better. Population.

And uh let's see. Emissions per

location. Emissions versus population. I

actually should say emissions.

Where's the title here? Oh, I got to do it up here. I'm going to say emissions for continental US. Did I spell that right? No.

US. Did I spell that right? No.

Continental US. My spelling is so terrible. But we're only looking at the

terrible. But we're only looking at the continental US here, so that might just be worth noting. Um, we are done.

We are done here. We have our United States emissions breakdown and I think this looks great. Uh, we dug into some of the data a little bit deeper than we probably needed to, but I think this was

great and we're able to use AI and build out our dashboard. I am super happy with how this turned out. Remember, this is a real data set. There's so much other data in here. So if you want to, you can

use the assistant in here and say, "Hey, what other things can I look at for emissions and it'll give you suggestions." But this is what I wanted

suggestions." But this is what I wanted to build because I was super interested in it. And emissions is just, you know,

in it. And emissions is just, you know, something that everybody understands.

People are curious about it. And so

visualizing it, being able to see it and kind of shame some of these counties that just got so many emissions, but it's worth it. It's worth uh worth building it out. So I hope that you enjoyed this. I hope that you found this

enjoyed this. I hope that you found this helpful. Before we go, I want to give a

helpful. Before we go, I want to give a huge shout out to the sponsor of this entire series and that was Data Bricks.

Data Bricks has been so awesome to work with because I already love Data Bricks.

It was an easy fit. And what's even better is they have this Data Bricks free edition. If you have not already,

free edition. If you have not already, if you just watched through this without following along, create a Data Bricks free edition account. Go ahead and do that. I will leave a link in the

that. I will leave a link in the description so you can build something like this completely for free. It is an amazing amazing deal. With that being said, thank you guys so much for following along through this entire project. I had a lot of fun. I hope you

project. I had a lot of fun. I hope you did too. If you have not already, be

did too. If you have not already, be sure to like and subscribe and I will see you in the next video.

Loading...

Loading video analysis...