LongCut logo

Data Analysis With AI In 21 Minutes

By Tina Huang

Summary

## Key takeaways - **AI aids human coordination and reduces errors.**: AI can help bridge communication gaps and minimize mistakes in collaborative tasks by summarizing discussions and acting as a verification layer for human input. [01:35], [03:23] - **Use AI for tedious tasks, not creative ones.**: AI excels at automating repetitive and mundane data cleaning and visualization tasks, freeing up humans for more complex problem-solving and strategic thinking. [02:09], [04:16] - **The DIG framework guides AI data analysis.**: Approach AI data analysis using the DIG framework: Describe to understand the data, Introspect to uncover patterns and potential issues, and Goal-set to define clear objectives for the analysis. [07:31], [10:15] - **AI enables complex data filtering beyond traditional tools.**: AI can intelligently filter and analyze data based on nuanced criteria, such as job preferences for location and specific industry skills, which would be extremely difficult with standard tools. [13:45], [14:24] - **AI can automate multimedia analysis and file organization.**: AI can process various media formats, extract frames from videos, apply transformations, and even organize and rename large collections of files within zip archives. [17:05], [18:19] - **Transform AI analyses into downloadable software.**: Complex sequences of AI-driven data analysis steps can be automated by instructing the AI to generate a Python script, which can then be downloaded and run as an executable program. [19:13], [19:49]

Topics Covered

  • The ACHIEVE Framework: When to Use AI for Data Analysis.
  • Don't Skip Steps: The DIG Framework for AI Analysis.
  • AI Filters Data Intelligently Beyond Traditional Tools.
  • AI Automates Analysis and Ensures Reproducibility.
  • Build AI Applications from Analysis, No Code.

Full Transcript

I learned how to do data analysis with

AI for you. I guess we can call it vibe

analyzing. But really though, I took 11

courses on this topic. What can I say? I

used to be a data scientist at Meta. I

love data. I use data every single day.

I'm going to save you the time and money

that I spent buying these courses and

give you the cliffos version of what I

learned. As per usual, there'll be

little quizzes throughout this video.

So, pay attention. All right, let's go.

A portion of this video is sponsored by

LTX2. The outline of today's video is

first I'm going to cover when is it

useful to use AI for data analysis. Then

we'll talk about the dig framework for

how to approach analysis. But of course

to make it all concrete we need some

examples. So I'll then be showing you

lots of examples. And finally I'll

explain how to take this even further

and take your data analysis and build it

out into dashboards or even AI

applications. I do want to make a note

that a lot of the courses and examples

are focused on using chatbt as the tool

for data analysis. But that is not the

case that you have to use chatbt. In

fact, you can switch it off at Gemini

and Claude would work the same way. In

fact, sometimes they would actually work

better. So, don't feel like you need to

be married to a single tool. And I'll

actually call out if there is a tool

that I think would work even better.

Okay, let's start off with when we

should be considering using AI for data

analysis. Well, from the course Chachi

PT advanced data analysis, Dr. Jules

White, the instructor from Vanderbilt

University has an acronym for this

called achieve. He explains that there

are five different areas that is useful

for using AI in data analysis. Aiding

human coordination, cutting out tedious

task, help provide a safety net for

humans, inspire better productivity and

problem solving, and enable great ideas

to scale faster, achieve. Aiding human

coordination refers to helping people

work better with each other because

people actually tend to be quite messy,

you know, and there's a lot of

miscommunications. There's a lot of like

back and forth between people. So,

there's a lot of room for improvement

here that AI can help with. Say, for

example, you're in a meeting with a

bunch of people and you have this

meeting transcript. You can actually put

it into AI and say, "Actal assistant.

Read the following meeting transcript

and provide me a summary of the key

points of discussion." This is just an

example. I'm sure you can think of a lot

of other scenarios where there is a

bunch of data that can be analyzed such

that you can provide more clarity for

humans. The second part of the framework

is to cut out tedious task. Just as that

suggests, it's best to let AI to be able

to do things that are very repetitive

and boring for people. For example, say

you're hosting a workshop and you ask

people to sign up for the workshop and

provide different types of information

like what their name is, what their

occupation is, which department that

they're in, what their interests are.

Instead of having to go through this and

manually analyze it, you can tell the AI

that this is the list of people that

registered for my workshop on prompt

engineering and chatbt. Describe the

data in this file. By the way, asking AI

to describe data is is best practices

which we'll cover a little bit later.

But yes, so the AI will be like, okay,

like you know, this file contains this

type of information. It's a CSV file and

it contains like timestamp, name, the

email, the department, um this is a

university workshop, how they're using

chatbt and their tools already, the role

that they hold, etc. You might notice

that people are filling their department

names in a lot of different variations.

So you can actually ask the AI, there

seems to be a lot of overlap between

departments with alternate spellings.

Can you list out all the departments and

then do some intelligent grouping of

them? And then you can ask it to create

a bar chart showing the total number of

registrations per department. This is

the kind of data cleaning and

visualization that is pretty mundane and

AI is able to do this much more quickly.

Third part of the framework is to help

provide a safety net for humans. You see

people often say like oh AI has a lot of

hallucinations and that is true. AI does

hallucinate but people hallucinate too.

People make a lot of mistakes, like some

really really dumb mistakes. I make dumb

mistakes like literally constantly.

Wrote my name wrong on a form yesterday

for example. So that is why having AI as

a backup is actually a really great

idea. Say for example, you're on a

business trip and you need to ask for a

reimbursement. So you need to like

generate some invoice thingy and then

make sure you have all the fields

covered. If you're anything like me, I

am not very detail- oriented. I probably

will make a really dumb mistake. So, as

a safety net, you can actually upload

this invoice into the AI along with the

business expense policy and ask read

each page of the attached business

expense policy and see if the attached

receipt complies with it. So many other

examples of this. Every time you need to

submit like an insurance claim, you need

to like read some sort of document uh

looking at like travel policy whatever

like yeah so many examples of this. The

next part of the framework for when to

use AI for data analysis is the IEV

which is inspire better problem solving

and creativity. This is another thing.

People always feel like AI is going to

make people less creative and is going

to take over creative things, but that

is not the case. It's all about asking

the right questions. Say, for example,

you have a PowerPoint presentation that

is really, really important. You can

actually upload those slides into AI.

Ask them to quickly summarize it and

then ask it to act as a skeptic of

everything I say in this presentation

and find flaws, my assumptions,

assertions, and other key points and

then generate 10 hard questions for me.

This is a way for you to actually force

yourself to think about the questions

that you could potentially be asked and

come up with better ways and better

solutions for answering them. People are

actually very rigid creatures. We tend

to think in a very specific way and it's

very hard for us to actually expand past

that. So using AI as a tool to help us

expand our creativity is is actually

really really helpful. And finally the

last part of the framework for when you

should be using AI for data analysis uh

is the E, which is enable great ideas to

scale faster. Let's go back to that

workshop example. You're doing a

workshop on prompt engineering and you

have people coming from all types of

different backgrounds um who are

interested in all types of different

things and all types of different

levels. After you give the AI the signup

form and analyze all the data about your

participants, you want to create a cheat

sheet for each of them that is most

relevant to them. What you can actually

do is ask the AI to map each attendee

with their specific domain of interest

and then generate a column called ideas

that includes the corresponding

idea/prompt

um to put for their cheat sheet. This

way now your CSV file for each attendee

also contains a very specific prompt

idea that is specific for that attendee.

And then after the workshop, you can

actually send them an email with their

specific little cheat sheet. I'm sure

you can see prior to AI, this would have

been so hard to do if you have more than

just like 10 attendees to be able to

come up with like a custom cheat sheet

for each person. So, whenever you're

thinking about if you should use AI to

do a certain analysis, you can think

back to this acronym. Of course, I

haven't yet covered exactly how it is

that you should be approached to these

analysis, which is what I'm going to be

covering next.

But first, let's do a little quiz.

Please put your answers in the comments.

This portion of the video is sponsored

by LTX2, the new AI video engine for

creative workflows. And this one

honestly blew me away. What stood out to

me isn't just the quality, it's how LTX2

can finally tell a story. And if you

can't guess by my career choice,

storytelling is is what I live for. Most

AI video models just give you short

looping clips. A few seconds that look

great but don't really say much. LTX2

though can generate up to 15 seconds of

continuous video with synchronized audio

which means full monologues.

>> Super villains also have feelings.

>> Dialogues or even short scenes with

music and natural sounds.

[Music]

It's smooth, coherent, and feels

cinematic. The kind of storytelling

range that's missing from AI video

tools. As someone who's been creating

content for over 5 years now, this one

really impresses me. You can build short

narratives, generate B-roll and

snippets, or even cinematic transitions,

all with a single prompt. You can now

try out LTX2 to create your own AI

videos. The link is in the description.

Thank you so much LTX2 for sponsoring

this portion of the video. Now, back to

the video. From the course, ChatBT Plus

Excel, master data, make decisions, and

tell stories. There is a nice little

acronym for how it is that you should be

approaching data analysis using AI

called DIG, which stands for

description, introspection, and goal

setting. By the way, if you do have a

little bit of background in data, like

data science or data analysis, um it's

basically the same as EDA, exploratory

data analysis, but specifically for

using AI. The first step of dig, which

is describe, it's a way for you and the

AI to explore the data together. This

step is very important because it helps

both you and AI gain a familiarity for

the data and to also notice if there's

any issues with that data. This is going

to help a lot with hallucinations and

issues down the road. So very similar to

normal EDA processes, after you upload

like a spreadsheet or whatever data it

is that you give to the AI, like let's

just say like a spreadsheet in this

case, you would ask the AI to list out

the columns in the attached spreadsheet

and show me a sample of the data in each

column. For example, if your spreadsheet

contains data about different roles um

and different salaries, the AI would be

able to output and say, "Here are the

columns for your spreadsheet along with

a sample of the data for each column."

So the column name you could have like

salary ID, job ID, max salary, med

salary which is median salary, min

salary, pay period, currency and

compensation type. You can already

notice that under max salary and min

salary you have nan which is not

available. This is important to note

because hallucinations tend to happen

when you have things like missing data

or incorrectly formatted data. So if you

see something like this, the first thing

you actually want to do is to confirm

that is it just that DAI is not parsing

your data correctly. So you actually

want to go in and see like is it

actually not available or is there like

a parsing problem? And if there actually

is a parsing problem, you want to then

tell the AI, hey, you are parsing this

incorrectly. This is actually how you

should be parsing it. Or maybe the data

is just not available. Then you just

want to make a note of this for the

future. We will get back to that. But

first, you actually want to do a few

more random samples. Just ask your AI

like take a couple more random samples

of the data for each column. Make sure

you understand the format and type of

information in each column. What we're

basically doing here is verifying that

the data is being parsed correctly and

the AI has correct understanding of each

of these columns. You can even ask it

what do you think each of these columns

represent and it might tell you that

salary ID appears to be a unique

identifier for each entry. Job ID is

likely unique identifier for jobs or

position max salary, min salary and med

salary not all entries have complete

salary data etc etc. So this is a way

for you to validate that the AI

understands what's happening also that

you understand what's happening. The

best way to think about this is that

your AI is a very competent but still

very junior developer or data scientist

or data analyst. So you need to make

sure it understands what is the data

that it's actually receiving. Otherwise

any analysis that you do on top of this

could potentially be wrong. So after you

do this, you want to move on to the next

step of the dig framework which is

introspection. This is when you want the

AI to start looking at the data that it

finished describing and think about the

patterns and relationships that exist in

the data. This is also another great way

to catch any misconceptions that the AI

may have. Notices were just being like

very skeptical all the time. Very

important. You can just ask tell me some

interesting questions that could be

answered with this data set and why they

would be interesting. And it might come

with some questions like is there a

relationship between compensation type

for example base salary and variability

in salary ranges. It's saying why it's

interesting is that understanding

whether certain compensation types like

bonuses or equity are more likely

associated with higher salary ranges can

help employees and employers make more

informed decisions about how they

structure pay packages. Here's a

question that came up with that could be

a red flag. The question is, are there

any noticeable patterns in salary data

for different currencies if additional

currencies exist? So, when you see this,

you want to think to yourself, oh, like,

is there actually other currencies that

exist? And you might want to double

check yourself to see if if there's

actually data that is not USD. In this

case, there actually are no other

currencies. So, that's when you want to

tell the AI all currencies are actually

listed in USD. So, it knows that

information moving forward. If you catch

your AI asking questions like these or

you're catch your AI like asking

questions that you know cannot be

answered by the data set and ask you to

generate some more questions that can be

answered from the data set. In this way

you're really helping the AI and

yourself make sure that you really

understand what's happening in the data

set. It's also actually quite a good

exercise because sometimes AI will come

up with different things and different

analyses that you might have not have

thought of doing. I know this might seem

a little tedious and you just want to

skip to like making graphs and charts

and doing analysis, right? But do not

skip these steps, okay? Trust me.

Because like that's the thing with data.

It's one of those things where if you

mess up, it will just propagate

throughout your entire analysis. So it

is really worth the effort to actually

make sure that everything is understood

correctly. And this is actually the same

with humans too. It's not like an AI

problem. Even when I was a data

scientist, I spent a significant amount

of time doing exploratory data analysis,

making sure that I understood exactly

what was happening in data and

clarifying all that information as well

because I knew if I just like rush into

things, I'll probably end up making a

mistake and be very embarrassed when

someone catches my mistake or worst case

scenario get fired because I made a

really dumb mistake and then, you know,

lost the company a lot of money or

something like that. Anyways, so after

you do this, the third step of the

framework is goal setting. It's very

important that your AI understands what

it is that you're trying to achieve.

Like if you just went like analyze this

data, your AI is going to be like what

what the heck like you know what does

that even mean? What am I supposed to

analyze? What's the result? So it's the

same. You got to be like very clear

about what the goal actually is. So you

could tell the AI, my goal is to answer

a couple of these questions, the

questions that you know it it generated

previously and turn them into a really

exciting interesting report to post on

LinkedIn. This really helps provide

context because then your AI is able to

do this analysis and then is able to do

things like give you LinkedIn ideas how

it is that you can put it together in

the form of LinkedIn. This is going to

be very different if you were actually

analyzing this data in order to, you

know, like do something very serious

like generate like a report for your

boss. This is also just part of good

prompt engineering practices. So, if you

do feel like you want to brush up your

prompt engineering a little bit to be

able to be more clear about what it is

that you want, I do recommend that you

check out a video that I have over here

which covers the foundations of how to

do good prompt engineering. So, check it

out over here. Anyways, whenever you

have a data set that you want AI to help

you analyze, it would be really helpful

for you to go through this dig

framework. It is a really great

foundation and you can build on top of

this as well. Now all the stuff that we

talked about earlier uh is pretty

standard for if you're doing any type of

data analysis using like Excel, Python,

SQL or whatever. It's just like maybe

more convenient doing in a

conversational fashion with AI. But

there are certain things that you can do

with AI that would be extremely

difficult for you to do just using these

traditional tools. Like for example, if

you're job hunting right now and you

have access to this data set, you could

be thinking like, oh, like I'm looking

for a job that is between like 50 to

$80,000 based on the East Coast and

specifically works with wood. I don't

know, something like that, right? So in

this data, there is no specific section

that's like works with wood/notwork with

wood, you know, or like materials that

you're working with. And it also doesn't

specify like is it east coast or west

coast. It's just like the location like

Chicago, right? But because of Genai's

capabilities, you're able to like filter

through this data um in a way that's far

more intelligent to be able to find the

rules that you could be potentially

interested in. This would be so hard to

do if you didn't have Genai. And later

on in the video, I have a lot more

examples which I'll show you uh like

Genai specific really cool data analysis

things that you can do. There's also one

more thing that I thought was really

cool in this module of the course, which

is the idea of traceability and

replication. I thought this was like

super clever because a major issue that

people face when doing traditional data

analysis is that they would like come up

with some sort of thing and then it

would be stuck in like a Jupyter

notebook or like whatever and it would

actually be very difficult for other

people to reproduce that analysis. But

with AI, you can actually ask AI to come

up with a traceability document that

allows other people to be able to

perform the same analysis to validate

the results. You can ask let's create a

traceability document to make sure that

others can one know what data was used

two how the analysis was performed and

three threats to validity. We want a

guide for someone else to be able to

replicate and know the limitations of

the analysis. You can you can save this

traceability information as like a

readme.md. Uh don't worry if you don't

know what that is. It's just very common

for software engineers to be able to

store it this way but you can kind of

store it like a word document whatever

doesn't actually matter. And then for

each analysis and visualization, you can

actually ask it to write a single Python

script that performs the full analysis

to produce the visualization and the

results. I think it's a really really

clever idea and really smart thing to do

if you're doing any type of data

analysis using AI.

All right, time for our next little

quiz. Please answer the questions on

screen in the comments. Yay. Okay, we

can move on to some examples. I'm really

excited for this section. First example

is super simple but is actually really

helpful is it's pretty much like any

type of small document you can just

directly upload do dig on it and then

have it proceed to analyze that

information and transform it in whatever

way like if you have structure data like

say you have a CSV form that has like

all the different types of inventory

throughout the past few months you can

ask it to filter it for different types

of inventories you can ask it what are

the trends in the inventories over time

are there certain items that are

becoming more popular certain items that

are becoming less popular maybe you want

to like remove those if there's too much

of that in the inventory. You can even

ask to come up with a predictive model

to see what are the inventories that you

should be stocking for the next few

months so that you're able to optimize

the amount of inventory so you don't

have too much and you also don't have

too little. You can also do

visualizations both static

visualizations. You might want to make a

bar chart that's ranking all the

different types of inventory that you

have. You might want to make a time

series graph over time. What are the

changes inventory? And you can also make

dashboards as well, interactive

dashboards. for this specific one. Cloud

as of the filming of this video um is a

lot better than the other models to

create interactive dashboards. It also

usually is the best at writing the code

as well in order to do the analysis and

it tends to hallucinate less. Again,

this is at the time of this filming, so

I don't know if this is going to change

in the future, but just FYI, AI data

analysis with different media forms is

also a really cool application. For

example, you can have like a video and

you can ask it to extract 10 frames from

this video evenly spaced out from 1

second apart. Then you can ask it to

take these images, resize the images,

make it like 300 pixels wide and say

convert it to grayscale and increase

contrast by 30%. You can ask it to do

things like combining the images to

animated GIFs that flip the to the next

image at 1second intervals. You can ask

you to turn the images into PowerPoint

presentations and then catalog all of

the images into a CSV file with the name

of the image and the name of the movie

file that the image was extracted from

and the operations applied to it. So

really really cool that you can

manipulate uh multimedia using AI. Prior

to AI, this would have been so difficult

to do. Another example of data analysis

using AI is by automating things using

zip files. Zip files are very, very

convenient and they're amazing because

not only are you able to put a lot of

different files into a single zip file,

you're also able to maintain folder

hierarchy in the zip files themselves,

which means that you can actually zip

together a bunch of different files

together and ask the AI to mass analyze

them all together. So you can have

multiple Excel files that you're telling

it to combine and search and do whatever

with it. Then afterwards, you can build

all back together and then send it back

to you. Also, Grave, if you need help

organizing different files, you can ask

the AI, one, I want you to help figure

out what is in them by opening and

reading each one to create a summary.

Two, I want you to propose a folder

structure that would better organize the

files. Three, I want you to propose

better naming for each file using just A

to Z and 0 to9, keeping the extensions.

And four, when you have all of this

done, show me your proposed folder

structure and name. Then it's going to

do that. And then finally, once you're

happy with it, you can ask her to zip

everything up again and then send it

back to you and and voila, everything is

well organized and beautiful. And the

final example I'm going to show from the

course is a little bit more advanced,

but so cool. This is when you can

actually turn conversations into

software programs. Let me explain. So,

say for example, you have like a

sequence of analyses that you did,

right? Like for example, maybe you have

like some sort of movie and then you ask

it to get like 10 frames from this movie

spaced 1 second apart. uh maximized it,

I don't know, like did some photo

manipulations on it um and then combine

them together and then generate some

descriptions for it and then put them

all together into a CSV file. You can

then ask the AI, turn this process into

a Python program that I can download and

run on my computer and provide the path

to the documents as command line

arguments. Zip up the program for me to

download and then it can literally go

and actually like write a script that

performs all of these different steps

and then put them all together in an

executable program. That is so cool. You

can literally do this for any sequence

of analyses that you do. You can just

like automate it like that. At least for

me, that blows my mind. That is so cool.

All right. All right. I can literally go

on forever, but I'm gonna stop for this

section for now and I'm going to put the

next little quiz onto the screen. Please

answer these questions and put them in

the comments. Okay. So, I wanted to

include this final section because I

wanted to make sure that you understand

that just doing the analysis using AI,

you don't need to stop there. There's

actually so much more that you can you

can do on top of that. From the examples

that we already seen, we can take this

data analysis and then use it to

generate emails, use it to make like

social posts, use it to generate

reports, PowerPoint slides, even build

software programs and dashboards. But

that's not all. You can even build

full-on applications based upon these

analyses. And no, you don't actually

need to know how to code. You can just

use pipe coding. Say you've analyzed a

lot of traffic data. You can actually

make this into an application that

analyzes real-time traffic data and then

like I don't know gives alerts to

people. uh where generates reports based

upon traffic incidents. You can have

application that's able to take videos

and blur out people's faces or like

different identifications within the

video. Here's an example of an

investment research AI agent uh that

people who join our AI agents boot camp

will build that has an entire database

with information about investments and

it has this interface where the user is

able to ask it specific questions and it

will analyze that data to generate

certain types of responses,

conversations and reports. Yeah, there

is so much that you can do. Data truly

is power and being able to analyze and

harness that power by using AI just

opens up so many possibilities. All

right, I'm going to stop here. If you do

want to dive into how to actually build

out these like applications, agents, and

things like that, I'll link a few videos

in the description that you can check

out that goes into a lot more detail

about how to do this. But for this

video, I'm going to end it for now. I

really hope that this was a very helpful

video for you and you have lots of ideas

for how to analyze your data now using

AI. Vibe data analysis. As promised,

here is the final little assessment.

Please answer the questions on screen

and put them into the comments. Thank

you so much for watching until the end

of this video and I will see you in the

next video or live stream.

Loading...

Loading video analysis...