Building an AI Social Listening App with BrightData
By notJust․dev
Summary
## Key takeaways - **Build Custom Social Listener**: Couldn't find a tool for general internet brand monitoring across social media, Google, forums, so built one myself to track mentions, complaints, competitor moves. [00:20], [00:52] - **BrightData SERP Scrapes Google Easily**: Use Bright Data SERP API with one HTTP POST request to scrape Google results in JSON, including organic links, ranks, titles, descriptions; enable CAPTCHA solver for reliable results. [07:21], [12:13] - **Webhooks Beat Polling for Scrapers**: Trigger Bright Data web scraper jobs via API, receive data instantly via webhook instead of spamming status checks; store snapshot ID in DB to map data to dataset on receipt. [25:10], [25:43] - **Chain Scrapers into Data Pipeline**: Scrape YouTube channel, webhook stores it and auto-triggers video scraper; videos webhook stores and triggers comments scraper, creating automated multi-stage pipeline. [34:30], [35:03] - **AI Transcript Prompts Yield JSON**: Prompt OpenAI with video transcript and exact JSON schema for summary and key topics; automate via database insert webhook to analyze every new video. [42:21], [45:07] - **Cron Automates 24/7 Tracking**: Supabase cron every 5min invokes edge function to scrape tracked items (channels, SERP) if not updated in 24h, enabling autopilot brand monitoring. [54:14], [55:11]
Topics Covered
- Build holistic brand monitors
- Webhooks beat polling
- Chain scrapers into pipelines
- Prompt LLMs for structured insights
- Autopilot tracking frees founders
Full Transcript
hey there and welcome to this exciting webinar brought to you by noas daav in collaboration with Bri data my name is Vadim and I'm thrilled to guide you
through today's session on building an AI powered social listening application that can help us monitor our brand and get Market intelligence using Bri data
tools before we get started let me quickly share with you a little pre-story on how I ended up building this project a couple of months ago I looking for a tool to help me monitor my
brand on the internet and I couldn't find exactly what I was looking for while there are a ton of specialized tools that can give you insights into a
specific social media or into Google search what I needed was more of a general overview of the health of my business across the whole internet I
wanted to know when someone is mentioning our work when someone is complaining or when our competitors are doing something different because I couldn't find such a tool and because
I'm an engineer I decided to build it myself in today's session I'm going to share with you the whole process and the whole project giving you a better look
of how to implement these features in your business and I'm going to share with you all the insights and learnings I got from building this social listening application before we do that
let's start with understanding why social listening matters I'm going to start with a quick question how many times are you checking what's being told
about your brand online and if the answer is not always that means that most probably you're missing out on a lot of insights from your customers and
competitors so here where social listening comes into play by using specific tools and monitoring what's being said or what's happening on the internet you can understand better the
customer sentiment you can identify emerging in your community or in the General market you can spot potential PR crisises early and take action to
prevent them and you can stay ahead of the competitors the challenge of having a good understanding of what's happening out there is that information nowadays
is so decentralized that we cannot find it in one single place we have to check out all the social medias we have to check out different forums Reddit search
Eng blogs and even some random forums that you might not even have an idea about to address this challenge in this session I'm going to show you how to use briy
data Tools in combination with large language models to get access to any public data available on the internet and then extract key insights using
these llm tools this is an exciting time to build these kind of tools because while we always had access to large amount of data we we're not able to
easily take insights now with advancement of llms we can easily process large amount of data and get the exact information that we need out of that and that's exactly what we're going
to do today by building a scalable social listening application from scratch we're going to cover the following topics we're going to cover how to collect and process data using
bright data apis we're going to learn how to store this structure data into the database for later being easier and faster to access it we're going to use
it with large language models to extract actionable insights and we're going to learn how to automate the entire workflow for continuous monitoring when
it comes to Tex teack I personally use the following combination of tools however feel free to replace and change and integrate in what makes sense for your project because a lot of things
that I'm going to share today are very applicable enough Tex Tex as well so in my case for the client side I've used rack native and Expo to build a crossplatform
application that is going to be used by the user or in this case by companies that want to monitor their brands by using R native wexo I managed to build
both a mobile application and a web application at the same time in the same source code on the back end I used super base which is using pogress as the
database and for the logic I use the edge functions from super base this back end is the part that is the most flexible here because it's up to you or up to your project to decide on how
you're going to build it you can easily replace it with a nojz backend or with other serverless Frameworks as well for the scraping infrastructures and apis I
use Bri data we're going to get into that in a moment and for the AI for the large language models I use the models provided by open AI when it comes to architecture I'm going to show you a
quick overview of how everything is interacting between each other but in a moment we're going to dive deeper into every single specific step of them process to understand them better so now
just for you to understand how parts are connecting together here it is everything starts with a client side application in our case this is a web or a mobile application built with rack
native on the other side when we think about how we're going to get the data for that we're going to use bright dat data API to scrape data from Google search from YouTube Instagram and other
social medias as well in order not to have to scrape this data anytime we need it we're going to have to store it somewhere in a database that we have
fast and easy access for that I'm going to use super base with pogress and Edge functions for the logic of interacting with external services with the data
that I will have in the database in pogress I'm going to use an another function to integrate with open AI for the AI analysis finally I'm going to add
a chrome job in order to run these flows periodically so we have Automation and auto brand monitoring so from a high level perspective this is architecture
that we're going to cover today and now we're going to get into every single step and understand how they work together to build this app to make the most out of this session I encourage you
to take notes and and also think about how you can apply these Concepts to your own use cases to your own business and your own brand and most importantly have fun if you want to have access to this
presentation and the source code of the project and additional materials make sure to download uh the the files that I prepare for you at assets. no.
dsocial listening now that we have a good plan I think it's time to get started [Applause] [Music] let's take it step by step and the first
objective that I want to cover here is retrieving brand mansion and brand information from Google search result I want to have a clear understanding of
how my brand is performing on Google how we are ranking for specific searches that my brand is targeting to accomplish this objective we're going to use the
serp API provided by bright data which makes it super easy to scrape search engine results let me show you how easy that is we're going to go to Bright data
and we're going to go to the dashboard let's go ahead under proxies and scraping and here where under more we're going to see the serp API on the this
page we see a playground that we can test with Ser PPI but before we start playing with a playground let's go ahead and configure a Zone and to do that I'm going to press on the add button and I'm
going to press on Ser API this is going to open a page to create a new Zone and new Zone will allow our scraper to run in and contains configuration about this
so give it a name I'm going to give uh serup API 2 and make sure to have capture solver Ena by having this enabled bright data will automatically
solve captures whenever it detects one whenever Google will show a capture this is crucial if you want to have good results from scraping the web and
without the tool like bright data captur are one of the most challenging things to to overcome so let's take benefit of Bri data automatically solving it leave
it enable and let's go ahead and create ad after we have this we already are greeted with a coral request that we can
use to test this out so as we can see the C request is towards the bata.com request and here we are looking and
Google to search for the query pizza so if we go ahead and take this C request open a terminal and paste this request in the terminal after a bit of time as
we can see we have the HTML code for the Google page for that query and this doesn't look very user friendly I'm going to go back here and I'm going to
go into the explore live playground the playground is here and as we can see we can search for something I'll search for my brand not just Dev and I'm going to
press search here I want to mention that we can search on different engines like Google search Google Maps Trends reviews L and so on we can search on Bing and
Yandex also we can specify a specific location for example Google CH or if I in Europe I would choose something like es for Spain country and so on we can
also specify the GE location that we want bright data to simulate when joining this one and this is crucial if you want to have a better understanding
of the global reach of your brand in different countries regions and situations or even devices with my query
if I scroll down we see that this is the page that bright data has scraped it's an actual Google page but on the right this is where things get interesting
right we have a Json with all the data that bright data scrap from this search result at the top we have gen en information we have navigation which
represents the links here for videos images news but if I search for organic this is where we have all the organic
links in our search result we have information like a link the rank the global rank description title and so on so as you can see some of them also have
images like this one from Twitter and so one so this is amazing because it allows it gives us all the data we need from SE results and in our case you can also use
pagination to get the next data because by default we're going to have 100 rows per page all right so now that we see how we can manually do that let's have a
look at how we can automatically run this request from our application from our own backend I'm going to reference
back the coral request that we were using here so this is a simple uh HTTP
request it is a post request to the URL bata.com request so what we need to do is we need to simulate the same request or we need to send the same request from
our system let's have a look at how I implemented that and I'm going to start with defining a function that is going to receive a query that I want to perform a search on and back it's going
to return me a structured object with all the links that I'm interested in the next step step is to send this uh fetch request to api. bata.com
request and we have the following options for the options I'm going to specify that this is a post request under authorization I'm going to use my
bright data API key under content type I'm going to specify Json and the body is the data that we want to send there here under the Zone make sure to use the
name of the zone that you're using in my case that's Ser API 1 the URL is going to be the link that you want to scrape so in my case that's going to be the
google.com/ search and the parameters is
google.com/ search and the parameters is going to be this is the important parameter is the que and the queue represents the query what we are
searching for so I'm also encoding it as a URI because if I have spaces I want to properly encode it in the URL another
important parameters here is in the Google link I have BRD _ json1 that tells bright data that the result that
you get there don't send it back as an HTML result but rather as a Json result and together with the line 18 where I say format Json I'm going to get this in
Json format the next step after I do the request I will take the data from the Json and I will get the Google data by
parsing the body of that script data and what this Google data is going to be in my application is exactly the data that we see here in the terminal it will have
this General it will have the input the navigation the organic and so on so all the data here if you want to learn more about the
structure of this data go ahead and press on the documentation link here and learn more about the serup API here you can go into the par Json results and
you're going to see more information about the Json result together with the general the input but also if you scroll down you see the explanation of other
fields for example the organic field is the main search results that are in the organic search and here is the Jason example of how they look so now that we
had this data scraped from Bright data which was super easy to do it was a matter of sending one HTTP request what we have to do as a tip as a
recommendation is is we have to cach this results for easier and faster access even though we saw how fast this request was like if I'm sending this it
takes less than 3 seconds having it stored in a database is going to allow us to do even faster requests to the data at any point in time when we need
it because data on Google doesn't change that often it might change day by day but it's not going to change every second so it would be better to have
this data cached in a database in our system that later can be used to Fed it into a large language model or show it to the user and only update it
periodically based on our needs on how fresh the data should be for that reason I'm using a postgress database where I'm storing all of the scraped data and the
pogress is hosted on super base for this Ser feature I have two tables one of it is the ser search which represents one
search query it has the query that we search and we have the search engine here now a Ser search will generate some links will return us some links for that
I have another table that links to this search and this table will store the links or the search result from Google for example it has the link the title
description the global Rank and most importantly a ID of the Ser request here now that I have a tables back in my backend code after I get the Google data
from scraping a Google search based on a query I'm inserting the Ser search result into the database and then for
all the organic links that I receive back I'm storing them in Ser links you don't have to stop here you can store more information based on the data that
you receive you can store the ads if you also want to track your ads or your competitor ads you can store videos images maps flights and so on so
depending on your business cases go ahead and store the information that you get by scraping Google and you can see all of the data here in the playground
and understand better like what exactly you need now having this information in the database we can easily connect it with our client side code with our
application in this case I have like an input for analyzing a Google Search and after I do that I see the links with their Global ranking with their website
and so on I can also track them or Star them but more on that a bit later in the tutorial so that was our first objective accomplished for finding information
about a specific brand on Google search again this can be extended on other search engines as well and you can also extend this with the exact data that you
need from Google search because serup gets us a lot of data in a very easy way in a matter of an HTTP request now let's go ahead and cover our second objective
and our second objective is to retrieve brand information and brand mentions from social media for that we are going to use another powerful tool from Bright
data called Web scraper apis so when we have to get data from different tools from different applic and web applications on the internet we have a
couple of options one option would be to integrate with your API the challenge is that not all the application provide an API not all the application provide you
with it all the data that you need through the API and not all apis are free some of them have insane costs and also some of them are not even available
now the next step would be okay if we don't have an API how to get data then we can build a scraper try to uh scrape the web try to parse the data and get it
the right information that you need the challenges here is that it's time consuming and things are always changing like you always have to come back to
your scraper to updated anytime these uh websites are changing themselves and another downside is that you have to take care of a lot of the piping and bottleneck yourself here I'm talking
about captas or being blocked by the website the third alternative that looks like a combination of the previous two
ones is to use the web scraper apis the web scraper API is a tool by bright data behind the scenes it's a scraper but for us developers it looks like an API we
simply send a request and we get the data that we need let me actually show you how it works and a lot will make more sense let's go back to Bright data
dashboard and here under web scrapers we can go ahead and if this is the first time you're using the web scraper go to scraper Marketplace the scraper
Marketplace is a list of more than 270 scrapers pre-built and maintained up to date for a lot of popular applications
out there such as LinkedIn Instagram Facebook Tik Tok you can even see marketplaces like Amazon I built an application with that web scraper as
well and if I check social media you can see also YouTube Twitter Reddit Pinterest and a lot more if for example I'm going to choose YouTube because this
is a platform that I have a lot of knowledge and experience with we're going to say that all of these categories have a lot of scrapers in them for example we can scrape profiles
by having a URL we can scrape comments by having a video URL we can script videos if we know of a channel URL and so on so depending on your use case we
can choose one of these scrapers here let's have a look at how uh one of these web scraper works I'm going to choose the first one with YouTube profiles that
knowing a URL is going to get us information about that profile I'm going to choose the scraper API click next and here is the documentation for the scraper if we look at the input we know
that we the only thing that we need to give is the URL in the output bright data will give us a Json with this information with a handle with profile
picture name subscriber video account description and so on everything that you can find on a profile page on YouTube let's go ahead in the second tab
in the data collection API and test it out and see how it works this is a nice playground that we can test with the first thing to test it out go ahead and
if you don't have an API token yet there will be a button to create and generate an API token after that put it here and it's going to be automatically filled in
in your request below now the first step is to trigger one of these scrapers to do that we have a trigger data collection API and we can set here the
different URLs as you can see we can also send batch request to handle multiple channels I'm going to leave this by default and I'm going to go
ahead on the right to copy the Linux bash script or the curl command to send this request I'm going to open the terminal I'm going to do clear and I'm
going to paste this request here if I press enter we see the snapshot ID so this snapshot ID is the ID of a job that
bright data started scraping sometimes takes time it's not instant and usually it's displayed here how much time a
scraper takes this one has 7 seconds so that means that knowing this ID knowing this snapshot ID that I'm going to copy from here we can scroll down on this
playground to delivery options delivery options is the way that we can get back the data that we just scraped I'm going to paste here in the snapshot ID the ID
that I got from the terminal and I'm going to copy this command to get the data that was scraped going to do clear
paste it and in a second we should have a big Json file I can put it also through a tool called J you to see it better and as you can see this is the
output we have first the Mr Beast Channel with his handle with his subscribers description and so on and we have another channel here below because
we've done a bch request for multiple channels for the second one if you want to check the logs of your jobs make sure to open the logs tab here and we can see
all the jobs that have been performed for a specific data set there are also o some management API URLs to for example get the list of different snapshots and
to also monitor the progress in some cases this is useful but in a second I'm going to show you a better way of how we can Implement our scraping without
needing to monitor the progress so let's have a look at how things worked in this situation first we sent a request to
trigger a scraping job we waited a bit of time and then we sent ENT a request to get the data in our case the scraper took 7 seconds and by the time we did
the second request it was ready but in some cases sending the second request will say that hey job is not ready yet if we look at the diagram to understand
how this would work for example in the back end we would send that we will trigger a scraping job by sending a request to this/ dat sets trigger bright
data will go and do the scraping but then we need the result back what we can do is from the same function in our backend we can send a different intervals of time we can send requests
to the data sets snapshot and this is the second request that we sent for delivery options the problem is that we don't know if it's ready and in most cases it's not going to be ready so
we're going to have to spam bright data with requests over a specific period of time to ask back for the data and this can take multiple requests back and forth until bright will say okay job is
finished here is your data and our backend can process it further this is the use case that I showed you right now manually a better use case that we are
going to integrate in our application is going to be using web hooks with web hooks we are not spamming bright data to ask for data instead we are letting
bright data call a function on our API whenever the job is ready and this way it's super efficient because from the back end we just trigger a job and tell
Bri data hey when you're ready give me the data and call the web hook endpoint or or URL with that data this has a couple of advantages we don't have to
spam Bri data with a lot of requests and also we get the data as soon as it is available without having to wait extra time so let's have a look at the
implementation of this process in code in our application because again in the playground we saw how we can do this by simply sending a c request which is an
HTTP request to the URL HTTP API bata.com data set vree trigger so what we have to do is we have to replicate the same request but in our code let's
go here and see how we're going to do that so we have a function that will trigger a collection API the function will get an input because as you see V
servey inputs and the inputs is an array of objects we're going to have a data set ID because every data set on bright
data is going to be invoked through the same URL the only difference is the parameter data set ID so if I copy this
URL and look at another data set from the marketplace for example let's say LinkedIn and if I search for this URL into the documentation you're going to
see see that the coral request is to the same data set V3 trigger the only difference is the data set ID is different so that's also a very good
thing that Bri data has standardized how we are interacting with these web scrapers because by implementing it once we can reuse it for any data set and
we're going to see that in a second so now that we have a input and the data set ID what we need is the bright data trigger URL which is that data set before free trigger that I showed you in
a moment then we need a web hook URL this URL is the end point that is going to contain the function that is going to
be ready to receive data from Bright data so back in our documentation that is this web hook that bright data will call and send the data when it's ready
now having this information we're going to start building our fetch request we're going to start with a base your well for Bri data we're going to specify
the data set ID and this is going to be dynamic and change for different requests we're going to specify the end point that Bri data is going to invoke
when the job is done and we say that the format should be Json and uncompressed this is to simplify things you can also send compressed data and just make sure that you decompress it when you receive
it and that's it now the next step is going to be to provide headers with your by dat data API key as authorization the method is going to be post and for the
body we are going to stringify the input and send the input as the body of this post request then what we get back is the data and the data is going to
contain the snapshot ID because the first request that we sent is an object with a snapshot ID this is important because this snapshot ID is going to be
the data and what I'm going to do is I'm going to store a snapshot ID in a database in a table called scrape jobs why do I do that well because I'm going
to have access to the snapshot ID but I also need to know what data set was that request for so in the table I have the ID which is this snapshot ID and I also
have a data set with a status this helps me in a multiple ways by subscribing in real time from my client side application to show the status change of
a job so as soon as it is ready I update the UI but a more important reason for this table is for me to be able to map a
snapshot ID to the data set because the web hook that I have there the web hook will receive a data and I need to know for what data set it was was it YouTube
data was it LinkedIn or was it Instagram but the only thing that I will know is the scrape job ID the snapshot ID so now that we are done with the first part of
our equation that is going to trigger a scrape job let's have a look at the code for the web hook that is going to receive the data so on the receiving end
we have this web hook and this because I'm using super base Edge functions is a doo function so a dinos Ser function in
your case it can be a nodejs uh endpoint in your API your rest API so what we have to do is to take the data from the
request dojon and this is basically the body of the request and from the headers there is going to be a header with snapshot ID so take this two data and
knowing the snapshot ID and the data that we received and the data is going to be exactly this like nothing more nothing less an array of information now
the next step is going to be to knowing the snapshot ID I'm going to fetch the scrape job from my database and this is going to be give me all the information
about that scrape job but what's more important for me in that case is to know what's the data set ID of the scrape job
that we just received data for knowing that data set ID I can have a list of functions where a switch case where I
handle every single data response differently for example if the scrape job had the data set equal to YouTube channels that means that this data
belongs to a channel data so I'm going to call a special function called save Channel if it's about videos I'm going to call save videos if it's YouTube comment save YouTube comments if it's
LinkedIn it's going to be save LinkedIn post and so on so all other data sets like Instagram Tik Tok LinkedIn and so on are going to be here different cases
this is one way to implement this and I did this in order to have only one web hook however this is not the only way you can
also have different web hooks for different data sets and all of them will know and handle one specific data set data for example you can have like a
YouTube channels web hook a YouTube videos web hook a YouTube comments web hook and so on so it really depends on you like how you want to structure it uh one endpoint with a switch Case by
calling different functions or completely different endpoints it depends on you and how to architect this both of them are going to work perfectly the only change that you're going to
have to do is when we trigger this endpoint here the web hook URL you need to specify the correct endpoint or the function that you want to call for the
result based on this data set ID so now that we have this data here the next step is again to cat the results for easier and faster access for the YouTube
data I have mapped exactly the data that I need to work with that I'm going to need for large language models and for my user interface and I have free tables I have information about the YouTube
channel I have information about YouTube videos that belong to a channel and information about YouTube comments that belong to a video and if we go back to
this switch case and focus exactly on the save channel the save Channel function will receive the data and the snapshot ID the data again I will remind
you is this array with data about the channel if we're talking about save Channel web hook now if look at that function what's happening there in the
save Channel we are simply taking the data and parsing and formatting it correctly and saving it in our YouTube Channel's table I'm using an upsert
operation instead of insert in order to insert it if it's not already there or update the data if I already have the same channel with the same ID so if I
will run this function again it's not going to error or it's not going to ignore the new request but it's going to update the information about the channel
giving me upto-date information in my system after I saved the channel what I'm doing is I also update the scrape job to set the status to ready and this
is helping me on the client side to update the UI and get the data that was fetched and finally at the end after I store the information
about the video I also trigger another scraping job by invoking that trigger collection API function with the video data set ID so basically what I'm doing
here is I'm saying now that I know the channel I want to know the videos of that channel and I'm doing this on multiple levels all the way until I have
all the data about a specific channel in this way I'm creating a data pipeline so from the client side application when I search for a channel I start a data
Pipeline with blue we have a BRI data web scrapers starting with scraping the channel information then Bri data is going to call a function with green here
in my system on my back end this web Hook is going to store Channel data and automatically trigger a new bright data video scraper to scrape videos for that
channel once that is done I'm going to receive them in the same web hook and I'm going to store videos in my system and I'm going to trigger another scraper to fetch the comments of all the videos
that we just received and this scraper is going to call the web hook to store the comments and you can link these stages as much as you need in order to have the information that you're looking
for and having them linked together this way is going to help us later whenever we have to rec scrape a specific Channel we're not going to have have to think
about multiple stages we're only going to say hey I need updated information about this Channel and I'm going to start the same data Pipeline and it's going to follow through the same steps
here from the UI perspective this is allowed me to build the following screens where I have a YouTube channel analyzer where I can paste the channel
URL at the top after that bright data is going to scrape data I'm going to store it in the database and then I'm going to display the channel information together with the videos and the
comments of the videos and inside the video we have info about that specific video this was an example of how we can integrate one data source in this case
YouTube but I encourage you not to stop here and go ahead and integrate other data sets as well as I said there are more than 270 data sets on bright data
web apis so go ahead check the others and try to integrate them because the code as I said is going to be the same the way we build it in the previous
steps is very simple because the triggering a collection or triggering a scrape API is going to be done with the same function zero code changes the only
thing is the input we're going to send a different data set ID and a different input and for example we can go into the LinkedIn we go here we go to the
overview and see for the input I need a URL for the LinkedIn profile and I need the ID of this data set you can take it
from the URL here in the request there is data set ID you can search for a data setor ID and you're going to see that you can copy from here or you can also
take it from the URL of Bri data alog together from the top so you're going to send this data set ID to our trigger collection API function and they input
with URLs for LinkedIn so triggering that doesn't require any code changes the only code changes that you're going to have to do is you're going to have to
add another case here to check what data set ID is this data from so you're going to add the LinkedIn case here and Implement a function on how you want to
handle the data that you receive from Bri data about this specific data set for that I encourage you to go into the overview and see the example output the
example output will tell you the format of the data so you can map it to your database for example you have a name you have a country position City and so on
so the output and the data structure is different for every single scraper because all the platforms all all of them have different data so make sure to
look here what data is available and map it to your database schema and that's how you're going to have data from different sources from different social
medias even from Reddit for example we can go to scrape market for social media we can see even from Reddit like very
important source of information as well all right so at this point we should have lots of data in our database we don't necessarily need data what we need
is insights we need easy to digest information and insights about this data so that's why in the next chapter we're going to cover how to use Ai and large
language models to extract Insight from this data so we're going to sprinkle some AI in this project and what we're going to do is we're going to use open
AI models for our project to extract insights I'm going to show you a couple of use cases and later you can experiment and implement it in so many
ways because the fundamentals are going to be the same it's only going to be a different prompt that you're going to use for your specific case the first use
case that I want to cover is AI video transcript analysis so from Bright data when I scrape the video information from YouTube I also get the transcript and
this is super valuable because with the transcript we can know what is happening in the video we can know what the Creator is talking about so what I want to do is I want to summarize the
transcript I want to extract the key topics that are being discussed there the discussion points and maybe even finding brand manions for example if
you're working with different creators you can do a scraper that would find the Brin manions will find the time a stamp when this is happening and so on so
let's have a look at the code on how we implement this first of all we need to uh install the open AI package it's available for nodejs python and a lot of
other languages as well we need an API key that we can generate on open Ai and using that key we are initializing open
AI next what we need to do is we need to start a chat completion chat completion in simple terms is the same things that happening when you go to chbt and ask a
question by pressing sand on a question you're generating a chat completion in other words what the model is going to do is complete the chat with the next
reply so we do that by calling open ai.
chat. completion. create and we provide a list of messages and this list of messages if we think back to the chbt example is the history of messages that
you have in that chat thread the first message is very specific when we use the API because we can provide the role developer here and with a role developer
we can specify some predefined rules that we want our open AI uh large language model to follow so we're giving some context of what do we we expect to
happen and how we expect the open AI to respond in this case I'm telling it that you an AI assistant specialized in analyzing and summarizing video transcripts your task is to extract
concise and meaningful summaries from the provided transcript provide the output in Json format Json format is important and we're going to see it in a second so this is the first uh message
the developer role message the next one is a message with role user and this is in a way what we are typing as a user so what I'm typing here is I'm creating a
message saying that here is the video transcript I'm putting the video.
transcript that I have in my database that we saw here so the whole transcript of the video and I'm finalizing the message with the following text please
provide the summary and key topics in Jason format so I remind it again that he needs to operate in Json format and to get the results that I expect I
provide the exact Json structure that I want back so I tell him what fields for example AI summary of type string I give
here more details about that and AI topics an array of strings and I give more information what do I mean by that what do I expect in that field so with
this Json format open AI will try to answer in Json format and if I do response format type Json object the open I will also try to validate that
it's a Json format it might not be the format that you need but it's a valid Json so I encourage you to do some try catches on your end as well because we
never know how the AI will hallucinate in some cases but Mo in most of the cases if you are very upfront with it that you expect a Json format and you provide the data type that you expect
like this you're going to get it this way so I'm specifying the model in this case I'm using the GPT 40 mini you can use other models as well depending on
your case and what we get back is the chat completion so I'm taking the reply from the first choice and I'm converting it to a Json using json.parse this is a
part that I recommend adding into a try catch because as I said Jason might have issues now having this Jason reply I can access Json reply. a summary and Json
reply AI topics because that's what I asked kindly the AI to give me and I can store and update my video in the database with this information I can
have this AI summary of a video with the key topics that were talked in the video like this again this is just an example and what's possible sky is the limit and
I encourage you to think about like what exact data you want to extract from the transcript what I did here to make it easier for me to run this process
without having to think about running it in some specific cases is I automated them AI analysis for every single insert
it in my YouTube videos database so as soon as I insert um video in the database on superbase there is a concept
called database web hooks so when the table YouTube videos has an event insert I'm calling a web hook I'm calling a URL
for the edge functions that is running this AI workflow and in my Edge functions I receive a type of operation I receive a table and the new record the
video that has been inserted so by having this automation I make sure to generate AI summaries every time a new video is inserted in my database if
you're building this on nodejs what you have to do is to always make sure that when you do an insert operation in the database for a YouTube video to also
invoke that AI function that will do the AI analysis and that is also going to work fine now moving on to the next use case in our AI analysis I also want to
show you how we can analyze not only data that the brand is posting but also data from Community from customers
comments reviews what people are saying because I think this use case is even more valuable for a company to know what the customers are saying
out there so the next use case is to do AI analysis of public opinion for this example I'm going to work with the comments from YouTube but you can extend
the same approach to social media to social media posts that are mentioning your brand to posts on Reddit reviews blogs forums and so on so what you can
do in terms of analyzing public opinion is you can analyze sentiment to try to understand if it's positive words of if it's negative sentiment of a post or of
a group of posts you can extract key conversation topics you can extract pain points or very common requests that your users are asking for you can find the
most common issues or challenges they're going through and this way you can prioritize your next steps you can get ideas for Content or get ideas for
improving your product or service based on public opinion and you can also for more specific use cases you can detect
early a potential PR crisis if you find some hints of a PR crisis of your brand being attacked you can find that early
and address it accordingly so possibilities here are also endless I'm going to show you how we can do sentiment analysis based on YouTube
comments and based on this you can extend it for other use cases as well well so the main part of this function Remains the Same we are still using open
AI we are still using chat completion with the same model the only different part is the messages so here a lot of this AI work nowadays is prompt
engineering and writing the correct prompt here is what I came up with maybe it's not the best I'm pretty sure you can write better promps than me but this
is what works perfectly for my use case so I'm telling it in developer message that you are an AI assistant specializing in analyzing YouTube comments your task is to determine the
sentiment and extract common topics and I also say it I need Json format now in the main message from the user I tell it like here are the comments for a YouTube
video and what I do is I merge all the comments under a YouTube video that I have in my system and I put them as a bullet point list here in text and finally I say please Analyze This
comments and provide the following format I need the sentiment and I provide three different values that it can choose from positive negative neutral the score of a sentiment from 0
to one the explanation here is going to give me a brief explanation of that sentiment and a list of common topics as an array of strings so having this uh
completion I also save it back in the database and this allows me to have a user interface and to show this common sentiment analysis under channels and videos in in the application that I've
built in the following way we can see the label one is positive one is neutral the score 085 05 a little bit of explanation so we see we understand
better like the overall structure of the comments and the common topics again you can extend this to give you things like commonly suggested ideas most common
bugs that are me being mentioned or other things that you might need to extract from the comments so the question that I want to ask you now is
what analysis and what data will make sense for your business think about that and go ahead and try it out with a prompt based on this example because as
I said the process Remains the Same and here is a high level overview of a AI data pipeline so whenever we have to
extract some insights using large language models first we need some data so using bright dat we can get any publicly available data from the internet The Next Step would be
to clean that data to make sure that we're working with the right data following the rule that garbage in garbage out we need to make sure that we are working with the correct data The
Next Step would be to try to provide more context for that data for the llm to make better recommendation to understand better what's happening you
can for example for a list of comments we can maybe provide the transcript of a video so the llm knows what the video is about and what do the comments mean
after we have data and more context we run it through an llm with a right prompt here you're going to get better results by being able to provide and
build a better prompt and after we get the result we store the results back in our system in our database for later being able to display to the user and
that's how you can use Ai and large language models to extract key insight from large data sets now let's go ahead and follow the last chapter of today's
video and in this chapter I want to talk about Automation and scheduling because at this moment what we have is a system
that can do this research ad hoc basically when the user asks for it so the whole system that we have so far is
powered by an input for example a channel URL a Google sech SE a LinkedIn URL and Instagram handle and so on after the user presses the button the whole
pipeline starts starting with a scraping the data scraping pipeline going through Channel scraping video scraping comment
scraping or search links details about a specific link and so on after the data scraping pipeline is done it will invoke
automatically the AI pipelines that is going to for examp example extract key Insight based on the data that was just collected and finally after all of this
is happening which can take some minutes the user will have this information in the application this is good for research purposes for example when we
try to research something new a competitor or a new thing however what I want to build in my application is a way
to track different channels social media profiles serup searches brand mansions and I want to track this over time on
autopilot I want this system to work for me every single hour to make sure that I have realtime data about my brand and
I'm also notified if something changes if something happens out there this can give a Founder the peace of mind that they need in order to disconnect a bit
from social medias and be able to to do the hard work we the risk of something happening out there without us knowing so in terms of user interface what I
want is a dashboard where I can go ahead and pin or track different data I can track different YouTube channels I can track different search queries I can
track Reddit posts and so on and by tracking these channels what needs to happen in our diagram here in the data flow is a possibility to trigger a
channel scraper or a data scraper not only from the application from the client side when the user presses a button but also periodically over time
if we zoom in on this part what we want to add is we want to add a function let's call it the auto tracker and we want to schedu a Chron job a Chown job
is a process that is going to do something based on intervals of time that we specify for example we can say hey Crown job can you invoke this auto tracker function every 10 minutes unes
and now what the auto tracker function is going to do on its end it's going to have a couple of responsibilities it's going to get information from the database about the items that are being
tracked and specifically items that needs to be scraped because some of them can be scraped at larger intervals of time and it needs to start different
scrapers for example YouTube channel Ser scraper Instagram Reddit and so on the Cron job can be set up on any server
that you're running on Linux with nodejs and so on in my case I'm using superbase and superbase has a Chone integration that can invoke an edge function on
predefined intervals of time so I scheduled this Crown job for every 5 minutes to call this Edge function now let's have a look at what's happening in
that edge function in that edge function we go step by step for different phases and try to invoke the necessary scrapers
for example first thing is about Ser for the search engines so I'm going to look in the database for different Ser searches and I'm going to look specifically for the ones that is
tracked is equal to true the ones that we marked with the star in our application and also the ones that have not been updated in the last 24 hours so
that will help us instead of scraping the searches every 5 minutes I decid to only scrape them maximum once a day so
every 24 hours different platforms would require different intervals depending on how fast data changes on SE Engine it doesn't change that much on Reddit it
might change every minute so especially if a post is trending you might want to scrape it like at every minute for example so now that I have all the track
searches what I'm doing is for all of them and invoking a function to start collecting serup data to start the scraping process and I'm doing this for
other data points as well for example the next is YouTube scraping and I'm doing the same I'm looking for YouTube channels that are tracked and that are
not updated in the last 24 hours and I'm invoking the trigger collection API with a list of inputs and because the trigger
collection API from Bri data it it can do bash requests I'm putting them into one single request and I put all the URLs there and the data set I is the
YouTube channels data set and you do the same for all the tracked items for Instagram Pages for Reddit for mentions or different forums and so on and with
this what we get is a machine that is working on autopilot for for us 24/7 without us having to do anything all
right so this was our AI powered social listening application let's go ahead and recap quickly what we learned today so we learn how we can collect and process
data using Bri data tool and more specifically we learned about the serp API that can help us easily scrape the search engines which contain a lot of
valuable data for our businesses we learn how to use web scraper apis to get data from from a lot of social medias
like YouTube Instagram LinkedIn Facebook Reddit and so on and we know that there are more than 270 web scrapers pre-build that we can
invoke by simply sending a post request the only thing that we need to look find the data and parse it and store it in our system we also learn how to parse
this data and store structured data in our database that we can later use with the open AI llms to extract actionable insights we also learn how to automate
the whole workflow for continuous monitoring and to run on 24 on autopilot 247 and in the end we have an application a social listening
application that can help us have a better understanding of our brand across the internet before we end this I want to leave you with a couple of ideas of
how you can improve and what steps you can take next so here are some ideas how you can improve a social listening application what you can do is you can
Implement alerts you can specify some metrics that you want to track some thresholds and when the automatic data scraping is happening if the metric is
above a threshold you can notify using email or slack or any other application that you're using for communication you can do weekly daily or periodically
reports sent over email with the most important data and here you'll have to decide what is the most important data for your business you can track competitors to show comparison between
you and a competitor for example for different Google search searches or on social medias to know the sentiment of
your mentions versus the sentiment of the competitor mentions on Twitter this is just an example we can show progress over time and for that you're going to
have to adjust with database design to Store updates as new values instead of overwriting old data so a little bit of the changes on the database design you
can also have historical data collected and displayed in the application you can also Implement AI recommendations so far we have analyzed and got key insights
from the data with AI now we can do these key insights feed them back in the AI and say generate a list of recommendations and this can also be implemented quite easily with a system
that we have developed today moreover where you can add a rag system a retrieval augmented generation and I don't have a lot of time to explain what a rag system is uh you can quickly
search on the internet but it is a powerful tool that we can take our system to the next level by giving the
AI knowledge about our data sets about our database for example so by implementing this rag system that is not
super hard to do we can have features like searching similar companies not only based on the name and the data but also based on the topics that they
discuss in the videos in on social medias topics that they research and so on we can Implement a chatbot for our data for example I can go on a YouTube
video in my application and chat with a bot saying like what are the key topics here or at what time stamp did we mention this you can find creators or
companies based on the discussion topics and a lot more so basically having this information in your system you can Power It Up and build so many new things in fact if you're interested in the r
systems I'm planning uh in the following March of 2025 a tutorial on our Channel where I'm going to implement a reg system a chatbot on top of YouTube
videos based on their transcript so if you're interested in that make sure to follow me on YouTube at N.D and not to miss that out that's all I had for today and I hope
you learned something new I hope you had fun following this tutorial if you still have questions feel free to reach out to me on social medias you can find me at
Vadim nodev or on our YouTube channel and I'll be more than happy to try to help you with your questions if you decide to implement these systems in your business in your cases let me know
how you adjusted it let me know your uh case studies I'd be more than happy to hear what you guys have built with this system thank you Bri data for providing these tools that without which it would
be impossible to do this kind of projects and I'll see you later bye-bye
Loading video analysis...