Building an AI Social Listening App with BrightData

By notJust․dev

Summary

## Key takeaways - **Build Custom Social Listener**: Couldn't find a tool for general internet brand monitoring across social media, Google, forums, so built one myself to track mentions, complaints, competitor moves. [00:20], [00:52] - **BrightData SERP Scrapes Google Easily**: Use Bright Data SERP API with one HTTP POST request to scrape Google results in JSON, including organic links, ranks, titles, descriptions; enable CAPTCHA solver for reliable results. [07:21], [12:13] - **Webhooks Beat Polling for Scrapers**: Trigger Bright Data web scraper jobs via API, receive data instantly via webhook instead of spamming status checks; store snapshot ID in DB to map data to dataset on receipt. [25:10], [25:43] - **Chain Scrapers into Data Pipeline**: Scrape YouTube channel, webhook stores it and auto-triggers video scraper; videos webhook stores and triggers comments scraper, creating automated multi-stage pipeline. [34:30], [35:03] - **AI Transcript Prompts Yield JSON**: Prompt OpenAI with video transcript and exact JSON schema for summary and key topics; automate via database insert webhook to analyze every new video. [42:21], [45:07] - **Cron Automates 24/7 Tracking**: Supabase cron every 5min invokes edge function to scrape tracked items (channels, SERP) if not updated in 24h, enabling autopilot brand monitoring. [54:14], [55:11]

Topics Covered

Build holistic brand monitors
Webhooks beat polling
Chain scrapers into pipelines
Prompt LLMs for structured insights
Autopilot tracking frees founders

Full Transcript

hey there and welcome to this exciting webinar brought to you by noas daav in collaboration with Bri data my name is Vadim and I'm thrilled to guide you

through today's session on building an AI powered social listening application that can help us monitor our brand and get Market intelligence using Bri data

tools before we get started let me quickly share with you a little pre-story on how I ended up building this project a couple of months ago I looking for a tool to help me monitor my

brand on the internet and I couldn't find exactly what I was looking for while there are a ton of specialized tools that can give you insights into a

specific social media or into Google search what I needed was more of a general overview of the health of my business across the whole internet I

wanted to know when someone is mentioning our work when someone is complaining or when our competitors are doing something different because I couldn't find such a tool and because

I'm an engineer I decided to build it myself in today's session I'm going to share with you the whole process and the whole project giving you a better look

of how to implement these features in your business and I'm going to share with you all the insights and learnings I got from building this social listening application before we do that

let's start with understanding why social listening matters I'm going to start with a quick question how many times are you checking what's being told

about your brand online and if the answer is not always that means that most probably you're missing out on a lot of insights from your customers and

competitors so here where social listening comes into play by using specific tools and monitoring what's being said or what's happening on the internet you can understand better the

customer sentiment you can identify emerging in your community or in the General market you can spot potential PR crisises early and take action to

prevent them and you can stay ahead of the competitors the challenge of having a good understanding of what's happening out there is that information nowadays

is so decentralized that we cannot find it in one single place we have to check out all the social medias we have to check out different forums Reddit search

Eng blogs and even some random forums that you might not even have an idea about to address this challenge in this session I'm going to show you how to use briy

data Tools in combination with large language models to get access to any public data available on the internet and then extract key insights using

these llm tools this is an exciting time to build these kind of tools because while we always had access to large amount of data we we're not able to

easily take insights now with advancement of llms we can easily process large amount of data and get the exact information that we need out of that and that's exactly what we're going

to do today by building a scalable social listening application from scratch we're going to cover the following topics we're going to cover how to collect and process data using

bright data apis we're going to learn how to store this structure data into the database for later being easier and faster to access it we're going to use

it with large language models to extract actionable insights and we're going to learn how to automate the entire workflow for continuous monitoring when

it comes to Tex teack I personally use the following combination of tools however feel free to replace and change and integrate in what makes sense for your project because a lot of things

that I'm going to share today are very applicable enough Tex Tex as well so in my case for the client side I've used rack native and Expo to build a crossplatform

application that is going to be used by the user or in this case by companies that want to monitor their brands by using R native wexo I managed to build

both a mobile application and a web application at the same time in the same source code on the back end I used super base which is using pogress as the

database and for the logic I use the edge functions from super base this back end is the part that is the most flexible here because it's up to you or up to your project to decide on how

you're going to build it you can easily replace it with a nojz backend or with other serverless Frameworks as well for the scraping infrastructures and apis I

use Bri data we're going to get into that in a moment and for the AI for the large language models I use the models provided by open AI when it comes to architecture I'm going to show you a

quick overview of how everything is interacting between each other but in a moment we're going to dive deeper into every single specific step of them process to understand them better so now

just for you to understand how parts are connecting together here it is everything starts with a client side application in our case this is a web or a mobile application built with rack

native on the other side when we think about how we're going to get the data for that we're going to use bright dat data API to scrape data from Google search from YouTube Instagram and other

social medias as well in order not to have to scrape this data anytime we need it we're going to have to store it somewhere in a database that we have

fast and easy access for that I'm going to use super base with pogress and Edge functions for the logic of interacting with external services with the data

that I will have in the database in pogress I'm going to use an another function to integrate with open AI for the AI analysis finally I'm going to add

a chrome job in order to run these flows periodically so we have Automation and auto brand monitoring so from a high level perspective this is architecture

that we're going to cover today and now we're going to get into every single step and understand how they work together to build this app to make the most out of this session I encourage you

to take notes and and also think about how you can apply these Concepts to your own use cases to your own business and your own brand and most importantly have fun if you want to have access to this

presentation and the source code of the project and additional materials make sure to download uh the the files that I prepare for you at assets. no.

dsocial listening now that we have a good plan I think it's time to get started [Applause] [Music] let's take it step by step and the first

objective that I want to cover here is retrieving brand mansion and brand information from Google search result I want to have a clear understanding of

how my brand is performing on Google how we are ranking for specific searches that my brand is targeting to accomplish this objective we're going to use the

serp API provided by bright data which makes it super easy to scrape search engine results let me show you how easy that is we're going to go to Bright data

and we're going to go to the dashboard let's go ahead under proxies and scraping and here where under more we're going to see the serp API on the this

page we see a playground that we can test with Ser PPI but before we start playing with a playground let's go ahead and configure a Zone and to do that I'm going to press on the add button and I'm

going to press on Ser API this is going to open a page to create a new Zone and new Zone will allow our scraper to run in and contains configuration about this

so give it a name I'm going to give uh serup API 2 and make sure to have capture solver Ena by having this enabled bright data will automatically

solve captures whenever it detects one whenever Google will show a capture this is crucial if you want to have good results from scraping the web and

without the tool like bright data captur are one of the most challenging things to to overcome so let's take benefit of Bri data automatically solving it leave

it enable and let's go ahead and create ad after we have this we already are greeted with a coral request that we can

use to test this out so as we can see the C request is towards the bata.com request and here we are looking and

Google to search for the query pizza so if we go ahead and take this C request open a terminal and paste this request in the terminal after a bit of time as

we can see we have the HTML code for the Google page for that query and this doesn't look very user friendly I'm going to go back here and I'm going to

go into the explore live playground the playground is here and as we can see we can search for something I'll search for my brand not just Dev and I'm going to

press search here I want to mention that we can search on different engines like Google search Google Maps Trends reviews L and so on we can search on Bing and

Yandex also we can specify a specific location for example Google CH or if I in Europe I would choose something like es for Spain country and so on we can

also specify the GE location that we want bright data to simulate when joining this one and this is crucial if you want to have a better understanding

of the global reach of your brand in different countries regions and situations or even devices with my query

if I scroll down we see that this is the page that bright data has scraped it's an actual Google page but on the right this is where things get interesting

right we have a Json with all the data that bright data scrap from this search result at the top we have gen en information we have navigation which

represents the links here for videos images news but if I search for organic this is where we have all the organic

links in our search result we have information like a link the rank the global rank description title and so on so as you can see some of them also have

images like this one from Twitter and so one so this is amazing because it allows it gives us all the data we need from SE results and in our case you can also use

pagination to get the next data because by default we're going to have 100 rows per page all right so now that we see how we can manually do that let's have a

look at how we can automatically run this request from our application from our own backend I'm going to reference

back the coral request that we were using here so this is a simple uh HTTP

request it is a post request to the URL bata.com request so what we need to do is we need to simulate the same request or we need to send the same request from

our system let's have a look at how I implemented that and I'm going to start with defining a function that is going to receive a query that I want to perform a search on and back it's going

to return me a structured object with all the links that I'm interested in the next step step is to send this uh fetch request to api. bata.com

request and we have the following options for the options I'm going to specify that this is a post request under authorization I'm going to use my

bright data API key under content type I'm going to specify Json and the body is the data that we want to send there here under the Zone make sure to use the

name of the zone that you're using in my case that's Ser API 1 the URL is going to be the link that you want to scrape so in my case that's going to be the

google.com/ search and the parameters is

google.com/ search and the parameters is going to be this is the important parameter is the que and the queue represents the query what we are

searching for so I'm also encoding it as a URI because if I have spaces I want to properly encode it in the URL another

important parameters here is in the Google link I have BRD _ json1 that tells bright data that the result that

you get there don't send it back as an HTML result but rather as a Json result and together with the line 18 where I say format Json I'm going to get this in

Json format the next step after I do the request I will take the data from the Json and I will get the Google data by

parsing the body of that script data and what this Google data is going to be in my application is exactly the data that we see here in the terminal it will have

this General it will have the input the navigation the organic and so on so all the data here if you want to learn more about the

structure of this data go ahead and press on the documentation link here and learn more about the serup API here you can go into the par Json results and

you're going to see more information about the Json result together with the general the input but also if you scroll down you see the explanation of other

fields for example the organic field is the main search results that are in the organic search and here is the Jason example of how they look so now that we

had this data scraped from Bright data which was super easy to do it was a matter of sending one HTTP request what we have to do as a tip as a

recommendation is is we have to cach this results for easier and faster access even though we saw how fast this request was like if I'm sending this it

takes less than 3 seconds having it stored in a database is going to allow us to do even faster requests to the data at any point in time when we need

it because data on Google doesn't change that often it might change day by day but it's not going to change every second so it would be better to have

this data cached in a database in our system that later can be used to Fed it into a large language model or show it to the user and only update it

periodically based on our needs on how fresh the data should be for that reason I'm using a postgress database where I'm storing all of the scraped data and the

pogress is hosted on super base for this Ser feature I have two tables one of it is the ser search which represents one

search query it has the query that we search and we have the search engine here now a Ser search will generate some links will return us some links for that

I have another table that links to this search and this table will store the links or the search result from Google for example it has the link the title

description the global Rank and most importantly a ID of the Ser request here now that I have a tables back in my backend code after I get the Google data

from scraping a Google search based on a query I'm inserting the Ser search result into the database and then for

all the organic links that I receive back I'm storing them in Ser links you don't have to stop here you can store more information based on the data that

you receive you can store the ads if you also want to track your ads or your competitor ads you can store videos images maps flights and so on so

depending on your business cases go ahead and store the information that you get by scraping Google and you can see all of the data here in the playground

and understand better like what exactly you need now having this information in the database we can easily connect it with our client side code with our

application in this case I have like an input for analyzing a Google Search and after I do that I see the links with their Global ranking with their website

and so on I can also track them or Star them but more on that a bit later in the tutorial so that was our first objective accomplished for finding information

about a specific brand on Google search again this can be extended on other search engines as well and you can also extend this with the exact data that you

need from Google search because serup gets us a lot of data in a very easy way in a matter of an HTTP request now let's go ahead and cover our second objective

and our second objective is to retrieve brand information and brand mentions from social media for that we are going to use another powerful tool from Bright

data called Web scraper apis so when we have to get data from different tools from different applic and web applications on the internet we have a

couple of options one option would be to integrate with your API the challenge is that not all the application provide an API not all the application provide you

with it all the data that you need through the API and not all apis are free some of them have insane costs and also some of them are not even available

now the next step would be okay if we don't have an API how to get data then we can build a scraper try to uh scrape the web try to parse the data and get it

the right information that you need the challenges here is that it's time consuming and things are always changing like you always have to come back to

your scraper to updated anytime these uh websites are changing themselves and another downside is that you have to take care of a lot of the piping and bottleneck yourself here I'm talking

about captas or being blocked by the website the third alternative that looks like a combination of the previous two

ones is to use the web scraper apis the web scraper API is a tool by bright data behind the scenes it's a scraper but for us developers it looks like an API we

simply send a request and we get the data that we need let me actually show you how it works and a lot will make more sense let's go back to Bright data

dashboard and here under web scrapers we can go ahead and if this is the first time you're using the web scraper go to scraper Marketplace the scraper

Marketplace is a list of more than 270 scrapers pre-built and maintained up to date for a lot of popular applications

out there such as LinkedIn Instagram Facebook Tik Tok you can even see marketplaces like Amazon I built an application with that web scraper as

well and if I check social media you can see also YouTube Twitter Reddit Pinterest and a lot more if for example I'm going to choose YouTube because this

is a platform that I have a lot of knowledge and experience with we're going to say that all of these categories have a lot of scrapers in them for example we can scrape profiles

by having a URL we can scrape comments by having a video URL we can script videos if we know of a channel URL and so on so depending on your use case we

can choose one of these scrapers here let's have a look at how uh one of these web scraper works I'm going to choose the first one with YouTube profiles that

knowing a URL is going to get us information about that profile I'm going to choose the scraper API click next and here is the documentation for the scraper if we look at the input we know

that we the only thing that we need to give is the URL in the output bright data will give us a Json with this information with a handle with profile

picture name subscriber video account description and so on everything that you can find on a profile page on YouTube let's go ahead in the second tab

in the data collection API and test it out and see how it works this is a nice playground that we can test with the first thing to test it out go ahead and

if you don't have an API token yet there will be a button to create and generate an API token after that put it here and it's going to be automatically filled in

in your request below now the first step is to trigger one of these scrapers to do that we have a trigger data collection API and we can set here the

different URLs as you can see we can also send batch request to handle multiple channels I'm going to leave this by default and I'm going to go

ahead on the right to copy the Linux bash script or the curl command to send this request I'm going to open the terminal I'm going to do clear and I'm

going to paste this request here if I press enter we see the snapshot ID so this snapshot ID is the ID of a job that

bright data started scraping sometimes takes time it's not instant and usually it's displayed here how much time a

scraper takes this one has 7 seconds so that means that knowing this ID knowing this snapshot ID that I'm going to copy from here we can scroll down on this

playground to delivery options delivery options is the way that we can get back the data that we just scraped I'm going to paste here in the snapshot ID the ID

that I got from the terminal and I'm going to copy this command to get the data that was scraped going to do clear

paste it and in a second we should have a big Json file I can put it also through a tool called J you to see it better and as you can see this is the

output we have first the Mr Beast Channel with his handle with his subscribers description and so on and we have another channel here below because

we've done a bch request for multiple channels for the second one if you want to check the logs of your jobs make sure to open the logs tab here and we can see

all the jobs that have been performed for a specific data set there are also o some management API URLs to for example get the list of different snapshots and

to also monitor the progress in some cases this is useful but in a second I'm going to show you a better way of how we can Implement our scraping without

needing to monitor the progress so let's have a look at how things worked in this situation first we sent a request to

trigger a scraping job we waited a bit of time and then we sent ENT a request to get the data in our case the scraper took 7 seconds and by the time we did

the second request it was ready but in some cases sending the second request will say that hey job is not ready yet if we look at the diagram to understand

how this would work for example in the back end we would send that we will trigger a scraping job by sending a request to this/ dat sets trigger bright

data will go and do the scraping but then we need the result back what we can do is from the same function in our backend we can send a different intervals of time we can send requests

to the data sets snapshot and this is the second request that we sent for delivery options the problem is that we don't know if it's ready and in most cases it's not going to be ready so

we're going to have to spam bright data with requests over a specific period of time to ask back for the data and this can take multiple requests back and forth until bright will say okay job is

finished here is your data and our backend can process it further this is the use case that I showed you right now manually a better use case that we are

going to integrate in our application is going to be using web hooks with web hooks we are not spamming bright data to ask for data instead we are letting

bright data call a function on our API whenever the job is ready and this way it's super efficient because from the back end we just trigger a job and tell

Bri data hey when you're ready give me the data and call the web hook endpoint or or URL with that data this has a couple of advantages we don't have to

spam Bri data with a lot of requests and also we get the data as soon as it is available without having to wait extra time so let's have a look at the

implementation of this process in code in our application because again in the playground we saw how we can do this by simply sending a c request which is an

HTTP request to the URL HTTP API bata.com data set vree trigger so what we have to do is we have to replicate the same request but in our code let's

go here and see how we're going to do that so we have a function that will trigger a collection API the function will get an input because as you see V

servey inputs and the inputs is an array of objects we're going to have a data set ID because every data set on bright

data is going to be invoked through the same URL the only difference is the parameter data set ID so if I copy this

URL and look at another data set from the marketplace for example let's say LinkedIn and if I search for this URL into the documentation you're going to

see see that the coral request is to the same data set V3 trigger the only difference is the data set ID is different so that's also a very good

thing that Bri data has standardized how we are interacting with these web scrapers because by implementing it once we can reuse it for any data set and

we're going to see that in a second so now that we have a input and the data set ID what we need is the bright data trigger URL which is that data set before free trigger that I showed you in

a moment then we need a web hook URL this URL is the end point that is going to contain the function that is going to

be ready to receive data from Bright data so back in our documentation that is this web hook that bright data will call and send the data when it's ready

now having this information we're going to start building our fetch request we're going to start with a base your well for Bri data we're going to specify

the data set ID and this is going to be dynamic and change for different requests we're going to specify the end point that Bri data is going to invoke

when the job is done and we say that the format should be Json and uncompressed this is to simplify things you can also send compressed data and just make sure that you decompress it when you receive

it and that's it now the next step is going to be to provide headers with your by dat data API key as authorization the method is going to be post and for the

body we are going to stringify the input and send the input as the body of this post request then what we get back is the data and the data is going to

contain the snapshot ID because the first request that we sent is an object with a snapshot ID this is important because this snapshot ID is going to be

the data and what I'm going to do is I'm going to store a snapshot ID in a database in a table called scrape jobs why do I do that well because I'm going

to have access to the snapshot ID but I also need to know what data set was that request for so in the table I have the ID which is this snapshot ID and I also

have a data set with a status this helps me in a multiple ways by subscribing in real time from my client side application to show the status change of

a job so as soon as it is ready I update the UI but a more important reason for this table is for me to be able to map a

snapshot ID to the data set because the web hook that I have there the web hook will receive a data and I need to know for what data set it was was it YouTube

data was it LinkedIn or was it Instagram but the only thing that I will know is the scrape job ID the snapshot ID so now that we are done with the first part of

our equation that is going to trigger a scrape job let's have a look at the code for the web hook that is going to receive the data so on the receiving end

we have this web hook and this because I'm using super base Edge functions is a doo function so a dinos Ser function in

your case it can be a nodejs uh endpoint in your API your rest API so what we have to do is to take the data from the

request dojon and this is basically the body of the request and from the headers there is going to be a header with snapshot ID so take this two data and

knowing the snapshot ID and the data that we received and the data is going to be exactly this like nothing more nothing less an array of information now

the next step is going to be to knowing the snapshot ID I'm going to fetch the scrape job from my database and this is going to be give me all the information

about that scrape job but what's more important for me in that case is to know what's the data set ID of the scrape job

that we just received data for knowing that data set ID I can have a list of functions where a switch case where I

handle every single data response differently for example if the scrape job had the data set equal to YouTube channels that means that this data

belongs to a channel data so I'm going to call a special function called save Channel if it's about videos I'm going to call save videos if it's YouTube comment save YouTube comments if it's

LinkedIn it's going to be save LinkedIn post and so on so all other data sets like Instagram Tik Tok LinkedIn and so on are going to be here different cases

this is one way to implement this and I did this in order to have only one web hook however this is not the only way you can

also have different web hooks for different data sets and all of them will know and handle one specific data set data for example you can have like a

YouTube channels web hook a YouTube videos web hook a YouTube comments web hook and so on so it really depends on you like how you want to structure it uh one endpoint with a switch Case by

calling different functions or completely different endpoints it depends on you and how to architect this both of them are going to work perfectly the only change that you're going to

have to do is when we trigger this endpoint here the web hook URL you need to specify the correct endpoint or the function that you want to call for the

result based on this data set ID so now that we have this data here the next step is again to cat the results for easier and faster access for the YouTube

data I have mapped exactly the data that I need to work with that I'm going to need for large language models and for my user interface and I have free tables I have information about the YouTube

channel I have information about YouTube videos that belong to a channel and information about YouTube comments that belong to a video and if we go back to

this switch case and focus exactly on the save channel the save Channel function will receive the data and the snapshot ID the data again I will remind

you is this array with data about the channel if we're talking about save Channel web hook now if look at that function what's happening there in the

save Channel we are simply taking the data and parsing and formatting it correctly and saving it in our YouTube Channel's table I'm using an upsert

operation instead of insert in order to insert it if it's not already there or update the data if I already have the same channel with the same ID so if I

will run this function again it's not going to error or it's not going to ignore the new request but it's going to update the information about the channel

giving me upto-date information in my system after I saved the channel what I'm doing is I also update the scrape job to set the status to ready and this

is helping me on the client side to update the UI and get the data that was fetched and finally at the end after I store the information

about the video I also trigger another scraping job by invoking that trigger collection API function with the video data set ID so basically what I'm doing

here is I'm saying now that I know the channel I want to know the videos of that channel and I'm doing this on multiple levels all the way until I have

all the data about a specific channel in this way I'm creating a data pipeline so from the client side application when I search for a channel I start a data

Pipeline with blue we have a BRI data web scrapers starting with scraping the channel information then Bri data is going to call a function with green here

in my system on my back end this web Hook is going to store Channel data and automatically trigger a new bright data video scraper to scrape videos for that

channel once that is done I'm going to receive them in the same web hook and I'm going to store videos in my system and I'm going to trigger another scraper to fetch the comments of all the videos

that we just received and this scraper is going to call the web hook to store the comments and you can link these stages as much as you need in order to have the information that you're looking

for and having them linked together this way is going to help us later whenever we have to rec scrape a specific Channel we're not going to have have to think

about multiple stages we're only going to say hey I need updated information about this Channel and I'm going to start the same data Pipeline and it's going to follow through the same steps

here from the UI perspective this is allowed me to build the following screens where I have a YouTube channel analyzer where I can paste the channel

URL at the top after that bright data is going to scrape data I'm going to store it in the database and then I'm going to display the channel information together with the videos and the

comments of the videos and inside the video we have info about that specific video this was an example of how we can integrate one data source in this case

YouTube but I encourage you not to stop here and go ahead and integrate other data sets as well as I said there are more than 270 data sets on bright data

web apis so go ahead check the others and try to integrate them because the code as I said is going to be the same the way we build it in the previous

steps is very simple because the triggering a collection or triggering a scrape API is going to be done with the same function zero code changes the only

thing is the input we're going to send a different data set ID and a different input and for example we can go into the LinkedIn we go here we go to the

overview and see for the input I need a URL for the LinkedIn profile and I need the ID of this data set you can take it

from the URL here in the request there is data set ID you can search for a data setor ID and you're going to see that you can copy from here or you can also

take it from the URL of Bri data alog together from the top so you're going to send this data set ID to our trigger collection API function and they input

with URLs for LinkedIn so triggering that doesn't require any code changes the only code changes that you're going to have to do is you're going to have to

add another case here to check what data set ID is this data from so you're going to add the LinkedIn case here and Implement a function on how you want to

handle the data that you receive from Bri data about this specific data set for that I encourage you to go into the overview and see the example output the

example output will tell you the format of the data so you can map it to your database for example you have a name you have a country position City and so on

so the output and the data structure is different for every single scraper because all the platforms all all of them have different data so make sure to

look here what data is available and map it to your database schema and that's how you're going to have data from different sources from different social

medias even from Reddit for example we can go to scrape market for social media we can see even from Reddit like very

important source of information as well all right so at this point we should have lots of data in our database we don't necessarily need data what we need

is insights we need easy to digest information and insights about this data so that's why in the next chapter we're going to cover how to use Ai and large

language models to extract Insight from this data so we're going to sprinkle some AI in this project and what we're going to do is we're going to use open

AI models for our project to extract insights I'm going to show you a couple of use cases and later you can experiment and implement it in so many

ways because the fundamentals are going to be the same it's only going to be a different prompt that you're going to use for your specific case the first use

case that I want to cover is AI video transcript analysis so from Bright data when I scrape the video information from YouTube I also get the transcript and

this is super valuable because with the transcript we can know what is happening in the video we can know what the Creator is talking about so what I want to do is I want to summarize the

transcript I want to extract the key topics that are being discussed there the discussion points and maybe even finding brand manions for example if

you're working with different creators you can do a scraper that would find the Brin manions will find the time a stamp when this is happening and so on so

let's have a look at the code on how we implement this first of all we need to uh install the open AI package it's available for nodejs python and a lot of

other languages as well we need an API key that we can generate on open Ai and using that key we are initializing open

AI next what we need to do is we need to start a chat completion chat completion in simple terms is the same things that happening when you go to chbt and ask a

question by pressing sand on a question you're generating a chat completion in other words what the model is going to do is complete the chat with the next

reply so we do that by calling open ai.

chat. completion. create and we provide a list of messages and this list of messages if we think back to the chbt example is the history of messages that

you have in that chat thread the first message is very specific when we use the API because we can provide the role developer here and with a role developer

we can specify some predefined rules that we want our open AI uh large language model to follow so we're giving some context of what do we we expect to

happen and how we expect the open AI to respond in this case I'm telling it that you an AI assistant specialized in analyzing and summarizing video transcripts your task is to extract

concise and meaningful summaries from the provided transcript provide the output in Json format Json format is important and we're going to see it in a second so this is the first uh message

the developer role message the next one is a message with role user and this is in a way what we are typing as a user so what I'm typing here is I'm creating a

message saying that here is the video transcript I'm putting the video.

transcript that I have in my database that we saw here so the whole transcript of the video and I'm finalizing the message with the following text please

provide the summary and key topics in Jason format so I remind it again that he needs to operate in Json format and to get the results that I expect I

provide the exact Json structure that I want back so I tell him what fields for example AI summary of type string I give

here more details about that and AI topics an array of strings and I give more information what do I mean by that what do I expect in that field so with

this Json format open AI will try to answer in Json format and if I do response format type Json object the open I will also try to validate that

it's a Json format it might not be the format that you need but it's a valid Json so I encourage you to do some try catches on your end as well because we

never know how the AI will hallucinate in some cases but Mo in most of the cases if you are very upfront with it that you expect a Json format and you provide the data type that you expect

like this you're going to get it this way so I'm specifying the model in this case I'm using the GPT 40 mini you can use other models as well depending on

your case and what we get back is the chat completion so I'm taking the reply from the first choice and I'm converting it to a Json using json.parse this is a

part that I recommend adding into a try catch because as I said Jason might have issues now having this Jason reply I can access Json reply. a summary and Json

reply AI topics because that's what I asked kindly the AI to give me and I can store and update my video in the database with this information I can

have this AI summary of a video with the key topics that were talked in the video like this again this is just an example and what's possible sky is the limit and

I encourage you to think about like what exact data you want to extract from the transcript what I did here to make it easier for me to run this process

without having to think about running it in some specific cases is I automated them AI analysis for every single insert

it in my YouTube videos database so as soon as I insert um video in the database on superbase there is a concept

called database web hooks so when the table YouTube videos has an event insert I'm calling a web hook I'm calling a URL

for the edge functions that is running this AI workflow and in my Edge functions I receive a type of operation I receive a table and the new record the

video that has been inserted so by having this automation I make sure to generate AI summaries every time a new video is inserted in my database if

you're building this on nodejs what you have to do is to always make sure that when you do an insert operation in the database for a YouTube video to also

invoke that AI function that will do the AI analysis and that is also going to work fine now moving on to the next use case in our AI analysis I also want to

show you how we can analyze not only data that the brand is posting but also data from Community from customers

comments reviews what people are saying because I think this use case is even more valuable for a company to know what the customers are saying

out there so the next use case is to do AI analysis of public opinion for this example I'm going to work with the comments from YouTube but you can extend

the same approach to social media to social media posts that are mentioning your brand to posts on Reddit reviews blogs forums and so on so what you can

do in terms of analyzing public opinion is you can analyze sentiment to try to understand if it's positive words of if it's negative sentiment of a post or of

a group of posts you can extract key conversation topics you can extract pain points or very common requests that your users are asking for you can find the

most common issues or challenges they're going through and this way you can prioritize your next steps you can get ideas for Content or get ideas for

improving your product or service based on public opinion and you can also for more specific use cases you can detect

early a potential PR crisis if you find some hints of a PR crisis of your brand being attacked you can find that early

and address it accordingly so possibilities here are also endless I'm going to show you how we can do sentiment analysis based on YouTube

comments and based on this you can extend it for other use cases as well well so the main part of this function Remains the Same we are still using open

AI we are still using chat completion with the same model the only different part is the messages so here a lot of this AI work nowadays is prompt

engineering and writing the correct prompt here is what I came up with maybe it's not the best I'm pretty sure you can write better promps than me but this

is what works perfectly for my use case so I'm telling it in developer message that you are an AI assistant specializing in analyzing YouTube comments your task is to determine the

sentiment and extract common topics and I also say it I need Json format now in the main message from the user I tell it like here are the comments for a YouTube

video and what I do is I merge all the comments under a YouTube video that I have in my system and I put them as a bullet point list here in text and finally I say please Analyze This

comments and provide the following format I need the sentiment and I provide three different values that it can choose from positive negative neutral the score of a sentiment from 0

to one the explanation here is going to give me a brief explanation of that sentiment and a list of common topics as an array of strings so having this uh

completion I also save it back in the database and this allows me to have a user interface and to show this common sentiment analysis under channels and videos in in the application that I've

built in the following way we can see the label one is positive one is neutral the score 085 05 a little bit of explanation so we see we understand

better like the overall structure of the comments and the common topics again you can extend this to give you things like commonly suggested ideas most common

bugs that are me being mentioned or other things that you might need to extract from the comments so the question that I want to ask you now is

what analysis and what data will make sense for your business think about that and go ahead and try it out with a prompt based on this example because as

I said the process Remains the Same and here is a high level overview of a AI data pipeline so whenever we have to

extract some insights using large language models first we need some data so using bright dat we can get any publicly available data from the internet The Next Step would be

to clean that data to make sure that we're working with the right data following the rule that garbage in garbage out we need to make sure that we are working with the correct data The

Next Step would be to try to provide more context for that data for the llm to make better recommendation to understand better what's happening you

can for example for a list of comments we can maybe provide the transcript of a video so the llm knows what the video is about and what do the comments mean

after we have data and more context we run it through an llm with a right prompt here you're going to get better results by being able to provide and

build a better prompt and after we get the result we store the results back in our system in our database for later being able to display to the user and

that's how you can use Ai and large language models to extract key insight from large data sets now let's go ahead and follow the last chapter of today's

video and in this chapter I want to talk about Automation and scheduling because at this moment what we have is a system

that can do this research ad hoc basically when the user asks for it so the whole system that we have so far is

powered by an input for example a channel URL a Google sech SE a LinkedIn URL and Instagram handle and so on after the user presses the button the whole

pipeline starts starting with a scraping the data scraping pipeline going through Channel scraping video scraping comment

scraping or search links details about a specific link and so on after the data scraping pipeline is done it will invoke

automatically the AI pipelines that is going to for examp example extract key Insight based on the data that was just collected and finally after all of this

is happening which can take some minutes the user will have this information in the application this is good for research purposes for example when we

try to research something new a competitor or a new thing however what I want to build in my application is a way

to track different channels social media profiles serup searches brand mansions and I want to track this over time on

autopilot I want this system to work for me every single hour to make sure that I have realtime data about my brand and

I'm also notified if something changes if something happens out there this can give a Founder the peace of mind that they need in order to disconnect a bit

from social medias and be able to to do the hard work we the risk of something happening out there without us knowing so in terms of user interface what I

want is a dashboard where I can go ahead and pin or track different data I can track different YouTube channels I can track different search queries I can

track Reddit posts and so on and by tracking these channels what needs to happen in our diagram here in the data flow is a possibility to trigger a

channel scraper or a data scraper not only from the application from the client side when the user presses a button but also periodically over time

if we zoom in on this part what we want to add is we want to add a function let's call it the auto tracker and we want to schedu a Chron job a Chown job

is a process that is going to do something based on intervals of time that we specify for example we can say hey Crown job can you invoke this auto tracker function every 10 minutes unes

and now what the auto tracker function is going to do on its end it's going to have a couple of responsibilities it's going to get information from the database about the items that are being

tracked and specifically items that needs to be scraped because some of them can be scraped at larger intervals of time and it needs to start different

scrapers for example YouTube channel Ser scraper Instagram Reddit and so on the Cron job can be set up on any server

that you're running on Linux with nodejs and so on in my case I'm using superbase and superbase has a Chone integration that can invoke an edge function on

predefined intervals of time so I scheduled this Crown job for every 5 minutes to call this Edge function now let's have a look at what's happening in

that edge function in that edge function we go step by step for different phases and try to invoke the necessary scrapers

for example first thing is about Ser for the search engines so I'm going to look in the database for different Ser searches and I'm going to look specifically for the ones that is

tracked is equal to true the ones that we marked with the star in our application and also the ones that have not been updated in the last 24 hours so

that will help us instead of scraping the searches every 5 minutes I decid to only scrape them maximum once a day so

every 24 hours different platforms would require different intervals depending on how fast data changes on SE Engine it doesn't change that much on Reddit it

might change every minute so especially if a post is trending you might want to scrape it like at every minute for example so now that I have all the track

searches what I'm doing is for all of them and invoking a function to start collecting serup data to start the scraping process and I'm doing this for

other data points as well for example the next is YouTube scraping and I'm doing the same I'm looking for YouTube channels that are tracked and that are

not updated in the last 24 hours and I'm invoking the trigger collection API with a list of inputs and because the trigger

collection API from Bri data it it can do bash requests I'm putting them into one single request and I put all the URLs there and the data set I is the

YouTube channels data set and you do the same for all the tracked items for Instagram Pages for Reddit for mentions or different forums and so on and with

this what we get is a machine that is working on autopilot for for us 24/7 without us having to do anything all

right so this was our AI powered social listening application let's go ahead and recap quickly what we learned today so we learn how we can collect and process

data using Bri data tool and more specifically we learned about the serp API that can help us easily scrape the search engines which contain a lot of

valuable data for our businesses we learn how to use web scraper apis to get data from from a lot of social medias

like YouTube Instagram LinkedIn Facebook Reddit and so on and we know that there are more than 270 web scrapers pre-build that we can

invoke by simply sending a post request the only thing that we need to look find the data and parse it and store it in our system we also learn how to parse

this data and store structured data in our database that we can later use with the open AI llms to extract actionable insights we also learn how to automate

the whole workflow for continuous monitoring and to run on 24 on autopilot 247 and in the end we have an application a social listening

application that can help us have a better understanding of our brand across the internet before we end this I want to leave you with a couple of ideas of

how you can improve and what steps you can take next so here are some ideas how you can improve a social listening application what you can do is you can

Implement alerts you can specify some metrics that you want to track some thresholds and when the automatic data scraping is happening if the metric is

above a threshold you can notify using email or slack or any other application that you're using for communication you can do weekly daily or periodically

reports sent over email with the most important data and here you'll have to decide what is the most important data for your business you can track competitors to show comparison between

you and a competitor for example for different Google search searches or on social medias to know the sentiment of

your mentions versus the sentiment of the competitor mentions on Twitter this is just an example we can show progress over time and for that you're going to

have to adjust with database design to Store updates as new values instead of overwriting old data so a little bit of the changes on the database design you

can also have historical data collected and displayed in the application you can also Implement AI recommendations so far we have analyzed and got key insights

from the data with AI now we can do these key insights feed them back in the AI and say generate a list of recommendations and this can also be implemented quite easily with a system

that we have developed today moreover where you can add a rag system a retrieval augmented generation and I don't have a lot of time to explain what a rag system is uh you can quickly

search on the internet but it is a powerful tool that we can take our system to the next level by giving the

AI knowledge about our data sets about our database for example so by implementing this rag system that is not

super hard to do we can have features like searching similar companies not only based on the name and the data but also based on the topics that they

discuss in the videos in on social medias topics that they research and so on we can Implement a chatbot for our data for example I can go on a YouTube

video in my application and chat with a bot saying like what are the key topics here or at what time stamp did we mention this you can find creators or

companies based on the discussion topics and a lot more so basically having this information in your system you can Power It Up and build so many new things in fact if you're interested in the r

systems I'm planning uh in the following March of 2025 a tutorial on our Channel where I'm going to implement a reg system a chatbot on top of YouTube

videos based on their transcript so if you're interested in that make sure to follow me on YouTube at N.D and not to miss that out that's all I had for today and I hope

you learned something new I hope you had fun following this tutorial if you still have questions feel free to reach out to me on social medias you can find me at

Vadim nodev or on our YouTube channel and I'll be more than happy to try to help you with your questions if you decide to implement these systems in your business in your cases let me know

how you adjusted it let me know your uh case studies I'd be more than happy to hear what you guys have built with this system thank you Bri data for providing these tools that without which it would

be impossible to do this kind of projects and I'll see you later bye-bye

Loading...

Loading video analysis...