How to Actually Scrape Twitter/X Data with n8n
By Nate Herk | AI Automation
Summary
## Key takeaways - **Scrape Unlimited Tweets with n8n**: This n8n workflow automates scraping unlimited tweets from X, enabling market analysis, competitor research, and staying updated on industry trends. The workflow and Google Sheet template are available for free. [00:01], [00:15] - **Cost-Effective Twitter API Access**: The twitterapi.io service offers a pay-as-you-go model, with 1,000 tweets costing around 15 cents. Using a specific referral link provides a $6 starting credit, significantly more than the standard $1. [02:41], [02:50] - **n8n HTTP Request Setup**: To set up an HTTP request in n8n, import the 'curl' command from API documentation. This automatically populates fields like method, URL, and authorization, simplifying the process, especially for complex POST requests. [03:44], [03:57] - **Handling Twitter API Pagination**: Pagination in the Twitter API requires using a 'cursor' parameter. The workflow dynamically fetches the 'next_cursor' from previous API responses to retrieve subsequent pages of tweets, looping until a set count is reached. [13:04], [13:34] - **Automated Tweet Data Storage**: After scraping and extracting tweet data (ID, URL, content, likes, views, date), the workflow appends this information to a Google Sheet using the 'Append Rows' node, mapping extracted fields to sheet columns. [01:11], [11:08]
Topics Covered
- Unlock limitless market intelligence from X.
- API calls are simpler than you think.
- Transform raw tweets into actionable insights.
- Mastering dynamic pagination for endless data.
- How to build advanced looping workflows in n8n.
Full Transcript
today I'm going to be showing you guys
this n workflow that I built that helps
me scrape unlimited amount of tweets
from X there's a ton of good stuff that
you can find on X whether it's market
analysis competitor analysis or just
staying up to date with the latest news
and Trends in a specific industry so the
system is going to help you do all of
that and as always I'm giving away the
workflow as well as the Google sheet
template for free so all you need to do
is download those things plug them in
hit run and you'll be scraping Twitter
all you have to do to get those
resources is join my free school
Community the link for that's down in
the description but let's not waste any
time let's hop into a live demo and then
I'll show you guys how to set up this
API step by step as you can see here's
the workflow that we're going over today
quick disclaimer I don't know if this is
optimal the way we have these different
sets and code noes so if any of you guys
are programmers please let me know I can
um make this more efficient but anyways
it works right so we're scraping X up
here we're going to be checking the
count and right now I'm just having it
only go three runs through um if you
wanted to increase that you'd have to
change the number here as well as
increase the sort of count code right
here but if we haven't gone through or
three times it's going to come down here
we're just going to be setting the
variables increasing the count we have
to set the page ination and then we're
going to loop back through and we're
just going to keep scraping Twitter
until we've done that three times and
then that's the end of the process but
as you can see we're updating a Google
sheet right here which has these
specific columns like the Tweet ID the
URL the content likes retweets replies
quotes views and the date of the Tweet
so we're going to hop back into the
workflow I'm going to hit test workflow
and we'll see this happen live so right
now it's hitting Twitter or X now we're
extracting the info adding it to the
sheet and then you're going to see it
Loop all the way back down here we're
once again hitting the API again we're
doing the second round of scraping
adding that to the sheet as you can see
there's 38 total items so there's 38
tweets and this is going to be the last
run we have 58 tweets and then it's
going to go off this way because we are
done so that just finished up we can
click into our Google sheet and you can
see now that we have 58 tweets to look
through each of them of course have the
URL so if I clicked into this one we
could see we have a tweet right here
open AI your turn can you give this man
a luscious full head of hair and a nice
haircut looks great so as you can see if
I was to scroll down we would see that
we in fact have 58 tweets all of them
have an ID along with the links so if we
clicked into this one we can see this
was on March 11th and it has almost
31,000 views so if we click into it and
we wait for it to load it's on March
11th and it has almost 31,000 views so
we know that we're actually getting real
information into our data sheet and um
yeah so let's break down what's going on
okay so I told you guys that we were
going to walk through setting up that
API call step by step so we're going to
do that and then I'll walk through this
actual execution right here and we'll
take a look what was going on all right
so this is the API that we're going to
be using it's called Twitter api. and
I'm sure you guys are wondering about
the price it's really not too bad as you
can see it's sort of a pay as you go
model and you can get a th000 tweets for
15 cents so it's really not too bad also
I have a link for this down in the
description and if you use that specific
link you'll get $6 to start with I think
if you sign up normally you only get one
so you'll get five free extra dollars to
play with anyways this is the API we're
going to be using to access the Twitter
data I'm going to click on docs which is
the API documentation for the different
endpoints that we can hit and basically
the different functionality of what we
can do using this API so let's take a
quick glance at the left we have user
endpoint actions which would mean we
were looking at a specific user wanting
to get their tweets their followers
their mentions we have tweet endpoints
which means that we can um grab an ID of
a tweet so over here you see for every
tweet we have an ID we could grab you
know tweets by ID we could grab their
replies their retweeters or what we were
doing in the demo was just doing an
advanced search where we were searching
for tweets based on a query okay so I
know that API documentation and setting
up these requests can be kind of
intimidating going to try to break it
down as simple as possible okay the
first thing I want to point out is
whenever you're looking at API
documentation if you see curl commands
on the right which would look like this
you're going to want to copy that and go
into a new workflow type in HTTP request
and all you're going to want to do is
hit import curl paste that in there and
when you click import it's going to
populate the fields that you need so
it's going to be really handy this one's
not too bad because it's a simple get
request with pretty much just
authorization here but in the case of
you know sending over a post request and
you have to do a full Json body um and
setting up those parameters it's going
to be really helpful if you're able to
just import that curl and everything set
up okay so the first thing that we
notice is our method and our URL so if I
hop back into the documentation we can
see that right here we have a method
which is G and then this is sort of the
endpoint so if we were to copy this
endpoint and come back into here and
paste it it would give us that full
endpoint so I just pasted exactly what
we copied um and basically what happens
is we have like sort some sort of Base
URL so we're accessing the Twitter api.
API and then every single function has a
different endpoint so because right now
we're doing advanced search that's what
it looks like if we were doing um you
know get user info the end point would
be Twitter
userinfo so as you can see all of these
are going to have different endpoints
which basically just says hey we're
reaching out to this server and we want
to do something different so then what
comes next is going to be authorization
and that just means you know you made an
account you have an API key you're
paying for this search not someone else
so right here we can see authorization
we have sort of a key value pair the key
is going to be x- api-key and then the
actual value is going to be your API key
and what's important to notice here is
that this is a header off sometimes
they're query offs sometimes they're
headers in this case we have a header
and so what you need to do is go to your
dashboard in the top right you'll click
on your profile your dashboard and then
you'll have an API key right there to
copy copy that and then we'll bring it
into NN so as you remember the key was
x- ai- key and the value was your actual
API key so this is basically saying this
is a placeholder this is where you'll
put in your API key now what we can do
that's a really cool tip with NN is
instead of filling it out here in the
header parameters we're going to do this
up here under the authentication tab
which basically just means we're able to
save this authentication and use it for
later and this is why you needed to
remember that this is going to be a
header off so I'm going to click on this
button I'm going to click on General
credential type and then within the
General off type we're going to be
choosing header because that's what we
saw in the documentation so header and
now all we have to do is as you can see
mine is already configured but I'm going
to pretend like I'm creating a new one
we have a key value pair like we talked
about so in this case it was x-
api-key
um and all caps and then for the value
you're just going to paste in your API
key that you just grabbed from Twitter
api. and then you can just basically
save this so then you have it forever so
I'm just going to call this one Twitter
demo we're saving the credential it's
connect successfully and now I as you
can see I have all these different apis
that I already have saved so when I want
to set up a request in the future I
don't have to go find it put it in here
as a header off I just have it saved
already so I'm going to flick off send
headers because we're sending a header
right here and now let's go back to the
documentation and see what else we need
to configure okay so we're back to the
advanced search endpoint we can see that
we have two required fields that we need
to put in which is going to be a query
and a query type so the query is like
what we're actually searching Twitter
for so in that first example my query
was open Ai and I'll show you guys that
later but that means it's going to be
searching Twitter for open Ai and then
we have a query type which basically
means you have two options you can
either say latest tweets or top tweets
so what I did in the demo was top tweets
as you can tell they were all um very um
very high performing with with the views
and the likes but they're still going to
be pulling recently so these were all
tweets you know still within here's
March 8th so that was about a week ago
um but mainly they're still going to be
pretty recent as you can see okay so
anyways for query we have a string and
for query type we have a string but but
we only have two options so what I'm
going to do is I'm going to flick on
send query parameters and we know the
first one was called query and for this
example let's just do Manis because you
know that dropped and everyone's talking
about it and then we're going to add
another one which we know was query type
I think with a capital T let's just go
make sure query type with a capital T
and it has to be either latest or top so
for this example let's do latest rather
than doing top okay so that's what we
have here um and then as you can tell in
the demo we have one more option which
is cursor
um and we're not going to set that up
right now but this is basically how you
paginate through requests up here it
says that um each page returns exactly
exactly about that's kind of weird
wording but each page returns about 20
tweets and if you want more like in the
demo we got 58 because we went through
three times so we're going to leave that
blank for now and we should be set up so
I'm going to hit test step we're going
to see it's going to be searching now
through Twitter and we're going to get
one item and if I just move this over a
little bit we can see we have one item
with
this is a total
of looks like this only got a 16 total
tweets because this is um number 15 and
computers start counting from zero but
anyways this one got us 16 tweets so I'm
just going to pin this data so we don't
have to rerun the API we have this to
play with and let's just take a look at
one of the tweets so here we have the ID
of the tweet and the URL let's search
Google for this and we'll see that it
should be a recent tweet about Manis um
let's translate this I can't fetch the
translation so let's try another it's
from a user name Manis Eric so maybe
that's what happened okay maybe let's
try something else I'm going to type
in college basketball and we'll try this
and it's going to ask me if I want to
unpin the data yes I do so we can do
another run and then we'll see if we
just want to validate some tweets okay
let's let's go over here and pin this
and we will copy this link right here
and go to x and see what we got so
college basketball betting this one came
out 541 which is right now is current
time so that's the latest tweet okay so
we have a ton of data coming back right
and it's all in one item so we want to
do is clean this up and extract the
fields that we're looking for so I'm
going to paste in this code node right
here which you can get by joining my
free school Community the link for
that's in the description you'll click
on YouTube resources click on the post
associated with this video and I'll have
a text file right here for the actual
code that's within the code node or of
course you could download the workflow
um where you download this Json come
into nadn hit import from file up here
and then you'll have the whole workflow
with all the code nodes and everything
so of course this is the workflow that
you'll actually be downloading and if
you want to really understand what's
going on with this workflow and the the
looping and the setting Fields then I
would definitely recommend you join my
paid Community the link for that is also
down in the description it's really just
a more Hands-On approach to learning
nadn and having deep discussions about
what's going on we have a full classroom
about building agents Vector databases
apis and HTTP requests as well as
step-by-step builds this is definitely
not a place for experts only my whole
goal of the channel is to make things as
simple as possible so um if this sounds
like something that you're interested in
then definitely hop in here okay anyway
so I'm just going to plug in the code
node and then it's already configured
basically what we're saying is out of
this item with it could have 15 tweets
it could have 20 tweets every time what
we want to do is just basically pull all
the objects out and get what we want so
actually in this case we have 23 tweets
um so this one's different than that
first one right and as you can see for
each one we've now extracted a tweet ID
a URL the actual content the like count
the view count and as you can see all of
these were just recently posted so
they're very low on views except for
this one actually kind of went crazy
this one was from March 6th so not sure
what happened there that was almost 2
weeks ago now but um any anyways this is
our Twitter data so what's next from
here is putting that into a Google sheet
so I'm going to grab a Google sheet node
all the way down here um we're going to
do a pen row and sheet or it's going to
be a pen rows but we will choose our
actual sheet which is going to be
Twitter data we'll choose the sorry the
document now we're choosing the sheet
which is sheet one and now we have to do
is map The Columns so because we were
able to extract all of these columns
that we want it's going to be super easy
it's just as simple as dragging in the
values that we need to send over to the
colums in our Google sheet so what I'm
going to do real quick is I'm going to
delete all 58 items over here so we can
just start from scratch and now we can
see we have to basically tell naden what
values am I putting in each of these
cells right here so back in the naden
workflow I'm going to grab tweet ID from
the left from the code node and just
drag it into the Tweet ID column I'm
going to grab URL drag it into the URL
column and just going to do all that all
the way down we made it in this order so
it's just really intuitive to drag um
exactly like I said in this order order
so you can also get the template for
this Google sheet in the free school
Community um just so you can basically
plug this thing in right away and get
going but we have um pretty much
everything one thing also I did in the
code node was we formatted the date to
look a little more human readable but
now we have that done and it's going to
be doing that for all 23 items coming
through and if I hit play and then we'll
go over to here we'll basically just
watch all 23 tweets pop into this
workflow or sorry the sheet so as you
can see there they are we have all of
the links are clickable so let's click
into this one real quick just to verify
there we go we have some um looks like
women's college basketball nice bucket
there
anyways that is pretty much the first
step of we scraped we extracted and we
put it into a Google sheet so from there
I was thinking okay that's cool but we
only got 23 items what if we you know
want to put this on a scheduled trigger
where every morning we are scraping you
know AI news and we want every morning
to just get like 100 tweets put into a
Google sheet what we had to do was look
at how the page in nation works so as
you remember in the API documentation it
says use cursor for pagination if we see
the cursor parameter it says the cursor
to paginate through the results first
page is just basically an empty string
this is basically what we just built out
these three nodes where we're getting
tweets extracting info and adding it to
the sheet but now what we have to do is
set up a parameter in here that is the
cursor and it's not just a simple page
ination where it's like page zero page
one page two we have to grab a value and
so what we're we're grabbing here is the
output of the actual tweet extraction or
tweet scraping there's a value called
Next cursor so item one basically says
this was the first page and if you put
this page in another request you get
page two and then on page two we get a
different cursor which basically says
okay now you can put this cursor in and
you'll get page three and as you can see
they basically get longer each time so I
think that it's just adding a chunk on
each time and saying hey let's get the
next page now so we had to bacon some
other logic here so I'm just going to
break down what's going on in order okay
so the first thing that we did was we
set a count and basically I came in here
and I hardcoded count to equal number
one and this is important because we
need this number to work off of and this
is where I said you know if you're a
coder or a programmer and this is not
the way to do it let me know but this is
how I got it to work so anyways we're
setting number one and then in the next
node which is the counter we're feeding
in the actual count from the previous
node as well as the cursor which will
come from down here So eventually the
count and cursor will both come from
down here but to start with on the first
first run we're just grabbing count from
the previous node which would be one and
then we're going to feed that in the
rest of the process okay then we move on
to the API call where we're going to be
scraping Twitter first thing that you
notice is it's pretty much the same
right here as the step-by-step example
we have our endpoint we have our method
we have our credential we have our query
which was open Ai and we have our query
type which was top rather than or then
we have our query type which was top so
searching for top tweets and then we
have our cursor which we didn't have in
the step-by-step example what I'm going
to show you guys is if we go to to run
number one there would have been no
cursor being fed in so on the left you
can see what was fed in which was
counter was one at this point and cursor
was null so basically it was saying okay
regular request I'm just hitting um
looking for top results for open AI then
if we go to run two we can see on the
left what happened is we now have a
cursor and we now have the counter
equals two so this is run number two
we're feeding in the cursor and we're
getting different tweets over here and
finally run number three on the left we
can see the counter went up to three the
cursor is now much longer and we feed
that back into the request right away
because we're able to always say json.
cursor which means we're always looking
here and so this is a concept that's
kind of hard to explain we're basically
looping everything back together because
otherwise we'd be referencing some sort
of absolute node which it'd be hard to
say we want the most recent cursor not
the first time we got one so that's why
we have to have this counter node which
is really important that says okay
whatever I'm getting is going to be the
most recent count and also the most
recent cursor so as you know we were
getting big it coming out of the API
call and then we have to extract them so
exact same code that we used in the step
by step we are getting three different
Runs run one had 18 Run Two at 20 and
run three at 20 which is a total of 58
and then we're just adding to them to
the Google sheet exact same way we did
that earlier except for we're doing it
one at a time so 18 first all the way
back 20 all the way back then 20 more
what's going on over here when we're
checking for the count is we basically
just have a simple if and we're trying
to check if the count is equal to three
then we're going to basically end the
process and the way we're able to do
this is once again we're referencing
that counter node which is this one that
we're feeding back for the most recent
count and cursor so we're able to look
here as you can see this one ran three
times run one um it was it was false
because the counter was one run two it
was false because the counter was two
and then on run three it finally became
true because the counter on this run was
three so we're just sending it off to a
no operation node which literally just
does nothing and if they're false it's
going to Loop all the way back down here
so the first thing I'm doing here is
just setting it to one item only because
whatever leaves this branch is going to
be either 18 items 20 items however many
tweets were pulled back so I just wanted
to set it to one to keep things cleaner
then what I did was I set the increase
so basically I grabbed um the counter
from from earlier and which which would
be the most recent count so we're
setting it back to two but we're setting
it dynamically here so that the code
node can bump it up by one and then
exact same thing with cursor we're
grabbing cursor from the G tweets node
um in order to feed get back in later so
here you can see run one the counter was
one um and then later gets bumped up to
two on run two what was coming in was
two and then it gets bumped up to three
and you notice each time the cursor also
increased from there it's a code node
that obviously disclaimer all of these
code nodes I had claw 3.7 right for me
so like I said maybe not optimal but
it's working we have the counter coming
in at two and then we're outputting it
called count and it's going up one as
you can see so on run one it was coming
in as one it came out as two on run two
coming in as two came out as three from
there all we're doing is we're setting
it one more time so I know there's lots
of sets going on but we're setting it
because we need to be able to pull it in
dynamically and always have this node be
able to reference it as json. count or
json. cursor because remember earlier we
passed it in as a hard-coded variable so
it needs to be able to say okay I can
either look here or here based on
whichever one has most recently been
activated all right so that's basically
it I'm definitely aware that this
concept of dynamically setting all these
things is a little bit confusing but
what I would definitely recommend is You
Know download this template run it and
just look in there and explore and see
how it's actually being able to
reference things the key thing to
remember here is that when you're
referencing a variable like let's let's
just go back into here when you're
referencing something and you use dollar
sign Json it's looking for whatever is
the most immediate node coming
beforehand and otherwise if you're doing
something like right here where in this
case we're referencing the counter node
or we're referencing the get tweets node
that's a lot different because it's like
an absolute reference so when we use a
dollar sign Json it's just going to give
us a lot more flexibility by being able
to reference whatever came previously
before so we know we're getting the most
upto-date information so I hope you guys
enjoyed this one I hope you guys learned
something new as always if you did
please give it a like it definitely
helps me out a ton and I always
appreciate you guys making it to the end
of the videos definitely let me know in
the comments what else you guys want to
see some other use cases as you know
there's a ton more we can do with this
Twitter API because we can now that we
have the IDS of both the users and the
actual tweet we can look up so much
other stuff so generating lead list
stuff like that but yeah that's going to
be it really appreciate you guys once
again and I'll see you in the next video
thanks
Loading video analysis...