LongCut logo

How to Actually Scrape Twitter/X Data with n8n

By Nate Herk | AI Automation

Summary

## Key takeaways - **Scrape Unlimited Tweets with n8n**: This n8n workflow automates scraping unlimited tweets from X, enabling market analysis, competitor research, and staying updated on industry trends. The workflow and Google Sheet template are available for free. [00:01], [00:15] - **Cost-Effective Twitter API Access**: The twitterapi.io service offers a pay-as-you-go model, with 1,000 tweets costing around 15 cents. Using a specific referral link provides a $6 starting credit, significantly more than the standard $1. [02:41], [02:50] - **n8n HTTP Request Setup**: To set up an HTTP request in n8n, import the 'curl' command from API documentation. This automatically populates fields like method, URL, and authorization, simplifying the process, especially for complex POST requests. [03:44], [03:57] - **Handling Twitter API Pagination**: Pagination in the Twitter API requires using a 'cursor' parameter. The workflow dynamically fetches the 'next_cursor' from previous API responses to retrieve subsequent pages of tweets, looping until a set count is reached. [13:04], [13:34] - **Automated Tweet Data Storage**: After scraping and extracting tweet data (ID, URL, content, likes, views, date), the workflow appends this information to a Google Sheet using the 'Append Rows' node, mapping extracted fields to sheet columns. [01:11], [11:08]

Topics Covered

  • Unlock limitless market intelligence from X.
  • API calls are simpler than you think.
  • Transform raw tweets into actionable insights.
  • Mastering dynamic pagination for endless data.
  • How to build advanced looping workflows in n8n.

Full Transcript

today I'm going to be showing you guys

this n workflow that I built that helps

me scrape unlimited amount of tweets

from X there's a ton of good stuff that

you can find on X whether it's market

analysis competitor analysis or just

staying up to date with the latest news

and Trends in a specific industry so the

system is going to help you do all of

that and as always I'm giving away the

workflow as well as the Google sheet

template for free so all you need to do

is download those things plug them in

hit run and you'll be scraping Twitter

all you have to do to get those

resources is join my free school

Community the link for that's down in

the description but let's not waste any

time let's hop into a live demo and then

I'll show you guys how to set up this

API step by step as you can see here's

the workflow that we're going over today

quick disclaimer I don't know if this is

optimal the way we have these different

sets and code noes so if any of you guys

are programmers please let me know I can

um make this more efficient but anyways

it works right so we're scraping X up

here we're going to be checking the

count and right now I'm just having it

only go three runs through um if you

wanted to increase that you'd have to

change the number here as well as

increase the sort of count code right

here but if we haven't gone through or

three times it's going to come down here

we're just going to be setting the

variables increasing the count we have

to set the page ination and then we're

going to loop back through and we're

just going to keep scraping Twitter

until we've done that three times and

then that's the end of the process but

as you can see we're updating a Google

sheet right here which has these

specific columns like the Tweet ID the

URL the content likes retweets replies

quotes views and the date of the Tweet

so we're going to hop back into the

workflow I'm going to hit test workflow

and we'll see this happen live so right

now it's hitting Twitter or X now we're

extracting the info adding it to the

sheet and then you're going to see it

Loop all the way back down here we're

once again hitting the API again we're

doing the second round of scraping

adding that to the sheet as you can see

there's 38 total items so there's 38

tweets and this is going to be the last

run we have 58 tweets and then it's

going to go off this way because we are

done so that just finished up we can

click into our Google sheet and you can

see now that we have 58 tweets to look

through each of them of course have the

URL so if I clicked into this one we

could see we have a tweet right here

open AI your turn can you give this man

a luscious full head of hair and a nice

haircut looks great so as you can see if

I was to scroll down we would see that

we in fact have 58 tweets all of them

have an ID along with the links so if we

clicked into this one we can see this

was on March 11th and it has almost

31,000 views so if we click into it and

we wait for it to load it's on March

11th and it has almost 31,000 views so

we know that we're actually getting real

information into our data sheet and um

yeah so let's break down what's going on

okay so I told you guys that we were

going to walk through setting up that

API call step by step so we're going to

do that and then I'll walk through this

actual execution right here and we'll

take a look what was going on all right

so this is the API that we're going to

be using it's called Twitter api. and

I'm sure you guys are wondering about

the price it's really not too bad as you

can see it's sort of a pay as you go

model and you can get a th000 tweets for

15 cents so it's really not too bad also

I have a link for this down in the

description and if you use that specific

link you'll get $6 to start with I think

if you sign up normally you only get one

so you'll get five free extra dollars to

play with anyways this is the API we're

going to be using to access the Twitter

data I'm going to click on docs which is

the API documentation for the different

endpoints that we can hit and basically

the different functionality of what we

can do using this API so let's take a

quick glance at the left we have user

endpoint actions which would mean we

were looking at a specific user wanting

to get their tweets their followers

their mentions we have tweet endpoints

which means that we can um grab an ID of

a tweet so over here you see for every

tweet we have an ID we could grab you

know tweets by ID we could grab their

replies their retweeters or what we were

doing in the demo was just doing an

advanced search where we were searching

for tweets based on a query okay so I

know that API documentation and setting

up these requests can be kind of

intimidating going to try to break it

down as simple as possible okay the

first thing I want to point out is

whenever you're looking at API

documentation if you see curl commands

on the right which would look like this

you're going to want to copy that and go

into a new workflow type in HTTP request

and all you're going to want to do is

hit import curl paste that in there and

when you click import it's going to

populate the fields that you need so

it's going to be really handy this one's

not too bad because it's a simple get

request with pretty much just

authorization here but in the case of

you know sending over a post request and

you have to do a full Json body um and

setting up those parameters it's going

to be really helpful if you're able to

just import that curl and everything set

up okay so the first thing that we

notice is our method and our URL so if I

hop back into the documentation we can

see that right here we have a method

which is G and then this is sort of the

endpoint so if we were to copy this

endpoint and come back into here and

paste it it would give us that full

endpoint so I just pasted exactly what

we copied um and basically what happens

is we have like sort some sort of Base

URL so we're accessing the Twitter api.

API and then every single function has a

different endpoint so because right now

we're doing advanced search that's what

it looks like if we were doing um you

know get user info the end point would

be Twitter

userinfo so as you can see all of these

are going to have different endpoints

which basically just says hey we're

reaching out to this server and we want

to do something different so then what

comes next is going to be authorization

and that just means you know you made an

account you have an API key you're

paying for this search not someone else

so right here we can see authorization

we have sort of a key value pair the key

is going to be x- api-key and then the

actual value is going to be your API key

and what's important to notice here is

that this is a header off sometimes

they're query offs sometimes they're

headers in this case we have a header

and so what you need to do is go to your

dashboard in the top right you'll click

on your profile your dashboard and then

you'll have an API key right there to

copy copy that and then we'll bring it

into NN so as you remember the key was

x- ai- key and the value was your actual

API key so this is basically saying this

is a placeholder this is where you'll

put in your API key now what we can do

that's a really cool tip with NN is

instead of filling it out here in the

header parameters we're going to do this

up here under the authentication tab

which basically just means we're able to

save this authentication and use it for

later and this is why you needed to

remember that this is going to be a

header off so I'm going to click on this

button I'm going to click on General

credential type and then within the

General off type we're going to be

choosing header because that's what we

saw in the documentation so header and

now all we have to do is as you can see

mine is already configured but I'm going

to pretend like I'm creating a new one

we have a key value pair like we talked

about so in this case it was x-

api-key

um and all caps and then for the value

you're just going to paste in your API

key that you just grabbed from Twitter

api. and then you can just basically

save this so then you have it forever so

I'm just going to call this one Twitter

demo we're saving the credential it's

connect successfully and now I as you

can see I have all these different apis

that I already have saved so when I want

to set up a request in the future I

don't have to go find it put it in here

as a header off I just have it saved

already so I'm going to flick off send

headers because we're sending a header

right here and now let's go back to the

documentation and see what else we need

to configure okay so we're back to the

advanced search endpoint we can see that

we have two required fields that we need

to put in which is going to be a query

and a query type so the query is like

what we're actually searching Twitter

for so in that first example my query

was open Ai and I'll show you guys that

later but that means it's going to be

searching Twitter for open Ai and then

we have a query type which basically

means you have two options you can

either say latest tweets or top tweets

so what I did in the demo was top tweets

as you can tell they were all um very um

very high performing with with the views

and the likes but they're still going to

be pulling recently so these were all

tweets you know still within here's

March 8th so that was about a week ago

um but mainly they're still going to be

pretty recent as you can see okay so

anyways for query we have a string and

for query type we have a string but but

we only have two options so what I'm

going to do is I'm going to flick on

send query parameters and we know the

first one was called query and for this

example let's just do Manis because you

know that dropped and everyone's talking

about it and then we're going to add

another one which we know was query type

I think with a capital T let's just go

make sure query type with a capital T

and it has to be either latest or top so

for this example let's do latest rather

than doing top okay so that's what we

have here um and then as you can tell in

the demo we have one more option which

is cursor

um and we're not going to set that up

right now but this is basically how you

paginate through requests up here it

says that um each page returns exactly

exactly about that's kind of weird

wording but each page returns about 20

tweets and if you want more like in the

demo we got 58 because we went through

three times so we're going to leave that

blank for now and we should be set up so

I'm going to hit test step we're going

to see it's going to be searching now

through Twitter and we're going to get

one item and if I just move this over a

little bit we can see we have one item

with

this is a total

of looks like this only got a 16 total

tweets because this is um number 15 and

computers start counting from zero but

anyways this one got us 16 tweets so I'm

just going to pin this data so we don't

have to rerun the API we have this to

play with and let's just take a look at

one of the tweets so here we have the ID

of the tweet and the URL let's search

Google for this and we'll see that it

should be a recent tweet about Manis um

let's translate this I can't fetch the

translation so let's try another it's

from a user name Manis Eric so maybe

that's what happened okay maybe let's

try something else I'm going to type

in college basketball and we'll try this

and it's going to ask me if I want to

unpin the data yes I do so we can do

another run and then we'll see if we

just want to validate some tweets okay

let's let's go over here and pin this

and we will copy this link right here

and go to x and see what we got so

college basketball betting this one came

out 541 which is right now is current

time so that's the latest tweet okay so

we have a ton of data coming back right

and it's all in one item so we want to

do is clean this up and extract the

fields that we're looking for so I'm

going to paste in this code node right

here which you can get by joining my

free school Community the link for

that's in the description you'll click

on YouTube resources click on the post

associated with this video and I'll have

a text file right here for the actual

code that's within the code node or of

course you could download the workflow

um where you download this Json come

into nadn hit import from file up here

and then you'll have the whole workflow

with all the code nodes and everything

so of course this is the workflow that

you'll actually be downloading and if

you want to really understand what's

going on with this workflow and the the

looping and the setting Fields then I

would definitely recommend you join my

paid Community the link for that is also

down in the description it's really just

a more Hands-On approach to learning

nadn and having deep discussions about

what's going on we have a full classroom

about building agents Vector databases

apis and HTTP requests as well as

step-by-step builds this is definitely

not a place for experts only my whole

goal of the channel is to make things as

simple as possible so um if this sounds

like something that you're interested in

then definitely hop in here okay anyway

so I'm just going to plug in the code

node and then it's already configured

basically what we're saying is out of

this item with it could have 15 tweets

it could have 20 tweets every time what

we want to do is just basically pull all

the objects out and get what we want so

actually in this case we have 23 tweets

um so this one's different than that

first one right and as you can see for

each one we've now extracted a tweet ID

a URL the actual content the like count

the view count and as you can see all of

these were just recently posted so

they're very low on views except for

this one actually kind of went crazy

this one was from March 6th so not sure

what happened there that was almost 2

weeks ago now but um any anyways this is

our Twitter data so what's next from

here is putting that into a Google sheet

so I'm going to grab a Google sheet node

all the way down here um we're going to

do a pen row and sheet or it's going to

be a pen rows but we will choose our

actual sheet which is going to be

Twitter data we'll choose the sorry the

document now we're choosing the sheet

which is sheet one and now we have to do

is map The Columns so because we were

able to extract all of these columns

that we want it's going to be super easy

it's just as simple as dragging in the

values that we need to send over to the

colums in our Google sheet so what I'm

going to do real quick is I'm going to

delete all 58 items over here so we can

just start from scratch and now we can

see we have to basically tell naden what

values am I putting in each of these

cells right here so back in the naden

workflow I'm going to grab tweet ID from

the left from the code node and just

drag it into the Tweet ID column I'm

going to grab URL drag it into the URL

column and just going to do all that all

the way down we made it in this order so

it's just really intuitive to drag um

exactly like I said in this order order

so you can also get the template for

this Google sheet in the free school

Community um just so you can basically

plug this thing in right away and get

going but we have um pretty much

everything one thing also I did in the

code node was we formatted the date to

look a little more human readable but

now we have that done and it's going to

be doing that for all 23 items coming

through and if I hit play and then we'll

go over to here we'll basically just

watch all 23 tweets pop into this

workflow or sorry the sheet so as you

can see there they are we have all of

the links are clickable so let's click

into this one real quick just to verify

there we go we have some um looks like

women's college basketball nice bucket

there

anyways that is pretty much the first

step of we scraped we extracted and we

put it into a Google sheet so from there

I was thinking okay that's cool but we

only got 23 items what if we you know

want to put this on a scheduled trigger

where every morning we are scraping you

know AI news and we want every morning

to just get like 100 tweets put into a

Google sheet what we had to do was look

at how the page in nation works so as

you remember in the API documentation it

says use cursor for pagination if we see

the cursor parameter it says the cursor

to paginate through the results first

page is just basically an empty string

this is basically what we just built out

these three nodes where we're getting

tweets extracting info and adding it to

the sheet but now what we have to do is

set up a parameter in here that is the

cursor and it's not just a simple page

ination where it's like page zero page

one page two we have to grab a value and

so what we're we're grabbing here is the

output of the actual tweet extraction or

tweet scraping there's a value called

Next cursor so item one basically says

this was the first page and if you put

this page in another request you get

page two and then on page two we get a

different cursor which basically says

okay now you can put this cursor in and

you'll get page three and as you can see

they basically get longer each time so I

think that it's just adding a chunk on

each time and saying hey let's get the

next page now so we had to bacon some

other logic here so I'm just going to

break down what's going on in order okay

so the first thing that we did was we

set a count and basically I came in here

and I hardcoded count to equal number

one and this is important because we

need this number to work off of and this

is where I said you know if you're a

coder or a programmer and this is not

the way to do it let me know but this is

how I got it to work so anyways we're

setting number one and then in the next

node which is the counter we're feeding

in the actual count from the previous

node as well as the cursor which will

come from down here So eventually the

count and cursor will both come from

down here but to start with on the first

first run we're just grabbing count from

the previous node which would be one and

then we're going to feed that in the

rest of the process okay then we move on

to the API call where we're going to be

scraping Twitter first thing that you

notice is it's pretty much the same

right here as the step-by-step example

we have our endpoint we have our method

we have our credential we have our query

which was open Ai and we have our query

type which was top rather than or then

we have our query type which was top so

searching for top tweets and then we

have our cursor which we didn't have in

the step-by-step example what I'm going

to show you guys is if we go to to run

number one there would have been no

cursor being fed in so on the left you

can see what was fed in which was

counter was one at this point and cursor

was null so basically it was saying okay

regular request I'm just hitting um

looking for top results for open AI then

if we go to run two we can see on the

left what happened is we now have a

cursor and we now have the counter

equals two so this is run number two

we're feeding in the cursor and we're

getting different tweets over here and

finally run number three on the left we

can see the counter went up to three the

cursor is now much longer and we feed

that back into the request right away

because we're able to always say json.

cursor which means we're always looking

here and so this is a concept that's

kind of hard to explain we're basically

looping everything back together because

otherwise we'd be referencing some sort

of absolute node which it'd be hard to

say we want the most recent cursor not

the first time we got one so that's why

we have to have this counter node which

is really important that says okay

whatever I'm getting is going to be the

most recent count and also the most

recent cursor so as you know we were

getting big it coming out of the API

call and then we have to extract them so

exact same code that we used in the step

by step we are getting three different

Runs run one had 18 Run Two at 20 and

run three at 20 which is a total of 58

and then we're just adding to them to

the Google sheet exact same way we did

that earlier except for we're doing it

one at a time so 18 first all the way

back 20 all the way back then 20 more

what's going on over here when we're

checking for the count is we basically

just have a simple if and we're trying

to check if the count is equal to three

then we're going to basically end the

process and the way we're able to do

this is once again we're referencing

that counter node which is this one that

we're feeding back for the most recent

count and cursor so we're able to look

here as you can see this one ran three

times run one um it was it was false

because the counter was one run two it

was false because the counter was two

and then on run three it finally became

true because the counter on this run was

three so we're just sending it off to a

no operation node which literally just

does nothing and if they're false it's

going to Loop all the way back down here

so the first thing I'm doing here is

just setting it to one item only because

whatever leaves this branch is going to

be either 18 items 20 items however many

tweets were pulled back so I just wanted

to set it to one to keep things cleaner

then what I did was I set the increase

so basically I grabbed um the counter

from from earlier and which which would

be the most recent count so we're

setting it back to two but we're setting

it dynamically here so that the code

node can bump it up by one and then

exact same thing with cursor we're

grabbing cursor from the G tweets node

um in order to feed get back in later so

here you can see run one the counter was

one um and then later gets bumped up to

two on run two what was coming in was

two and then it gets bumped up to three

and you notice each time the cursor also

increased from there it's a code node

that obviously disclaimer all of these

code nodes I had claw 3.7 right for me

so like I said maybe not optimal but

it's working we have the counter coming

in at two and then we're outputting it

called count and it's going up one as

you can see so on run one it was coming

in as one it came out as two on run two

coming in as two came out as three from

there all we're doing is we're setting

it one more time so I know there's lots

of sets going on but we're setting it

because we need to be able to pull it in

dynamically and always have this node be

able to reference it as json. count or

json. cursor because remember earlier we

passed it in as a hard-coded variable so

it needs to be able to say okay I can

either look here or here based on

whichever one has most recently been

activated all right so that's basically

it I'm definitely aware that this

concept of dynamically setting all these

things is a little bit confusing but

what I would definitely recommend is You

Know download this template run it and

just look in there and explore and see

how it's actually being able to

reference things the key thing to

remember here is that when you're

referencing a variable like let's let's

just go back into here when you're

referencing something and you use dollar

sign Json it's looking for whatever is

the most immediate node coming

beforehand and otherwise if you're doing

something like right here where in this

case we're referencing the counter node

or we're referencing the get tweets node

that's a lot different because it's like

an absolute reference so when we use a

dollar sign Json it's just going to give

us a lot more flexibility by being able

to reference whatever came previously

before so we know we're getting the most

upto-date information so I hope you guys

enjoyed this one I hope you guys learned

something new as always if you did

please give it a like it definitely

helps me out a ton and I always

appreciate you guys making it to the end

of the videos definitely let me know in

the comments what else you guys want to

see some other use cases as you know

there's a ton more we can do with this

Twitter API because we can now that we

have the IDS of both the users and the

actual tweet we can look up so much

other stuff so generating lead list

stuff like that but yeah that's going to

be it really appreciate you guys once

again and I'll see you in the next video

thanks

Loading...

Loading video analysis...