LongCut logo

Scrape Unlimited Leads WITHOUT Paying for APIs (99% FREE)

By Nick Saraev

Summary

## Key takeaways - **Scrape Google Maps for Free with n8n**: You can scrape emails from Google Maps listings using n8n without paying for third-party APIs by constructing a specific Google Maps search URL and processing the results. [00:01], [00:37] - **Leverage Code Snippets for Complex Tasks**: For tasks like extracting URLs or emails from scraped HTML, simple code snippets, even those generated by AI, can be integrated into n8n to efficiently process data. [03:21], [03:33] - **Filter and Deduplicate Scraped Data**: After scraping, it's crucial to filter out irrelevant domains (like Google or gstatic) and remove duplicate entries to obtain a clean list of target websites. [06:59], [08:45] - **Implement Scraping Hygiene with Loops and Waits**: To avoid IP blocks when scraping multiple websites, use loop nodes to process requests in batches and include wait nodes to introduce delays between requests. [09:53], [10:00] - **Extract Emails with Regular Expressions**: After scraping individual website HTML, a code block utilizing a regular expression specifically designed to find email addresses can be used to extract the desired contact information. [13:04], [13:44] - **Handle Rate Limits with Proxies**: If encountering rate limits from services like Google Maps during large-scale scraping, using a proxy service can help sanitize requests and make them appear legitimate. [19:14], [19:35]

Topics Covered

  • Scrape Google Maps Emails: No APIs, No Cost.
  • Unlocking Data: N8N + AI-Powered Custom Code.
  • Avoid IP Blocks: Smart Scraping Hygiene for Scale.
  • Scale Your Scraper: Advanced Techniques for Google Maps.

Full Transcript

Hey, today I'm building a simple system

in NAD that lets you scrape emails from

Google Maps completely free without

needing any thirdparty APIs. That's

right, we're going to do it entirely in

NAND. And what I'll do first is demo the

flow before then showing you guys how to

build it on your own from scratch. You

guys can find the templates in the

description as per

usual. So this is what the flow looks

like. This works using a Google sheet.

The Google sheet is pretty simple. We

just have two here. One called searches

and another called emails. Both have one

column. Now the search on the left hand

side here has a row called Calgary plus

dentist. What this is is this is a

Google search URL. If you pump Calgary

dentist into the URL of the search, you

will get a giant list of Google Maps

listings that have Calgary dentist. What

we're going to do then is pump that into

our flow as input. And then at the end

of it, we are going to have a giant list

of emails that are going to dump right

over here. Don't believe me? Well, let's

get started. When I click test workflow

here, we're grabbing that search query.

It says Calgary dentist. And then what

we're doing is we're scraping all of the

Google Maps listings over here for

Calgary dentists. So this is going to

look like something like

maps.google.com/, you know, Calgary

dentist, let's say. Okay. After that

what we do is we do a ton of processing.

We do some URL extraction. Then we do

some filtering, some duplicate removal

some limiting in my case, just for test

purposes. Then we have a simple loop

that allows us to extract emails using

code before finally dumping all the

results in our Google sheet. Looks like

we got eight items. So if I go back over

here to the Google sheet, so you can see

we've now deposited all the email

addresses over here. Okay. Info Mloud

admin at Bington, info at Galaxy, Setin

at Galaxy, and so on and so on and so

forth. And this is just a small little

search. You can actually run this across

tens of thousands of different Google

Maps listings. All you have to do is

just change the limit and then pump in a

bunch more search terms on the left.

Okay, so how do you actually build this

from scratch? Well, let me walk you guys

through what this looks like, not from

the outside in, but from the inside out.

And I like doing this because if I

didn't, I'd just be showing you guys a

finished product. That's kind of like

you know, showing an engineer a picture

of the Eiffel Tower and saying, "Hey

there it is. Why don't you go ahead and

build it?" Right? It's not very

realistic. So, why don't I actually walk

through how to build this thing from

scratch? As I mentioned, you can grab

the template below in the

description. We'll call this Google Map

Scraper no API. I'm just going to add a

tag for NAN course here to keep things

very simple. Okay. So, first step that

I'm going to do is I'm just going to add

a manual trigger. The reason why I'm

going to do this is because I'm not

going to connect this to my Google

sheet, at least for the purposes of this

demo. I'm just going to keep things

super simple and super easy, and we can

talk about adding a sheet input later.

The next thing we're going to need is an

HTTP request. Okay, now this is where

we're going to be putting in the URL of

our Google Map search. Now, Google Maps

is scraped using a very specific URL. It

is

www.google.com/maps/arch and then we put

in the search query. You can't have

spaces in the query, and that's why we

needed to add that plus beforehand.

There are two additional options I'm

also going to add. The first is going to

be ignore SSL issues and the second is

going to be response. We're going to

include the response headers and status

too. Now, when I click this test step

what's going to happen is it's actually

going to perform that HTTP request to

the Google Maps back end. And then we're

going to receive a giant list of

essentially HTML. Now, hidden within

this HTML is a bunch of links that we

can then take, do more HTTP requests to

and then extract email addresses

directly from all of those. The question

is, how do we actually get these links?

Well, what I'm going to do here is I'm

just going to share a little snippet of

code that I've used for this purpose.

It's actually very straightforward. I

should say that you don't need to use

code for this, but I thought it was

simple enough that I just asked Chat GBT

in 10 seconds, whip me up a little

snippet that does this, and it did a

pretty good job. So, I'm going to write

code that's going to allow us to run

some custom JavaScript or Python code.

I'm then just going to take all of this

stuff out, and let me run you guys

through what the code would look like.

Okay, so what we're going to do is we're

actually going to grab some of that

information on the left hand side here.

And I'm just going to package it all

inside of an input variable. In

JavaScript, we do that by writing const.

Now, the purpose of what I'm about to

show you is not that I expect you guys

to learn JavaScript or kind of figure it

all out just watching me write this.

It's just to show you guys how easy and

simple it is to grab data that is in no

code format and then use a couple of

lines of code to simply and quickly

convert it without also requiring a ton

of execution. The cool thing is you can

now ask AI to do large portions of this

for you. I just know that this

particular snippet of code works. That's

why I'm going to reuse it. But

essentially what we're going to do is

we're going to store with an input all

of data. So I'm just going to go dollar

sign. This is going to allow me to

select the specific item that I want to

pull. So we'll do input. Now in NAD they

have this convention where they actually

return all data from previous nodes as

an array of items. And so what we have

to do is we have to select the first of

this array. Even if we're only getting

one item, which in our case we are, it's

technically an array of items. We have

to select the first one. Kind of

annoying, I know. Talk about annoying.

We have another convention here which I

don't really talk about which is this

JSON convention. In order to access this

data, we first have to go through this

filter of JSON and then after finally at

the very end now we can actually select

that data. Okay, so now for all intents

and purposes this is inside of this. All

right, moving on. What we have to do

next we have to build out the pattern

that we're going to use to extract all

of the URLs. So what I did is I actually

asked chatbt a moment ago to build me

out a reax which is essentially a

regular expression a templating language

used specifically for this purpose. Now

I know this because I use reax all the

time to extract parse and do various

things like this. If you just had a

brief 5-minute conversation with chat

gbt and asked it how you would do this

it would probably return reax as an

example as well. So don't think that

this is some super convoluted scary

programming stuff. What I'm going to do

is I'm just going to copy this. Then I'm

just going to write const reax. And then

I'm going to paste this in along with a

couple of additional characters. I do a

slash at the beginning and a slash g at

the end. This just stands for global.

And again, it's one of those little

formatting things. From here, what we

have is we have the input data, all of

the scrape stuff in code. We then have

the pattern that we're going to apply to

this. What we have to do is we actually

have to do the applying. So the way that

I do this is I write const. You know

const is just a convention in

JavaScript. URLs or why don't we just

use websites. It's probably a little bit

easier for people to understand. What

we're going to do is we're going to go

input and we're going to match it and

we're going to match it to this reax.

Okay. Then finally, what we're going to

get as a result of this is we're going

to get a giant list of websites. So what

I want to do next is I want to return

these websites in the format that NAT is

expecting. So what I'm going to do is

I'm going to return websites the map and

then for all websites I'm going to

return a website. Then they have this

specific format with an equal sign and

then a greater than symbol which is

basically like an arrow. Then we

returned things nested within one layer.

Here, I'm going to go uh JSON and then I

will return my website down over here.

Okay, so if I didn't screw something up

if I click test step, we should have a

giant list of websites under this

website thing, which we do. Now, what

you'll see is we got a giant list of

websites, but these aren't really

websites related to our search. We have

schema, Google, Google, whatever, gt. It

isn't until I kind of go way farther

forward, okay, that we actually start

getting the dental care websites that I

was looking for. Okay, this is a

problem. Obviously, we don't actually

care about Google and Gstatic and stuff.

So, what we have to do is we have to

remove them. And that's what this next

step is going to be. It's going to be

filtering out all of these really

annoying domains that don't really add

anything and then giving us a nice tight

list of only the dental websites that

are left. Cuz remember, what we're doing

is we're basically going to like Google

Maps. We're pumping in Calgary dentists.

We're just scraping the entire page

right? So, we actually need to take this

data and then we have to format out all

of the additional links they provide us.

But never fear, that's actually very

easy to do in NA. What I'm going to do

is I'll just press P. That's going to

pin my output data. Then over here, I'm

going to go filter. Okay. Now, filter

allows us to remove items matching a

condition. So, I'm just going to drag in

website over here. Now, what I want to

do, if you think about it, is I just

want to remove all those bogus ones. So

schema, I want to remove Google. I want

to remove a bunch of stuff. And in

filter, in order to do that, just go to

string and then go does not contain. So

first of all, I don't want anything to

contain

schema. Next up, I do not want it to

contain anything with the term Google

right? I saw a couple of other terms

there that I'm just going to pump in

really quickly. And you know, we'll go

back and forth until we actually get all

this stuff done. I think it was GG

right? I wanted to contain that. Let's

test this and let's see how that works

first of all. So, we fed in 302 items.

And as you can see, it's only returning

133. So, we're actually getting pretty

far there. And it looks like we're

actually getting like dental domains now

in the first page, which is nice. We

still have gstatic. Okay. So, we got to

get rid of those. What else? Gstatic

search.openare. Open care might actually

be good. I'm not entirely sure. Okay.

Uh, what else we got? Gstatic mostly

but then we also have CAN 5 recall max.

Okay, I don't know what that is. Chat

now. Okay, so Gstatic is really the main

one. Why don't I go and then we'll go I

also don't want you to return anything

that contains the term Gstatic. Okay

let's try this. Now, as you can see

we're just like progressively filtering

but this this looks pretty good. What

you'll notice though is we're getting a

ton of duplicates, aren't we? Richmond

dental Richmond dental concept dentist

concept dentist pathways heritage

heritage heritage that's not good we we

need to remove these so that's what I'm

going to do next okay so first of all

I'm going to press P again pin the data

and now I'm going to go to remove

duplicates so how do you do that dd dup

or actually it's the remove duplicates

node here and I'm going to go remove

items repeated within the current input

this is the easiest and simplest way to

just immediately remove duplicates from

the preceding node if I press test step

you'll see that we fed in 60 now we only

have 27 left okay very easy awesome so

now that I've removed the duplicates if

Think about it. What have we done up

until now? What we've done is we've

scraped a bunch of data over here. We

then extracted it with code, extracted

URLs with code. We're then filtering

these URLs. Now, we're removing all the

duplicates in those URLs. Well, the next

thing we have to do is obviously we have

to start scraping the individual pages

to look for emails, right? So, I could

theoretically just add another HTTP

request here. And what it would do, you

know, is I would feed in the URL of the

website that I want. Okay. What this

would do is this immediately process all

27 items in the list. But I'll be

honest, I've tried this before and if

you try in NAN just process all 27

websites, usually your IP address gets

blocked. So what we have to do is we

have to do like basically some some

scraping hygiene here. And we have to be

a little bit smart about how we're going

to be performing all of these scrapes.

And the way that I like to do it is I

like to use what's called a loop over

batches or split in batches node. So

just type loop. loop looks pretty

intimidating if you've never used it

before because there's all these arrows

that are going in and out of different

modules and there's this replace me node

which means nothing to most people here

but let me just run you through really

simply what this does is it will take

the output of the previous node as input

and then it will run for all 27 items

this loop so it'll do everything that we

say over and over and over and over

again 27 times and on the 28th run it'll

say hey there's nothing left back here

and then it'll say okay well I guess

we're done and it'll proceed down the

done path. Okay, that's all that's

really going on here. So, you have to

feed the output of this into the input

in order for this to make sense. Okay

so what I'm going to do now is I'm going

to add the HTTP request right over here.

And then what I'm also going to do just

before I map all this is I'm also going

to add a little weight node. Just going

to make a wait of 1 second cuz I've had

a couple of issues in the past where I

don't have any weights in my HTTP

requests and then as a result, you know

I can like demo it or whatever, but I

don't just want this to be demoable. I

actually want you guys to be able to use

this. In practice, if you don't have any

weights and you can just get IP blocked

pretty easily when you're doing any sort

of scraping. So, I usually recommend at

least for testing purposes, just put

some weights in. Okay. Then the output

of this weight is going to be the input

to the loop over items node. Okay. And

now what we've done is we've essentially

closed the loop. And now we have this

done little string which we can fire off

after. All right. Okay. Just to make

things minimally ugly, I'm just going to

move this stuff down here. And now what

I need to do is I just need to get the

input into the loop over items. Now

this kind of annoying to do. I'm just

going to look at test step and see. But

we can't actually do this cuz I've

connected this. Why don't we just delete

this? Retest

this. Okay. So now we have the loop

branch which contains the website. So we

can actually feed this into the website

and then we can add this loop branch in.

Now that we have access to that, we can

just drag the website over. Couple of

additional things that I'm going to add

under the redirects tab. I'm just going

to go do not follow redirect. Some

websites will redirect you multiple

times and when you hit a redirect loop

it just makes the thing error out which

is kind of annoying. And anyway, then

I'm going to have that wait and then

it's just going to go for all 27 items.

Okay, pretty straightforward. Now, I'll

be real. I don't want to take 30 seconds

to run this every time for demo. So

what I'm going to do is I'm actually

just going to cut the input way down.

See how it says 27 items? There's a

quick little hack that allows us to do

this during testing in N. Just add a

limit node, then add the limit

something really small like three.

Because we did this, what this will do

now is this will take 27 items as an

input. Then it's only just going to poop

out three items, which will mean that

when we run this through our loop, it's

going to do it in 3 seconds, not 27.

This is just going to help me do this

video a little bit faster and then just

retain my sanity while also minimizing

the likelihood that we get IP block

because we're running a lot of requests.

Okay, work smart, not hard. Okay, now

I'm going to execute this workflow. As

you can see, first item done, second

item done, third item done. And then

once we're done, what it does is

actually returns three items. How cool

is that? So, what is it returning right

now? Well, it's returning all of the

HTML from the websites that we just

scraped. So, a bunch more code. But this

isn't really what we're looking for, is

it? No, it's not. I'll tell you what

we're looking for. What we're looking

for is we're actually looking for the

email addresses. Okay, so how do we find

the email addresses here? Well, that's

where another code block is going to

come in handy. What we're going to do in

this code block is instead of finding

URLs, we're actually going to go and

we're going to parse um emails. Okay, so

I'm just going to stick this in over

here. Then the output of this code block

is going to loop back and be the input.

And then once this is done, we can then

get into some final data processing and

then we should be good to go. Okay, so

what are we going to do with this code

block? Well, I mean, you know, I just

pasted in a bunch of the the stuff over

here. Well, if you think about it, we're

basically going to do the exact same

thing that we did for the URL and just

instead of doing the URL, we're going to

do this for emails. So, we're going to

run a bunch of code that basically takes

the data that we're feeding in from this

weight, which actually we already have

inside of input, and then we're going to

look specifically for emails. So, I have

another reex here. It's just instead of

this, what I'm going to do is I'm just

going to ask it to find me all emails.

So, let me go back here. Then I'll say

okay, great. Now, build me a simple reax

that finds all

emails in a website scrape instead. What

it's going to do is it's going to write

me something very similar. And I don't

actually know if this is entirely good

to go. I'm going to try it. We're going

to see what happens. But I usually just

run it and then play it by ear. Go

slashg again. Then here under constant

websites, what I'll do is I'll go

emails. Okay. And instead of returning

websites map, I'll return emails.m

mapap. And then instead of website, I'll

go email. And then JSON, I'll go email.

Okay. I don't know if this is going to

entirely work. We'll give it a go. Okay.

We couldn't find any emails in the first

three. So I just pumped this up to

10. And it looks like we are now getting

a couple of email addresses, which is

pretty cool. Yeah, that's to be

expected. We're not going to get emails

of everything, for instance. Right. In

the demo, I think we pumped in like 300

or something. We got like a 100. So

that's that. Now that we have a bunch of

email addresses, what we're going to do

is we're going to go and proceed down

the done loop. What do we got to do with

this done loop? Well, if you think about

it, like we're outputting a bunch of

emails, right? Everything is nested

within this emails array. So we're going

to have a bunch of email arrays. So what

we have to do is basically to make a

long story short is we have to split all

of these out so that they're each their

own object instead of being independent

arrays. And then we're going to take

that data and then we're going to add it

to our Google sheet. So what does this

actually look like in practice? We

actually have to like get the data out

to this loop in order for me to access

it. So, I'm just going to add a weight

for one second and just push it all the

way through. So, we're scraping

scraping, scraping. Okay. And then once

we finish, we now have access to those

10 items. Let's just take a look at what

this data looks like. So, for some of

these, we're not going to have access to

the email because some of these will

have been null. Okay. So, if you find

yourself ever getting an error with an

HTTP request, what you can do is you can

go to settings and then just go on error

continue. And in reality, NAN can't

scrape all web pages. So, we're just

sort of throwing the ones that it can't

scrape away just for simplicity. But for

the ones that it can, we're going to

have email addresses as we could see.

So, Mloud Trail Dental, Galaxy Dental

Scenic Smile, Satin. Right. And now that

we have all of these, what we need to do

essentially is we need to aggregate all

of the individual emails, and we need to

remove all of the null entries. So, I'm

going to go down here to filter first.

We're just going to use the filter to

remove all the null entries. Let me pin

this so we don't have to do that again.

What I'm going to do is I'm just going

to feed in emails, and I'll say emails

is an array, right? So, I'm just going

to check. Let me just go schema or JSON.

Is it an array? Yeah, emails is an

array. So, I'm going to go down to array

and then I'll just say check to make

sure emails exist. Okay. So, this should

just filter out all of these null

entries. Cool. And now we have three

items of the what did we feed in? 10.

And now that we have three items, as you

can see, this is scraping multiple and

aggregating multiple into a single array

per website. So it scraped three

instances of info at galaxy or four and

then two of set in a galaxy and then two

of info at what we want to do is we just

want to stick all of this into one giant

list and then we want to run through and

dduplicate it. So what I'm going to do

next is I'm going to add how would I do

this? I do split out. I think I do split

out. Yeah, pretty sure. And I'm just

going to go emails here and this should

basically concatenate all of these

together into one. Cool. Now once I have

this I'm going to ddup

it. And then this will now filter down

all of the many into four. Beautiful.

Now once we have four, what can we do?

Well, now we can, I don't know, add them

to a Google sheet or something. So, let

me go down to append row in sheet.

That's what we're looking for. Just

going to use my own

credential. This one right over here.

Then we'll go from list. Uh, I think

this is scrape without paying for APIs

right here. Right. Then the sheet was

emails. And we should just uh dump the

email directly in here. Okay. And I'm

going to use the minimize API call

option because I've obviously had some

issues with this in the past where I've

just done so many demos that it's just

dumped a bunch of stuff into a Google

sheet and then I run into API rate

limits and stuff and then I can't record

my video for half an hour. So, I'm not

going to allow that happen to me today.

Um, why don't we go back to this email

list, just delete all of them, then go

over here. Why don't I just pin my

outputs and finally I'll just run this.

See how this works. Oh, perfect. Just

dumped all four. very very good and uh

yeah in a nutshell that's more or less

how to do it. Okay so a couple of

gotchas that I think are pretty common

for people um as well as a couple ways

to extend the system. The first way you

could extend the system is right now all

we're really doing is we're scraping the

kind of like the homepage of all these

websites. Realistically the email

addresses aren't just buried on the

homepage they're buried on all pages. So

you know over here where we extract the

URLs. Well, what you can do is first you

extract the URL. Then you do an HTTP

request to that URL and then instead of

extracting emails over here, you

actually extract other URLs on the URL

that you can access. Then you run a

third loop and that third loop goes

through each of the URLs that you just

extracted from the homepage and then it

does the exact same thing that we're

checking here aka extracting emails. So

now, if you think about it, we're

significantly increasing the number of

pages total that we're scraping from.

Um, but I just didn't do that for

simplicity purposes, and I just wanted

to give you guys like a I don't know, a

little nugget that you could build out.

This isn't the first time that people

have built a sort a sort of system like

this. It's not like this is

revolutionary or anything. And this is

definitely one of the poor scraper

systems I think that I could have put

together, but yeah, I just wanted

everybody here to um have a good place

to start for more advanced scraping

applications. Now, after you're done

with this as well, what you could look

into is if you run this at any sort of

scale, you know, eventually this Google

Maps um HTTP request module, this

initial one uh initial one will run into

Google Maps rate limits where they'll

think that you're scraping them, which

you are, and their AI will detect on it

and then they'll just start throttling

you. So, you can only do maybe one

request every hour or something. If you

want to get around this, there are a

couple of options. The most common is to

use a proxy. Now, proxies are basically

third party services where you pass the

request through before it goes to the

end URL, which in our case is Google

that sanitize the request and then make

the request appear legitimate by

sprinkling in a bunch of additional

data. There are variety of proxies you

can use for this purpose. I'm not going

to recommend any specific one, but the

way that the HTTP node works is if you

wanted to add a proxy, you just go down

to the bottom, click proxy, and then

paste it in. And this is going to depend

on the proxy service that they're using.

They all have slightly different formats

and they're going to give you their

username and the password and stuff like

that. But that's how you do it. And then

if you are going to search for something

like that, then obviously search for

like a search engine results proxy, a

SER proxy. That's sort of the the thing

that you should start by googling. SER

proxies depend on whether or not you

want to do like residential, whether or

not you want to do warehouse. There's a

little bit more nuance there. If you

guys want to learn more about how all

that stuff works, check out the Appify

course on proxies. It's probably the

best written one on the internet today.

I am affiliated with Ampify, but you pay

nothing to access that free resource.

Hopefully you guys saw just how easy and

straightforward it was to put together a

real actual NN scraper that allows you

to get and extract email addresses

directly from Google Maps listings

without requiring any APIs or third

party services. The really cool thing

about this is you could run multiple

variations of what I just built for

basically any service, whether it's

directories, whether it's some other

search engine, whether it's county real

estate databases and more. If you guys

like this sort of stuff, definitely

check out the full 6-hour N8N zero to

hero course that I published just a

couple of days ago. Like, comment

subscribe, check out Maker School, my AI

automation program that focuses on daily

lead generation accountability if you

wanted to turn your build and automation

skills into maybe a profitable business.

And have a lovely rest of the day.

Looking forward to seeing you on the

next video. Tris.

Loading...

Loading video analysis...