Scrape Unlimited Leads WITHOUT Paying for APIs (99% FREE)
By Nick Saraev
Summary
## Key takeaways - **Scrape Google Maps for Free with n8n**: You can scrape emails from Google Maps listings using n8n without paying for third-party APIs by constructing a specific Google Maps search URL and processing the results. [00:01], [00:37] - **Leverage Code Snippets for Complex Tasks**: For tasks like extracting URLs or emails from scraped HTML, simple code snippets, even those generated by AI, can be integrated into n8n to efficiently process data. [03:21], [03:33] - **Filter and Deduplicate Scraped Data**: After scraping, it's crucial to filter out irrelevant domains (like Google or gstatic) and remove duplicate entries to obtain a clean list of target websites. [06:59], [08:45] - **Implement Scraping Hygiene with Loops and Waits**: To avoid IP blocks when scraping multiple websites, use loop nodes to process requests in batches and include wait nodes to introduce delays between requests. [09:53], [10:00] - **Extract Emails with Regular Expressions**: After scraping individual website HTML, a code block utilizing a regular expression specifically designed to find email addresses can be used to extract the desired contact information. [13:04], [13:44] - **Handle Rate Limits with Proxies**: If encountering rate limits from services like Google Maps during large-scale scraping, using a proxy service can help sanitize requests and make them appear legitimate. [19:14], [19:35]
Topics Covered
- Scrape Google Maps Emails: No APIs, No Cost.
- Unlocking Data: N8N + AI-Powered Custom Code.
- Avoid IP Blocks: Smart Scraping Hygiene for Scale.
- Scale Your Scraper: Advanced Techniques for Google Maps.
Full Transcript
Hey, today I'm building a simple system
in NAD that lets you scrape emails from
Google Maps completely free without
needing any thirdparty APIs. That's
right, we're going to do it entirely in
NAND. And what I'll do first is demo the
flow before then showing you guys how to
build it on your own from scratch. You
guys can find the templates in the
description as per
usual. So this is what the flow looks
like. This works using a Google sheet.
The Google sheet is pretty simple. We
just have two here. One called searches
and another called emails. Both have one
column. Now the search on the left hand
side here has a row called Calgary plus
dentist. What this is is this is a
Google search URL. If you pump Calgary
dentist into the URL of the search, you
will get a giant list of Google Maps
listings that have Calgary dentist. What
we're going to do then is pump that into
our flow as input. And then at the end
of it, we are going to have a giant list
of emails that are going to dump right
over here. Don't believe me? Well, let's
get started. When I click test workflow
here, we're grabbing that search query.
It says Calgary dentist. And then what
we're doing is we're scraping all of the
Google Maps listings over here for
Calgary dentists. So this is going to
look like something like
maps.google.com/, you know, Calgary
dentist, let's say. Okay. After that
what we do is we do a ton of processing.
We do some URL extraction. Then we do
some filtering, some duplicate removal
some limiting in my case, just for test
purposes. Then we have a simple loop
that allows us to extract emails using
code before finally dumping all the
results in our Google sheet. Looks like
we got eight items. So if I go back over
here to the Google sheet, so you can see
we've now deposited all the email
addresses over here. Okay. Info Mloud
admin at Bington, info at Galaxy, Setin
at Galaxy, and so on and so on and so
forth. And this is just a small little
search. You can actually run this across
tens of thousands of different Google
Maps listings. All you have to do is
just change the limit and then pump in a
bunch more search terms on the left.
Okay, so how do you actually build this
from scratch? Well, let me walk you guys
through what this looks like, not from
the outside in, but from the inside out.
And I like doing this because if I
didn't, I'd just be showing you guys a
finished product. That's kind of like
you know, showing an engineer a picture
of the Eiffel Tower and saying, "Hey
there it is. Why don't you go ahead and
build it?" Right? It's not very
realistic. So, why don't I actually walk
through how to build this thing from
scratch? As I mentioned, you can grab
the template below in the
description. We'll call this Google Map
Scraper no API. I'm just going to add a
tag for NAN course here to keep things
very simple. Okay. So, first step that
I'm going to do is I'm just going to add
a manual trigger. The reason why I'm
going to do this is because I'm not
going to connect this to my Google
sheet, at least for the purposes of this
demo. I'm just going to keep things
super simple and super easy, and we can
talk about adding a sheet input later.
The next thing we're going to need is an
HTTP request. Okay, now this is where
we're going to be putting in the URL of
our Google Map search. Now, Google Maps
is scraped using a very specific URL. It
is
www.google.com/maps/arch and then we put
in the search query. You can't have
spaces in the query, and that's why we
needed to add that plus beforehand.
There are two additional options I'm
also going to add. The first is going to
be ignore SSL issues and the second is
going to be response. We're going to
include the response headers and status
too. Now, when I click this test step
what's going to happen is it's actually
going to perform that HTTP request to
the Google Maps back end. And then we're
going to receive a giant list of
essentially HTML. Now, hidden within
this HTML is a bunch of links that we
can then take, do more HTTP requests to
and then extract email addresses
directly from all of those. The question
is, how do we actually get these links?
Well, what I'm going to do here is I'm
just going to share a little snippet of
code that I've used for this purpose.
It's actually very straightforward. I
should say that you don't need to use
code for this, but I thought it was
simple enough that I just asked Chat GBT
in 10 seconds, whip me up a little
snippet that does this, and it did a
pretty good job. So, I'm going to write
code that's going to allow us to run
some custom JavaScript or Python code.
I'm then just going to take all of this
stuff out, and let me run you guys
through what the code would look like.
Okay, so what we're going to do is we're
actually going to grab some of that
information on the left hand side here.
And I'm just going to package it all
inside of an input variable. In
JavaScript, we do that by writing const.
Now, the purpose of what I'm about to
show you is not that I expect you guys
to learn JavaScript or kind of figure it
all out just watching me write this.
It's just to show you guys how easy and
simple it is to grab data that is in no
code format and then use a couple of
lines of code to simply and quickly
convert it without also requiring a ton
of execution. The cool thing is you can
now ask AI to do large portions of this
for you. I just know that this
particular snippet of code works. That's
why I'm going to reuse it. But
essentially what we're going to do is
we're going to store with an input all
of data. So I'm just going to go dollar
sign. This is going to allow me to
select the specific item that I want to
pull. So we'll do input. Now in NAD they
have this convention where they actually
return all data from previous nodes as
an array of items. And so what we have
to do is we have to select the first of
this array. Even if we're only getting
one item, which in our case we are, it's
technically an array of items. We have
to select the first one. Kind of
annoying, I know. Talk about annoying.
We have another convention here which I
don't really talk about which is this
JSON convention. In order to access this
data, we first have to go through this
filter of JSON and then after finally at
the very end now we can actually select
that data. Okay, so now for all intents
and purposes this is inside of this. All
right, moving on. What we have to do
next we have to build out the pattern
that we're going to use to extract all
of the URLs. So what I did is I actually
asked chatbt a moment ago to build me
out a reax which is essentially a
regular expression a templating language
used specifically for this purpose. Now
I know this because I use reax all the
time to extract parse and do various
things like this. If you just had a
brief 5-minute conversation with chat
gbt and asked it how you would do this
it would probably return reax as an
example as well. So don't think that
this is some super convoluted scary
programming stuff. What I'm going to do
is I'm just going to copy this. Then I'm
just going to write const reax. And then
I'm going to paste this in along with a
couple of additional characters. I do a
slash at the beginning and a slash g at
the end. This just stands for global.
And again, it's one of those little
formatting things. From here, what we
have is we have the input data, all of
the scrape stuff in code. We then have
the pattern that we're going to apply to
this. What we have to do is we actually
have to do the applying. So the way that
I do this is I write const. You know
const is just a convention in
JavaScript. URLs or why don't we just
use websites. It's probably a little bit
easier for people to understand. What
we're going to do is we're going to go
input and we're going to match it and
we're going to match it to this reax.
Okay. Then finally, what we're going to
get as a result of this is we're going
to get a giant list of websites. So what
I want to do next is I want to return
these websites in the format that NAT is
expecting. So what I'm going to do is
I'm going to return websites the map and
then for all websites I'm going to
return a website. Then they have this
specific format with an equal sign and
then a greater than symbol which is
basically like an arrow. Then we
returned things nested within one layer.
Here, I'm going to go uh JSON and then I
will return my website down over here.
Okay, so if I didn't screw something up
if I click test step, we should have a
giant list of websites under this
website thing, which we do. Now, what
you'll see is we got a giant list of
websites, but these aren't really
websites related to our search. We have
schema, Google, Google, whatever, gt. It
isn't until I kind of go way farther
forward, okay, that we actually start
getting the dental care websites that I
was looking for. Okay, this is a
problem. Obviously, we don't actually
care about Google and Gstatic and stuff.
So, what we have to do is we have to
remove them. And that's what this next
step is going to be. It's going to be
filtering out all of these really
annoying domains that don't really add
anything and then giving us a nice tight
list of only the dental websites that
are left. Cuz remember, what we're doing
is we're basically going to like Google
Maps. We're pumping in Calgary dentists.
We're just scraping the entire page
right? So, we actually need to take this
data and then we have to format out all
of the additional links they provide us.
But never fear, that's actually very
easy to do in NA. What I'm going to do
is I'll just press P. That's going to
pin my output data. Then over here, I'm
going to go filter. Okay. Now, filter
allows us to remove items matching a
condition. So, I'm just going to drag in
website over here. Now, what I want to
do, if you think about it, is I just
want to remove all those bogus ones. So
schema, I want to remove Google. I want
to remove a bunch of stuff. And in
filter, in order to do that, just go to
string and then go does not contain. So
first of all, I don't want anything to
contain
schema. Next up, I do not want it to
contain anything with the term Google
right? I saw a couple of other terms
there that I'm just going to pump in
really quickly. And you know, we'll go
back and forth until we actually get all
this stuff done. I think it was GG
right? I wanted to contain that. Let's
test this and let's see how that works
first of all. So, we fed in 302 items.
And as you can see, it's only returning
133. So, we're actually getting pretty
far there. And it looks like we're
actually getting like dental domains now
in the first page, which is nice. We
still have gstatic. Okay. So, we got to
get rid of those. What else? Gstatic
search.openare. Open care might actually
be good. I'm not entirely sure. Okay.
Uh, what else we got? Gstatic mostly
but then we also have CAN 5 recall max.
Okay, I don't know what that is. Chat
now. Okay, so Gstatic is really the main
one. Why don't I go and then we'll go I
also don't want you to return anything
that contains the term Gstatic. Okay
let's try this. Now, as you can see
we're just like progressively filtering
but this this looks pretty good. What
you'll notice though is we're getting a
ton of duplicates, aren't we? Richmond
dental Richmond dental concept dentist
concept dentist pathways heritage
heritage heritage that's not good we we
need to remove these so that's what I'm
going to do next okay so first of all
I'm going to press P again pin the data
and now I'm going to go to remove
duplicates so how do you do that dd dup
or actually it's the remove duplicates
node here and I'm going to go remove
items repeated within the current input
this is the easiest and simplest way to
just immediately remove duplicates from
the preceding node if I press test step
you'll see that we fed in 60 now we only
have 27 left okay very easy awesome so
now that I've removed the duplicates if
Think about it. What have we done up
until now? What we've done is we've
scraped a bunch of data over here. We
then extracted it with code, extracted
URLs with code. We're then filtering
these URLs. Now, we're removing all the
duplicates in those URLs. Well, the next
thing we have to do is obviously we have
to start scraping the individual pages
to look for emails, right? So, I could
theoretically just add another HTTP
request here. And what it would do, you
know, is I would feed in the URL of the
website that I want. Okay. What this
would do is this immediately process all
27 items in the list. But I'll be
honest, I've tried this before and if
you try in NAN just process all 27
websites, usually your IP address gets
blocked. So what we have to do is we
have to do like basically some some
scraping hygiene here. And we have to be
a little bit smart about how we're going
to be performing all of these scrapes.
And the way that I like to do it is I
like to use what's called a loop over
batches or split in batches node. So
just type loop. loop looks pretty
intimidating if you've never used it
before because there's all these arrows
that are going in and out of different
modules and there's this replace me node
which means nothing to most people here
but let me just run you through really
simply what this does is it will take
the output of the previous node as input
and then it will run for all 27 items
this loop so it'll do everything that we
say over and over and over and over
again 27 times and on the 28th run it'll
say hey there's nothing left back here
and then it'll say okay well I guess
we're done and it'll proceed down the
done path. Okay, that's all that's
really going on here. So, you have to
feed the output of this into the input
in order for this to make sense. Okay
so what I'm going to do now is I'm going
to add the HTTP request right over here.
And then what I'm also going to do just
before I map all this is I'm also going
to add a little weight node. Just going
to make a wait of 1 second cuz I've had
a couple of issues in the past where I
don't have any weights in my HTTP
requests and then as a result, you know
I can like demo it or whatever, but I
don't just want this to be demoable. I
actually want you guys to be able to use
this. In practice, if you don't have any
weights and you can just get IP blocked
pretty easily when you're doing any sort
of scraping. So, I usually recommend at
least for testing purposes, just put
some weights in. Okay. Then the output
of this weight is going to be the input
to the loop over items node. Okay. And
now what we've done is we've essentially
closed the loop. And now we have this
done little string which we can fire off
after. All right. Okay. Just to make
things minimally ugly, I'm just going to
move this stuff down here. And now what
I need to do is I just need to get the
input into the loop over items. Now
this kind of annoying to do. I'm just
going to look at test step and see. But
we can't actually do this cuz I've
connected this. Why don't we just delete
this? Retest
this. Okay. So now we have the loop
branch which contains the website. So we
can actually feed this into the website
and then we can add this loop branch in.
Now that we have access to that, we can
just drag the website over. Couple of
additional things that I'm going to add
under the redirects tab. I'm just going
to go do not follow redirect. Some
websites will redirect you multiple
times and when you hit a redirect loop
it just makes the thing error out which
is kind of annoying. And anyway, then
I'm going to have that wait and then
it's just going to go for all 27 items.
Okay, pretty straightforward. Now, I'll
be real. I don't want to take 30 seconds
to run this every time for demo. So
what I'm going to do is I'm actually
just going to cut the input way down.
See how it says 27 items? There's a
quick little hack that allows us to do
this during testing in N. Just add a
limit node, then add the limit
something really small like three.
Because we did this, what this will do
now is this will take 27 items as an
input. Then it's only just going to poop
out three items, which will mean that
when we run this through our loop, it's
going to do it in 3 seconds, not 27.
This is just going to help me do this
video a little bit faster and then just
retain my sanity while also minimizing
the likelihood that we get IP block
because we're running a lot of requests.
Okay, work smart, not hard. Okay, now
I'm going to execute this workflow. As
you can see, first item done, second
item done, third item done. And then
once we're done, what it does is
actually returns three items. How cool
is that? So, what is it returning right
now? Well, it's returning all of the
HTML from the websites that we just
scraped. So, a bunch more code. But this
isn't really what we're looking for, is
it? No, it's not. I'll tell you what
we're looking for. What we're looking
for is we're actually looking for the
email addresses. Okay, so how do we find
the email addresses here? Well, that's
where another code block is going to
come in handy. What we're going to do in
this code block is instead of finding
URLs, we're actually going to go and
we're going to parse um emails. Okay, so
I'm just going to stick this in over
here. Then the output of this code block
is going to loop back and be the input.
And then once this is done, we can then
get into some final data processing and
then we should be good to go. Okay, so
what are we going to do with this code
block? Well, I mean, you know, I just
pasted in a bunch of the the stuff over
here. Well, if you think about it, we're
basically going to do the exact same
thing that we did for the URL and just
instead of doing the URL, we're going to
do this for emails. So, we're going to
run a bunch of code that basically takes
the data that we're feeding in from this
weight, which actually we already have
inside of input, and then we're going to
look specifically for emails. So, I have
another reex here. It's just instead of
this, what I'm going to do is I'm just
going to ask it to find me all emails.
So, let me go back here. Then I'll say
okay, great. Now, build me a simple reax
that finds all
emails in a website scrape instead. What
it's going to do is it's going to write
me something very similar. And I don't
actually know if this is entirely good
to go. I'm going to try it. We're going
to see what happens. But I usually just
run it and then play it by ear. Go
slashg again. Then here under constant
websites, what I'll do is I'll go
emails. Okay. And instead of returning
websites map, I'll return emails.m
mapap. And then instead of website, I'll
go email. And then JSON, I'll go email.
Okay. I don't know if this is going to
entirely work. We'll give it a go. Okay.
We couldn't find any emails in the first
three. So I just pumped this up to
10. And it looks like we are now getting
a couple of email addresses, which is
pretty cool. Yeah, that's to be
expected. We're not going to get emails
of everything, for instance. Right. In
the demo, I think we pumped in like 300
or something. We got like a 100. So
that's that. Now that we have a bunch of
email addresses, what we're going to do
is we're going to go and proceed down
the done loop. What do we got to do with
this done loop? Well, if you think about
it, like we're outputting a bunch of
emails, right? Everything is nested
within this emails array. So we're going
to have a bunch of email arrays. So what
we have to do is basically to make a
long story short is we have to split all
of these out so that they're each their
own object instead of being independent
arrays. And then we're going to take
that data and then we're going to add it
to our Google sheet. So what does this
actually look like in practice? We
actually have to like get the data out
to this loop in order for me to access
it. So, I'm just going to add a weight
for one second and just push it all the
way through. So, we're scraping
scraping, scraping. Okay. And then once
we finish, we now have access to those
10 items. Let's just take a look at what
this data looks like. So, for some of
these, we're not going to have access to
the email because some of these will
have been null. Okay. So, if you find
yourself ever getting an error with an
HTTP request, what you can do is you can
go to settings and then just go on error
continue. And in reality, NAN can't
scrape all web pages. So, we're just
sort of throwing the ones that it can't
scrape away just for simplicity. But for
the ones that it can, we're going to
have email addresses as we could see.
So, Mloud Trail Dental, Galaxy Dental
Scenic Smile, Satin. Right. And now that
we have all of these, what we need to do
essentially is we need to aggregate all
of the individual emails, and we need to
remove all of the null entries. So, I'm
going to go down here to filter first.
We're just going to use the filter to
remove all the null entries. Let me pin
this so we don't have to do that again.
What I'm going to do is I'm just going
to feed in emails, and I'll say emails
is an array, right? So, I'm just going
to check. Let me just go schema or JSON.
Is it an array? Yeah, emails is an
array. So, I'm going to go down to array
and then I'll just say check to make
sure emails exist. Okay. So, this should
just filter out all of these null
entries. Cool. And now we have three
items of the what did we feed in? 10.
And now that we have three items, as you
can see, this is scraping multiple and
aggregating multiple into a single array
per website. So it scraped three
instances of info at galaxy or four and
then two of set in a galaxy and then two
of info at what we want to do is we just
want to stick all of this into one giant
list and then we want to run through and
dduplicate it. So what I'm going to do
next is I'm going to add how would I do
this? I do split out. I think I do split
out. Yeah, pretty sure. And I'm just
going to go emails here and this should
basically concatenate all of these
together into one. Cool. Now once I have
this I'm going to ddup
it. And then this will now filter down
all of the many into four. Beautiful.
Now once we have four, what can we do?
Well, now we can, I don't know, add them
to a Google sheet or something. So, let
me go down to append row in sheet.
That's what we're looking for. Just
going to use my own
credential. This one right over here.
Then we'll go from list. Uh, I think
this is scrape without paying for APIs
right here. Right. Then the sheet was
emails. And we should just uh dump the
email directly in here. Okay. And I'm
going to use the minimize API call
option because I've obviously had some
issues with this in the past where I've
just done so many demos that it's just
dumped a bunch of stuff into a Google
sheet and then I run into API rate
limits and stuff and then I can't record
my video for half an hour. So, I'm not
going to allow that happen to me today.
Um, why don't we go back to this email
list, just delete all of them, then go
over here. Why don't I just pin my
outputs and finally I'll just run this.
See how this works. Oh, perfect. Just
dumped all four. very very good and uh
yeah in a nutshell that's more or less
how to do it. Okay so a couple of
gotchas that I think are pretty common
for people um as well as a couple ways
to extend the system. The first way you
could extend the system is right now all
we're really doing is we're scraping the
kind of like the homepage of all these
websites. Realistically the email
addresses aren't just buried on the
homepage they're buried on all pages. So
you know over here where we extract the
URLs. Well, what you can do is first you
extract the URL. Then you do an HTTP
request to that URL and then instead of
extracting emails over here, you
actually extract other URLs on the URL
that you can access. Then you run a
third loop and that third loop goes
through each of the URLs that you just
extracted from the homepage and then it
does the exact same thing that we're
checking here aka extracting emails. So
now, if you think about it, we're
significantly increasing the number of
pages total that we're scraping from.
Um, but I just didn't do that for
simplicity purposes, and I just wanted
to give you guys like a I don't know, a
little nugget that you could build out.
This isn't the first time that people
have built a sort a sort of system like
this. It's not like this is
revolutionary or anything. And this is
definitely one of the poor scraper
systems I think that I could have put
together, but yeah, I just wanted
everybody here to um have a good place
to start for more advanced scraping
applications. Now, after you're done
with this as well, what you could look
into is if you run this at any sort of
scale, you know, eventually this Google
Maps um HTTP request module, this
initial one uh initial one will run into
Google Maps rate limits where they'll
think that you're scraping them, which
you are, and their AI will detect on it
and then they'll just start throttling
you. So, you can only do maybe one
request every hour or something. If you
want to get around this, there are a
couple of options. The most common is to
use a proxy. Now, proxies are basically
third party services where you pass the
request through before it goes to the
end URL, which in our case is Google
that sanitize the request and then make
the request appear legitimate by
sprinkling in a bunch of additional
data. There are variety of proxies you
can use for this purpose. I'm not going
to recommend any specific one, but the
way that the HTTP node works is if you
wanted to add a proxy, you just go down
to the bottom, click proxy, and then
paste it in. And this is going to depend
on the proxy service that they're using.
They all have slightly different formats
and they're going to give you their
username and the password and stuff like
that. But that's how you do it. And then
if you are going to search for something
like that, then obviously search for
like a search engine results proxy, a
SER proxy. That's sort of the the thing
that you should start by googling. SER
proxies depend on whether or not you
want to do like residential, whether or
not you want to do warehouse. There's a
little bit more nuance there. If you
guys want to learn more about how all
that stuff works, check out the Appify
course on proxies. It's probably the
best written one on the internet today.
I am affiliated with Ampify, but you pay
nothing to access that free resource.
Hopefully you guys saw just how easy and
straightforward it was to put together a
real actual NN scraper that allows you
to get and extract email addresses
directly from Google Maps listings
without requiring any APIs or third
party services. The really cool thing
about this is you could run multiple
variations of what I just built for
basically any service, whether it's
directories, whether it's some other
search engine, whether it's county real
estate databases and more. If you guys
like this sort of stuff, definitely
check out the full 6-hour N8N zero to
hero course that I published just a
couple of days ago. Like, comment
subscribe, check out Maker School, my AI
automation program that focuses on daily
lead generation accountability if you
wanted to turn your build and automation
skills into maybe a profitable business.
And have a lovely rest of the day.
Looking forward to seeing you on the
next video. Tris.
Loading video analysis...