From Schema Markup to Knowledge Graphs: Powering AI with Connected Data - Martha van Berkel
By Tech SEO Connect
Summary
Topics Covered
- Knowledge Graphs Build AI Efficiency
- NL Web Powers Agentic Endpoints
- Entity Linking Boosts Non-Branded Traffic
- Schema Markup Fixes AI Hallucinations
Full Transcript
Hi.
>> All right. We heard about schema being awesome right?
So, I'm really excited today because I'm going to ask you to step out of your SEO brain and step into your data architect brain. And this is really where schema
brain. And this is really where schema can play a really strategic role in this new era of AI. And I'm going to talk also about how you can do it to prepare for agents. Now, those of you that have
for agents. Now, those of you that have seen me speak, I always introduce myself with my knowledge graph. So, it's who I am and how I'm connected to other things in the world. the same thing that we do
with our websites with schema markup. So
you can see I'm a rower. I spend my summers skulling on G Lake near where I live. I attended MIT in Queens and so I
live. I attended MIT in Queens and so I have a technical background. So ask me your nerdy questions afterwards. I am
the co-founder of Schema app and the CEO. Um but I grew up in enterprise so I
CEO. Um but I grew up in enterprise so I spent 14 years at Cisco. In fact, I used to spend a bunch of time in RTP just down the way from here. I am Canadian
and I used to own a 1965 Austin Healey Sprite. And so this is a picture of my
Sprite. And so this is a picture of my car and my actual car was in the movie called Losing Chase that Kevin Bacon directed. And Kevin Bacon used to drive
directed. And Kevin Bacon used to drive my car. So you have now inferred just
my car. So you have now inferred just like search and AI by understanding the graph and the relationships between me and these other things that you can win
the six degrees of Kevin baking game.
And so this is what knowledge graphs is and I'll talk a little bit more about what that looks like. So we're going to talk about schema markup. We're going to talk about the evolving value how to build a knowledge graph. So again
thinking beyond just why are we optimizing for SEO but how are we actually preparing our data just like Krishna said that so it can be used to steer AI. And then I want to talk about
steer AI. And then I want to talk about agent readiness. Now many of you may
agent readiness. Now many of you may have seen that in October Google sort of reiterated again structured data is critical but they added something new.
They also talked about it being important for AI efficiency. And my
hypothesis as to why is because it builds a knowledge graph. And so for those of you that don't know what a content knowledge graph is, it's basically what I just showed you. It's a
collection of relationships. You know,
where you're defining those relationships using a standard vocabulary, in this case, schema.org.
And so that new knowledge can be inferenced just like we just inferred that you can win the Kevin Bacon game.
Okay, so a knowledge graph is really sort of what the intent was always behind the schema.org vocabulary and why they built it this way. And so at schema
app, we really think of schema markup, you know, as if you optimize a page as sort of that foundation layer. But in
order to elevate it to a content knowledge graph, you have to define more relationships. And so we talk about this
relationships. And so we talk about this as called entity linking. and I'll
define that for you today. Um there's
also sort of the um once you have all your entities, how do you organize them and define relationships which is taxonomy? And when you do this, you
taxonomy? And when you do this, you unlock not just SEO value, rich results, non-branded queries, AI AIO citations,
but also insights into topic authority and then also AI readiness because your data source is then ready to be consumed and accessed.
And so this is not new. I just want to like reiterate like Gardner talked about knowledge graphs being really strategic back in 2024, right? This isn't just like Martha's making up things about knowledge graphs. Draft technologies
knowledge graphs. Draft technologies been around for a long time. In fact,
there's a ton of research that says LLMs are more efficient and grounded. And
I'll share some more research going on.
And so again, this makes total sense, right? We're trying to, you know,
right? We're trying to, you know, develop trust, you know, develop accuracy just like Krishna talked about this morning. And we have the power of
this morning. And we have the power of doing schema markup to use it. And so as my co-founder talks about like knowledge graphs are no longer nice to have, right? They're actually the foundational
right? They're actually the foundational layer for AI understanding.
And I like to like look even further back. And I love this article. If you
back. And I love this article. If you
haven't read it, it's from Scientific America in 2001. And it's actually when Tim Berners Lee and Oral Lassela and James Hendler were actually pontificating about what the semantic
web will look like. And by the way, they talk about ontologies and knowledge graphs and semantics. All the things that we're hearing surface today. And
the examples are truly what we're seeing as the agentic web today. You know,
where actual, you know, machines will take action on our behalf. And so we like to call the agentic web as the semantic web in motion. And good news,
it's not happening in the future. It's
happening right now. Right? So Google's
talking about it in shopping, right?
They don't want you to leave the experience as you take actions to make purchases. You know, we're also seeing
purchases. You know, we're also seeing the same when it comes from Open AI and how they're sort of doing native integrations directly in that consumer
chat experience. But guess what? All of
chat experience. But guess what? All of
this is really sort of then the agentic web. And so today I want to talk about
web. And so today I want to talk about like how do we start thinking about preparing not just for AI search, right?
That's sort of again, you know, last year's news, but like what's happening with the agentic web? And I love some of Microsoft's um keynotes. Um Kevin Scott
is their CTO. And in May, he talked about sort of the need for an open standard, something like we had with HTTPS for how the web works today, but how are we going to build that or how are they going to build that and create
the open web for the agentic side of things? And he he really I think was
things? And he he really I think was referring to their NL web project. Now,
how many people in the crowd have heard of NL web? Oh, this gonna be so fun.
Okay, so NL web was announced in May, right after their keynote. It was same week that Google and Microsoft also was like structured data is critical for the new search and their goal is to make it
so that there's a natural language interface. So, think of it almost like a
interface. So, think of it almost like a out ofthe-box, you know, chatbot on your website and it uses structured data and RSS in order to inform its vector
database. The person leading NL web is
database. The person leading NL web is no other than RV Gua. RV Guha is the founder of schema.org.
And I think what's so interesting is that when I got on the phone with him recently and we were kind of understanding more about his vision for NL web, he really is thinking about it
as the agentic end point.
So, by the way, SEOs, the structured data that we're doing is actually the data layer that's going to help our organizations prepare for the agentic web.
So, so exciting. And so, your content knowledge graph, if we're thinking about like schema markup isn't just about then rich results, it's about building this data layer, then it really is becoming
that bridge, that like element that we can do to inform, you know, how AI and how your brand is going to be understood. And so I love this article
understood. And so I love this article by Krishna. If you haven't read it, I
by Krishna. If you haven't read it, I highly recommend it. He talks about the things that you can do. But what I love about it is he really emphasizes around semantically clear. And so your
semantically clear. And so your knowledge graph is really how to how you can do that. Okay. So I want to give you something tactical that you can run with. This is my checklist on how to
with. This is my checklist on how to build a content knowledge graph. And
it's a lot of the things you've done previously when you've done schema markup. But it's a bit different now,
markup. But it's a bit different now, right? Because rich results are sort of
right? Because rich results are sort of how we thought about schema markup before. It was sort of that lens. Now we
before. It was sort of that lens. Now we
have to think broader than this, right?
This is how our brand is being understood. So the first is around
understood. So the first is around defining things. There are 840 plus
defining things. There are 840 plus different classes within schema.org.
Do not call everything a web page. Okay?
We can be very very specific. You
actually heard Krishna call out this morning that you know like they're using it as a steering wheel to know what the page is about. So get really really specific within schema.org when you're categorizing your pages. The other piece
is like you need to define the properties and the properties historically as we look at like what are the required and recommended from Google in their documentation but guess what that's usually like 10% of the
properties that Google Microsoft Yandex and Yahoo defined back in 2011 and so we can be very articulate about the structure of the page and about what that content is on it. The other piece
is around like going deep with those properties making sure all elements of your page are defined. And so this is an example of our web page, our highlighter, which is our scalable tool.
And you can see that like we're not having multiple different types of schema.org. Like it's one type. The page
schema.org. Like it's one type. The page
is about one thing, right? And then I'm defining everything else on that page with relationship to that one thing.
The other thing is about breath.
Historically, if you'd asked me five years ago, I'd be like, let's make sure we're optimizing for rich results. Let's
like prioritize what you're putting schema markup on for that visibility.
That game has changed. changed in the last 12 months. And so now we're really thinking about what do we need to know about our brand? What about our brand needs to be fully understood? And let's
make sure we're translating that, you know, in relationship to everything else. Which brings me to my last point,
else. Which brings me to my last point, which is around connectivity. This is
about entity linking. Now, I want to be really clear here. When we're talking about connections, we're talking about literally like my introductory where we're trying to find the relationships
between things, right? We're going to make sure it's all connected. And the
way I like to think about that is because, you know, the robots are smart, right? Like LM are smart. They can crawl
right? Like LM are smart. They can crawl the graph to make inferencing. And so we want to make sure that we're not just saying, "Oh, that's a product over here, and this is a blog over here, and here's a person. Let's define what all those
a person. Let's define what all those relationships are so they can do the inferencing just like we did with my Kevin Bacon example."
It's like, I knew I was going to say that. I always like to kind of go back
that. I always like to kind of go back to how I define an entity. And the way I usually describe it is by talking about a shoe. Now, if I tell you I have a
a shoe. Now, if I tell you I have a favorite shoe, my shoe is awesome. It's
my, you know, the best shoe ever. Kind
of like many of you will, some of you will think about different kind of shoes. Maybe you'll think about your
shoes. Maybe you'll think about your favorite shoe. But if I'm like, it's a
favorite shoe. But if I'm like, it's a fluorescent pink 5 in heel, right? With
five straps across the front, you're going to think very differently about what that favorite shoe is. And so think of entities as being something with properties, right? It has dimensionality
properties, right? It has dimensionality to it. And those properties, guess what
to it. And those properties, guess what are like what you're doing in schema.org, right? Like those are what
schema.org, right? Like those are what you're doing. Now, it translates to
you're doing. Now, it translates to triples within knowledge graphs, which I know JP will love because he speaks my language. But the most important thing
language. But the most important thing is we want to be descriptive, right? We
want to kind of describe those entities.
Now, entity linking, external entity linking is where you're using authoritative um data sources, Wiki data, Wikipedia, Google's knowledge graph. And then
internal entities is where your page is describing that entity. So, think of this like we like to talk about as like the entity home. So, if you're, you know, with a bank and you're talking about a specific credit card, like what
is the one page that that's entity home for that credit card? And this again is like if it's going to go look for that source or you want to reference it, it's sort of like backlinking but with context, right? Because we're going to
context, right? Because we're going to define that relationship. Now, good
news. Um I have some data to share with you. When we worked with Bride View
you. When we worked with Bride View Senior Living and we did entity linking specifically around locations, we found that non-branded queries
related to that location went up, the clicks went up and the impressions went up associated to that query. Okay, the
whole case study is on our website, but the entity linking is driving results.
We did a similar um exercise with Incinerator. This time more we were on
Incinerator. This time more we were on product pages and we were trying to disambiguate um both their brand but also sort of the actual product like what was this product and how do we sort
of refer to it specifically and again we saw an increase in clicks and impressions tied to non-branded queries associated with that entity. Good news.
So entity optimization is about like making sure you're curating. you're
really thinking about like what are the things you're trying to be known for and is that really clear within your schema markup in the data layer, right, that the machines are understanding, how do you audit it? Um how do you make sure
then you're optimizing for those? And
then the last part that's really neat is we've started to actually look at search console data by entity so that you can actually start seeing, you know, if you're creating more content, if you're doing sort of more work on that, how is
that actually performing? And so I like to think of curating your entities as sort of like managing your brand and managing sort of that topicality within your data layer, right? Within your
schema markup for what you want to be known for. So it's really taking your
known for. So it's really taking your schema markup again to that content knowledge graph and ensuring there's high accuracy. Why? Because if this is
high accuracy. Why? Because if this is what AI and agents are going to be used to understand your brand, we want to make sure it's accurate. Everyone with
me?
All right, we're ready to talk about agents.
I got 12 minutes.
I am so excited about this stuff because I just can't believe how much of an amazing role SEOs are going to get to play in this next movement of the web.
And so I want to now again I've asked you to be data architects today. Right.
Right. We're not going to wear SEO's hat. So
hat. So AI systems and browsers are going to take actions. Right. Not just humans.
take actions. Right. Not just humans.
it's going to be in in your direction, but they're the ones who are going to be taking those actions. And so the website is now the data source, right? It's it's
sort of again we have to think about like how we're architecting that data to be readiness. And there's new standards
be readiness. And there's new standards like MCP and NL web that are coming out that are going to allow us to kind of create access. And so when we're sitting
create access. And so when we're sitting back and thinking and we work with large enterprise, like what do we need to do to be agent ready? We need to make sure we have our knowledge graph, right? that
it's accessible, it's correct, it's also complete, right? We need to have it the
complete, right? We need to have it the whole story.
We need to make sure that we have AI governance, right? We talked a little
governance, right? We talked a little bit, Christian talked a little bit about this on the AI side, but I'll I'll talk about it from like governance on sort of how your data is ready. And then the other is like how do we make sure we're thinking about agentic endpoints?
Everyone ready?
Okay. So, MCP, who's familiar with MCP?
It's been around for about a year now.
Yeah. So, model context protocol. I
lived through the 2014s of doing APIs like APIs for AI and agents, right?
You're giving you're being very directive as to what it can access. Um,
it's an open standard. It has only been around for a year, right? But it is becoming sort of a deacto standard. It's
a great way that again if you're able to store your schema markup in a knowledge graph, that MCP connector is a really great way that you can kind of connect it to agents. The other is Microsoft NL
web that I talked about. And this is really interesting because NL web actually contains an MCP server. So I'm
going to illustrate um in the next slide after this is like how do those things like work with regards to thinking about how your data is being accessed. And so
I like to kind of think about you know how do we turn your graph right because we're now not thinking about page optimization right we're to thinking about the system like how are we kind of
using that to fuel agents. So first
thing like you still need a website and you still need amazing content, right?
Fresh content. We heard that this morning. Okay. Then you need to make
morning. Okay. Then you need to make sure you have schema markup on it so it's kind of understood. Then we need to make sure that like you're being very clear. It's the entities and the
clear. It's the entities and the disambiguation about it. So we need to kind of think about that entity linking and make sure you're clear on to like where those sources are. And then you need to think about the agentic access
layer, right? So the agentic access
layer, right? So the agentic access layer is like how are you going to make sure that they're accessing the information in the way that you want it to get to that right understanding and then you also then need to be designing
the agentic outcomes and one of the things that I found so interesting even in the last 6 months is the opportunity for collaboration like beyond SEO and
content right this is this is like a whole team effort right it is going to be your friend DevOps are going to be your friends right like all of this
information comes together.
So this is how we're thinking about it today. So today like if you do nothing,
today. So today like if you do nothing, right? You're just putting schema markup
right? You're just putting schema markup across your site. Like this is how it's being accessed, right? So the search engine search indexes are coming.
They're crawling your your content as Jamie said earlier like this is what we know about, right? But you're not really in control, right? Like you're not really kind of providing, you know,
specifics on how you want it accessed or interpreted. And so that's where like an
interpreted. And so that's where like an MCP server can come in, right? So just
like an API a decade ago, now you can say this is the data I have. You can
access my schema markup or my knowledge graph in this specific way. Um and so your your schema markup or your knowledge graph becomes like that source of truth for that MCP and therefore
you're then controlling how it's understood and MCP can then connect directly natively sort of with the chats and everything else, right? Because part
of this is how are you going to get your brand data in those native applications and there's like SDKs coming out the wazoo from all of them and and good news like most of them are are adopting MCP
as a standard. Now what's really cool about NL web is like I said it has an MCP server built in. Well also cool like within a week of Atlas being announced
the open- source c code for NL web also included an SDK for Atlas. And so with NL web now, you're actually creating a browser experience directly into your data source because your schema markup
is basically training that vector database in order to then sort of interact with sort of the web. It has a native integration with all these different chat bots, right? Depending on
sort of where your audience is going.
Um, and it can be an MCP server. So now
you don't actually even need to do the MCP work because NL web is going to solve that for you. And what's beautiful about NL web like it is Microsoft's project but it is open like they're
really trying hard for this to be the new open standard just like HTTPS and I think you know as how fast this is moving. Um it's going to be really
moving. Um it's going to be really exciting to learn more. Um so my CTO my co-founder is actually writing a lot about this. Um NL web is actually on our
about this. Um NL web is actually on our website. um it's buggy right now. So
website. um it's buggy right now. So
just like know that but again we're trying to be early adopters of this to see and be able to measure and share the case studies as we understand sort of what the impact is and how it can work.
So thank you Krishna and the Microsoft team because I think this is going to make our work even more strategic.
So let's talk a little bit about governance. I know it's boring but I
governance. I know it's boring but I think it's important for us to be thinking because if we're data architects right we're being data architects today we need to be thinking about AI governance. And so I love this
Gartner quote from earlier this year where they're like, you know, AI like the CIOS aren't getting the value from AI yet because they don't have the data
foundations.
Good news.
As SEOs, if we're doing really robust schema markup and we're connecting all the dots, we have the data.
This is awesome, right?
And so again like when we think about trust it's about sort of like that semantic data source and I I'll share some like in the next couple slides about sort of like good news like
knowledge graphs have a history of being an amazing data source for large language models and we want it grounded in data because this is our brand right like this is the new marketing out there
and so accuracy is one of the pieces and so we talked about hallucinations and I I love that Krishna talked about grounding Because knowledge graphs are known for grounding. In fact, um this this study
grounding. In fact, um this this study from Jon Snow talked about knowledge graph grounding having 91% accuracy versus 43% from GPT4,
right? And this is just like one
right? And this is just like one example. I'll share some more data. Um
example. I'll share some more data. Um
we released this case study Laura last week. Last week Laura's here from Wells
week. Last week Laura's here from Wells Fargo. Um she gave me the okay to share
Fargo. Um she gave me the okay to share this today. So we are also seeing schema
this today. So we are also seeing schema markup solve hallucinations and in this case study we actually had a location that on AIO was saying it was
permanently closed. You can imagine this
permanently closed. You can imagine this was a problem for the bank right they reached out to John Mueller. They tried
different things. Now it happened that we didn't have schema markup on those pages connected to the rest of their graph. And so being good citizens we're
graph. And so being good citizens we're like we'll just do it for you. Let's see
if we can solve it. And when we put robust location schema markup and connected it with the rest of the knowledge graph for Wells Fargo, within
days, AIO started citing their location page instead of this 25-year-old news article.
Schema markup can solve hallucinations because it's grounded, trusted data.
Round of applause. Come on.
[applause] Thank you Laura and the Wells Fargo team for letting me share that. And then also if you have questions feel free to ask Laura or I afterwards.
>> Thank you. So I always like to go back to like this isn't just about SEO research though, right? Like let's go back and look at the research around knowledge graphs, right? I shared the
one earlier like 300% more accuracy sort of with regards to LLM responses in enterprise when they look at graph data versus sort of a structured data perhaps
in a relationship database. Does
everyone know the difference between those two things? Like a relational database is like a SQL database, right?
That has like tables and then if you were wanting to add something new in that table, it's a pain in the butt, right? Because now you have to like,
right? Because now you have to like, well, we need data for every field. we
need to do elements in a graph.
Everything's in triples. And so
everything is just like one relationship away from being able to add dynamic data. And so what's cool about
data. And so what's cool about schema.org is it was written every time you add a property, you're actually adding a triple. And so what's cool is that there's like as they evolve schema.org and they're likely going to
do that with like agent actions, you know, it's merely just like one property away from like adding that to your entire robust data set. Um the other one I love on here is is just around sort of
like the survey around hallucination mitigation. So there's like academic
mitigation. So there's like academic review showing again that knowledge graphs solve hallucinations. Um these
are all linked. They'll be shared and we're happy to kind of share data and sort of research that's been happening.
The other piece is around open standards and so schema.org is the open standard that you are all familiar with but when it comes to knowledge graphs there's other open standards and there's other semantic people in the crowd. So like JP
someone you can talk to about this. Any
other other semantic technologists in the crowd? Okay, come talk to me
the crowd? Okay, come talk to me afterwards. I'm happy to be a resource.
afterwards. I'm happy to be a resource.
So RDF is like another piece. It's like
basically the most robust way to build a knowledge graph. Um you know there's
knowledge graph. Um you know there's other sort of more property graphs but they're not as robust and they likely won't stand like the tell of time sort of in this kind of new world of Agentic.
Um Provo is actually more around governance around being able to make sure that you understand changes that have happened to the graph. Again, if
you work in an enterprise level like we do, um that's important. And then
finally, MCP and NL web that I've talked about. And so, the last kind of key
about. And so, the last kind of key piece, agentic entry points are going to be really interesting because I think we're going to need to think about what is a conversion today and how is that going to become an agentic action in
future. And again, we can think about it
future. And again, we can think about it from a marketing standpoint, but again, this is where we're going to have to partner with it because like we need to now define, you know, how the rules are
and the conditions for when those actions can happen, which agents are going to be able to do that. And then I think the last piece that's really new and I would say we're even at the sort of beginning of it is like how do you
have a registry of like all the different actions that you want to be able to happen within your business? And
we're already seeing sort of opportunities with some of our clientele around like booking an appointment, right? And how do we sort of look at
right? And how do we sort of look at sort of enabling that simple action that's just really like passing the agent, you know, with the data we have to the appropriate endpoint. Um, so I think we'll need to start thinking about
what that is and then also making sure the data and access points are available.
Okay, so agent readiness starts with schema markup. So good news, us SEOs are
schema markup. So good news, us SEOs are going to be able to lead the charge, but we need to be thinking about governance and we need to be thinking about preparing those agentic endpoints.
So what do you do next? Lock in that foundation. Like to me, this is if ever
foundation. Like to me, this is if ever you needed a business case in order to do schema markup, like now is the time.
Um, and again, think of those like breath, depth, like what do we want the world to know about our brand? Um, you
need to be thinking about entities and disambiguation sort of within your graph and within your schema markup. Again, we
have a ton of resources on our site to help you do that. And then how do we then make sure that our graph is ready to be accessed? And whether that be testing MCP, many organizations are doing MCP work, but don't necessarily
maybe know that like NL web's a thing or that they should be looking at sort of, you know, how do you make sure your schema markup is being consumed by that MCP server so that you can use that to accelerate those AI initiatives within
your enterprise.
And so schema markup is more than SEO.
It's about content optimization and about enabling AI innovation and agents.
So, if you want to know more, feel free to reach out to me. I'd love to connect on LinkedIn. Um, or feel free to
on LinkedIn. Um, or feel free to download our ebook. We have a ton of content on this stuff. It's all free.
Um, feel free to come to our website and check it out. That's me on time.
[applause]
Loading video analysis...