Mercor CEO: Evals Will Replace Knowledge Work, AI x Hiring Today & the Future of Data Labeling
By Unsupervised Learning: Redpoint's AI Podcast
Summary
## Key takeaways - **AI excels at text-based talent evaluation**: AI models are nearing superhuman performance in evaluating text-based candidate information like interview transcripts, written assessments, and resumes, yet this capability is largely untapped in the economy. [01:41], [01:56] - **Reasoning models unlock AI hiring**: The recent advancements in large language models, particularly their improved reasoning capabilities, have transformed AI's effectiveness in hiring by enabling better context handling and focus identification. [02:36], [02:50] - **Mercor: Global labor market infrastructure**: Mercor was founded to address the fragmentation in global labor markets, aiming to create a unified platform where any candidate can apply to any job, facilitated by AI matching. [04:24], [04:52] - **AI can automate human hiring tasks**: Mercor automates manual hiring processes like resume review and interviews by using LLMs to score candidates and predict job performance, moving beyond subjective 'vibe checks'. [05:36], [07:05] - **Data labeling market shift**: The data labeling market has shifted from crowdsourcing simple tasks to requiring high-quality experts who can work with researchers to create complex data that challenges advanced AI models. [14:23], [15:01] - **Future of knowledge work: Evals**: A significant portion of future knowledge work may shift towards creating evaluations and identifying proxies for skills that AI cannot yet master, rather than performing repetitive tasks. [27:29], [27:40]
Topics Covered
- AI is already superhuman at evaluating text-based talent assessments.
- AI Will Revolutionize Hiring Assessments, Not Sales
- The Data Labeling Market Shift: From Crowd-Sourcing to High-Quality Talent
- AI's Job Impact: The Real Threat Regulators Should Focus On
- Evals: The Underhyped Bastion of Human Capability in AI
Full Transcript
Humans have this very strong bias
towards thinking that they're right in
this vibes based assessment. Hiring is
like the original vibe everything
definitely do not suffer from that.
August of 2023, one of our customers
introed us to the co-founders of XAI and
then 2 days later they had us into the
Tesla office. We were still in college
right? Like this is insane. What's the
state of AI evaluating talent? What will
humans be doing in the economy in 5
years? Please tell us which is a huge
question for everyone. At least
everything I'm seeing is leading me to
believe that Brendan Foody is the
co-founder and CEO of Meror, a company
building the infrastructure for AI
native labor markets. Meror's platform
is already being used to label data
screen talent, predict performance, and
evaluate both human and AI candidates.
It's a really interesting company at the
intersection of recruiting, eval, and
core to improving foundation models.
Brandon's team recently raised $100
million, and they're working with some
of the most sophisticated companies in
AI today. Our conversation today hit a
lot of interesting things, including
what role humans will play in labor in
the future. We talked about the types of
data labeling that really matter to
improve models going forward. Brennan
reflected on Mor's rapid ascent and some
of the key decisions he made. And we
also hit on where AI does and doesn't
work in the hiring process today. All in
all, a really interesting conversation.
I think you'll really enjoy it. Now
here's Brandon Foody.
Well, thanks so much for uh for coming
on the podcast. Really appreciate it.
Yeah, thank you so much for having me
on. I'm big fan. So excited to chat.
Yeah. Excited to to have you here. I
figured we'd start at like the the
highest level place, which is for our
listeners, I'd love if you just
contextualize like where are we today?
What's the state of AI evaluating
talent? Like what works, what doesn't?
Uh what's going on? I'm amazed at how
good it is. Like I think that everything
that a human is able to evaluate over
text, the models are close to superhuman
at. whether it be the transcripts of
someone's interview, the assessments
that they're filling out in a written
way, or even the signals on their
resume. And it's a fascinating dichotomy
because so little of that has actually
been distributed in the economy, right?
And so there's just this like huge green
field associated with doing that and
it's one of the things we're really
excited about working on and building
out. Yeah. Were there were there things
that didn't work pre-reasoning models
that like you know maybe talk about the
last 6 months like as these models have
gotten better what's like finally
started to to work for you guys? Yeah, I
I remember back in end of March of 2020
or 2023 when GP4 came out and we were
like build our first prototype of AI
interviewer and nothing worked, right?
It was like the model would hallucinate
every two or three questions and all of
that. And so it's just been riding this
incredible tailwind over time. And I
think the reasoning models are just
obviously the knowledge in the models
improved a lot in sort of the first year
and then the reasoning models have made
them much better at particularly
handling a lot of context, figuring out
what matters, what to focus on, etc.
It's been really cool. Still there's
multimodal things that the models aren't
as good at just cuz it historically
hasn't been as much of a focus of the
labs and it's a lot harder to do RL
with. Um but we're excited about that
being added soon. Yeah. What are the
milestones that like Matt you're like I
can't wait till the model can do X or Y
or Yeah. There there's a handful of
things like there are certain things
that the humans are very good at like
this like vibe check of whether I would
enjoy working this this person whether
this person is passionate and like they
seem really genuine about what they're
saying. uh that the model it's really
hard, right? It's hard for the best
humans, let alone models. Uh and so I'm
really excited about that uh and
building evals out for a good chunk of
it. But when whenever I read through
like the reasoning chains of the models
uh I'm like in trying to decipher things
that are eval, I'm always thinking like
wow, the the model seems a lot more
reasonable than like our than whatever
uh researcher on our team was creating
the eval, right? And so it's u it's
really incredible how fast they've
improved. Um, and I think everyone
obviously is seeing everything working
in in code, but uh, but we're just in
the early innings of of a lot of other
domains uh, that are taking off in an
incredible way. And obviously, it seems
like a big part of what you're doing is
basically, you know, coming up with
evals for humans and how good they'll be
at jobs. You know, obviously we have all
these people creating like AI employees
now. It's like, hey, agents are going to
do this or you'll have an AI agent
doing, you know, this set of tasks that
an employee would do. Do you guys play
into this at all? Absolutely. So, I
mean, we do a huge chunk of this. Maybe
giving a little bit of the backstory of
the company. The reason we started is
that we felt like there were incredibly
talented people all around the world
that weren't getting opportunities. And
the primary reason is that labor markets
are very fragmented and that a candidate
somewhere else in the world maybe it's
remotely in the US or another country
was only applying to a handful of jobs.
The company in San Francisco is
considering a fraction of percent of
people because there's this like
matching problem that they're solving
manually. And through applying LMS, we
could solve this matching problem so
that we could build this global unified
labor market that every candidate
applies to and every company hires from.
But then we realized that there was this
huge takeoff in hiring people associated
with these like new knowledge work roles
in evaluating LLMs. Um and so now we
hire you know all sorts of the experts
for the top AI labs that um use our
technology to help facilitate that um
both for uh you know creating evals to
to evaluate our experts as well as to
evaluate the models and all of these
agents that you're discussing. Maybe for
our listeners too on the the merkor side
um you guys obviously have a bunch of
uses of AI and screening candidates
going through resumes. um can you talk
through some of the different use cases
that you have for AI and then what the
stack looks like um that you guys are
building on today? Yeah, I think a good
huristic is just thinking about all the
things that humans would do manually
creating evals over those and seeing how
we can automate them. So similar to how
a human would review a resume, conduct
an interview and then rank people or
decide who should be hired. We automate
all of those processes with LMS. And so
we have evals for how accurately are we
parsing the resume, how accurately are
we, you know, scoring different parts of
the resume, how accurately are we asking
questions in an interview, evaluating
that interview, and then passing that
all into model context along with the
references or every other kind of data
that we have on a candidate to make the
unprediction around how well they'll
perform. Is it mostly off-the-shelf
models and you're kind of curating the
evaluation and context around them?
Yeah, there's a lot of off-the-shelf
models for more basic things, but
particularly for the like hardest
problem of making the end evaluation of
a candidate is where the post training
comes in and learning from all the data
we get from our customers of who's doing
well, for what reasons, how can we learn
from those signals to make better
predictions around who we should be
hiring in the future. Have you learned
anything about anything surprising about
those signals of something that the AI
found where you thought, you know, maybe
this isn't how I would have thought
about it or how humans would have
thought about it? Yeah, there's there's
all sorts of things. I think that one of
the key benefits of AI is that it's able
to just go way more in depth about like
everything about a candidate and it's
able to pick up on all the small details
that humans sometimes miss or like uh
you know the the vibe check sort of
skips over because people already have
their mind made up on a candidate. And
so there's all sorts of like little
resume signals if um people have
demonstrated extreme interest in a
particular area where they're just doing
it for fun. uh as you would anticipate
all the way to like different signals of
whether someone studied abroad in a
country that uh is where they're doing
the end job. They might they might
communicate better and and be more
conducive in a work environment. Um and
and so there's there's lots of those
little things that come up and are are
very specific to projects and customers.
Are there certain things that you see
kind of will always be done by people?
You were talking about the multimodal
stuff, but I guess how do you see AI and
you know human interviewers working
together versus a world where it just
kind of goes all AI assessment? Like at
a simplistic level, the hiring process
involves assessing candidates and
selling candidates. And the assessment
part I think is going to soon get so
good from LMS that it'll sort of be uh
like foolish to think we know better
right? like will people just like take
the recommendation because it'll like
have proof that it is performing so much
better on the eval on the end outcome
that customers care about where humans I
think will still continue to play a
really large role in the selling process
of like this person that we're going to
be working with and spending time with.
Um and I think about it as enabling
human recruiters and hiring managers to
spend all of their time on the
candidates they want to hire rather than
all these interviews of people that they
don't end up wanting to hire. And so
really, yeah, unlocking them to, you
know, help people better understand the
role, better understand the people that
they're going to be working with and all
the things that they should be excited
about. Yeah, I love that. Will, uh, will
people start gaming the assessment?
Like, is that something that you've seen
at all? I guess the LMS are picking up
on certain things if you put in this
side of they studied abroad in the right
place. They all studied abroad in the
you know, in the place where they're
recruiting for that. Yeah. Yeah. It's
why sometimes you have to be a little
bit secretive about the signals, right?
But yeah, I mean we we have so many
things where we we deal this with this
as every large hiring process does. And
so I think the key is ensuring that
assessments are relatively dynamic.
Either like the problem that they're
working on is changed frequently or that
you're asking them super in-depth
questions about a particular part of
their background because there's so much
in the way of talent assessment that
becomes possible when the models are
able to do immense preparation for an
interview, right? Like when I'm like
doing a first interview of an executive
candidate, like maybe sometimes I'll
have references on them, but most of the
time I look at their LinkedIn profile
for a couple of minutes. I have like
some preliminary notes, but imagine if I
could go like listen to a podcast that
they were on, right? Go read through
like blog posts that they've written
all of the papers that they might have
done during their PhD and ask about
those things, right? You can get way
more in-depth and nuanced in a way
that's very hard to game. Obviously, you
have these models that are pretty good
at predicting how well these candidates
will do. 27 doesn't matter that that's
explainable or like these models just
have like, you know, in a black box like
yeah, this person's going to be good and
this person's not. Yeah, I think it does
matter that it's explainable uh for sort
of two reasons. First is uh for
customers to like understand and trust
those those claims, right? Like building
trust and uh all all the reasoning
chains. And then the second is obviously
making sure that the models are
selecting people for the right reasons
the reasons that they they should be
considering. And so it it's beneficial
but I I think like the end state of the
economy is probably just that like it'll
be, you know, some sort of API or
interaction where people want work done
or they they need some level of human
involvement uh and just a a confidence
interval on how that well that person
will perform on the job and there's far
less of the uh intermediation that
humans play in the process. Yeah, it's
like an interim trust milestone on the
way on the way there. No, it makes uh
makes a ton of sense. And then
obviously, you know, today in in kind of
the first uh or or one of the areas we
have a lot of fit on the data labeling
side, there's kind of these clean
feedback loops of like, you know, I
imagine you could even score like how
accurately and you probably have
multiple people looking at the same
pieces of data. Talk about some of the
challenges maybe in translating this to
like maybe more vague domains of uh of
human work. Totally. I I mean like
venture capital. Yeah. What wait 15
years and uh and then you get your
feedback loop. Yeah. One way I think
about it is like if you have a hundred
people that are all doing the same job
it's very easy to stack rank them versus
if you have a hundred people doing a
very different job, uh, right? Like
founders, right? Like they're all
working on something that's nuanced in
one way or another. It's very difficult
to like pattern match like what is the
thing that they said or the thing that
we learned that actually translated to
the outcome because there's just like so
many confounding variables in in the
equation. And so I think that it's going
to be like relatively easy for the like
larger pipelines of roles like if you're
hiring 20 account executives, right?
Stackranking all of them, learning from
those signals. Um and then the models
are starting to be able to learn from uh
these, you know, much more complicated
things where everyone's working on
something else. uh like we're doing a
ranking of a bunch of the the teal
fellows and and that's that's like a fun
case, but it it definitely is more
challenging uh and relies more
underlying reasoning capabilities of the
models. Maybe just talk through like
what are some of the challenges that
emerge in doing that? Yeah. Well, it
it's basically that oftentimes there's a
lot of things that aren't in model
context and so models struggle to learn
from that and people like forget to add
it to model context. So maybe it's like
I heard my like friend said this good
thing about using this company's
product, right? And like or these things
um that you know might not be making
their way in. Making sure that all like
references are added um all the like
interpersonal stuff that humans might
pick up on. Um so we we found that
actually often times just making sure
the requisite data is in model context
is like uh majority of the problem.
Yeah. I guess in the future maybe we're
just recording every conversation with
you know our smart glasses and easy
enough to feed into the model.
Bridgewwater had it right all along.
Exactly. Exactly. Is that where we're
headed? Is it just like it'll be uh you
know Bridgewater at scale? Uh we'll see.
Um I mean I think of course a lot of
companies will be adverse to to that and
I think there will be regulatory reasons
and illegal reasons people don't want to
do that. Um but I I also think there's
just going to be better processes for
how models help get this information in
context, right? Maybe it's AI doing an
exit interview of the manager and the
people on the teams to help better
understand what was going on because all
the people have so many details in their
head uh around this that we we just need
to get into the models for them to be
able to make these superhuman
predictions. Yeah, there are certainly
more and more both founders and you know
all kinds of people that are bringing AI
to their meetings and so I think a lot
of you know those meetings and
interactions will be recorded for AI to
learn from. Totally. I think that'll be
interesting. We need you to take our
transcripts and stack rank us against
each other.
Only if I come up on that. Uh what do
you think of the data labeling landscape
today? Like how do you see the different
players kind of differentiating from
each other? It seemed like scale was
really in a position to run away, but
then now there's been a bunch of kind of
new players in the landscape like how do
you how do you think about that world?
Yeah, I I think like the key thing that
most people don't understand in the like
data annotation evaluation landscape is
just the shift in the market and how
dramatically different it is from what
it used to be 2 years ago. Cuz when
Chhat GPT came out, it was the models
like weren't that good. It was easy to
trip them up. They were making mistakes
left and right. like even like a a high
school student uh or or like college
undergrad could do a lot of completions
or evals to help improve the models in
this crowd sourcing fashion where they
run these huge pipelines to get hundreds
of thousands of pieces of SFT or RLHF.
SFT being inputs outputs, RHF being
choosing between a bunch of different
like preference options like you would
see in chatbt. But as the models got
really good, that crowd sourcing model
started to break because you needed
really high-quality people that would
work directly with researchers to help
them understand why is this model doing
well, why is it not doing well, how can
we, you know, create this really
complicated data that helps to trip up
the model and actually reflects like the
real world things that we want to
automate. And so our platform of finding
exceptional people that you would want
to work with was perfectly positioned
for that and that we can hire these
really high-quality people super
quickly. Um, and that caused us to take
off and and have all of the the traction
working with the big labs. And I think
that trend will continue and that like
the companies that are stuck in these
like super high volume crowdsourcing
pipelines um are probably are certainly
going to see a lot of churn and it's
going to be the new players that
understand the direction the market is
headed um and lean into really
highquality talent underpinning it that
are going to continue taking a lot of
market share. Do you think there'll
always be demand for I guess humans in
the data labeling process? there's you
know obviously more and more that can be
done with these models or big model gets
really good at one task and then can
train smaller model like how do you see
that evolving over time yeah the way I
think about it is that so long as there
are things in the economy that humans
can do that models can't do we will need
to create evals or environments so that
models can learn how to do those things
so I think there's certain domains where
that's just going to get solved sooner
than others right like within math or
even many parts of code like you don't
need that much data. It's super
verifiable. The models will solve those
problems. But then there's other domains
that are like much much more open-ended.
What makes a good founder when we're
assessing them, right? Uh or or you
know, most honestly like a large chunk
of knowledge work domains, maybe a
majority of them are these like
open-ended problems that are really
difficult to verify and understand what
good looks like. And you just need to
get all of that understanding that the
models don't have into the models. Um
and that's why I expect like orders of
magnitude increase in the human data and
evaluations market over time. If I
understand correctly, you guys, you
know, clearly I think one of the initial
arbitrageages you and and what inspired
the company is you have these great
coders that are all around the world and
like, you know, they're not getting
access to some of these jobs and
obviously that ended up being really
important for coding data. Um, you know
obviously you've expanded into other
areas as well. Like what if you know
coding again it's the perfect like RL
use case. It's probably also really
perfect for evaluation. like what have
you had to change or improve as you've
like gone into some of these fuzzier
domains uh and and recruiting people in
those areas? Yeah, I think that there
leaning on a lot of the heristics of
what a human would do manually is
probably a good way to do it. So, for
example, if you want to automate being a
consultant, how would you assess
consultants that can help to do that?
Giving them a case study. Uh maybe
that's specific to their background.
Maybe it's like maybe a silly question
but like you guys are all probably great
coders and so I imagine you know how to
evaluate coders. like if you're starting
to get a doctor on the platform like
like how do you even know how to what
the heristics are for humans? Well, I
think the point you're getting at is
really interesting which is that as you
start to get into domains beyond like
the machine learning team's
capabilities. They need to have these
experts. We need to have doctors that
are helping us create our like doctor
assessments and our evals for what makes
a good doctor as as well as um you know
a bunch of other domains. And similarly
it's what the researchers need to do
with all of their technology, right?
Like when we were all going on limbs, it
was super easy to uh you know look at
like the high school level physics and
say like what problem was right or or
which one was slightly better. But when
it's like PhD level chemistry and the
researcher doesn't have a PhD in
chemistry, it's really hard to
understand what's going on to interpret
these evals to figure out how we can
improve them. Um, and so I think that
that's the other big shift to your
question earlier around evaluations is
is that both for assessing our talent as
well as the re the way the researchers
uh assess models. It's just going to be
this much more collaborative process and
working with people to help trip up the
model and improve capabilities. I I've
heard you talk before about how this
kind of like short-term data label
contract work is like it is kind of the
perfect initial market for what you've
done and there's a massive amount of
demand and it's kind of this wedge to
like eventually doing just kind of like
endtoend labor markets. I'd love to just
hear you riff a little bit on like
what's the sequencing of the company
look like from here uh you know toward
that vision. Yeah. Well, I wrote our
like secret master plan that goes over
this a little bit. But the way I think
about it is we have like the reason that
marketplaces are generally hard to build
is that they're very network effect
intensive. And so the thing that makes
them defensible also makes them hard to
build. And so it's important right now
that we're very focused on like drilling
this wedge of huge amounts of demand
that we have to expand the network
effects, grow the marketplace and and
focus on that right now. Um, but then
we're also starting to see a lot of
demand for hiring high volumes of
contractors from our existing customers
at big tech companies where they might
need, you know, hundreds of data
scientists or or software engineers or
whatever the role is for a particular
domain outside of human data. Um, which
is really the exact same kind of
request. Um, it's just a little bit more
of a legacy market where you'd be going
up against like the Accenturers or
Deoites of the world historically. Um
and so leaning into that as like the
second main focus. uh and then expanding
to all sorts of full-time hiring. Uh but
but one of the key things is that like
the lifetime of the business, we've been
doing all of these. Uh like even the
first year of the business had nothing
to do with human data. It was just like
hiring contractors for our friends and
for ourselves, many of which became
full-time employees. Um and so it's it's
much more continuous and there's a lot
of things that unify them and that we
know that all companies want more
candidates. They want to be able to hire
them more quickly and they want
confidence that they'll perform well.
And so if we just measure those things
and improve them over time, that'll
position us for every stage of the
business. Yeah. Was there a moment that
it was like obvious to you to lean into
the human data side? Like it was just
like so abundantly clear this is where
to Yeah. I remember it was while I was
still in college. So uh I mean the
background of the business is I met my
co-founders when we were 14 in high
school. We were all the speech and
debate team together. They were like
winning all the tournaments that I I
wasn't as good as them but I was
building companies and then we started
hiring people internationally at the
IATS in India. Like we partnered with it
Krogpor's code club and we were amazed
that there were these smart people as
you're mentioning that weren't getting
jobs and we felt like we could hire them
to build projects. Our friends wanted to
pay us to hire them. We could take a
small fee. So we hustled a lot
bootstrapped that to a million dollar
revenue run rate. We profited 80k after
paying ourselves before dropping out
which I was like very proud of. But the
parents still weren't satisfied with
that of course until we had raised
money. Uh but to your question, in
August of 2023, one of our customers
introed us to the co-founders of XAI
while they were still working out of the
Tesla office and he said these uh Merkor
has these really smart engineers in
India that are phenomenal like math and
coding. Uh and then like the XAI
co-founder the next day uh got or one of
them or two of them got on a call with
me and and our team and were just like
really excited and then two days later
they had us into the Tesla office to
meet with the entire XI co-founding team
except for Elon. It was right right
before one of their meetings with with
Elon. We were still in college, right?
Like this is insane. Uh and we were just
like wow why like why do they want what
we've built so badly? And it's because
there was this change that was
happening. so fast in the market that no
one else had realized yet, right? Um and
now of course we've like scaled that up
and and are talking about it because we
have uh you know critical mass of of the
market share but um but that was that
was the point and then we but they
weren't ready for human data yet and so
it wasn't until call like 6 months later
that we started working with a lot of
the frontier labs and and really scaling
up the business. You could see the see
the title wave coming. Yeah. Yeah. I I
think like one thing I've realized over
time in founders looking for product
market fit is that people try to force
things too much sometimes. It's like you
need to just look for the signs of the
market where it's like wow there's like
gold to be found and just like drill
after that. Um and cuz like if it's hard
to get like an initial sale then it's
going to be hard to scale up the
process. you need to rather like look at
like what are the like the really strong
pain points where the wealthiest
companies will pay whatever it takes
right and just like sniff those out and
then lean into them. I guess if you've
expanded beyond coding like maybe to go
back to the doctor example cuz I'm
struck by when you were describing it
one uh you know in some senses like
evaling what a good doctor is is like
actually what you're what you're
eventually going to bring these people
to the model companies like they're
going to figure out like you know is
this the reasoning process that a that a
good doctor would use. What are you
actually doing uh when you're working
with someone to do eval? Yeah, I think
that one of the key things that humans
are better a lot better at right now is
like learning over time from the
instructions, from the training, from
all the feedback. And so, excuse me
looking for these proxies that people
have demonstrated of like they're, you
know, asking the right questions uh
about the problem. They're going about
thinking about it in the right way. They
have signals in their background that
indicate they've been in these high
performing environments where people are
obviously uh learning significantly over
time. Um and all of those translating to
them finding ways to trip up the model
and and improve capabilities. Do you
guys use your own product today and like
how does it get used in your own hiring
process? Absolutely. We use it for every
role except our executive roles. So I
mean we still have the listing for
executive roles but most of our
executives like I I would take the first
interview rather than sending them
straight to the AI interview for the
selling reason, not the vetting reason.
Um yeah, I mean it's it's extremely
effective. In fact, we've found that in
many cases it's like the most predictive
signal. I think one thing people
underestimate hiring processes is that
humans have this very strong bias
towards thinking that they're right in
this vibes based assessment. Uh and like
hiring is like the original vibe
everything, right? You definitely do not
do not suffer from that. Yeah. And it's
like it's like let's ground everything
in the performance data of who of who's
actually doing well on the job. I I
remember actually like we so we have
this role we're hiring for strategic
project leads and we used to have a
human case study before like the
strategic project lead on-site uh to and
the on-site is like working with us for
a day to see how they would do on
various parts of the job and figure out
who to hire and then we switched over to
fully an AI process before the on-site
and the conversion went up on the
on-site uh and so it's like through
using the like AI interviewer just being
a lot more objective about the
comparisons having it standardized
throughout uh you know everyone who's
applying to the role rather and just
like mixed across like three different
interviewers. It was allowing us to have
a lot better conversion. What about on
the eval side? Are you guys using kind
of a bunch of people that you source for
your own evals? You do a lot of that
internally. Yeah, we use a lot of people
or we work with a lot of people from our
marketplace to create our own evals. And
so it's a similar process what we go
through with our customers. Um, and of
course we still need the researchers
involved with those people and
understanding what are the reasons that
the model's making mistakes, how can we
create our error taxonomy, have our post
training data reflect that air toxonomy
and hill climb on the eval um, but it's
all the same processes and and people
obviously you talked a little bit about
you know using multimodal capabilities
to determine like passion and other
things like what other things are you
thinking about with like you know
incorporating like video and and and uh
you know, other things that futuristic
for the for the platform. Yeah. One
thing that I think about a lot is what
role RL will have in the sort of like
timeline to improve video capabilities.
And that RL is really good at these
search problems and video is just like a
huge amount of tokens. That's why like
models struggle with it. And so it's
like in many ways this search problem of
like how do we look for the signal where
that person was really excited about the
particular thing or uh if they cheated
on the interview or like what other
things were we could find in multimodal
context. And so I think a lot about how
we can effectively you know create the
right data to uh get the model to pay
attention to those um as well as a lot
of what the frontier labs are doing to
um to improve those base capabilities. I
mean obviously it seems like you know
even in the course of a few years the
end labeling market changed so much as
you think like two years from now like
where do you think this is all going and
like do you think this is actually a
part of your business or in in two years
is it like only the expert of the
experts that are uh that are required I
think it's a huge part and and the
reason is like I mentioned in the
beginning we started the business
because of this notion of labor
aggregation and that it feels like the
way labor is allocated in the economy is
wildly inefficient and we could make
that much more efficient but a big part
of that is making a bet on like what
will humans be doing in the economy in 5
years? Please tell us which is which is
a huge question for everyone. Um at
least everything I'm seeing is leading
me to believe that it's far more
structurally efficient for humans to
create evals over the things they don't
yet know how to do or sorry models
aren't able to yet do than it is for
them to like redundantly do that task
all the time. And so I actually think
it's highly probable that a huge chunk
of knowledge work just trends towards
creating evals. And it might not be the
like rigid context that we have right
now of like people working on an
annotation tooling. Uh it might be much
more dynamic and then talking to an
interviewer about how to solve their
problem. Um, but I I think that that is
going to uh be just like a huge part of
the economy and it's one thing that I
think very like very few people are are
aware of yet because so many of them
conflate it with what's happening in the
SFT and RHF market uh where a lot of
those data types just aren't as useful
as they as they previously were and and
budgets for them are coming down. What
do you think will be the most
interesting skills for people to develop
or kind of I don't know if you were to
advise someone um that was in school
you know, what to study or focus on
where would you where would you steer
them? I would definitely optimize for
like a a fast rate of learning because
things are changing so quickly, right?
It's hard to know like there's so many
of these things that people didn't think
the models would be good at for a long
long time that they just got really good
at really fast. Um I would say like work
with AI as much as possible. One thing I
hear from people in our marketplace is
that they love the fact that they just
get to play around with these models all
day. Like they get to think about like
you know, they get to spend hours
thinking about a problem that the
model's not going to work at, not going
to be able to do and like what are the
things the model is like missing out on.
And they say that they build a lot of
valuable skills that help them to know
in their workflow as a McKenzie analyst
where should they be using AI, where
should they not be using AI, etc. And so
I think just spending as much time with
the models as possible um and getting
very familiar with the things that
they're good at or bad at in a
particular domain is is really helpful.
But it's hard to say like be a software
engineer be a software engineer. Um
yeah. Yeah. It's interesting that like
uh you know obviously yeah like your
point that so many many more of us will
be spending time like training these
models and like you know there's almost
an an infinite amount of things.
Obviously there's like hard skills that
have like right or wrong answers but
then there's so many like just
subjective things and like maybe in the
future I don't know we get paid to just
train our own individual models for us
like totally totally yeah I I think
that'll be a big part of it. Um I I
would say one other thing is that people
should focus on domains where demand is
very elastic and so an example is I
think there's demand to build 100 or
thousand times more software in the
economy right and maybe it's not like a
thousand times as many web apps but it's
like more feature iteration on existing
products better ranking algorithms
whatever it is versus other roles where
demand is probably more fixed like we
only need so many accounting accountants
right and like so so much of an
accounting function
And so as much as we can focus on those
things that there will be vastly more
demand for when
we're also a safe bet. Yeah, that's a
great way to put it. I had a a founder I
was talking to the other day and he was
like for all this talk about software
engineering going away. I really could
use a lot more software engineers.
I know it's something I'm I'm really
excited about. Like if they made our
software engineers 10 times more
productive, we'd probably hire more
software engineers, right? So I think
that there's always like interesting uh
curves around demand and how pricing
will will implicate it over time. I mean
obviously I imagine when you started
like you know there's probably
temptation like you could have built
like a recruiter co-pilot or like
partner like built software for staffing
agencies like you've obviously gone
decided to go end to end like was that
obvious from the start like uh how did
that kind of come about? I I think part
of the start was just shaped by I I
think we had a lot of benefit of just
approaching the problem from first
principles because we hadn't seen how it
was done. Like we knew the problem our
friends wanted solved is they wanted to
work with a software engineer and so we
would just like handle everything
associated with getting the software
engineer that will perform well to be
working with them. But in hindsight, I
think that there's just many more
businesses that will trend towards that
because it doesn't make sense to build a
co-pilot for a job that probably won't
exist, at least in nearly the same way
that it does. Uh it probably makes more
sense to have this endto-end process
automated in a way that it's able to
learn from the feedback loops and make
better predictions. Yeah. Though
obviously in your case, I think you
benefited from like this data lab market
is actually perfect for in at a time of
relatively nent capabilities. you can do
it kind of end to end, right? And I'm
sure if that didn't exist, I imagine you
might have had to go co-pilot for some
of these other like complex roles. I I
think this is absolutely right. Right.
Because it's like if you're hiring
full-time employees, then obviously
definitionally people want to have them
on on their payroll. And so I I think
that is one thing that we were fortunate
about is that our operating model and
the way that we'd structured a lot of
the business was very conducive to uh
what the demand and the shift we were
seeing in the market was. Initially, it
sounds like yeah, you were you were
helping find you know, contractors for
for your friends. Like did you think
like I assume at some point you were
like this is a side project and then at
some point it became like the main thing
like when did you kind of I mean at one
point was it like yeah I'm actually
going to build this business for the
next 20 years versus like this is a cool
thing I'm doing at the start of college.
Well well the background is that I was
always building companies in high
school. I had a company that was doing
pretty well so I didn't want to go to
college and I told my parents like no
I'm I'm not going to go to college and
they did not like to hear that. And so
then eventually I appeased them. I
applied to college, uh, went to school
but I told them like, I'm always I'm
going to drop out. And they didn't
really believe me. They figured that it
was a safe bet once I'd agreed to go to
school. And then I I went to school and
every block the term teal fellow on
like, please don't look up. Yeah. Every
semester, you know, I tell them the same
thing. And then eventually I dropped out
without really giving them a heads up or
telling them because I was like, I've
been telling them for the last two
years, right? You gave me a heads up. I
gave them a heads
up. Wrong heads up. Right. Uh, and so I
think for me it was that like I knew
that I I just wanted to build a company.
I was like passionate about building
things that have impact in the world
rather than sitting through classes that
didn't feel very productive. Um, and I
was in many ways just finding the the
right thing to to spend my time on. I
think with my co-founders it was
starting as a side project, you know
wanting to wanting to make sure they had
the evidence to justify their parents
their decision to drop out. Um, and it's
funny their uh parent or sort part of
their condition for dropping out was
that we would raise money and even
though we had this business that was
doing a million dollar revenue run
right? We'd profit at 80K after paying
ourselves, right? It was like making a
lot of progress. That that wasn't
sufficient. The key was that we needed
to raise our seed round. That's what
keeps us VCs in business. Parents
wanting to validate. It's the
credibility stamp. Well, that's a good
segue. You recently raised a lot of
money at $und00 million round. Congrats.
Thank you. Um, what does that kind of
allow you to do now? Or how did you
think about, you know, when was the
right time to go raise more capital? I'm
sure people want to throw money at you
all the time. So, like, how do you think
about cutting off when to cut off the
spigot? Well, it's it's also
interesting. The only time we went to
raise money was really our seed round
where we were like, "Okay, we need to
raise money uh to justify dropping out."
And then our series A and our series B.
Exactly. Our series A and our series B
were both preemptive. And so our thought
process was that we wanted to keep
dilution relatively low at 5%. Um and
sort of build up like a war chest so
that we could invest in the product
capabilities that we were talking about
of like how do we you know have referral
incentives and all sorts of these
creative uh consumer products that can
build up the supply side of our
marketplace as well as investing in more
post- training data to improve our
models performance prediction
capabilities. Um and in many ways one of
the largest blockers on our like ML team
is just creating more evals and more uh
and more RL environments to per improve
our models which happens to be very
conducive to our business. You you have
a kind of customer base of of a lot of
foundation model companies like what do
you think happens to that landscape over
time? I mean some people are like you
know uh it will consolidate to to two or
three maybe we'll see more. You know how
many different players do you think we
end up with and and how do they
ultimately differentiate? It's a very
good question. I definitely am in the
school of thought that OpenAI is and
will continue to be a product company
not an API company. I think that so many
of the API capabilities will get
commoditized and it's really how you
integrate with all the customers context
and that over time where they're able to
generate a lot of pricing power, but I
think that the market is going to be so
large that I could see each of them
leading into a given segment that
they're able to absorb a lot of value.
Like even if one of these labs were just
go all in on building a hedge fund, I
bet they could make a ridiculous amount
of money, right? And so I yeah, I think
it's easy to like pattern match and say
these companies are overvalued, but if
you really approach the problem of like
automating knowledge work and like what
that opportunity is from first
principles, it's like it's hard to
justify that these companies with such
exceptional teams making so much
progress won't be able to build really
incredible businesses. Yeah. Yeah, I
mean obviously today it feels like
there's been so much just like cross
domain generalization that that like it
feels like it's trended toward like a
more of a winner take all or the top
take most versus like hey we'll have one
that's really good in this place and one
that's good in that though I guess your
hedge fun examples is interesting in so
far as like you could obviously there's
a lot more to build around the
scaffolding of the model to to make that
work. Yeah, I mean there are a lot of
value there's a lot of value to focus. I
think that having a general API is
probably not a great business for
multiple companies. Uh and so I think
that there's there's going to be one
player in that uh likely, you know, one
of the one of the top two labs right
now. And then there's uh going to be
just a huge amount of customization that
happens at the application layer for
every vertical and every customer use
case. Yeah. And you think like for a lot
of those like custom models that uh that
require some some sophisticated uh
labeling? Oh, certainly. I mean there is
like so much. I mean, imagine if every
trading firm could have evals over like
the particular parts of their like
trading analysis that were accurate
conclusions versus inaccurate
conclusions that translated to trade
doing well or not. And like you had one
of the top post training teams that was
just focused on like how do we optimize
having the right trading analysis for
sort of mid-frequency faster than our
human traders are able to get to it. I I
think there's a huge amount of
opportunity. uh by talking to you. It
feels like some trading firms optimal
strategy should just be stop trading
spend 9 months just like laser focused
on on uh post- training model. Maybe I I
actually have been sort of surprised
that a lot of the trading firms are less
sophisticated in post- training than one
would have anticipated. I think that
part of it is just the geographic
separation of all of them being in New
York or having a big good chunk of their
core teams in New York versus the labs
being in San Francisco and a lot of the
top researchers wanting to work on AGI
rather than making money. Um, and so
yeah, but I I think that they're going
to invest vast amounts in it. Um, and
there there's just going to be these
sort of like nine figure, 10 figure
partnerships with Frontier Labs to help
customize their specific use case.
What's like the the the biggest unknown
question you have like in AI right now
that you feel like has you're like, God
if I knew the answer to this, it would
make big implications for how I'm like
running the business today. I think it's
what you said earlier of what humans
will be doing in like five or 10 years.
Like that's such a hard question to
answer and I think about that as the
mission of the company in many ways and
we have all sorts of intuitions but the
world is changing very fast. Uh I think
like so many jobs are going to get
automated that getting a better
understanding of of that and and like
how we can help define humans new
opportunities and the role that they
play in the economy is um is one of the
most important things. Yeah. Is there
some is there more stuff that we should
be doing from like a policy perspective
around this or like how do you think
about the role other institutions in in
society should play here? Absolutely.
Like I think that so many regulators
have been very focused on things that
actually aren't as close to like
impacting American lives and that
they're focused on competition with
China. Uh which sure it matters but it's
a lot less close to people's day-to-day.
They're focused on safety risks which
matter but are a lot less close to
people's dayto-day. I think the thing
that everyone's going to start freaking
out about in the next two or three years
is just that there's these models that
are significantly better than them at
their jobs and we need to figure out how
they're going to fit into the economy.
And that's something we know will
happen, right? It's not just this like
low probability, high impact risk. And
so I think that regulators need to be
much more proactive around how we can
plan for that future around how we can
stay like expectation management for the
general public and what the world will
look like in a few years. Yeah. I guess
just hard not knowing what we're
retraining people for. Yeah, it it is
exactly but but I wish that there was a
lot more conversation around that
right? um and a lot more focus on what
that next generation of jobs is going to
look like and what guidance we should be
giving to uh to everyone as they're
going through school and entering the
workforce. Yeah. Well, we always like to
end our interviews with a quickfire
round where we get your, you know, quick
take on some overly broad questions that
we stuff in at the end. Uh and so maybe
to start, uh would love, you know
what's one thing that's overhyped and
one thing that's underhyped in the AI
world today? Oh, good question. I think
that evals are underhyped very
significantly. Even though they're
hyped, I think they're still underhyped
very significantly. I think one of the
last bastions of human capabilities.
Yeah. I think the one thing that's
really overhyped is like SFT chef data
that like bucket of legacy data. There's
companies that are literally spending
billions of dollars on it that don't
need to be spending or need to be
spending an order of magnitude less. Um
and that'll change. What's one thing uh
you've changed your mind on in the AI
world last year? Interesting. I think my
timelines for automating software
engineering have gone up significantly.
Like I used to I used to be a little bit
skeptical of of hearing from researchers
what their timelines are to having a
really good AI software engineer that's
able to write a PR that has a higher hit
rate than a human. And I think now it
seems clear that that's coming later
this year, sometimes in the first half
of next year. Um, and that's going to be
really really cool. Yeah. Do you think I
mean obviously like it seems like with
some of these AI improvements, you know
it's like if you talked about them what
they what they were two years ago, you
would have said, "Oh my god, it's going
to change the world." And then they
happened and it's like, okay, like that
kind of adjusted things, but not like do
you feel like that's this like oh wow
moment where like, you know, there's
just mass change in in employment on the
software engineering side or is it one
of those things that will feel like some
10% change or 20% change? Well, I think
the thing that frames it is the
elasticity that we were talking about of
the role and that I'm less worried about
the like short time horizon of
engineering jobs because I think giving
them tools to make them more productive
will just mean we build more software.
But it will definitely change the nature
of the role in that people that are
product minded, people that understand
how to do the things that models might
not be as good at are have more of a
comparative advantage in the market.
What AI startup are you most excited
about besides Merkore? I'm really
excited about OpenAI's coding
capabilities, even though that's not a
contrarian answer. I also think that
there's going to be an immense amount of
custom agents and so there's there's a
company I'm friends with that's sort of
like in stealth um that I'm super
excited about. All right. Well, you
definitely can't share it on this
podcast. When we stop recording, we'll
we'll harass you for uh for what that
is. Um obviously, you know, like uh uh
you're running a hugely impactful
company. You know, let's say you were
getting started today. uh you know
you're you're uh you were just getting
beginning in building some AI app like
totally different category like what
else would you think would be fun to
build right now or like what else would
you go spend time on? I think that I
would choose a certain knowledge work
vertical probably something in finance
uh that can be automated uh and build
custom agents uh in that vertical to do
so you can build this AI trading for
yeah I would probably try to choose
something that I think is more
positively impactful because I think
that
I think that you know making sure that
we get to the right like valuation by
the morning instead of the afternoon
probably doesn't move the needle in the
world but uh yeah choose something that
I feel is super impactful uh to automate
certain capabilities but uh yeah it's a
cool world yeah well I always want to
leave the last word to you um it's been
a fascinating conversation where can
folks go to learn more about you the
work you're doing at Merkore uh the mic
is yours anywhere you want to point our
listeners yeah absolutely go to our
website mercuror.com we're hiring huge
volumes of people for ourselves that our
our customers or smaller values for
ourselves, huge values for our customers
uh and uh have all sorts of great
opportunities um that we would we would
love to work with people on. Awesome. Uh
well, thanks so much. That was fun.
Yeah, thank you so much. That was a lot
of fun.
[Music]
Loading video analysis...