LongCut logo

Mercor CEO: Evals Will Replace Knowledge Work, AI x Hiring Today & the Future of Data Labeling

By Unsupervised Learning: Redpoint's AI Podcast

Summary

## Key takeaways - **AI excels at text-based talent evaluation**: AI models are nearing superhuman performance in evaluating text-based candidate information like interview transcripts, written assessments, and resumes, yet this capability is largely untapped in the economy. [01:41], [01:56] - **Reasoning models unlock AI hiring**: The recent advancements in large language models, particularly their improved reasoning capabilities, have transformed AI's effectiveness in hiring by enabling better context handling and focus identification. [02:36], [02:50] - **Mercor: Global labor market infrastructure**: Mercor was founded to address the fragmentation in global labor markets, aiming to create a unified platform where any candidate can apply to any job, facilitated by AI matching. [04:24], [04:52] - **AI can automate human hiring tasks**: Mercor automates manual hiring processes like resume review and interviews by using LLMs to score candidates and predict job performance, moving beyond subjective 'vibe checks'. [05:36], [07:05] - **Data labeling market shift**: The data labeling market has shifted from crowdsourcing simple tasks to requiring high-quality experts who can work with researchers to create complex data that challenges advanced AI models. [14:23], [15:01] - **Future of knowledge work: Evals**: A significant portion of future knowledge work may shift towards creating evaluations and identifying proxies for skills that AI cannot yet master, rather than performing repetitive tasks. [27:29], [27:40]

Topics Covered

  • AI is already superhuman at evaluating text-based talent assessments.
  • AI Will Revolutionize Hiring Assessments, Not Sales
  • The Data Labeling Market Shift: From Crowd-Sourcing to High-Quality Talent
  • AI's Job Impact: The Real Threat Regulators Should Focus On
  • Evals: The Underhyped Bastion of Human Capability in AI

Full Transcript

Humans have this very strong bias

towards thinking that they're right in

this vibes based assessment. Hiring is

like the original vibe everything

definitely do not suffer from that.

August of 2023, one of our customers

introed us to the co-founders of XAI and

then 2 days later they had us into the

Tesla office. We were still in college

right? Like this is insane. What's the

state of AI evaluating talent? What will

humans be doing in the economy in 5

years? Please tell us which is a huge

question for everyone. At least

everything I'm seeing is leading me to

believe that Brendan Foody is the

co-founder and CEO of Meror, a company

building the infrastructure for AI

native labor markets. Meror's platform

is already being used to label data

screen talent, predict performance, and

evaluate both human and AI candidates.

It's a really interesting company at the

intersection of recruiting, eval, and

core to improving foundation models.

Brandon's team recently raised $100

million, and they're working with some

of the most sophisticated companies in

AI today. Our conversation today hit a

lot of interesting things, including

what role humans will play in labor in

the future. We talked about the types of

data labeling that really matter to

improve models going forward. Brennan

reflected on Mor's rapid ascent and some

of the key decisions he made. And we

also hit on where AI does and doesn't

work in the hiring process today. All in

all, a really interesting conversation.

I think you'll really enjoy it. Now

here's Brandon Foody.

Well, thanks so much for uh for coming

on the podcast. Really appreciate it.

Yeah, thank you so much for having me

on. I'm big fan. So excited to chat.

Yeah. Excited to to have you here. I

figured we'd start at like the the

highest level place, which is for our

listeners, I'd love if you just

contextualize like where are we today?

What's the state of AI evaluating

talent? Like what works, what doesn't?

Uh what's going on? I'm amazed at how

good it is. Like I think that everything

that a human is able to evaluate over

text, the models are close to superhuman

at. whether it be the transcripts of

someone's interview, the assessments

that they're filling out in a written

way, or even the signals on their

resume. And it's a fascinating dichotomy

because so little of that has actually

been distributed in the economy, right?

And so there's just this like huge green

field associated with doing that and

it's one of the things we're really

excited about working on and building

out. Yeah. Were there were there things

that didn't work pre-reasoning models

that like you know maybe talk about the

last 6 months like as these models have

gotten better what's like finally

started to to work for you guys? Yeah, I

I remember back in end of March of 2020

or 2023 when GP4 came out and we were

like build our first prototype of AI

interviewer and nothing worked, right?

It was like the model would hallucinate

every two or three questions and all of

that. And so it's just been riding this

incredible tailwind over time. And I

think the reasoning models are just

obviously the knowledge in the models

improved a lot in sort of the first year

and then the reasoning models have made

them much better at particularly

handling a lot of context, figuring out

what matters, what to focus on, etc.

It's been really cool. Still there's

multimodal things that the models aren't

as good at just cuz it historically

hasn't been as much of a focus of the

labs and it's a lot harder to do RL

with. Um but we're excited about that

being added soon. Yeah. What are the

milestones that like Matt you're like I

can't wait till the model can do X or Y

or Yeah. There there's a handful of

things like there are certain things

that the humans are very good at like

this like vibe check of whether I would

enjoy working this this person whether

this person is passionate and like they

seem really genuine about what they're

saying. uh that the model it's really

hard, right? It's hard for the best

humans, let alone models. Uh and so I'm

really excited about that uh and

building evals out for a good chunk of

it. But when whenever I read through

like the reasoning chains of the models

uh I'm like in trying to decipher things

that are eval, I'm always thinking like

wow, the the model seems a lot more

reasonable than like our than whatever

uh researcher on our team was creating

the eval, right? And so it's u it's

really incredible how fast they've

improved. Um, and I think everyone

obviously is seeing everything working

in in code, but uh, but we're just in

the early innings of of a lot of other

domains uh, that are taking off in an

incredible way. And obviously, it seems

like a big part of what you're doing is

basically, you know, coming up with

evals for humans and how good they'll be

at jobs. You know, obviously we have all

these people creating like AI employees

now. It's like, hey, agents are going to

do this or you'll have an AI agent

doing, you know, this set of tasks that

an employee would do. Do you guys play

into this at all? Absolutely. So, I

mean, we do a huge chunk of this. Maybe

giving a little bit of the backstory of

the company. The reason we started is

that we felt like there were incredibly

talented people all around the world

that weren't getting opportunities. And

the primary reason is that labor markets

are very fragmented and that a candidate

somewhere else in the world maybe it's

remotely in the US or another country

was only applying to a handful of jobs.

The company in San Francisco is

considering a fraction of percent of

people because there's this like

matching problem that they're solving

manually. And through applying LMS, we

could solve this matching problem so

that we could build this global unified

labor market that every candidate

applies to and every company hires from.

But then we realized that there was this

huge takeoff in hiring people associated

with these like new knowledge work roles

in evaluating LLMs. Um and so now we

hire you know all sorts of the experts

for the top AI labs that um use our

technology to help facilitate that um

both for uh you know creating evals to

to evaluate our experts as well as to

evaluate the models and all of these

agents that you're discussing. Maybe for

our listeners too on the the merkor side

um you guys obviously have a bunch of

uses of AI and screening candidates

going through resumes. um can you talk

through some of the different use cases

that you have for AI and then what the

stack looks like um that you guys are

building on today? Yeah, I think a good

huristic is just thinking about all the

things that humans would do manually

creating evals over those and seeing how

we can automate them. So similar to how

a human would review a resume, conduct

an interview and then rank people or

decide who should be hired. We automate

all of those processes with LMS. And so

we have evals for how accurately are we

parsing the resume, how accurately are

we, you know, scoring different parts of

the resume, how accurately are we asking

questions in an interview, evaluating

that interview, and then passing that

all into model context along with the

references or every other kind of data

that we have on a candidate to make the

unprediction around how well they'll

perform. Is it mostly off-the-shelf

models and you're kind of curating the

evaluation and context around them?

Yeah, there's a lot of off-the-shelf

models for more basic things, but

particularly for the like hardest

problem of making the end evaluation of

a candidate is where the post training

comes in and learning from all the data

we get from our customers of who's doing

well, for what reasons, how can we learn

from those signals to make better

predictions around who we should be

hiring in the future. Have you learned

anything about anything surprising about

those signals of something that the AI

found where you thought, you know, maybe

this isn't how I would have thought

about it or how humans would have

thought about it? Yeah, there's there's

all sorts of things. I think that one of

the key benefits of AI is that it's able

to just go way more in depth about like

everything about a candidate and it's

able to pick up on all the small details

that humans sometimes miss or like uh

you know the the vibe check sort of

skips over because people already have

their mind made up on a candidate. And

so there's all sorts of like little

resume signals if um people have

demonstrated extreme interest in a

particular area where they're just doing

it for fun. uh as you would anticipate

all the way to like different signals of

whether someone studied abroad in a

country that uh is where they're doing

the end job. They might they might

communicate better and and be more

conducive in a work environment. Um and

and so there's there's lots of those

little things that come up and are are

very specific to projects and customers.

Are there certain things that you see

kind of will always be done by people?

You were talking about the multimodal

stuff, but I guess how do you see AI and

you know human interviewers working

together versus a world where it just

kind of goes all AI assessment? Like at

a simplistic level, the hiring process

involves assessing candidates and

selling candidates. And the assessment

part I think is going to soon get so

good from LMS that it'll sort of be uh

like foolish to think we know better

right? like will people just like take

the recommendation because it'll like

have proof that it is performing so much

better on the eval on the end outcome

that customers care about where humans I

think will still continue to play a

really large role in the selling process

of like this person that we're going to

be working with and spending time with.

Um and I think about it as enabling

human recruiters and hiring managers to

spend all of their time on the

candidates they want to hire rather than

all these interviews of people that they

don't end up wanting to hire. And so

really, yeah, unlocking them to, you

know, help people better understand the

role, better understand the people that

they're going to be working with and all

the things that they should be excited

about. Yeah, I love that. Will, uh, will

people start gaming the assessment?

Like, is that something that you've seen

at all? I guess the LMS are picking up

on certain things if you put in this

side of they studied abroad in the right

place. They all studied abroad in the

you know, in the place where they're

recruiting for that. Yeah. Yeah. It's

why sometimes you have to be a little

bit secretive about the signals, right?

But yeah, I mean we we have so many

things where we we deal this with this

as every large hiring process does. And

so I think the key is ensuring that

assessments are relatively dynamic.

Either like the problem that they're

working on is changed frequently or that

you're asking them super in-depth

questions about a particular part of

their background because there's so much

in the way of talent assessment that

becomes possible when the models are

able to do immense preparation for an

interview, right? Like when I'm like

doing a first interview of an executive

candidate, like maybe sometimes I'll

have references on them, but most of the

time I look at their LinkedIn profile

for a couple of minutes. I have like

some preliminary notes, but imagine if I

could go like listen to a podcast that

they were on, right? Go read through

like blog posts that they've written

all of the papers that they might have

done during their PhD and ask about

those things, right? You can get way

more in-depth and nuanced in a way

that's very hard to game. Obviously, you

have these models that are pretty good

at predicting how well these candidates

will do. 27 doesn't matter that that's

explainable or like these models just

have like, you know, in a black box like

yeah, this person's going to be good and

this person's not. Yeah, I think it does

matter that it's explainable uh for sort

of two reasons. First is uh for

customers to like understand and trust

those those claims, right? Like building

trust and uh all all the reasoning

chains. And then the second is obviously

making sure that the models are

selecting people for the right reasons

the reasons that they they should be

considering. And so it it's beneficial

but I I think like the end state of the

economy is probably just that like it'll

be, you know, some sort of API or

interaction where people want work done

or they they need some level of human

involvement uh and just a a confidence

interval on how that well that person

will perform on the job and there's far

less of the uh intermediation that

humans play in the process. Yeah, it's

like an interim trust milestone on the

way on the way there. No, it makes uh

makes a ton of sense. And then

obviously, you know, today in in kind of

the first uh or or one of the areas we

have a lot of fit on the data labeling

side, there's kind of these clean

feedback loops of like, you know, I

imagine you could even score like how

accurately and you probably have

multiple people looking at the same

pieces of data. Talk about some of the

challenges maybe in translating this to

like maybe more vague domains of uh of

human work. Totally. I I mean like

venture capital. Yeah. What wait 15

years and uh and then you get your

feedback loop. Yeah. One way I think

about it is like if you have a hundred

people that are all doing the same job

it's very easy to stack rank them versus

if you have a hundred people doing a

very different job, uh, right? Like

founders, right? Like they're all

working on something that's nuanced in

one way or another. It's very difficult

to like pattern match like what is the

thing that they said or the thing that

we learned that actually translated to

the outcome because there's just like so

many confounding variables in in the

equation. And so I think that it's going

to be like relatively easy for the like

larger pipelines of roles like if you're

hiring 20 account executives, right?

Stackranking all of them, learning from

those signals. Um and then the models

are starting to be able to learn from uh

these, you know, much more complicated

things where everyone's working on

something else. uh like we're doing a

ranking of a bunch of the the teal

fellows and and that's that's like a fun

case, but it it definitely is more

challenging uh and relies more

underlying reasoning capabilities of the

models. Maybe just talk through like

what are some of the challenges that

emerge in doing that? Yeah. Well, it

it's basically that oftentimes there's a

lot of things that aren't in model

context and so models struggle to learn

from that and people like forget to add

it to model context. So maybe it's like

I heard my like friend said this good

thing about using this company's

product, right? And like or these things

um that you know might not be making

their way in. Making sure that all like

references are added um all the like

interpersonal stuff that humans might

pick up on. Um so we we found that

actually often times just making sure

the requisite data is in model context

is like uh majority of the problem.

Yeah. I guess in the future maybe we're

just recording every conversation with

you know our smart glasses and easy

enough to feed into the model.

Bridgewwater had it right all along.

Exactly. Exactly. Is that where we're

headed? Is it just like it'll be uh you

know Bridgewater at scale? Uh we'll see.

Um I mean I think of course a lot of

companies will be adverse to to that and

I think there will be regulatory reasons

and illegal reasons people don't want to

do that. Um but I I also think there's

just going to be better processes for

how models help get this information in

context, right? Maybe it's AI doing an

exit interview of the manager and the

people on the teams to help better

understand what was going on because all

the people have so many details in their

head uh around this that we we just need

to get into the models for them to be

able to make these superhuman

predictions. Yeah, there are certainly

more and more both founders and you know

all kinds of people that are bringing AI

to their meetings and so I think a lot

of you know those meetings and

interactions will be recorded for AI to

learn from. Totally. I think that'll be

interesting. We need you to take our

transcripts and stack rank us against

each other.

Only if I come up on that. Uh what do

you think of the data labeling landscape

today? Like how do you see the different

players kind of differentiating from

each other? It seemed like scale was

really in a position to run away, but

then now there's been a bunch of kind of

new players in the landscape like how do

you how do you think about that world?

Yeah, I I think like the key thing that

most people don't understand in the like

data annotation evaluation landscape is

just the shift in the market and how

dramatically different it is from what

it used to be 2 years ago. Cuz when

Chhat GPT came out, it was the models

like weren't that good. It was easy to

trip them up. They were making mistakes

left and right. like even like a a high

school student uh or or like college

undergrad could do a lot of completions

or evals to help improve the models in

this crowd sourcing fashion where they

run these huge pipelines to get hundreds

of thousands of pieces of SFT or RLHF.

SFT being inputs outputs, RHF being

choosing between a bunch of different

like preference options like you would

see in chatbt. But as the models got

really good, that crowd sourcing model

started to break because you needed

really high-quality people that would

work directly with researchers to help

them understand why is this model doing

well, why is it not doing well, how can

we, you know, create this really

complicated data that helps to trip up

the model and actually reflects like the

real world things that we want to

automate. And so our platform of finding

exceptional people that you would want

to work with was perfectly positioned

for that and that we can hire these

really high-quality people super

quickly. Um, and that caused us to take

off and and have all of the the traction

working with the big labs. And I think

that trend will continue and that like

the companies that are stuck in these

like super high volume crowdsourcing

pipelines um are probably are certainly

going to see a lot of churn and it's

going to be the new players that

understand the direction the market is

headed um and lean into really

highquality talent underpinning it that

are going to continue taking a lot of

market share. Do you think there'll

always be demand for I guess humans in

the data labeling process? there's you

know obviously more and more that can be

done with these models or big model gets

really good at one task and then can

train smaller model like how do you see

that evolving over time yeah the way I

think about it is that so long as there

are things in the economy that humans

can do that models can't do we will need

to create evals or environments so that

models can learn how to do those things

so I think there's certain domains where

that's just going to get solved sooner

than others right like within math or

even many parts of code like you don't

need that much data. It's super

verifiable. The models will solve those

problems. But then there's other domains

that are like much much more open-ended.

What makes a good founder when we're

assessing them, right? Uh or or you

know, most honestly like a large chunk

of knowledge work domains, maybe a

majority of them are these like

open-ended problems that are really

difficult to verify and understand what

good looks like. And you just need to

get all of that understanding that the

models don't have into the models. Um

and that's why I expect like orders of

magnitude increase in the human data and

evaluations market over time. If I

understand correctly, you guys, you

know, clearly I think one of the initial

arbitrageages you and and what inspired

the company is you have these great

coders that are all around the world and

like, you know, they're not getting

access to some of these jobs and

obviously that ended up being really

important for coding data. Um, you know

obviously you've expanded into other

areas as well. Like what if you know

coding again it's the perfect like RL

use case. It's probably also really

perfect for evaluation. like what have

you had to change or improve as you've

like gone into some of these fuzzier

domains uh and and recruiting people in

those areas? Yeah, I think that there

leaning on a lot of the heristics of

what a human would do manually is

probably a good way to do it. So, for

example, if you want to automate being a

consultant, how would you assess

consultants that can help to do that?

Giving them a case study. Uh maybe

that's specific to their background.

Maybe it's like maybe a silly question

but like you guys are all probably great

coders and so I imagine you know how to

evaluate coders. like if you're starting

to get a doctor on the platform like

like how do you even know how to what

the heristics are for humans? Well, I

think the point you're getting at is

really interesting which is that as you

start to get into domains beyond like

the machine learning team's

capabilities. They need to have these

experts. We need to have doctors that

are helping us create our like doctor

assessments and our evals for what makes

a good doctor as as well as um you know

a bunch of other domains. And similarly

it's what the researchers need to do

with all of their technology, right?

Like when we were all going on limbs, it

was super easy to uh you know look at

like the high school level physics and

say like what problem was right or or

which one was slightly better. But when

it's like PhD level chemistry and the

researcher doesn't have a PhD in

chemistry, it's really hard to

understand what's going on to interpret

these evals to figure out how we can

improve them. Um, and so I think that

that's the other big shift to your

question earlier around evaluations is

is that both for assessing our talent as

well as the re the way the researchers

uh assess models. It's just going to be

this much more collaborative process and

working with people to help trip up the

model and improve capabilities. I I've

heard you talk before about how this

kind of like short-term data label

contract work is like it is kind of the

perfect initial market for what you've

done and there's a massive amount of

demand and it's kind of this wedge to

like eventually doing just kind of like

endtoend labor markets. I'd love to just

hear you riff a little bit on like

what's the sequencing of the company

look like from here uh you know toward

that vision. Yeah. Well, I wrote our

like secret master plan that goes over

this a little bit. But the way I think

about it is we have like the reason that

marketplaces are generally hard to build

is that they're very network effect

intensive. And so the thing that makes

them defensible also makes them hard to

build. And so it's important right now

that we're very focused on like drilling

this wedge of huge amounts of demand

that we have to expand the network

effects, grow the marketplace and and

focus on that right now. Um, but then

we're also starting to see a lot of

demand for hiring high volumes of

contractors from our existing customers

at big tech companies where they might

need, you know, hundreds of data

scientists or or software engineers or

whatever the role is for a particular

domain outside of human data. Um, which

is really the exact same kind of

request. Um, it's just a little bit more

of a legacy market where you'd be going

up against like the Accenturers or

Deoites of the world historically. Um

and so leaning into that as like the

second main focus. uh and then expanding

to all sorts of full-time hiring. Uh but

but one of the key things is that like

the lifetime of the business, we've been

doing all of these. Uh like even the

first year of the business had nothing

to do with human data. It was just like

hiring contractors for our friends and

for ourselves, many of which became

full-time employees. Um and so it's it's

much more continuous and there's a lot

of things that unify them and that we

know that all companies want more

candidates. They want to be able to hire

them more quickly and they want

confidence that they'll perform well.

And so if we just measure those things

and improve them over time, that'll

position us for every stage of the

business. Yeah. Was there a moment that

it was like obvious to you to lean into

the human data side? Like it was just

like so abundantly clear this is where

to Yeah. I remember it was while I was

still in college. So uh I mean the

background of the business is I met my

co-founders when we were 14 in high

school. We were all the speech and

debate team together. They were like

winning all the tournaments that I I

wasn't as good as them but I was

building companies and then we started

hiring people internationally at the

IATS in India. Like we partnered with it

Krogpor's code club and we were amazed

that there were these smart people as

you're mentioning that weren't getting

jobs and we felt like we could hire them

to build projects. Our friends wanted to

pay us to hire them. We could take a

small fee. So we hustled a lot

bootstrapped that to a million dollar

revenue run rate. We profited 80k after

paying ourselves before dropping out

which I was like very proud of. But the

parents still weren't satisfied with

that of course until we had raised

money. Uh but to your question, in

August of 2023, one of our customers

introed us to the co-founders of XAI

while they were still working out of the

Tesla office and he said these uh Merkor

has these really smart engineers in

India that are phenomenal like math and

coding. Uh and then like the XAI

co-founder the next day uh got or one of

them or two of them got on a call with

me and and our team and were just like

really excited and then two days later

they had us into the Tesla office to

meet with the entire XI co-founding team

except for Elon. It was right right

before one of their meetings with with

Elon. We were still in college, right?

Like this is insane. Uh and we were just

like wow why like why do they want what

we've built so badly? And it's because

there was this change that was

happening. so fast in the market that no

one else had realized yet, right? Um and

now of course we've like scaled that up

and and are talking about it because we

have uh you know critical mass of of the

market share but um but that was that

was the point and then we but they

weren't ready for human data yet and so

it wasn't until call like 6 months later

that we started working with a lot of

the frontier labs and and really scaling

up the business. You could see the see

the title wave coming. Yeah. Yeah. I I

think like one thing I've realized over

time in founders looking for product

market fit is that people try to force

things too much sometimes. It's like you

need to just look for the signs of the

market where it's like wow there's like

gold to be found and just like drill

after that. Um and cuz like if it's hard

to get like an initial sale then it's

going to be hard to scale up the

process. you need to rather like look at

like what are the like the really strong

pain points where the wealthiest

companies will pay whatever it takes

right and just like sniff those out and

then lean into them. I guess if you've

expanded beyond coding like maybe to go

back to the doctor example cuz I'm

struck by when you were describing it

one uh you know in some senses like

evaling what a good doctor is is like

actually what you're what you're

eventually going to bring these people

to the model companies like they're

going to figure out like you know is

this the reasoning process that a that a

good doctor would use. What are you

actually doing uh when you're working

with someone to do eval? Yeah, I think

that one of the key things that humans

are better a lot better at right now is

like learning over time from the

instructions, from the training, from

all the feedback. And so, excuse me

looking for these proxies that people

have demonstrated of like they're, you

know, asking the right questions uh

about the problem. They're going about

thinking about it in the right way. They

have signals in their background that

indicate they've been in these high

performing environments where people are

obviously uh learning significantly over

time. Um and all of those translating to

them finding ways to trip up the model

and and improve capabilities. Do you

guys use your own product today and like

how does it get used in your own hiring

process? Absolutely. We use it for every

role except our executive roles. So I

mean we still have the listing for

executive roles but most of our

executives like I I would take the first

interview rather than sending them

straight to the AI interview for the

selling reason, not the vetting reason.

Um yeah, I mean it's it's extremely

effective. In fact, we've found that in

many cases it's like the most predictive

signal. I think one thing people

underestimate hiring processes is that

humans have this very strong bias

towards thinking that they're right in

this vibes based assessment. Uh and like

hiring is like the original vibe

everything, right? You definitely do not

do not suffer from that. Yeah. And it's

like it's like let's ground everything

in the performance data of who of who's

actually doing well on the job. I I

remember actually like we so we have

this role we're hiring for strategic

project leads and we used to have a

human case study before like the

strategic project lead on-site uh to and

the on-site is like working with us for

a day to see how they would do on

various parts of the job and figure out

who to hire and then we switched over to

fully an AI process before the on-site

and the conversion went up on the

on-site uh and so it's like through

using the like AI interviewer just being

a lot more objective about the

comparisons having it standardized

throughout uh you know everyone who's

applying to the role rather and just

like mixed across like three different

interviewers. It was allowing us to have

a lot better conversion. What about on

the eval side? Are you guys using kind

of a bunch of people that you source for

your own evals? You do a lot of that

internally. Yeah, we use a lot of people

or we work with a lot of people from our

marketplace to create our own evals. And

so it's a similar process what we go

through with our customers. Um, and of

course we still need the researchers

involved with those people and

understanding what are the reasons that

the model's making mistakes, how can we

create our error taxonomy, have our post

training data reflect that air toxonomy

and hill climb on the eval um, but it's

all the same processes and and people

obviously you talked a little bit about

you know using multimodal capabilities

to determine like passion and other

things like what other things are you

thinking about with like you know

incorporating like video and and and uh

you know, other things that futuristic

for the for the platform. Yeah. One

thing that I think about a lot is what

role RL will have in the sort of like

timeline to improve video capabilities.

And that RL is really good at these

search problems and video is just like a

huge amount of tokens. That's why like

models struggle with it. And so it's

like in many ways this search problem of

like how do we look for the signal where

that person was really excited about the

particular thing or uh if they cheated

on the interview or like what other

things were we could find in multimodal

context. And so I think a lot about how

we can effectively you know create the

right data to uh get the model to pay

attention to those um as well as a lot

of what the frontier labs are doing to

um to improve those base capabilities. I

mean obviously it seems like you know

even in the course of a few years the

end labeling market changed so much as

you think like two years from now like

where do you think this is all going and

like do you think this is actually a

part of your business or in in two years

is it like only the expert of the

experts that are uh that are required I

think it's a huge part and and the

reason is like I mentioned in the

beginning we started the business

because of this notion of labor

aggregation and that it feels like the

way labor is allocated in the economy is

wildly inefficient and we could make

that much more efficient but a big part

of that is making a bet on like what

will humans be doing in the economy in 5

years? Please tell us which is which is

a huge question for everyone. Um at

least everything I'm seeing is leading

me to believe that it's far more

structurally efficient for humans to

create evals over the things they don't

yet know how to do or sorry models

aren't able to yet do than it is for

them to like redundantly do that task

all the time. And so I actually think

it's highly probable that a huge chunk

of knowledge work just trends towards

creating evals. And it might not be the

like rigid context that we have right

now of like people working on an

annotation tooling. Uh it might be much

more dynamic and then talking to an

interviewer about how to solve their

problem. Um, but I I think that that is

going to uh be just like a huge part of

the economy and it's one thing that I

think very like very few people are are

aware of yet because so many of them

conflate it with what's happening in the

SFT and RHF market uh where a lot of

those data types just aren't as useful

as they as they previously were and and

budgets for them are coming down. What

do you think will be the most

interesting skills for people to develop

or kind of I don't know if you were to

advise someone um that was in school

you know, what to study or focus on

where would you where would you steer

them? I would definitely optimize for

like a a fast rate of learning because

things are changing so quickly, right?

It's hard to know like there's so many

of these things that people didn't think

the models would be good at for a long

long time that they just got really good

at really fast. Um I would say like work

with AI as much as possible. One thing I

hear from people in our marketplace is

that they love the fact that they just

get to play around with these models all

day. Like they get to think about like

you know, they get to spend hours

thinking about a problem that the

model's not going to work at, not going

to be able to do and like what are the

things the model is like missing out on.

And they say that they build a lot of

valuable skills that help them to know

in their workflow as a McKenzie analyst

where should they be using AI, where

should they not be using AI, etc. And so

I think just spending as much time with

the models as possible um and getting

very familiar with the things that

they're good at or bad at in a

particular domain is is really helpful.

But it's hard to say like be a software

engineer be a software engineer. Um

yeah. Yeah. It's interesting that like

uh you know obviously yeah like your

point that so many many more of us will

be spending time like training these

models and like you know there's almost

an an infinite amount of things.

Obviously there's like hard skills that

have like right or wrong answers but

then there's so many like just

subjective things and like maybe in the

future I don't know we get paid to just

train our own individual models for us

like totally totally yeah I I think

that'll be a big part of it. Um I I

would say one other thing is that people

should focus on domains where demand is

very elastic and so an example is I

think there's demand to build 100 or

thousand times more software in the

economy right and maybe it's not like a

thousand times as many web apps but it's

like more feature iteration on existing

products better ranking algorithms

whatever it is versus other roles where

demand is probably more fixed like we

only need so many accounting accountants

right and like so so much of an

accounting function

And so as much as we can focus on those

things that there will be vastly more

demand for when

we're also a safe bet. Yeah, that's a

great way to put it. I had a a founder I

was talking to the other day and he was

like for all this talk about software

engineering going away. I really could

use a lot more software engineers.

I know it's something I'm I'm really

excited about. Like if they made our

software engineers 10 times more

productive, we'd probably hire more

software engineers, right? So I think

that there's always like interesting uh

curves around demand and how pricing

will will implicate it over time. I mean

obviously I imagine when you started

like you know there's probably

temptation like you could have built

like a recruiter co-pilot or like

partner like built software for staffing

agencies like you've obviously gone

decided to go end to end like was that

obvious from the start like uh how did

that kind of come about? I I think part

of the start was just shaped by I I

think we had a lot of benefit of just

approaching the problem from first

principles because we hadn't seen how it

was done. Like we knew the problem our

friends wanted solved is they wanted to

work with a software engineer and so we

would just like handle everything

associated with getting the software

engineer that will perform well to be

working with them. But in hindsight, I

think that there's just many more

businesses that will trend towards that

because it doesn't make sense to build a

co-pilot for a job that probably won't

exist, at least in nearly the same way

that it does. Uh it probably makes more

sense to have this endto-end process

automated in a way that it's able to

learn from the feedback loops and make

better predictions. Yeah. Though

obviously in your case, I think you

benefited from like this data lab market

is actually perfect for in at a time of

relatively nent capabilities. you can do

it kind of end to end, right? And I'm

sure if that didn't exist, I imagine you

might have had to go co-pilot for some

of these other like complex roles. I I

think this is absolutely right. Right.

Because it's like if you're hiring

full-time employees, then obviously

definitionally people want to have them

on on their payroll. And so I I think

that is one thing that we were fortunate

about is that our operating model and

the way that we'd structured a lot of

the business was very conducive to uh

what the demand and the shift we were

seeing in the market was. Initially, it

sounds like yeah, you were you were

helping find you know, contractors for

for your friends. Like did you think

like I assume at some point you were

like this is a side project and then at

some point it became like the main thing

like when did you kind of I mean at one

point was it like yeah I'm actually

going to build this business for the

next 20 years versus like this is a cool

thing I'm doing at the start of college.

Well well the background is that I was

always building companies in high

school. I had a company that was doing

pretty well so I didn't want to go to

college and I told my parents like no

I'm I'm not going to go to college and

they did not like to hear that. And so

then eventually I appeased them. I

applied to college, uh, went to school

but I told them like, I'm always I'm

going to drop out. And they didn't

really believe me. They figured that it

was a safe bet once I'd agreed to go to

school. And then I I went to school and

every block the term teal fellow on

like, please don't look up. Yeah. Every

semester, you know, I tell them the same

thing. And then eventually I dropped out

without really giving them a heads up or

telling them because I was like, I've

been telling them for the last two

years, right? You gave me a heads up. I

gave them a heads

up. Wrong heads up. Right. Uh, and so I

think for me it was that like I knew

that I I just wanted to build a company.

I was like passionate about building

things that have impact in the world

rather than sitting through classes that

didn't feel very productive. Um, and I

was in many ways just finding the the

right thing to to spend my time on. I

think with my co-founders it was

starting as a side project, you know

wanting to wanting to make sure they had

the evidence to justify their parents

their decision to drop out. Um, and it's

funny their uh parent or sort part of

their condition for dropping out was

that we would raise money and even

though we had this business that was

doing a million dollar revenue run

right? We'd profit at 80K after paying

ourselves, right? It was like making a

lot of progress. That that wasn't

sufficient. The key was that we needed

to raise our seed round. That's what

keeps us VCs in business. Parents

wanting to validate. It's the

credibility stamp. Well, that's a good

segue. You recently raised a lot of

money at $und00 million round. Congrats.

Thank you. Um, what does that kind of

allow you to do now? Or how did you

think about, you know, when was the

right time to go raise more capital? I'm

sure people want to throw money at you

all the time. So, like, how do you think

about cutting off when to cut off the

spigot? Well, it's it's also

interesting. The only time we went to

raise money was really our seed round

where we were like, "Okay, we need to

raise money uh to justify dropping out."

And then our series A and our series B.

Exactly. Our series A and our series B

were both preemptive. And so our thought

process was that we wanted to keep

dilution relatively low at 5%. Um and

sort of build up like a war chest so

that we could invest in the product

capabilities that we were talking about

of like how do we you know have referral

incentives and all sorts of these

creative uh consumer products that can

build up the supply side of our

marketplace as well as investing in more

post- training data to improve our

models performance prediction

capabilities. Um and in many ways one of

the largest blockers on our like ML team

is just creating more evals and more uh

and more RL environments to per improve

our models which happens to be very

conducive to our business. You you have

a kind of customer base of of a lot of

foundation model companies like what do

you think happens to that landscape over

time? I mean some people are like you

know uh it will consolidate to to two or

three maybe we'll see more. You know how

many different players do you think we

end up with and and how do they

ultimately differentiate? It's a very

good question. I definitely am in the

school of thought that OpenAI is and

will continue to be a product company

not an API company. I think that so many

of the API capabilities will get

commoditized and it's really how you

integrate with all the customers context

and that over time where they're able to

generate a lot of pricing power, but I

think that the market is going to be so

large that I could see each of them

leading into a given segment that

they're able to absorb a lot of value.

Like even if one of these labs were just

go all in on building a hedge fund, I

bet they could make a ridiculous amount

of money, right? And so I yeah, I think

it's easy to like pattern match and say

these companies are overvalued, but if

you really approach the problem of like

automating knowledge work and like what

that opportunity is from first

principles, it's like it's hard to

justify that these companies with such

exceptional teams making so much

progress won't be able to build really

incredible businesses. Yeah. Yeah, I

mean obviously today it feels like

there's been so much just like cross

domain generalization that that like it

feels like it's trended toward like a

more of a winner take all or the top

take most versus like hey we'll have one

that's really good in this place and one

that's good in that though I guess your

hedge fun examples is interesting in so

far as like you could obviously there's

a lot more to build around the

scaffolding of the model to to make that

work. Yeah, I mean there are a lot of

value there's a lot of value to focus. I

think that having a general API is

probably not a great business for

multiple companies. Uh and so I think

that there's there's going to be one

player in that uh likely, you know, one

of the one of the top two labs right

now. And then there's uh going to be

just a huge amount of customization that

happens at the application layer for

every vertical and every customer use

case. Yeah. And you think like for a lot

of those like custom models that uh that

require some some sophisticated uh

labeling? Oh, certainly. I mean there is

like so much. I mean, imagine if every

trading firm could have evals over like

the particular parts of their like

trading analysis that were accurate

conclusions versus inaccurate

conclusions that translated to trade

doing well or not. And like you had one

of the top post training teams that was

just focused on like how do we optimize

having the right trading analysis for

sort of mid-frequency faster than our

human traders are able to get to it. I I

think there's a huge amount of

opportunity. uh by talking to you. It

feels like some trading firms optimal

strategy should just be stop trading

spend 9 months just like laser focused

on on uh post- training model. Maybe I I

actually have been sort of surprised

that a lot of the trading firms are less

sophisticated in post- training than one

would have anticipated. I think that

part of it is just the geographic

separation of all of them being in New

York or having a big good chunk of their

core teams in New York versus the labs

being in San Francisco and a lot of the

top researchers wanting to work on AGI

rather than making money. Um, and so

yeah, but I I think that they're going

to invest vast amounts in it. Um, and

there there's just going to be these

sort of like nine figure, 10 figure

partnerships with Frontier Labs to help

customize their specific use case.

What's like the the the biggest unknown

question you have like in AI right now

that you feel like has you're like, God

if I knew the answer to this, it would

make big implications for how I'm like

running the business today. I think it's

what you said earlier of what humans

will be doing in like five or 10 years.

Like that's such a hard question to

answer and I think about that as the

mission of the company in many ways and

we have all sorts of intuitions but the

world is changing very fast. Uh I think

like so many jobs are going to get

automated that getting a better

understanding of of that and and like

how we can help define humans new

opportunities and the role that they

play in the economy is um is one of the

most important things. Yeah. Is there

some is there more stuff that we should

be doing from like a policy perspective

around this or like how do you think

about the role other institutions in in

society should play here? Absolutely.

Like I think that so many regulators

have been very focused on things that

actually aren't as close to like

impacting American lives and that

they're focused on competition with

China. Uh which sure it matters but it's

a lot less close to people's day-to-day.

They're focused on safety risks which

matter but are a lot less close to

people's dayto-day. I think the thing

that everyone's going to start freaking

out about in the next two or three years

is just that there's these models that

are significantly better than them at

their jobs and we need to figure out how

they're going to fit into the economy.

And that's something we know will

happen, right? It's not just this like

low probability, high impact risk. And

so I think that regulators need to be

much more proactive around how we can

plan for that future around how we can

stay like expectation management for the

general public and what the world will

look like in a few years. Yeah. I guess

just hard not knowing what we're

retraining people for. Yeah, it it is

exactly but but I wish that there was a

lot more conversation around that

right? um and a lot more focus on what

that next generation of jobs is going to

look like and what guidance we should be

giving to uh to everyone as they're

going through school and entering the

workforce. Yeah. Well, we always like to

end our interviews with a quickfire

round where we get your, you know, quick

take on some overly broad questions that

we stuff in at the end. Uh and so maybe

to start, uh would love, you know

what's one thing that's overhyped and

one thing that's underhyped in the AI

world today? Oh, good question. I think

that evals are underhyped very

significantly. Even though they're

hyped, I think they're still underhyped

very significantly. I think one of the

last bastions of human capabilities.

Yeah. I think the one thing that's

really overhyped is like SFT chef data

that like bucket of legacy data. There's

companies that are literally spending

billions of dollars on it that don't

need to be spending or need to be

spending an order of magnitude less. Um

and that'll change. What's one thing uh

you've changed your mind on in the AI

world last year? Interesting. I think my

timelines for automating software

engineering have gone up significantly.

Like I used to I used to be a little bit

skeptical of of hearing from researchers

what their timelines are to having a

really good AI software engineer that's

able to write a PR that has a higher hit

rate than a human. And I think now it

seems clear that that's coming later

this year, sometimes in the first half

of next year. Um, and that's going to be

really really cool. Yeah. Do you think I

mean obviously like it seems like with

some of these AI improvements, you know

it's like if you talked about them what

they what they were two years ago, you

would have said, "Oh my god, it's going

to change the world." And then they

happened and it's like, okay, like that

kind of adjusted things, but not like do

you feel like that's this like oh wow

moment where like, you know, there's

just mass change in in employment on the

software engineering side or is it one

of those things that will feel like some

10% change or 20% change? Well, I think

the thing that frames it is the

elasticity that we were talking about of

the role and that I'm less worried about

the like short time horizon of

engineering jobs because I think giving

them tools to make them more productive

will just mean we build more software.

But it will definitely change the nature

of the role in that people that are

product minded, people that understand

how to do the things that models might

not be as good at are have more of a

comparative advantage in the market.

What AI startup are you most excited

about besides Merkore? I'm really

excited about OpenAI's coding

capabilities, even though that's not a

contrarian answer. I also think that

there's going to be an immense amount of

custom agents and so there's there's a

company I'm friends with that's sort of

like in stealth um that I'm super

excited about. All right. Well, you

definitely can't share it on this

podcast. When we stop recording, we'll

we'll harass you for uh for what that

is. Um obviously, you know, like uh uh

you're running a hugely impactful

company. You know, let's say you were

getting started today. uh you know

you're you're uh you were just getting

beginning in building some AI app like

totally different category like what

else would you think would be fun to

build right now or like what else would

you go spend time on? I think that I

would choose a certain knowledge work

vertical probably something in finance

uh that can be automated uh and build

custom agents uh in that vertical to do

so you can build this AI trading for

yeah I would probably try to choose

something that I think is more

positively impactful because I think

that

I think that you know making sure that

we get to the right like valuation by

the morning instead of the afternoon

probably doesn't move the needle in the

world but uh yeah choose something that

I feel is super impactful uh to automate

certain capabilities but uh yeah it's a

cool world yeah well I always want to

leave the last word to you um it's been

a fascinating conversation where can

folks go to learn more about you the

work you're doing at Merkore uh the mic

is yours anywhere you want to point our

listeners yeah absolutely go to our

website mercuror.com we're hiring huge

volumes of people for ourselves that our

our customers or smaller values for

ourselves, huge values for our customers

uh and uh have all sorts of great

opportunities um that we would we would

love to work with people on. Awesome. Uh

well, thanks so much. That was fun.

Yeah, thank you so much. That was a lot

of fun.

[Music]

Loading...

Loading video analysis...