Personalized AI Language Education — with Andrew Hsu, Speak

By Latent Space

Summary

## Key takeaways - **AI enables a '3rd generation' of language learning**: Speak's vision is to create AI-native language tutors that leverage advancements in speech and language models, moving beyond the 'Gen 2' mobile apps like Duolingo to focus on functional fluency and adaptive instruction. [10:59], [11:15] - **South Korea: A strategic proving ground**: Speak initially focused on the South Korean market due to its high demand for English fluency and competitive education landscape, validating their AI-native model against human-based solutions before global expansion. [16:30], [17:37] - **LLMs and Whisper accelerated Speak's evolution**: The advent of models like Whisper and GPT in 2022 transformed Speak from a practice tool into a full-featured tutor, enabling real-time feedback, semantic understanding, and conversational memory. [23:31], [25:59] - **AI-generated content scales curriculum development**: To support multiple languages and expand content, Speak is investing in AI agents and pipelines to generate curriculum and lesson material, aiming for 100x more content with less manual effort. [29:08], [30:32] - **Quantifying fluency with knowledge graphs**: Speak is developing a knowledge graph to track what a learner knows and can do in a language, aiming to create a holistic 'Speak score' that measures real-world proficiency. [30:39], [31:58] - **Real-world fluency over textbook learning**: Speak prioritizes teaching casual, conversational language that real people use, rather than traditional textbook phrases, focusing on functional fluency for practical situations. [35:45], [36:00]

Topics Covered

Our founding vision stayed the same for years.
Why real-time translation won't replace language learning.
We chose the hardest market to prove our model.
AI creates a safe space to make mistakes.
The real world has enormous inertia against AI.

Full Transcript

[Music]

Hey everyone, welcome to the latest in

space podcast. This is Allesio, partner

and CTO at Descelible and I'm joined by

my co-host Wix, founder of Small AI.

Hello. Hello. We're back in the studio

with Andrew Su speak. Welcome.

Thank you for having me.

I have to start this off. I didn't prep

you on this at all, but you were a teal

fellow in 2011.

First class. First class.

Is that is that the one with um SBF?

No, he was I think several years later

actually.

Yeah. Yeah.

What was it like? Just talk about the

That's a good question. Haven't been

asked that one in a while. It was a

really crazy idea at the time and very

controversial. And I think the first few

years of the fellowship were definitely

let's just find 20 people under 20 and

give them $100,000 to drop out of

college. And it could be it was no holds

barred. You could do anything. You could

be doing some crazy research idea, a

startup, anything. Um and uh I actually

met my current co-founder at speak. He

was in the second year of the fellowship

and made many like very close friends

from the first uh few years. But I mean

for me it was life-changing. I I had a

very unusual path where I was actually I

did finish college unfortunately. I was

in grad school at the time because I

went to school really early.

Yeah. I was like aren't you too old? You

know feel like I'm young. I was

I was 19 at the time and in grad school.

It was a very accelerated path but I

think like I knew at the time that I was

going to leave grad school and do

startups anyway and the timing lined up

really well.

Yeah. Yeah.

Yeah.

Vitalik I think

he was also in a later year.

Ah

damn. Okay. Anyway,

but the first two years had I mean there

are some crazy successes you know Dylan

from Figma. Uh I mean yeah like a lot of

people.

Awesome. Well you know feel free to

bring in those stories as and when

because obviously only only you know

like those kinds of people. You are now

CTO co-founder speak. Uh I would say

from a very early stage like one of the

most successful and prominent OpenAI

partners that like anyone would know is

like doing doing well and like teaching

English to Koreans is like your your

your rough um remitt at the time. How

did that all come about? It's funny that

you say that because despite our current

sort of revenue scale and objectively I

think how successful we are, we've

always operated in a market at least

initially on the other side of the world

and been much much more popular in the

sort of eastern world and a bunch of

Asian markets and relatively unknown in

the west. So it hasn't really felt like

we've had that sort of awareness until

you know the past few years really. But

brief story is that my co-founder and I

back in 2016 were fascinated by the

promise of AI and we spent a year

soatical basically learning everything

we could. We talked to Carpathy back

then actually when he was like just

finishing grad school and did a lot of

sort of self-study research and we were

just so convinced I think fundamentally

that speech models were going like this,

language models were going like this and

in the 5 to 10 year span they would

become superhuman and we were utterly

convinced of this future and we saw that

the way people learn things and

specifically learn languages which was a

very sort of humanbased thing if you

really care about fluency. That would

completely change and we'd be able to

build language shooters that were pure

software, pure AI. So that was kind of

the genesis story of speak. It took much

much longer than we expected to build a

great product and find good PMF. The

first few years were very painful and I

think without this really compelling

vision of the future, we would have

quit. We actually like never pivoted.

Last year we brought the entire company

to Taipei. We do this company trip every

year and we played our original YC

application video on screen and it was

really funny because the things we were

saying in that video were the exact same

things that I still say today about the

long-term vision and what we're building

towards. Um so that was really cool to

see. Can you summarize the long-term

vision again? It was that as speech

models and language models become

superhuman, that would let us create an

AI language tutor that would help you

become fluent faster than any human

could. And I think we're like 80 to 90%

of the tech is here now.

And you have this big focus on like

speaking. Obviously, it's in the name of

the company. That's right.

Uh and I think the speech models were

maybe a little delayed compared to the

text models. Did you ever think about

okay maybe speech is just not going to

work for this use case or like what were

kind of like the valleys of you know

discomfort and then what were maybe some

of the pivotal releases and models that

you were like okay it's going to work it

might take a little longer but it's

going to work so we've always done

custom speech stuff the first act of the

company if you will was before LMS right

before 2022 when whisper came out when

chat came out in the years before that

you know like roughly 2 to 3 years is

when we feel like we found PMF in South

Korea and then started growing still

only in that market, still only teaching

English and we developed custom speech

recognition models and users were

speaking into the app all day. So we had

a ton of this non-native English speaker

data and we would use that to fine-tune

models, understand our users better. We

still do that today and it's important

for us for the core recording loop in

many of our lessons that it's extremely

fast. So we're very latency sensitive.

There's many other sort of product

surfaces within the app today that are

more LM powered where it's more

open-ended real tutoring where we

actually give you feedback on what you

said in the semantics and so on. So that

stuff is more like whisper powered more

LM powered but we've always had like a

very fast core ASR loop that's been

fully custom. I just onboarded to the

app earlier today.

Yeah. Unlike other apps there's kind of

like this uh tutor conversation that you

do for onboarding. I'm guessing that is

mostly LLM based and then you're kind of

judging the person respond. So I

selected Spanish and the conversation

was in Spanish via text to start and

then from there started to create

lessons for me.

Yeah.

Was that all unlocked from LLM where now

you can kind of have these conversations

and then bring people into the speech

flow?

Yeah. So before that, so we call that

magic on boarding and it was a new thing

we built that was more conversational.

We wanted it to feel more like you were

talking with a tutor and they were sort

of learning things about you and we

would use that later to personalize the

experience. Before that we had a like a

much more traditional app on boarding.

There's still a lot of open questions

interesting questions around what is the

proper onboarding UX because a lot of

people start using speak and they're not

in a situation where they can actually

speak aloud. So we have like you know

fallback outlets and so on but it's

something we're like super actively

experimenting with.

Is there a structured output behind

that? you know anything that you found

implementing magic onboarding I think

people always want to improve onboarding

what's the uplift or was there one we

still don't know yet the interesting

thing is that in general because it's

speaking based which is a much higher

barrier than just like tapping a

multiple choice button what we see is

that install to sign up rate

is a decent amount lower but trial start

rate is higher it's still something that

we have an active experiment that um

that's running And we're trying to be

super agile about testing many different

sort of like formats of this. I don't

think I have like the final answer yet,

but I think the intent the really like

vision that we're going for here is that

as soon as you download the app from the

app store, maybe you see it in an ad,

the first thing that the first

interaction when you have like a fresh

open of the app should feel pretty

futuristic. should feel like okay this

is like the new AI native nextG way of

learning a language to fluency and

that's kind of been always our ambition

like we wanted to build something that

wasn't possible before without LM and

like AI technology

yeah I I think um I wanted to go back on

the onboarding soon but there's a

general idea of like when you replace a

form with voice bot that you need to

have some kind of state machine behind

the hook the thing to drive like what

else don't don't I know about you let me

proactively ask that and I'm just

wondering if you had any insights there

or is it literally just a state machine

we tried both actually right now I think

probably what you saw is a state machine

but I think that

trust the AGI

yeah right

I think that things should move in a

direction where it's much more of a

natural conversation there is a general

sense of a goal in the prompt that you

can specify and part of the hard thing

here is all the guardrails Right?

And trying to be like antagonistic to

the system, then things start like

really going off the rails. So for a

bunch of these experiences, we're pretty

careful about the fallbacks and we have

a lot of eval around that. But I think

where it should end up is just feeling

like you have a quick 3 to 5 minute

conversation with your tutor and then it

knows a lot about you and then you

create your account, etc. and you create

memories like

yeah so we we store what you're saying

we summarize in the experience the way

it works is the tutor will ask you some

sort of question like what are your

goals around learning English or the

language and then we will basically use

a separate LM prompt to summarize so

it's not the like full transcript for

what you said that you see it's more of

like an abstracted okay here's what you

care about and we think that's a better

product experience

what were some of the other key tenants

on the Obviously, language learning is

like one of those consumer markets where

like dozens of companies always trying

to get started and you get these old

companies like you know Babel and you

got Dualingo. Um so speaking the the act

of speaking was like a big part of it. I

think this memory stuff is great. I

think if you tried some of the other

apps is like they always try to re ask

you the same things that you get wrong

before but you're not really learning.

Is there anything else that is maybe uh

not as obvious from the outside in the

design of the app and the product that

is like you think is really different? I

would say from a macro level, this is

actually a pretty new product category,

AI powered language learning. And all

these apps that you mentioned, Dualingo,

Babel, etc., they're more of like the

Gen 2 of language learning. So like if

you think about Gen One was Rosetta

Stone if you, you know, if you remember,

right, CDROMs in airports and then Gen 2

was basically mobile. So you have these

very casual, massively popular mobile

apps like Dualingo that I think the comp

there is probably closer to a mobile

game, something that feels productive,

something that's very engaging, very

gamified. And

Dualingo is very leaning into it, the

gamification,

and they've done an amazing job of that

to be clear.

Yeah, they might be the world's best

people at it.

Yeah. Um, and our view is that LLMs and

AI now enable Gen 3 of language

learning, which is something that is

very AI native, very focused on

functional fluency, which is why we do

all these role plays and let you

practice Spanish by talking to your Uber

driver. We don't teach vocabulary and

grammar. We teach sentence patterns and

we try to get you to just repeat and

drill and drill and drill almost like

you're in a gym until it's automatic

because that's what speaking is, right?

Like has to be spontaneous and

automatic. In terms of the other aspects

of the design though, we went through

many many iterations over the first few

years of starting the company. This is

kind of what I was mentioning about it

was like really painful in the first

four or five years. And in fact, the

current version of the speak app is not

the first thing that we launched. We had

something that we call internally the

red app which was like a red app icon

still a similar logo and it was more

around packs of content instead of

courses where you could sort of choose

any topic that you wanted to learn. It

was for many different languages for

learning. It was essentially like not a

very directed experience and it didn't

really work. It was free. It was a very

basic thing, but we in 2018 tore

everything down and realized that we had

to really fully change what we were

doing. And that's when we decided to

focus on South Korea, specifically on

teaching English. We built a bunch of

new lesson types and we created our

courses so that the experience was much

more on Rails. We realized people don't

want to choose. They're already using

some of their motivation on a daily

basis just to open the app. they don't

want to make another choice after that,

right? Just tell me what to do, right?

Like, you know, give me a big button and

then I can tap it and just start a video

lesson or whatever. We also pretty

critically, I think, abandoned the free

version and just went straight premium

and we kind of sidestepped the

motivation question that way because we

knew that there were a ton of users that

really wanted to learn English and were

already really motivated. So, we wanted

to basically filter for these users. Um

so there you know like it was I wouldn't

say there was one silver bullet. It was

kind of the combination of many

learnings over three or four years. And

then that started really growing in

South Korea. And from there, I guess

like phase 2 was really 2022 when LLM

came out and Whisper came out. And that

allowed us to go from this more

supplemental speaking practice tool to

more fullfeatured language tutoring

where we could use LMS's like 3.5 Turbo

back then to give you direct feedback on

your wording and on you know like that

was kind of a weird thing to say. A

native speaker would say it this way or

use a different word or whatever. I

always do a poor job of doing this, but

uh can we get some headline numbers like

just to get a sense of scale because I

think maybe some audiences don't know

where you at now in terms of your reach.

So, we're now the biggest English app in

South Korea.

Yeah, we do billboards, big celebrity

campaigns, that sort of scale. Like

we're, you know, very popular there. I

think like 6% of the Korean population

has tried us. Um we're, you know, like

well on the way in a bunch of other

Asian markets like Japan, Taiwan. So the

Asian markets are currently our main

stay. We also teach English in 40 more

countries. We're coming to the US as

well, launching, I mean we we have

Spanish, French live and several more

languages are coming this year. That's a

huge focus of the company right now. Um

in terms of revenue scale, well over 50

million AR. It's a pretty simple

business model. It's like mostly

consumer. Yeah,

the B2B stuff is super super exciting

and that's also growing really fast and

I think it'll be a really meaningful

part of the business.

When did you start B2B?

About a year ago. It would it kind of it

was like very much a side bet/experiment

at first and then it just started

working and

of course it's going to work.

Yeah. And now it's like okay,

you know, this is part of the future,

right? This is a real thing. Yeah. So

that's exciting.

What's the B2B race between learning

language and like real time AI

translation at Google IO? like one of

those like Google Beam things for like

uh you know for conferencing that do

realtime translation like

Yeah. Yeah. Yeah. So people always ask

this, right? They're they're always

like, "What happens when the babbleish

comes, right? When the real time

translation comes."

And babblefish is the hitchhiker's

guide,

right? Yes. Exactly. The counter example

that I always have that I think is quite

illustrative is in German, the verb is

at the end of the sentence, right? So if

you're trying to do real-time

translation from German to English as an

example, you can't actually make any

progress on the English until you hear

the whole German sentence and you know

what the verb is at the end, right? So

like the minimum latency there is is the

full sentence. And that that's like an

example of the technical blocker for

like why it'll never be truly truly

perfect. But also I think besides that,

if you talk to all of our users in Asia,

they don't want a translator. the reason

that they are trying to learn English is

to make themselves a better person to

connect with other people like they want

to be able to look you in the eye and

speak English speak the same language as

you right so it's actually like a very

different thing I think what will end up

happening is that we will build a

real-time translation feature into speak

and have it integrated into the learning

experience

and also like there's always that human

side right like I'm dating a Romanian

woman yeah uh his wife is trying to

learn Italian like there's always at

yeah going to keep happening. I want to

double click on Korea.

I think it's like a very insightful

smart decision.

Maybe people only know Korea through

K-pop.

But actually I think a lot of Americans

learn Korean because of K-pop. That's a

that's a side thing. But like you could

have done Taiwan. You could have done

China. I saw I remember sawing a

documentary about how China was crazy

about English or mad about English. I

think that was the title of the

documentary. Was it obvious? Were you

sure when you went into Korea or was it

just a test? We visited a bunch of Asian

countries when we were thinking about

how do we relaunch things? How do we

focus in? And we almost chose Taiwan

actually. Um, but I think it was a

little bit serendipitous. So, our first

employee is Korean and was my

co-founder's college roommate. Actually,

when my co-founder visited Seoul to

check out the market, he asked SJ to

come along as essentially a translator

and to like, you know, facilitate. And I

think that just went really well. And it

was just very obvious from being on the

ground in the market that Korea is

pretty obsessed with learning English.

And there is every human-based solution

possible, right? You know, like

Englishmies, classes, skyscrapers full

of classrooms, stuff like that. And our

logic was basically if we can really

make headway and win this market that is

chalk full of these human competitor

products and all these people that

fundamentally care about fluency, then

we probably have something pretty real

and strong PMF that we could win other

markets with. So that was the original

logic and you know so far it's been

working. It's retroactively obvious

which kind of obvious but like it's so

counterintuitive that you would be the

team to do this and not a Korean team

right where they would be they would

know because they had personal

experience of like I started in Korean I

learned English here's how you do it

yeah in hindsight super weird right like

we we were definitely you know sitting

in an office here in San Francisco

operating with users in a market all the

way on the other side of the world it

would not have worked without Sunjay I

have to give him a lot of credit here

because we paid a lot of attention to

the specific wording of button text in

the app and local, you know, like like

localized strings. We had a lot of

reports from users pretty early on that

they were shocked that it was an

American company like they thought do

right cuz you can always tell there's

always some weird wording or whatever,

but there wasn't in speak and I think

that probably had like a large sort of

non-tangible effect.

Yeah. Focus, attention to detail. tech

stack you this was 18 what were you

rolling you you just did ASR and there

was no LLM so bird maybe I don't know

we actually had really no LM component

of it so all of the content oh yeah

another thing we did that I forgot to

mention was we decided we needed to

fully own all the content so the way

that we teach all in-house all sort of

thought from first principles we built

this thing called the speak method which

is basically like uh pedagogical

philosophy around teaching sentence

patterns that you drill and then sort of

combine into higher order patterns and

all of that was inhouse with you know

our content team and our teachers um and

we built a lot of internal tooling to

make this possible there's just a lot of

operational overhead I would say this is

something we've struggled with to scale

to many more languages and that's like a

big research effort within the company

right now. We're building a mobile

product, right? My co-founder and I have

always just loved apps and been big

iPhone users. So, we cared a lot about

the app being native, feeling great,

being high performance. The DNA of the

company was always consumer. Frankly, my

co-founder and I had never worked in a

real company. I dropped out of grad

school, had a few failed startups, and

then eventually started speak. And he

had never worked in real company either.

compete uh to startups in the past. So

we didn't know anything about enterprise

enterprise workflows or like what sort

of software real companies used. Um so I

think frankly consumer was the only

path. Uh I don't think we could have

done anything else. We just didn't know

enough. Um, and I think that has served

us well though in terms of just

really caring about the craft of it and

wanting to build something that felt not

90 to 95% but 95 to 100% in terms of

polish.

Was it hard to build an engineering team

that did that at the time? Because ML

engineering is very academia driven back

then and then you have like the more

consumer stuff that it's maybe more nent

and it's mobile. I'm now realizing that

our story is very weird.

So, you only realized

in addition to the market on the other

side of the world, our first iOS

engineer that we hired through a YC

referral was in Slovenia. If you don't

know where Slovenia is, look it up on

Google Maps, but it's, you know, it's

like a pretty obscure little country.

Yeah. And then we needed to hire a

backend engineer, and one of his best

friends was a great backend engineer,

and we hired him. And then this happened

four more times.

all in the same city. And then we were

like, "Okay, we should probably just

open a physical office." So for Yes. So

for several years we had an engineering

office in Slovenia.

What?

And then a few people here in San

Francisco and we still do. Now we have

90% of our core product development team

in San Francisco here. Office and FI

were really only hiring here. But for

the first like several years that you

know that was like another very

interesting sort of cultural aspect of

the company I guess

I think a lot of early stage founders

have to do that that's the only people

they can afford or whatever.

Um what are your tips that make that

remote stage work?

For us it wasn't really a price thing. I

think legitimately thought he was the

best person that we interviewed and then

it just kind of happened

that way when you roll it out.

Yeah. Yeah. It's not a price. It's more

about remote work, right? Like

distributed team early stage. Like a lot

of people say like no, you have to you

have to move everyone SF or your startup

would die.

Yeah.

I don't think that we were good at

remote work. I don't think that my

personality or my co-founder's

personality is inherently very good at

async. Just to be perfectly frank, I

actually think that almost like in spite

of it, we made it work. It was a little

bit brute force. Like I would just sync

with them every single day, right? And

there was pain because the time zone

overlaps like it was like exactly the

most inconvenient.

Y

but I think for several years we did

that. We got really good at the cadence

of it. I think they were excellent

engineers as well. So

it worked out. But if I had to do it

over again, I probably wouldn't do it.

It's hard to say. Yeah.

Shall we move to phase two on the LM

side? Um that's when OpenI started

opening up. And when did they invest?

This was 2022. So

that was also when whisper dropped and

whisper was a really exciting moment for

us. It was actually since we started the

company and made that prediction of okay

in 5 or 10 years speech models, language

models will become superhuman level.

Whisper was really that magic moment for

us where we were like oh I think

what we predicted is here. And I pretty

distinctly remember this moment in the

office when we got access to the model

and we were testing it on an audio clip

of like a very beginner English learner

in Korea saying something and it was if

you closed your eyes as a human you'd

have no idea what they were saying.

There were four of us in the room. We

all closed our eyes and none of us had

any idea and the model got it right. So

I mean superhuman. I think that was the

moment that we had been waiting on. And

at the same time, LMS were on the

ascendancy. Chachi BT would come out, I

think on Thanksgiving of 2022, and 3.5

Turbo came out. And I think like we kind

of realized very quickly that all the

pieces were clicking now, right?

We have what we need at our fingertips.

Now to go from something that was listen

and repeat where the user would see

something on screen, hear a reference of

the teacher saying the thing and then

they would just repeat the thing, right?

It was like very simple. Still a great

product by the way, you know, still grew

to like several million er in South

Korea. Um, so clearly there was like a

big market need for that.

Pre-Whisper.

Yes.

Wow.

This is from like 2019 through 2022.

Yeah, that's that's the grind. You need

you needed to hang in there.

Yeah.

And again, I think that if we there were

many moments when things weren't working

from 2017 through 2019. We were looking

in the mirror and we were like, why are

we doing this? This is this is crazy.

But I think we were so convinced about

the vision. We just like couldn't

believe that the vision would not

come true. So, we stuck with it. So,

fast forward to 2022, the pieces started

coming together. We realized that we

could start building something that felt

more like a language tutor that could

give you feedback that could start

explaining to you why you did something

wrong. And that was act two of speak

true English tutor.

This is something that a lot of founders

struggle with today. It's like I'm kind

of building something hoping that the

models get better later.

Yeah.

How did you feel once the models got

better? Did you feel like okay I am

ahead of the curve because I built all

this history of building product and

like doing all this work or did you

almost feel like okay we spent all this

money and time building these models and

now we're just going to use wher? It was

purely positive for us. We still kept

using our custom ASR system because it

was streaming real time really fast

really well fine-tuned. Whisper wasn't

streaming so it was a different use

case. We used it for the more

spontaneous stuff and I think in almost

every way we were just really excited

because pretty directly as the frontier

of model intelligence improved it would

just unlock things on our road map that

were locked before that makes sense. And

we still really operate in that mode

today where we take a model and then we

try to think about okay how do we

saturate model capability by building

product on top of it and then it happens

again right and then we build and

saturate the model capability again I

think that's a really cool paradigm to

like you know think about but all the LM

stuff basically allowed us to build a

tutor for English and we still didn't

have like real-time voice for example

right but the barriers are coming down

now obviously it's a really hot topic

We're actively building out a real-time

voice platform that we can build a lot

of more verticalized specific lesson

experiences on top of that. I'm super

super excited about. I don't think

they're going to replace our current

lessons. They're going to be more

immersive, just a different thing

probably for more advanced learners.

Still language learning though, not

broadening out for language.

Yeah. So, I think that language learning

is interesting because it is so

universal. 99% of people, you know, have

certainly tried to learn a language and

it's so hard, right? Becoming fluent

just has a huge failure rate and it's

something people are willing to pay for.

So, I think that has been just like a

pretty amazing beach head for us and I

think we'll be doing language learning

for a long time. There's a huge huge

huge company to be built here. But our

even longerterm ambition is really this

idea that even beyond language, we think

AI will reinvent how people learn

anything, right? It already has for me,

right? I use chat GPT to learn things

every 10 minutes. And I think I'm just

naturally like a very curious person. So

whenever I'm thinking about something, I

want to know more about it. And then

I'll naturally go to TAGPT and then I'll

learn about it. It's unlocked this like

entirely new dimension of learning and

I'm spending way more time learning as

well as an adult, which is really cool.

And I want to bring that in a more sort

of structured systematic way to

everyone. So I think that that's like

the vision beyond language.

I'm curious to sort of double click on

to just the tech side.

We talked a little bit about the the

content that you that you own and

develop in house and we talked a little

bit about the onboarding memory. I

assume that you have conversational

memory as you as you go right as and any

other major pieces of the puzzle that

really unlocked it for you. So there's a

few things I can talk about. I think one

thing is in order to go from teaching

English to teaching a bunch more

languages, we needed to really figure

out more direct AI content generation.

That was a pretty right because it's

hard to scale like our little studio in

LA where we shoot a lot of the video

lessons. Um, all of the scripts were

written manually before by our content

team, but we want like 100x more

content, right? and 10x more languages

eventually 100x more language pairs

which is how we think about it. It's

like what's your native language and

then what language are you learning and

really the only way to do that is to

make it more AI generated and you know

very much like a AI native company we

want to be on a frontier here we want to

keep a small team and to have as much

leverage as possible through these types

of tools so that's a big active area

where we're building out I think using

you know people overuse the word agent

but we have a tutor agent we have a

curriculum writing agent we of a giant

LM based pipeline that creates

curriculum, scaffolds it in the right

way, writes the lessons themselves.

That's a big active area that will

basically help us to scale to a lot more

markets and a lot more languages. So

that's like one big thing. Another big

thing is we care a lot about fluency.

Obviously, specifically, we want to be

able to quantify how fluent you are. So

if you're learning Spanish, it's like,

okay, what what does it mean to be

fluent? Right? And

there's a real world test for that.

We care about real world fluency. Your

ability to go to Mexico City and go to a

street taco stand and actually order,

right? That's very functional fluency in

one aspect. You might be really good at

that, but be completely unable to like

talk about your family, right? So the

frontier of fluency is very jagged, but

we're very pragmatic and we care a lot

about meeting user goals and helping

them become fluent at what they care

about. And we're thinking a lot about,

okay, how do you quantify that? How do

you actually store a knowledge graph of

everything you know about Spanish in

terms of the vocabulary you know or you

don't know? this you know the patterns

that you know or you don't know the

mistakes you made using speak over the

last month that are clustered.

You said the magic word of knowledge

graphs uh is that live is is that

experimental?

There are aspects of it that are live

and it's it's a very sort of

multi-dimensional system where we think

of it as there are many aspects of

fluency, right? There's many subc scores

and we have a few of them that are

currently live and we're actively

developing other aspects of it and then

all of those will fold up into a more

holistic fluency score. The idea is that

eventually once we have a complete

enough picture, everything will fold up

into a number that we call the speak

score that is a very sort of holistic

measure of just like how good are you at

Spanish, right? And obviously 54 is kind

of meaningless by itself, but it does

give you a general sense, right? Like

being at 54 versus being at five is very

different, right? And I think everyone

can kind of like intuitively understand

that.

And surprising like I would have

grounded it more in real world like we

will get you to pass this exam that is a

standard that is like the ESL standard

or whatever. So the way that we think

about that is we don't really teach for

the test. I think it's possible in the

future that we'll do a test prep

product, but in general, we care about

real world proficiency in various

functional situations. So the way that

we think about it is if you're at this

level, then these are the things you can

do, right? So it is exactly that. We

have that a lot in Italy. I grew up in

Italy, so English is my second language.

And there's a lot of people that pass a

lot of tests and like get high grades in

all the classes and then they travel to

the US and the UK and it's like hard to

speak because they don't you know I feel

like the the hard part is like being in

the conversation you know it's I think

like when I started my Brit and reading

was like much higher than like

conversation which like doesn't really

help you if you're like traveling

somewhere.

That's me for Chinese because my my

parents spoke mano to me growing up

so I can understand like a non-trivial

amount but I I'm very bad at speaking. M

I heard there's a good language learning

product.

I have one question on the course

generation.

Yeah.

How do you eval that product? Like when

you're asking the AI to generate

courses, how do you figure out the

courses are going to be good?

Rely very heavily on our content team

and we are trying to build out an email

suite. It's really hard.

Right. There's the illustrative example

here is that as we try to hire and train

new content writers on our content team,

it's so nuanced. There's many different

aspects of training them in the speak

method and how to write the right types

of lessons and articulating

why this form of lesson which is subtly

different from this other form of lesson

is better. Right? So we try as hard as

we can to articulate that. So I think

like forming like a sense of eval using

model graded evals like that. That's one

piece of it. And I also think like in

the future a really good curriculum or

lesson writer agent will probably be

like reinforcement fine-tuned on a lot

of our internal data as well.

That's something we're experimenting

with but it's still pretty early. This

seems like a great example of like, you

know, AI removing jobs, which is like,

oh, you're creating the courses with AI,

you don't have a person, but it's

actually like instead of one person

creating two courses, like reviewing 50

courses that AI generates. That's kind

of how you're seeing the content team.

The way that we see it really, not just

for for our content team members, but

also I think it's perfectly applicable

to engineering, is that it's it's

leverage. It just allows you to do 100x

in the same amount of time. We still

need human review of the syllabus, the

curriculum, the specific lines, etc. But

the hope is that this will allow us to

launch 100x more courses. A lot of um

language is colloquial. I think the the

way that you put it on one of our

episodes one time was the Italian that

is taught in school is not the Italian

speak.

Yeah.

How much of that do you adjust for

informal versus formal

entirely? That's one of our fundamental

tenets which is that we don't teach

textbook English or textbook language.

Like we try very hard to

teach Jenz slang.

We don't go quite that far but we we try

to teach very

casual conversational language that is

actually what real people use. And like

you said that's usually very very

different. Like if you pick up like a

typical English textbook in Korea, it's

all really traditional and weird

formulations and it's not how people

actually speak.

Yeah,

I know you're going to release Italian

soon, so I can give you a hand on that.

I know in the US there's not that many

dialects. There's like accents, but like

most of the language like the words that

people use are similar because I know

for example, Spanish is like, you know,

Spanish spoken in Argentina is like very

different than Spanish spoken in Mexico.

How do you kind of adjust for that? Or

maybe you don't. But

so I would say that for example,

currently we teach American English,

standard American English. We don't

really teach other accents or other

dialects. For now, given how small we

are, we just have to be pragmatic and

teach in the direction that most people

want and most of our users know. So

we've made those decisions like on the

content team side for American, Spanish,

every language that we're teaching. But

I do expect that in the future we're

going to get a lot more sharply

differentiated. Like if you want to

learn British English, then we'll teach

you British English. We'll teach you how

to pronounce it, etc. I think all of

that feels like something that

superhuman language, you know, tutor

should be able to do.

I just think it'd be very funny if all

the Koreans had like a very distinct

southern accent.

It'd be it'd be great. Make that happen.

Yeah,

I do think about this because um you

know obviously there's a moving of the

goalpost like now that we have this now

we want the next thing

and obviously people who are English as

a second language always have an accent

like I haven't like a lot of people

think I don't have an accent but if you

know any Singaporeans you know I'm I'm

Singaporean how much accent training is

important right like I think like

actually that does help a lot

with for people and you cannot tokenize

accents yet

yes that's right so I have two main

thoughts on this. I think the first one

is that communication and your ability

to speak spontaneously and get a concept

across an idea across is almost fully

orthogonal to pronunciation. You can be

really bad at pronunciation but still

communicate effectively. So, a lot of

the current core product experience is

about just speak as much as possible,

make mistakes, don't worry about

screwing something up on the accent or

the pronunciation side. The important

thing is that you literally move your

mouth and you make the sounds, right?

And it turns out there's like a really

key psychological barrier there where

people are just not willing to do this

in front of a human, even if it's a

human that is a teacher that you're

paying, right? So a lot of the core

message of our marketing campaigns in

many of our like biggest markets is

along the lines of like you can make

mistakes in this private space with

speak and I think psychologically that's

extremely powerful and then you can go

and get it right more confidently in the

real world after you practice with

speak. Now having said that people do

care about their pronunciation and their

accent, right? So we we have for English

only right now a pronunciation coach

that is basically like a fine-tuned

version of wave to which is a meta model

but we basically fine-tune it on a bunch

of our own phonetic transcripts like

fine-tuned data. It works pretty well.

It's currently for single words. We're

gonna expand it to full sentences, to

more languages, etc. But I think that

just if you look at like the pure market

opportunity, our sense is that we really

want to push people to just speak very

freely as much as possible, you know,

just get that volume up.

Yeah. Yeah. In terms of immersing

language learning in the real world, one

of the more interesting approaches that

people keep trying is to have, let's

say, like a Chrome extension or

something on top of a page. I think

Toucan was doing this.

There's a bunch of those. Yeah.

Yeah. And then there was another one I

saw recently which is like watch a

YouTube video and it'll transcribe for

you but randomly mask out.

I saw that too. Yeah. Yeah.

That was like a show hacker news.

Yeah.

Do those work? there's kind of the

question of is you know is that the

right product right I don't think

so basically the difference is your

content or real world content right

obviously you want real world content

I think that for work right so for speak

for business for the for the B2B product

another part of the vision is really

like what should a superhuman language

shooter be able to do it should probably

be able to handle kids as well as a

Samsung employee that wants to transfer

to the US office and wants to use for

work, right? So our view there is that

it's the same product. It's a different

distribution mechanism, right? Consumer

versus B2B. And I think that we will

eventually build something like a Mac

app. Maybe it'll be integrated with the

browser in some way. We're not really

sure yet, but obviously in order to

apply it to your day-to-day, there needs

to be some way to hook into your actual

sort of work documents, whatever. That's

a whole can of worms.

um we are actively thinking about it but

I think my sense is that it's not clear

to me that any of these products have

really taken off and I think that

there's many other approaches that are

possible. I don't have the answer but

like another example very hypothetical

future world is maybe open AI you know

the the new Johnny I thing will come out

with some hardware that we'll be

listening to you all day and then we can

you know give you some sort of like very

deep analysis that is integrated with

the speak app at the end of the day or

like you know the end of the week

whatever I don't know

okay one more time since you brought

that up I'm sure you don't I haven't

told you anything but what I don't know

anything

what's it going to be

I don't know anything

it's like a very like the number one

topic in all the parties I go to now.

Really?

Yeah.

What's the most compelling idea you've

heard?

Okay. So, there's there's people that

say Joanie hates wearables.

Yeah, I've heard that, too.

I'm like, if it's not a wearable, then

you just made a second phone, and in

that case, just make a phone.

Yeah.

I thought they said it was I mean,

didn't Sam say that it was he wanted to

do a phone in the past?

That was in the far that was like in the

far past.

He says a lot of things.

He'll say a lot of things.

Yes.

Okay. Anyway, I think wearable makes

sense. I think the the the race is to

capture context.

I mean, I I have a wearable on.

Yeah, I I have a we have a wearable

here, too. Transcribe everything

that transcribes everything.

Yeah, that's cool.

Yeah, it's a previous episode of ours

with um I can hook you up if you want.

But yeah, I think like it's something

that a lot of people are interested

obviously cuz it's a huge bet by them

and uh yeah, it's curious. Okay, you

mentioned video. I just wanted to double

click on that a little bit. I'm sure

engagement very high for video because

people love to watch video. I thought

that speak would be one of those places

where like you just kind of leave it in

your pocket. You walk, you take take a

walk, learn to speak. Probably that's

not true. What we've done so far is part

of the course experience is a teacher

video. We've tested other more audio

forward types as well. We found that of

course like you said video is very

engaging but at the same time we have a

lot of users that do want to be able to

walk around with the phone locked in

their pocket. So doing something that is

more like voice mode with optional uh

you know visuals I think is really good.

I think there's huge opportunity for a

better way to learn things like

listening comprehension. So, I took

German in grad school for two years, and

I thought I was getting somewhere, but

anytime I listen to a native German

speaker, it's so fast. It's It's

completely on a different level.

And I think you can imagine a plethora

of really cool experiences that feel

kind of like you're listening to a

podcast, but it's all AI generated. It's

fully controllable. It's integrated with

the app. You know, there's something

there for sure. Yeah. don't want to do

AI podcast, man. We're cooked.

It's okay. We we'll document the the own

ending. I I I mean, I think when that

happens, we just end the show. Like, why

not? Like,

to zoom out a little bit, in the pretty

near future, multimodal models will

cross the threshold where they will be

able to generate images a lot faster

than they currently are, maybe somewhat

close to real time even, right? and

audio at the same time, text at the same

time. And you can imagine just like a

very powerful multimodal tutor that can

kind of do it all at once where there's

an audio track and then if the teacher

is teaching you something with the right

timing, it chooses, okay, at this point

I'm about to introduce a new concept, so

I'm going to show the word on screen so

that the user can see how it's spelled,

right? There's a lot there. or you can

do generative UI. A lot of nuance there

where it's easy to do it badly, but to

do it well requires a fair amount of

reasoning and mental modeling of what

the user knows.

Yeah.

Which feeds into what you need to show

at what time. So that's probably going

to have to be like a pretty parallel set

of systems. Have you spent any time

looking at this like, you know, like V3

where you do video plus audio at the

same time on how you can tweak the audio

part versus the video part? Because I

can imagine you might work on a video

part and then you want to change the

audio generation model. I don't actually

know how the model works inside on like

how much you can.

We haven't really looked at the video

stuff much. We basically think that

we're very bandwidth constrainted,

right? So we're just scaling and trying

to hire as fast as possible like

everyone else is. And as a result, we're

really focusing on just like the most in

reach highest impact things. I do think

that the barriers are coming down very

fast for all of this sort of stuff. I'm

just so excited about multimodality and

where things are going here because

imagine if you're learning Spanish being

able to look at an image that the model

generates for you and then doing Q&A on

it, right? Like a beach scene and then

the model will ask you like how many

people are running on the beach and then

you have to sort of respond in the

target language that you're learning.

very traditional language learning

exercise, but you can imagine it being

fully generative, which is really cool.

Awesome. Lots of stuff like that. The

engineer in me worries about inference

costs, but I think you can just kind of

sweep that under the rug for a while.

See if it works first and then you can

worry about cost.

Yes.

You mentioned realtime voice platform.

Uh I just want to give you the platform

to platform to talk more about that just

like uh you mentioned for example that

you're a very heavy user of the

real-time API from OpenAI and you build

a bunch of tooling around it. Yeah. So,

we last year had early access to the

real-time API and there's a very obvious

sort of use case for language learning.

I think one common theme that has just

been pretty awesome since LMS came out

is that language learning as an

application is just a really good fit

for LMS, all these model types in almost

every way, which has been just really

great for speak specifically for real

time. I think the audio piece promises

to really like infuse almost every

surface in the app. You can imagine this

is the primary way that you talk to your

tutor, right? And an additional

complication is that it needs to be

multilingual and there needs to be code

switching. So that's a pretty frontier

problem right now, right? So like I

should be able if I'm learning Spanish

to speak both English and Spanish and

vice versa from the model. That's a

pretty hard TTS problem today. It's

actually like only a few models are able

to speak two languages in the same

sentence

and then and then pronounce them

properly. Sorry.

You can have a router model like a tiny

little router model guess which language

first and then route.

Well, the problem is that there's you

could have a subword in in a single

sentence

in a different language.

Yeah.

So, you can't just concatenate either.

Yeah.

Because it won't sound right.

Right. It won't sound natural. That's

not how humans do it. So this seems to

be like a like a very native

controllable audio, you know, function.

Yeah. But we are in the process of

building a variety of experiences on top

of the real-time API. I want to clarify

that actually nothing is in production

yet, mostly for price reasons. Frankly,

the pricing model of the real-time API

makes more sense for something like a

customer support agent where you're very

directly replacing somebody that you

would pay hourly otherwise. And that's

how you're seeing the pricing model for

a lot of these initial agents work out

for us. We want our users to be able to

do these real-time role plays and have

these conversations for many hours a

day, right? If they want. Getting cost

under control is definitely a pretty key

consideration right now. But we are

pretty close. Maybe actually even by the

time that this episode is released,

we'll have something live. But we have a

what I think is a really cool

application of the real-time API which

is basically a new instructional lesson

where it's the model actually teaching

you something like a new language

concept and it's intended to sort of

augment slash play the same role as our

current video lessons which are the

instructional lesson type and it's

interactive obviously at certain points

in the 3 to 5 minute lesson you're

interacting with the real-time API it's

semi onguard rails

There was a lot of scaffolding we needed

to build to basically number one switch

between the interactive and

non-interactive portions of this lesson

properly if that makes sense. Right

there there's there's some portions

where you're just listening or looking

and then some portions where you're

actively in a short conversation and we

kind of swap back and forth and we have

like a bunch of sort of custom

architecture and info around that. And

then there's also making the cost make

sense or at least like semi make sense.

And then there's a bunch of WebRTC

infrastructure. We're at sort of, you

know, not huge but non-trivial scale

either. So we definitely just it'll cost

us millions of dollars if we do

something wrong.

Yeah. Yeah.

Do you do inference in Korea because of

the you know latency and all that? It's

something that we have been increasingly

paying attention to for all the real

time paths. Like I would say two or

three years ago when real time stuff was

still quite nent users didn't really

care as much but I think now the

standards have risen right like latency

has to be low everyone cares. Do you

have a hard latency budget for responses

or do you just kind of work it out? So

for example, right? Like you have a

knowledge graph that you're accessing.

you have content that you're retrieving.

There's a lot of stuff there and then

like I you know maybe you're using a

reasoning model probably not but like

that all eats into the budget. I will

say that from the like real-time

engineering side everyone talks about

okay submit user request to get agent

audio response like first bytes right

first audio bytes what's that latency

and then we try to get that as low as

possible. I would argue that's actually

like a vanity metric because what you

don't take into account is how the VAD

works. How do you do turn detection to

detect when the user is finished

speaking, right? Because that can easily

add like another second if you do it

badly and nobody talks about that for

some reason, right? Like what you need

to measure is actually when is the user

stop talking to when does the model

first audio come and usually that number

is much larger. That is a very domain

specific problem. You can use like the

semantic VA on real time API for regular

English conversation and that will

basically classify at every token how

likely it is that you're done speaking

as a sort of normal conversational

English speaker like in this

conversation that's fine but it doesn't

work at all for language learners right

if I am trying to respond in a language

that I'm learning I'm going to be

hesitating halfway through

for 10 seconds for more right so it

needs to be fully custom probably this

is something that we're also actively

working on but that is actually like the

dominating factor in perceived latency

coding do you use cursor windsurf um

other autonomous agents

it's kind of all of the above

so I think like as the CTO I view it as

part of my responsibility to really set

expectations push everyone on the team

show them what's possible

We've been trying everything.

Yeah.

And I think we tried to basically set

the expectation that the frontier is

moving so fast, it's deeply

non-intuitive.

Mhm.

If you've tried coding tools 6 months

ago and they weren't that great,

especially if it's not Typescript or

Python,

right?

I'd say malaps are the most popular

languages. Like that's all it is. We try

to set a culture in the engineering team

where usage of these tools as much as

possible and as a default path is the

expectation. And in hiring we are now

explicitly asking about this a lot.

Thinking about what are the types of

people that are going to be better

higher agency at trying these types of

tools. It's so important. Before we zoom

out, anything we missed about speak that

you really want to highlight or

something that people underrate about

it? One thing that I've always been

really excited about is that I feel like

a lot of the foundational pieces that

we're building around knowledge graph,

for example, a lot of these concepts

should be applicable to not just

learning language, but also other things

in the future. We're already starting to

see the very beginnings of this on the

B2B side where a lot of it is more like

management skills and hospitality

skills, communication skills, more like

true L & D for enterprise, less like

core pure English proficiency. So I

think you, you know, that's like

obviously immediate neighborhood, but

you can imagine many academic subjects,

math, biology, etc., you know, schools

to work for. Um, super excited about

that. If I knew my employer was giving

me a language tool, but then he was

evaluating me on my management skills

while learning the language,

I might use it less just, you know, you

know, you want to you want to you want

to separate that out.

Yeah, very fair.

I agree overall that the knowledge graph

problem is very important. We have a

whole track on it for for the conference

and I think that the amount of data can

be so high and actually like you want to

you want to generate relevant triplets.

I I assume you use the normal subject,

predicate, object type.

It's a bit more custom than that because

it's a bit more domain specific around

the way that we conceptualize the

vocabulary, you know, and the sentence

patterns and so on. So, it's it's more

specifically around like language

learning concepts, if you will.

But what I think we can extract from

speak or as it is generalized as a

framework is um what I've been calling

sort of like the bloom two segment

problem type thing like the level

adjusting tutor like where are you at?

let me adjust my thing to where you're

at and then I'll push you up to the next

level. And I think the knowledge graph

is is a part of it, but I don't know if

that that's all of it. I've never seen a

working example. We are approaching that

problem from a few different angles. I

think part of it is knowledge graph.

Part of it is being very careful in how

we structure the curriculum so that

you're placed at the right level so that

the learning path itself which has a

foundational backbone because beginner

to intermediate English learners

actually like all need to know a bunch

of similar concepts. It isn't really

until you get intermediate and more

advanced where that starts to like more

sharply diverge and a z through b1. I

would say there's a pretty well-

definfined like sort of linear path.

Actually, a lot of the deep thinking

that we've done around how do we

structure the pedagogy is also super

useful in terms of just like matching

people to the right level and then you

can take this backbone and then

basically modify it based on the

knowledge graph on your systems

knowledge of what the user is like bad

at versus good at. I think a lot of

startups or especially edtech like that

is the core engine like you know once

you do that you can kind of teach

anything.

Totally. Yeah.

We have a few more broader fun

questions. Yeah.

Um so speak.com domain.

I looked it up. Voice.com got bought for

30 million

in 2019.

When?

2019.

Okay.

So I don't know if you want to share how

much you paid for it but it was a lot

less. I figure it would be a lot less

but I'm curious. Uh my estimate was

100K but

it was it was more than that.

More than that. Wow. Okay. I'm not going

to say anymore about the numbers.

So what's the Yeah. What was the sort?

Was it easy? Was it did you use a broker

like we had, you know, Dash from Hotspot

who sold chat.com to OpenAI and he has a

lot of very

That was like 100 million deal or

something right?

That was that was very big.

Oh, wait. No, that was AI.com. What?

Uh Chad.com.

Oh, Chad.com. Okay.

Yeah.

We bought it several years ago. It felt

very expensive for us at the time. It

was a little bit of a crazy move, but I

think we were very convinced that we

needed a super strong consumer brand

that was scalable globally. And that was

just always our ambition. Like we want

to be the way the next billion people

learn languages and we need speak.com.

So we don't regret it. It's it's such a

such a great word. Makes for great swag.

Very nice decision. You had a couple

other fun fun questions. any fun Korean

celebrity stories because you work with

so many influencers.

We have a bunch baking right now, but I

think it was, you know, some something

more generally that has just been so fun

on the journey. So, we we would visit

soul every year.

Yeah.

And seeing speak go from nothing to the

first time we saw somebody on the street

using Speak

to now our main teacher in the app is

like a mini celebrity. people come up to

her on the street as she's just walking

around Soul and recognize her from the

app, which is really cool. Now, we do a

lot of advertising. We do billboards, TV

commercials, we work with big

influencers and so on. So, just like

seeing the scale of that has me kind of

like in awe. It's like really cool. Um,

just to see something that used to be

nothing.

I wanted you to name drop like Blackpink

or I don't know.

Look, there there's there's some stuff

baking right now.

Yeah. Okay. All right. We talked about

the teal fellowship on your LinkedIn.

You kind of have this hole between 2012

and 2016 which you talked about you did

some startups. Any of them that you want

to share like ideas that you worked on

that you thought were

maybe it was just early but

yeah what you should revisit.

I've always been interested in learning

and education. One of the other field

startups that I did in that time um was

it feels silly to even talk about this

because amounted to nothing but it was

called Bloom you know the Bloom 2 sigma

problem. It was actually like named

after that and we were trying to to

basically build like a better adult

learning platform and have really cool

interactive JavaScript widgets for

various concepts that you could learn.

didn't find PMF. I was young and didn't

really know anything about business at

the time either. But I think that the

common thread through actually like

everything that I've been interested in

since leaving grad school has been how

do we build software, build tools that

help people learn things more

effectively and better and faster. And

now I feel very lucky to be in this

position because obviously AI is the

ultimate version of that, right? And

it's been completely transformative for

me personally because I just get a lot

of just inherent fun and pleasure out of

being able to like think of a concept

and then oh now I can talk to this

omnisient LLM that can tell me more

about it and I'm really good at asking

the right follow-up questions that I

want to know. Um so that that's been

completely transformative for me. Do you

get a lot of like people using speak for

therapy like you know because it's not

meant to be that but since you have

inference they will use it.

In 2023 when we first launched our AI

role plays using GPD4 back then people

were way more concerned about safety,

right? And obviously the models now are

much better at like refusals and the

line is sharper between what's

appropriate and not. But we did see a

lot of our first users start to put in

pretty questionable custom scenarios.

Um, you probably guessed

and you know like this was something we

expected but I think seeing

the logs in person is like very

different.

Got it. Um,

some shocking stuff in there.

Last couple questions. One on Andre. You

talked to him in your machine learning

journey. Um, yeah.

He's also working on edtech now. Uh, I

don't know if you've ever had

conversations with him. No, I haven't.

He's also interested in language

learning by the way. You know, one thing

that I think we didn't really realize

early on or like fully internalize at

least was just like how deep the market

is.

Say more.

It was so universal where we really

struggled to do some of the basic

startup stuff around define your like

ideal customer profile and and like you

know segment your users because our

users were everyone like we we had

parents using it with their kids. We had

really old people using it. We had

people using it for work. So that was

kind of like mindboggling.

You still did customer segmentation or

are you saying it doesn't matter?

I'm saying it was hard to do. Like we

tried and we have a sweet spot in Korea.

It's like 25 to 45 more professional

more white collar but it's very it's

like like a very long tail on either

side. Yeah. I think it's you know it's a

huge market and I think it's a very

special moment in time right now where

it's obvious that a lot of the tech is

here. I think it's really good for

humanity if we make a lot of progress

here. So, I'm really excited for his

company too.

We started asking about the teal

fellowship. So, maybe we can wrap with

one of Theo's favorite questions, which

is, uh, what's something you believe in

today that most people would not agree

with you on?

I think that

people, if you recall, expected the

world to kind of explode when GPT4 came

out. And, you know, like everything

would change. And I think if you like go

to another state outside of the Bay

Area, probably even in California,

outside of the Bay Area, and and you ask

somebody how much their life has

materially changed, it's like pretty

close to zero. Real world inertia is

enormous. Obviously, AI is probably the

most transformative technology we've

ever built, but I think in a very real

sense, the world hasn't changed that

much either. And that's a really weird

thing, right? So I think we need more

builders. We need more people building

applications. It's weird to me that

speak is actually like not that many net

new consumer AI native applications at

scale. Like there should be way more. I

would love for there to be way more.

Consumer is hard.

Yeah. I'm intimidated but like you know

it was just like there was never any

alternative for us.

Yeah. Like I said,

you didn't have a choice,

but also you're very smart. But also

maybe you have some growth hack things

that you can advise people on that like

that people could could learn. But yeah,

I agree. I I I think like the general

take actually is this is what we want,

which is slow takeoff, short timeline.

That's fair, right? This is the 2x2 that

everyone always talks about in AI

safety. You're seeing slow takeoff and

like maybe don't complain. Like we we

have a heads up because or you know

Dario's right and like half of us lose

our jobs in the next two years. Yeah.

It's a It's It's so hard to predict.

Yeah.

Sometimes I get AI anxiety

and then I just

You get anxiety.

Yeah.

Okay.

And I just focus on our users.

That's a perfect place to wrap. Thank

you so much for taking the time.

Yeah. Thank you both so much. This is

great.

[Music]

Loading...

Loading video analysis...