Personalized AI Language Education — with Andrew Hsu, Speak
By Latent Space
Summary
## Key takeaways - **AI enables a '3rd generation' of language learning**: Speak's vision is to create AI-native language tutors that leverage advancements in speech and language models, moving beyond the 'Gen 2' mobile apps like Duolingo to focus on functional fluency and adaptive instruction. [10:59], [11:15] - **South Korea: A strategic proving ground**: Speak initially focused on the South Korean market due to its high demand for English fluency and competitive education landscape, validating their AI-native model against human-based solutions before global expansion. [16:30], [17:37] - **LLMs and Whisper accelerated Speak's evolution**: The advent of models like Whisper and GPT in 2022 transformed Speak from a practice tool into a full-featured tutor, enabling real-time feedback, semantic understanding, and conversational memory. [23:31], [25:59] - **AI-generated content scales curriculum development**: To support multiple languages and expand content, Speak is investing in AI agents and pipelines to generate curriculum and lesson material, aiming for 100x more content with less manual effort. [29:08], [30:32] - **Quantifying fluency with knowledge graphs**: Speak is developing a knowledge graph to track what a learner knows and can do in a language, aiming to create a holistic 'Speak score' that measures real-world proficiency. [30:39], [31:58] - **Real-world fluency over textbook learning**: Speak prioritizes teaching casual, conversational language that real people use, rather than traditional textbook phrases, focusing on functional fluency for practical situations. [35:45], [36:00]
Topics Covered
- Our founding vision stayed the same for years.
- Why real-time translation won't replace language learning.
- We chose the hardest market to prove our model.
- AI creates a safe space to make mistakes.
- The real world has enormous inertia against AI.
Full Transcript
[Music]
Hey everyone, welcome to the latest in
space podcast. This is Allesio, partner
and CTO at Descelible and I'm joined by
my co-host Wix, founder of Small AI.
Hello. Hello. We're back in the studio
with Andrew Su speak. Welcome.
Thank you for having me.
I have to start this off. I didn't prep
you on this at all, but you were a teal
fellow in 2011.
First class. First class.
Is that is that the one with um SBF?
No, he was I think several years later
actually.
Yeah. Yeah.
What was it like? Just talk about the
That's a good question. Haven't been
asked that one in a while. It was a
really crazy idea at the time and very
controversial. And I think the first few
years of the fellowship were definitely
let's just find 20 people under 20 and
give them $100,000 to drop out of
college. And it could be it was no holds
barred. You could do anything. You could
be doing some crazy research idea, a
startup, anything. Um and uh I actually
met my current co-founder at speak. He
was in the second year of the fellowship
and made many like very close friends
from the first uh few years. But I mean
for me it was life-changing. I I had a
very unusual path where I was actually I
did finish college unfortunately. I was
in grad school at the time because I
went to school really early.
Yeah. I was like aren't you too old? You
know feel like I'm young. I was
I was 19 at the time and in grad school.
It was a very accelerated path but I
think like I knew at the time that I was
going to leave grad school and do
startups anyway and the timing lined up
really well.
Yeah. Yeah.
Yeah.
Vitalik I think
he was also in a later year.
Ah
damn. Okay. Anyway,
but the first two years had I mean there
are some crazy successes you know Dylan
from Figma. Uh I mean yeah like a lot of
people.
Awesome. Well you know feel free to
bring in those stories as and when
because obviously only only you know
like those kinds of people. You are now
CTO co-founder speak. Uh I would say
from a very early stage like one of the
most successful and prominent OpenAI
partners that like anyone would know is
like doing doing well and like teaching
English to Koreans is like your your
your rough um remitt at the time. How
did that all come about? It's funny that
you say that because despite our current
sort of revenue scale and objectively I
think how successful we are, we've
always operated in a market at least
initially on the other side of the world
and been much much more popular in the
sort of eastern world and a bunch of
Asian markets and relatively unknown in
the west. So it hasn't really felt like
we've had that sort of awareness until
you know the past few years really. But
brief story is that my co-founder and I
back in 2016 were fascinated by the
promise of AI and we spent a year
soatical basically learning everything
we could. We talked to Carpathy back
then actually when he was like just
finishing grad school and did a lot of
sort of self-study research and we were
just so convinced I think fundamentally
that speech models were going like this,
language models were going like this and
in the 5 to 10 year span they would
become superhuman and we were utterly
convinced of this future and we saw that
the way people learn things and
specifically learn languages which was a
very sort of humanbased thing if you
really care about fluency. That would
completely change and we'd be able to
build language shooters that were pure
software, pure AI. So that was kind of
the genesis story of speak. It took much
much longer than we expected to build a
great product and find good PMF. The
first few years were very painful and I
think without this really compelling
vision of the future, we would have
quit. We actually like never pivoted.
Last year we brought the entire company
to Taipei. We do this company trip every
year and we played our original YC
application video on screen and it was
really funny because the things we were
saying in that video were the exact same
things that I still say today about the
long-term vision and what we're building
towards. Um so that was really cool to
see. Can you summarize the long-term
vision again? It was that as speech
models and language models become
superhuman, that would let us create an
AI language tutor that would help you
become fluent faster than any human
could. And I think we're like 80 to 90%
of the tech is here now.
And you have this big focus on like
speaking. Obviously, it's in the name of
the company. That's right.
Uh and I think the speech models were
maybe a little delayed compared to the
text models. Did you ever think about
okay maybe speech is just not going to
work for this use case or like what were
kind of like the valleys of you know
discomfort and then what were maybe some
of the pivotal releases and models that
you were like okay it's going to work it
might take a little longer but it's
going to work so we've always done
custom speech stuff the first act of the
company if you will was before LMS right
before 2022 when whisper came out when
chat came out in the years before that
you know like roughly 2 to 3 years is
when we feel like we found PMF in South
Korea and then started growing still
only in that market, still only teaching
English and we developed custom speech
recognition models and users were
speaking into the app all day. So we had
a ton of this non-native English speaker
data and we would use that to fine-tune
models, understand our users better. We
still do that today and it's important
for us for the core recording loop in
many of our lessons that it's extremely
fast. So we're very latency sensitive.
There's many other sort of product
surfaces within the app today that are
more LM powered where it's more
open-ended real tutoring where we
actually give you feedback on what you
said in the semantics and so on. So that
stuff is more like whisper powered more
LM powered but we've always had like a
very fast core ASR loop that's been
fully custom. I just onboarded to the
app earlier today.
Yeah. Unlike other apps there's kind of
like this uh tutor conversation that you
do for onboarding. I'm guessing that is
mostly LLM based and then you're kind of
judging the person respond. So I
selected Spanish and the conversation
was in Spanish via text to start and
then from there started to create
lessons for me.
Yeah.
Was that all unlocked from LLM where now
you can kind of have these conversations
and then bring people into the speech
flow?
Yeah. So before that, so we call that
magic on boarding and it was a new thing
we built that was more conversational.
We wanted it to feel more like you were
talking with a tutor and they were sort
of learning things about you and we
would use that later to personalize the
experience. Before that we had a like a
much more traditional app on boarding.
There's still a lot of open questions
interesting questions around what is the
proper onboarding UX because a lot of
people start using speak and they're not
in a situation where they can actually
speak aloud. So we have like you know
fallback outlets and so on but it's
something we're like super actively
experimenting with.
Is there a structured output behind
that? you know anything that you found
implementing magic onboarding I think
people always want to improve onboarding
what's the uplift or was there one we
still don't know yet the interesting
thing is that in general because it's
speaking based which is a much higher
barrier than just like tapping a
multiple choice button what we see is
that install to sign up rate
is a decent amount lower but trial start
rate is higher it's still something that
we have an active experiment that um
that's running And we're trying to be
super agile about testing many different
sort of like formats of this. I don't
think I have like the final answer yet,
but I think the intent the really like
vision that we're going for here is that
as soon as you download the app from the
app store, maybe you see it in an ad,
the first thing that the first
interaction when you have like a fresh
open of the app should feel pretty
futuristic. should feel like okay this
is like the new AI native nextG way of
learning a language to fluency and
that's kind of been always our ambition
like we wanted to build something that
wasn't possible before without LM and
like AI technology
yeah I I think um I wanted to go back on
the onboarding soon but there's a
general idea of like when you replace a
form with voice bot that you need to
have some kind of state machine behind
the hook the thing to drive like what
else don't don't I know about you let me
proactively ask that and I'm just
wondering if you had any insights there
or is it literally just a state machine
we tried both actually right now I think
probably what you saw is a state machine
but I think that
trust the AGI
yeah right
I think that things should move in a
direction where it's much more of a
natural conversation there is a general
sense of a goal in the prompt that you
can specify and part of the hard thing
here is all the guardrails Right?
And trying to be like antagonistic to
the system, then things start like
really going off the rails. So for a
bunch of these experiences, we're pretty
careful about the fallbacks and we have
a lot of eval around that. But I think
where it should end up is just feeling
like you have a quick 3 to 5 minute
conversation with your tutor and then it
knows a lot about you and then you
create your account, etc. and you create
memories like
yeah so we we store what you're saying
we summarize in the experience the way
it works is the tutor will ask you some
sort of question like what are your
goals around learning English or the
language and then we will basically use
a separate LM prompt to summarize so
it's not the like full transcript for
what you said that you see it's more of
like an abstracted okay here's what you
care about and we think that's a better
product experience
what were some of the other key tenants
on the Obviously, language learning is
like one of those consumer markets where
like dozens of companies always trying
to get started and you get these old
companies like you know Babel and you
got Dualingo. Um so speaking the the act
of speaking was like a big part of it. I
think this memory stuff is great. I
think if you tried some of the other
apps is like they always try to re ask
you the same things that you get wrong
before but you're not really learning.
Is there anything else that is maybe uh
not as obvious from the outside in the
design of the app and the product that
is like you think is really different? I
would say from a macro level, this is
actually a pretty new product category,
AI powered language learning. And all
these apps that you mentioned, Dualingo,
Babel, etc., they're more of like the
Gen 2 of language learning. So like if
you think about Gen One was Rosetta
Stone if you, you know, if you remember,
right, CDROMs in airports and then Gen 2
was basically mobile. So you have these
very casual, massively popular mobile
apps like Dualingo that I think the comp
there is probably closer to a mobile
game, something that feels productive,
something that's very engaging, very
gamified. And
Dualingo is very leaning into it, the
gamification,
and they've done an amazing job of that
to be clear.
Yeah, they might be the world's best
people at it.
Yeah. Um, and our view is that LLMs and
AI now enable Gen 3 of language
learning, which is something that is
very AI native, very focused on
functional fluency, which is why we do
all these role plays and let you
practice Spanish by talking to your Uber
driver. We don't teach vocabulary and
grammar. We teach sentence patterns and
we try to get you to just repeat and
drill and drill and drill almost like
you're in a gym until it's automatic
because that's what speaking is, right?
Like has to be spontaneous and
automatic. In terms of the other aspects
of the design though, we went through
many many iterations over the first few
years of starting the company. This is
kind of what I was mentioning about it
was like really painful in the first
four or five years. And in fact, the
current version of the speak app is not
the first thing that we launched. We had
something that we call internally the
red app which was like a red app icon
still a similar logo and it was more
around packs of content instead of
courses where you could sort of choose
any topic that you wanted to learn. It
was for many different languages for
learning. It was essentially like not a
very directed experience and it didn't
really work. It was free. It was a very
basic thing, but we in 2018 tore
everything down and realized that we had
to really fully change what we were
doing. And that's when we decided to
focus on South Korea, specifically on
teaching English. We built a bunch of
new lesson types and we created our
courses so that the experience was much
more on Rails. We realized people don't
want to choose. They're already using
some of their motivation on a daily
basis just to open the app. they don't
want to make another choice after that,
right? Just tell me what to do, right?
Like, you know, give me a big button and
then I can tap it and just start a video
lesson or whatever. We also pretty
critically, I think, abandoned the free
version and just went straight premium
and we kind of sidestepped the
motivation question that way because we
knew that there were a ton of users that
really wanted to learn English and were
already really motivated. So, we wanted
to basically filter for these users. Um
so there you know like it was I wouldn't
say there was one silver bullet. It was
kind of the combination of many
learnings over three or four years. And
then that started really growing in
South Korea. And from there, I guess
like phase 2 was really 2022 when LLM
came out and Whisper came out. And that
allowed us to go from this more
supplemental speaking practice tool to
more fullfeatured language tutoring
where we could use LMS's like 3.5 Turbo
back then to give you direct feedback on
your wording and on you know like that
was kind of a weird thing to say. A
native speaker would say it this way or
use a different word or whatever. I
always do a poor job of doing this, but
uh can we get some headline numbers like
just to get a sense of scale because I
think maybe some audiences don't know
where you at now in terms of your reach.
So, we're now the biggest English app in
South Korea.
Yeah, we do billboards, big celebrity
campaigns, that sort of scale. Like
we're, you know, very popular there. I
think like 6% of the Korean population
has tried us. Um we're, you know, like
well on the way in a bunch of other
Asian markets like Japan, Taiwan. So the
Asian markets are currently our main
stay. We also teach English in 40 more
countries. We're coming to the US as
well, launching, I mean we we have
Spanish, French live and several more
languages are coming this year. That's a
huge focus of the company right now. Um
in terms of revenue scale, well over 50
million AR. It's a pretty simple
business model. It's like mostly
consumer. Yeah,
the B2B stuff is super super exciting
and that's also growing really fast and
I think it'll be a really meaningful
part of the business.
When did you start B2B?
About a year ago. It would it kind of it
was like very much a side bet/experiment
at first and then it just started
working and
of course it's going to work.
Yeah. And now it's like okay,
you know, this is part of the future,
right? This is a real thing. Yeah. So
that's exciting.
What's the B2B race between learning
language and like real time AI
translation at Google IO? like one of
those like Google Beam things for like
uh you know for conferencing that do
realtime translation like
Yeah. Yeah. Yeah. So people always ask
this, right? They're they're always
like, "What happens when the babbleish
comes, right? When the real time
translation comes."
And babblefish is the hitchhiker's
guide,
right? Yes. Exactly. The counter example
that I always have that I think is quite
illustrative is in German, the verb is
at the end of the sentence, right? So if
you're trying to do real-time
translation from German to English as an
example, you can't actually make any
progress on the English until you hear
the whole German sentence and you know
what the verb is at the end, right? So
like the minimum latency there is is the
full sentence. And that that's like an
example of the technical blocker for
like why it'll never be truly truly
perfect. But also I think besides that,
if you talk to all of our users in Asia,
they don't want a translator. the reason
that they are trying to learn English is
to make themselves a better person to
connect with other people like they want
to be able to look you in the eye and
speak English speak the same language as
you right so it's actually like a very
different thing I think what will end up
happening is that we will build a
real-time translation feature into speak
and have it integrated into the learning
experience
and also like there's always that human
side right like I'm dating a Romanian
woman yeah uh his wife is trying to
learn Italian like there's always at
yeah going to keep happening. I want to
double click on Korea.
I think it's like a very insightful
smart decision.
Maybe people only know Korea through
K-pop.
But actually I think a lot of Americans
learn Korean because of K-pop. That's a
that's a side thing. But like you could
have done Taiwan. You could have done
China. I saw I remember sawing a
documentary about how China was crazy
about English or mad about English. I
think that was the title of the
documentary. Was it obvious? Were you
sure when you went into Korea or was it
just a test? We visited a bunch of Asian
countries when we were thinking about
how do we relaunch things? How do we
focus in? And we almost chose Taiwan
actually. Um, but I think it was a
little bit serendipitous. So, our first
employee is Korean and was my
co-founder's college roommate. Actually,
when my co-founder visited Seoul to
check out the market, he asked SJ to
come along as essentially a translator
and to like, you know, facilitate. And I
think that just went really well. And it
was just very obvious from being on the
ground in the market that Korea is
pretty obsessed with learning English.
And there is every human-based solution
possible, right? You know, like
Englishmies, classes, skyscrapers full
of classrooms, stuff like that. And our
logic was basically if we can really
make headway and win this market that is
chalk full of these human competitor
products and all these people that
fundamentally care about fluency, then
we probably have something pretty real
and strong PMF that we could win other
markets with. So that was the original
logic and you know so far it's been
working. It's retroactively obvious
which kind of obvious but like it's so
counterintuitive that you would be the
team to do this and not a Korean team
right where they would be they would
know because they had personal
experience of like I started in Korean I
learned English here's how you do it
yeah in hindsight super weird right like
we we were definitely you know sitting
in an office here in San Francisco
operating with users in a market all the
way on the other side of the world it
would not have worked without Sunjay I
have to give him a lot of credit here
because we paid a lot of attention to
the specific wording of button text in
the app and local, you know, like like
localized strings. We had a lot of
reports from users pretty early on that
they were shocked that it was an
American company like they thought do
right cuz you can always tell there's
always some weird wording or whatever,
but there wasn't in speak and I think
that probably had like a large sort of
non-tangible effect.
Yeah. Focus, attention to detail. tech
stack you this was 18 what were you
rolling you you just did ASR and there
was no LLM so bird maybe I don't know
we actually had really no LM component
of it so all of the content oh yeah
another thing we did that I forgot to
mention was we decided we needed to
fully own all the content so the way
that we teach all in-house all sort of
thought from first principles we built
this thing called the speak method which
is basically like uh pedagogical
philosophy around teaching sentence
patterns that you drill and then sort of
combine into higher order patterns and
all of that was inhouse with you know
our content team and our teachers um and
we built a lot of internal tooling to
make this possible there's just a lot of
operational overhead I would say this is
something we've struggled with to scale
to many more languages and that's like a
big research effort within the company
right now. We're building a mobile
product, right? My co-founder and I have
always just loved apps and been big
iPhone users. So, we cared a lot about
the app being native, feeling great,
being high performance. The DNA of the
company was always consumer. Frankly, my
co-founder and I had never worked in a
real company. I dropped out of grad
school, had a few failed startups, and
then eventually started speak. And he
had never worked in real company either.
compete uh to startups in the past. So
we didn't know anything about enterprise
enterprise workflows or like what sort
of software real companies used. Um so I
think frankly consumer was the only
path. Uh I don't think we could have
done anything else. We just didn't know
enough. Um, and I think that has served
us well though in terms of just
really caring about the craft of it and
wanting to build something that felt not
90 to 95% but 95 to 100% in terms of
polish.
Was it hard to build an engineering team
that did that at the time? Because ML
engineering is very academia driven back
then and then you have like the more
consumer stuff that it's maybe more nent
and it's mobile. I'm now realizing that
our story is very weird.
So, you only realized
in addition to the market on the other
side of the world, our first iOS
engineer that we hired through a YC
referral was in Slovenia. If you don't
know where Slovenia is, look it up on
Google Maps, but it's, you know, it's
like a pretty obscure little country.
Yeah. And then we needed to hire a
backend engineer, and one of his best
friends was a great backend engineer,
and we hired him. And then this happened
four more times.
all in the same city. And then we were
like, "Okay, we should probably just
open a physical office." So for Yes. So
for several years we had an engineering
office in Slovenia.
What?
And then a few people here in San
Francisco and we still do. Now we have
90% of our core product development team
in San Francisco here. Office and FI
were really only hiring here. But for
the first like several years that you
know that was like another very
interesting sort of cultural aspect of
the company I guess
I think a lot of early stage founders
have to do that that's the only people
they can afford or whatever.
Um what are your tips that make that
remote stage work?
For us it wasn't really a price thing. I
think legitimately thought he was the
best person that we interviewed and then
it just kind of happened
that way when you roll it out.
Yeah. Yeah. It's not a price. It's more
about remote work, right? Like
distributed team early stage. Like a lot
of people say like no, you have to you
have to move everyone SF or your startup
would die.
Yeah.
I don't think that we were good at
remote work. I don't think that my
personality or my co-founder's
personality is inherently very good at
async. Just to be perfectly frank, I
actually think that almost like in spite
of it, we made it work. It was a little
bit brute force. Like I would just sync
with them every single day, right? And
there was pain because the time zone
overlaps like it was like exactly the
most inconvenient.
Y
but I think for several years we did
that. We got really good at the cadence
of it. I think they were excellent
engineers as well. So
it worked out. But if I had to do it
over again, I probably wouldn't do it.
It's hard to say. Yeah.
Shall we move to phase two on the LM
side? Um that's when OpenI started
opening up. And when did they invest?
This was 2022. So
that was also when whisper dropped and
whisper was a really exciting moment for
us. It was actually since we started the
company and made that prediction of okay
in 5 or 10 years speech models, language
models will become superhuman level.
Whisper was really that magic moment for
us where we were like oh I think
what we predicted is here. And I pretty
distinctly remember this moment in the
office when we got access to the model
and we were testing it on an audio clip
of like a very beginner English learner
in Korea saying something and it was if
you closed your eyes as a human you'd
have no idea what they were saying.
There were four of us in the room. We
all closed our eyes and none of us had
any idea and the model got it right. So
I mean superhuman. I think that was the
moment that we had been waiting on. And
at the same time, LMS were on the
ascendancy. Chachi BT would come out, I
think on Thanksgiving of 2022, and 3.5
Turbo came out. And I think like we kind
of realized very quickly that all the
pieces were clicking now, right?
We have what we need at our fingertips.
Now to go from something that was listen
and repeat where the user would see
something on screen, hear a reference of
the teacher saying the thing and then
they would just repeat the thing, right?
It was like very simple. Still a great
product by the way, you know, still grew
to like several million er in South
Korea. Um, so clearly there was like a
big market need for that.
Pre-Whisper.
Yes.
Wow.
This is from like 2019 through 2022.
Yeah, that's that's the grind. You need
you needed to hang in there.
Yeah.
And again, I think that if we there were
many moments when things weren't working
from 2017 through 2019. We were looking
in the mirror and we were like, why are
we doing this? This is this is crazy.
But I think we were so convinced about
the vision. We just like couldn't
believe that the vision would not
come true. So, we stuck with it. So,
fast forward to 2022, the pieces started
coming together. We realized that we
could start building something that felt
more like a language tutor that could
give you feedback that could start
explaining to you why you did something
wrong. And that was act two of speak
true English tutor.
This is something that a lot of founders
struggle with today. It's like I'm kind
of building something hoping that the
models get better later.
Yeah.
How did you feel once the models got
better? Did you feel like okay I am
ahead of the curve because I built all
this history of building product and
like doing all this work or did you
almost feel like okay we spent all this
money and time building these models and
now we're just going to use wher? It was
purely positive for us. We still kept
using our custom ASR system because it
was streaming real time really fast
really well fine-tuned. Whisper wasn't
streaming so it was a different use
case. We used it for the more
spontaneous stuff and I think in almost
every way we were just really excited
because pretty directly as the frontier
of model intelligence improved it would
just unlock things on our road map that
were locked before that makes sense. And
we still really operate in that mode
today where we take a model and then we
try to think about okay how do we
saturate model capability by building
product on top of it and then it happens
again right and then we build and
saturate the model capability again I
think that's a really cool paradigm to
like you know think about but all the LM
stuff basically allowed us to build a
tutor for English and we still didn't
have like real-time voice for example
right but the barriers are coming down
now obviously it's a really hot topic
We're actively building out a real-time
voice platform that we can build a lot
of more verticalized specific lesson
experiences on top of that. I'm super
super excited about. I don't think
they're going to replace our current
lessons. They're going to be more
immersive, just a different thing
probably for more advanced learners.
Still language learning though, not
broadening out for language.
Yeah. So, I think that language learning
is interesting because it is so
universal. 99% of people, you know, have
certainly tried to learn a language and
it's so hard, right? Becoming fluent
just has a huge failure rate and it's
something people are willing to pay for.
So, I think that has been just like a
pretty amazing beach head for us and I
think we'll be doing language learning
for a long time. There's a huge huge
huge company to be built here. But our
even longerterm ambition is really this
idea that even beyond language, we think
AI will reinvent how people learn
anything, right? It already has for me,
right? I use chat GPT to learn things
every 10 minutes. And I think I'm just
naturally like a very curious person. So
whenever I'm thinking about something, I
want to know more about it. And then
I'll naturally go to TAGPT and then I'll
learn about it. It's unlocked this like
entirely new dimension of learning and
I'm spending way more time learning as
well as an adult, which is really cool.
And I want to bring that in a more sort
of structured systematic way to
everyone. So I think that that's like
the vision beyond language.
I'm curious to sort of double click on
to just the tech side.
We talked a little bit about the the
content that you that you own and
develop in house and we talked a little
bit about the onboarding memory. I
assume that you have conversational
memory as you as you go right as and any
other major pieces of the puzzle that
really unlocked it for you. So there's a
few things I can talk about. I think one
thing is in order to go from teaching
English to teaching a bunch more
languages, we needed to really figure
out more direct AI content generation.
That was a pretty right because it's
hard to scale like our little studio in
LA where we shoot a lot of the video
lessons. Um, all of the scripts were
written manually before by our content
team, but we want like 100x more
content, right? and 10x more languages
eventually 100x more language pairs
which is how we think about it. It's
like what's your native language and
then what language are you learning and
really the only way to do that is to
make it more AI generated and you know
very much like a AI native company we
want to be on a frontier here we want to
keep a small team and to have as much
leverage as possible through these types
of tools so that's a big active area
where we're building out I think using
you know people overuse the word agent
but we have a tutor agent we have a
curriculum writing agent we of a giant
LM based pipeline that creates
curriculum, scaffolds it in the right
way, writes the lessons themselves.
That's a big active area that will
basically help us to scale to a lot more
markets and a lot more languages. So
that's like one big thing. Another big
thing is we care a lot about fluency.
Obviously, specifically, we want to be
able to quantify how fluent you are. So
if you're learning Spanish, it's like,
okay, what what does it mean to be
fluent? Right? And
there's a real world test for that.
We care about real world fluency. Your
ability to go to Mexico City and go to a
street taco stand and actually order,
right? That's very functional fluency in
one aspect. You might be really good at
that, but be completely unable to like
talk about your family, right? So the
frontier of fluency is very jagged, but
we're very pragmatic and we care a lot
about meeting user goals and helping
them become fluent at what they care
about. And we're thinking a lot about,
okay, how do you quantify that? How do
you actually store a knowledge graph of
everything you know about Spanish in
terms of the vocabulary you know or you
don't know? this you know the patterns
that you know or you don't know the
mistakes you made using speak over the
last month that are clustered.
You said the magic word of knowledge
graphs uh is that live is is that
experimental?
There are aspects of it that are live
and it's it's a very sort of
multi-dimensional system where we think
of it as there are many aspects of
fluency, right? There's many subc scores
and we have a few of them that are
currently live and we're actively
developing other aspects of it and then
all of those will fold up into a more
holistic fluency score. The idea is that
eventually once we have a complete
enough picture, everything will fold up
into a number that we call the speak
score that is a very sort of holistic
measure of just like how good are you at
Spanish, right? And obviously 54 is kind
of meaningless by itself, but it does
give you a general sense, right? Like
being at 54 versus being at five is very
different, right? And I think everyone
can kind of like intuitively understand
that.
And surprising like I would have
grounded it more in real world like we
will get you to pass this exam that is a
standard that is like the ESL standard
or whatever. So the way that we think
about that is we don't really teach for
the test. I think it's possible in the
future that we'll do a test prep
product, but in general, we care about
real world proficiency in various
functional situations. So the way that
we think about it is if you're at this
level, then these are the things you can
do, right? So it is exactly that. We
have that a lot in Italy. I grew up in
Italy, so English is my second language.
And there's a lot of people that pass a
lot of tests and like get high grades in
all the classes and then they travel to
the US and the UK and it's like hard to
speak because they don't you know I feel
like the the hard part is like being in
the conversation you know it's I think
like when I started my Brit and reading
was like much higher than like
conversation which like doesn't really
help you if you're like traveling
somewhere.
That's me for Chinese because my my
parents spoke mano to me growing up
so I can understand like a non-trivial
amount but I I'm very bad at speaking. M
I heard there's a good language learning
product.
I have one question on the course
generation.
Yeah.
How do you eval that product? Like when
you're asking the AI to generate
courses, how do you figure out the
courses are going to be good?
Rely very heavily on our content team
and we are trying to build out an email
suite. It's really hard.
Right. There's the illustrative example
here is that as we try to hire and train
new content writers on our content team,
it's so nuanced. There's many different
aspects of training them in the speak
method and how to write the right types
of lessons and articulating
why this form of lesson which is subtly
different from this other form of lesson
is better. Right? So we try as hard as
we can to articulate that. So I think
like forming like a sense of eval using
model graded evals like that. That's one
piece of it. And I also think like in
the future a really good curriculum or
lesson writer agent will probably be
like reinforcement fine-tuned on a lot
of our internal data as well.
That's something we're experimenting
with but it's still pretty early. This
seems like a great example of like, you
know, AI removing jobs, which is like,
oh, you're creating the courses with AI,
you don't have a person, but it's
actually like instead of one person
creating two courses, like reviewing 50
courses that AI generates. That's kind
of how you're seeing the content team.
The way that we see it really, not just
for for our content team members, but
also I think it's perfectly applicable
to engineering, is that it's it's
leverage. It just allows you to do 100x
in the same amount of time. We still
need human review of the syllabus, the
curriculum, the specific lines, etc. But
the hope is that this will allow us to
launch 100x more courses. A lot of um
language is colloquial. I think the the
way that you put it on one of our
episodes one time was the Italian that
is taught in school is not the Italian
speak.
Yeah.
How much of that do you adjust for
informal versus formal
entirely? That's one of our fundamental
tenets which is that we don't teach
textbook English or textbook language.
Like we try very hard to
teach Jenz slang.
We don't go quite that far but we we try
to teach very
casual conversational language that is
actually what real people use. And like
you said that's usually very very
different. Like if you pick up like a
typical English textbook in Korea, it's
all really traditional and weird
formulations and it's not how people
actually speak.
Yeah,
I know you're going to release Italian
soon, so I can give you a hand on that.
I know in the US there's not that many
dialects. There's like accents, but like
most of the language like the words that
people use are similar because I know
for example, Spanish is like, you know,
Spanish spoken in Argentina is like very
different than Spanish spoken in Mexico.
How do you kind of adjust for that? Or
maybe you don't. But
so I would say that for example,
currently we teach American English,
standard American English. We don't
really teach other accents or other
dialects. For now, given how small we
are, we just have to be pragmatic and
teach in the direction that most people
want and most of our users know. So
we've made those decisions like on the
content team side for American, Spanish,
every language that we're teaching. But
I do expect that in the future we're
going to get a lot more sharply
differentiated. Like if you want to
learn British English, then we'll teach
you British English. We'll teach you how
to pronounce it, etc. I think all of
that feels like something that
superhuman language, you know, tutor
should be able to do.
I just think it'd be very funny if all
the Koreans had like a very distinct
southern accent.
It'd be it'd be great. Make that happen.
Yeah,
I do think about this because um you
know obviously there's a moving of the
goalpost like now that we have this now
we want the next thing
and obviously people who are English as
a second language always have an accent
like I haven't like a lot of people
think I don't have an accent but if you
know any Singaporeans you know I'm I'm
Singaporean how much accent training is
important right like I think like
actually that does help a lot
with for people and you cannot tokenize
accents yet
yes that's right so I have two main
thoughts on this. I think the first one
is that communication and your ability
to speak spontaneously and get a concept
across an idea across is almost fully
orthogonal to pronunciation. You can be
really bad at pronunciation but still
communicate effectively. So, a lot of
the current core product experience is
about just speak as much as possible,
make mistakes, don't worry about
screwing something up on the accent or
the pronunciation side. The important
thing is that you literally move your
mouth and you make the sounds, right?
And it turns out there's like a really
key psychological barrier there where
people are just not willing to do this
in front of a human, even if it's a
human that is a teacher that you're
paying, right? So a lot of the core
message of our marketing campaigns in
many of our like biggest markets is
along the lines of like you can make
mistakes in this private space with
speak and I think psychologically that's
extremely powerful and then you can go
and get it right more confidently in the
real world after you practice with
speak. Now having said that people do
care about their pronunciation and their
accent, right? So we we have for English
only right now a pronunciation coach
that is basically like a fine-tuned
version of wave to which is a meta model
but we basically fine-tune it on a bunch
of our own phonetic transcripts like
fine-tuned data. It works pretty well.
It's currently for single words. We're
gonna expand it to full sentences, to
more languages, etc. But I think that
just if you look at like the pure market
opportunity, our sense is that we really
want to push people to just speak very
freely as much as possible, you know,
just get that volume up.
Yeah. Yeah. In terms of immersing
language learning in the real world, one
of the more interesting approaches that
people keep trying is to have, let's
say, like a Chrome extension or
something on top of a page. I think
Toucan was doing this.
There's a bunch of those. Yeah.
Yeah. And then there was another one I
saw recently which is like watch a
YouTube video and it'll transcribe for
you but randomly mask out.
I saw that too. Yeah. Yeah.
That was like a show hacker news.
Yeah.
Do those work? there's kind of the
question of is you know is that the
right product right I don't think
so basically the difference is your
content or real world content right
obviously you want real world content
I think that for work right so for speak
for business for the for the B2B product
another part of the vision is really
like what should a superhuman language
shooter be able to do it should probably
be able to handle kids as well as a
Samsung employee that wants to transfer
to the US office and wants to use for
work, right? So our view there is that
it's the same product. It's a different
distribution mechanism, right? Consumer
versus B2B. And I think that we will
eventually build something like a Mac
app. Maybe it'll be integrated with the
browser in some way. We're not really
sure yet, but obviously in order to
apply it to your day-to-day, there needs
to be some way to hook into your actual
sort of work documents, whatever. That's
a whole can of worms.
um we are actively thinking about it but
I think my sense is that it's not clear
to me that any of these products have
really taken off and I think that
there's many other approaches that are
possible. I don't have the answer but
like another example very hypothetical
future world is maybe open AI you know
the the new Johnny I thing will come out
with some hardware that we'll be
listening to you all day and then we can
you know give you some sort of like very
deep analysis that is integrated with
the speak app at the end of the day or
like you know the end of the week
whatever I don't know
okay one more time since you brought
that up I'm sure you don't I haven't
told you anything but what I don't know
anything
what's it going to be
I don't know anything
it's like a very like the number one
topic in all the parties I go to now.
Really?
Yeah.
What's the most compelling idea you've
heard?
Okay. So, there's there's people that
say Joanie hates wearables.
Yeah, I've heard that, too.
I'm like, if it's not a wearable, then
you just made a second phone, and in
that case, just make a phone.
Yeah.
I thought they said it was I mean,
didn't Sam say that it was he wanted to
do a phone in the past?
That was in the far that was like in the
far past.
He says a lot of things.
He'll say a lot of things.
Yes.
Okay. Anyway, I think wearable makes
sense. I think the the the race is to
capture context.
I mean, I I have a wearable on.
Yeah, I I have a we have a wearable
here, too. Transcribe everything
that transcribes everything.
Yeah, that's cool.
Yeah, it's a previous episode of ours
with um I can hook you up if you want.
But yeah, I think like it's something
that a lot of people are interested
obviously cuz it's a huge bet by them
and uh yeah, it's curious. Okay, you
mentioned video. I just wanted to double
click on that a little bit. I'm sure
engagement very high for video because
people love to watch video. I thought
that speak would be one of those places
where like you just kind of leave it in
your pocket. You walk, you take take a
walk, learn to speak. Probably that's
not true. What we've done so far is part
of the course experience is a teacher
video. We've tested other more audio
forward types as well. We found that of
course like you said video is very
engaging but at the same time we have a
lot of users that do want to be able to
walk around with the phone locked in
their pocket. So doing something that is
more like voice mode with optional uh
you know visuals I think is really good.
I think there's huge opportunity for a
better way to learn things like
listening comprehension. So, I took
German in grad school for two years, and
I thought I was getting somewhere, but
anytime I listen to a native German
speaker, it's so fast. It's It's
completely on a different level.
And I think you can imagine a plethora
of really cool experiences that feel
kind of like you're listening to a
podcast, but it's all AI generated. It's
fully controllable. It's integrated with
the app. You know, there's something
there for sure. Yeah. don't want to do
AI podcast, man. We're cooked.
It's okay. We we'll document the the own
ending. I I I mean, I think when that
happens, we just end the show. Like, why
not? Like,
to zoom out a little bit, in the pretty
near future, multimodal models will
cross the threshold where they will be
able to generate images a lot faster
than they currently are, maybe somewhat
close to real time even, right? and
audio at the same time, text at the same
time. And you can imagine just like a
very powerful multimodal tutor that can
kind of do it all at once where there's
an audio track and then if the teacher
is teaching you something with the right
timing, it chooses, okay, at this point
I'm about to introduce a new concept, so
I'm going to show the word on screen so
that the user can see how it's spelled,
right? There's a lot there. or you can
do generative UI. A lot of nuance there
where it's easy to do it badly, but to
do it well requires a fair amount of
reasoning and mental modeling of what
the user knows.
Yeah.
Which feeds into what you need to show
at what time. So that's probably going
to have to be like a pretty parallel set
of systems. Have you spent any time
looking at this like, you know, like V3
where you do video plus audio at the
same time on how you can tweak the audio
part versus the video part? Because I
can imagine you might work on a video
part and then you want to change the
audio generation model. I don't actually
know how the model works inside on like
how much you can.
We haven't really looked at the video
stuff much. We basically think that
we're very bandwidth constrainted,
right? So we're just scaling and trying
to hire as fast as possible like
everyone else is. And as a result, we're
really focusing on just like the most in
reach highest impact things. I do think
that the barriers are coming down very
fast for all of this sort of stuff. I'm
just so excited about multimodality and
where things are going here because
imagine if you're learning Spanish being
able to look at an image that the model
generates for you and then doing Q&A on
it, right? Like a beach scene and then
the model will ask you like how many
people are running on the beach and then
you have to sort of respond in the
target language that you're learning.
very traditional language learning
exercise, but you can imagine it being
fully generative, which is really cool.
Awesome. Lots of stuff like that. The
engineer in me worries about inference
costs, but I think you can just kind of
sweep that under the rug for a while.
See if it works first and then you can
worry about cost.
Yes.
You mentioned realtime voice platform.
Uh I just want to give you the platform
to platform to talk more about that just
like uh you mentioned for example that
you're a very heavy user of the
real-time API from OpenAI and you build
a bunch of tooling around it. Yeah. So,
we last year had early access to the
real-time API and there's a very obvious
sort of use case for language learning.
I think one common theme that has just
been pretty awesome since LMS came out
is that language learning as an
application is just a really good fit
for LMS, all these model types in almost
every way, which has been just really
great for speak specifically for real
time. I think the audio piece promises
to really like infuse almost every
surface in the app. You can imagine this
is the primary way that you talk to your
tutor, right? And an additional
complication is that it needs to be
multilingual and there needs to be code
switching. So that's a pretty frontier
problem right now, right? So like I
should be able if I'm learning Spanish
to speak both English and Spanish and
vice versa from the model. That's a
pretty hard TTS problem today. It's
actually like only a few models are able
to speak two languages in the same
sentence
and then and then pronounce them
properly. Sorry.
You can have a router model like a tiny
little router model guess which language
first and then route.
Well, the problem is that there's you
could have a subword in in a single
sentence
in a different language.
Yeah.
So, you can't just concatenate either.
Yeah.
Because it won't sound right.
Right. It won't sound natural. That's
not how humans do it. So this seems to
be like a like a very native
controllable audio, you know, function.
Yeah. But we are in the process of
building a variety of experiences on top
of the real-time API. I want to clarify
that actually nothing is in production
yet, mostly for price reasons. Frankly,
the pricing model of the real-time API
makes more sense for something like a
customer support agent where you're very
directly replacing somebody that you
would pay hourly otherwise. And that's
how you're seeing the pricing model for
a lot of these initial agents work out
for us. We want our users to be able to
do these real-time role plays and have
these conversations for many hours a
day, right? If they want. Getting cost
under control is definitely a pretty key
consideration right now. But we are
pretty close. Maybe actually even by the
time that this episode is released,
we'll have something live. But we have a
what I think is a really cool
application of the real-time API which
is basically a new instructional lesson
where it's the model actually teaching
you something like a new language
concept and it's intended to sort of
augment slash play the same role as our
current video lessons which are the
instructional lesson type and it's
interactive obviously at certain points
in the 3 to 5 minute lesson you're
interacting with the real-time API it's
semi onguard rails
There was a lot of scaffolding we needed
to build to basically number one switch
between the interactive and
non-interactive portions of this lesson
properly if that makes sense. Right
there there's there's some portions
where you're just listening or looking
and then some portions where you're
actively in a short conversation and we
kind of swap back and forth and we have
like a bunch of sort of custom
architecture and info around that. And
then there's also making the cost make
sense or at least like semi make sense.
And then there's a bunch of WebRTC
infrastructure. We're at sort of, you
know, not huge but non-trivial scale
either. So we definitely just it'll cost
us millions of dollars if we do
something wrong.
Yeah. Yeah.
Do you do inference in Korea because of
the you know latency and all that? It's
something that we have been increasingly
paying attention to for all the real
time paths. Like I would say two or
three years ago when real time stuff was
still quite nent users didn't really
care as much but I think now the
standards have risen right like latency
has to be low everyone cares. Do you
have a hard latency budget for responses
or do you just kind of work it out? So
for example, right? Like you have a
knowledge graph that you're accessing.
you have content that you're retrieving.
There's a lot of stuff there and then
like I you know maybe you're using a
reasoning model probably not but like
that all eats into the budget. I will
say that from the like real-time
engineering side everyone talks about
okay submit user request to get agent
audio response like first bytes right
first audio bytes what's that latency
and then we try to get that as low as
possible. I would argue that's actually
like a vanity metric because what you
don't take into account is how the VAD
works. How do you do turn detection to
detect when the user is finished
speaking, right? Because that can easily
add like another second if you do it
badly and nobody talks about that for
some reason, right? Like what you need
to measure is actually when is the user
stop talking to when does the model
first audio come and usually that number
is much larger. That is a very domain
specific problem. You can use like the
semantic VA on real time API for regular
English conversation and that will
basically classify at every token how
likely it is that you're done speaking
as a sort of normal conversational
English speaker like in this
conversation that's fine but it doesn't
work at all for language learners right
if I am trying to respond in a language
that I'm learning I'm going to be
hesitating halfway through
for 10 seconds for more right so it
needs to be fully custom probably this
is something that we're also actively
working on but that is actually like the
dominating factor in perceived latency
coding do you use cursor windsurf um
other autonomous agents
it's kind of all of the above
so I think like as the CTO I view it as
part of my responsibility to really set
expectations push everyone on the team
show them what's possible
We've been trying everything.
Yeah.
And I think we tried to basically set
the expectation that the frontier is
moving so fast, it's deeply
non-intuitive.
Mhm.
If you've tried coding tools 6 months
ago and they weren't that great,
especially if it's not Typescript or
Python,
right?
I'd say malaps are the most popular
languages. Like that's all it is. We try
to set a culture in the engineering team
where usage of these tools as much as
possible and as a default path is the
expectation. And in hiring we are now
explicitly asking about this a lot.
Thinking about what are the types of
people that are going to be better
higher agency at trying these types of
tools. It's so important. Before we zoom
out, anything we missed about speak that
you really want to highlight or
something that people underrate about
it? One thing that I've always been
really excited about is that I feel like
a lot of the foundational pieces that
we're building around knowledge graph,
for example, a lot of these concepts
should be applicable to not just
learning language, but also other things
in the future. We're already starting to
see the very beginnings of this on the
B2B side where a lot of it is more like
management skills and hospitality
skills, communication skills, more like
true L & D for enterprise, less like
core pure English proficiency. So I
think you, you know, that's like
obviously immediate neighborhood, but
you can imagine many academic subjects,
math, biology, etc., you know, schools
to work for. Um, super excited about
that. If I knew my employer was giving
me a language tool, but then he was
evaluating me on my management skills
while learning the language,
I might use it less just, you know, you
know, you want to you want to you want
to separate that out.
Yeah, very fair.
I agree overall that the knowledge graph
problem is very important. We have a
whole track on it for for the conference
and I think that the amount of data can
be so high and actually like you want to
you want to generate relevant triplets.
I I assume you use the normal subject,
predicate, object type.
It's a bit more custom than that because
it's a bit more domain specific around
the way that we conceptualize the
vocabulary, you know, and the sentence
patterns and so on. So, it's it's more
specifically around like language
learning concepts, if you will.
But what I think we can extract from
speak or as it is generalized as a
framework is um what I've been calling
sort of like the bloom two segment
problem type thing like the level
adjusting tutor like where are you at?
let me adjust my thing to where you're
at and then I'll push you up to the next
level. And I think the knowledge graph
is is a part of it, but I don't know if
that that's all of it. I've never seen a
working example. We are approaching that
problem from a few different angles. I
think part of it is knowledge graph.
Part of it is being very careful in how
we structure the curriculum so that
you're placed at the right level so that
the learning path itself which has a
foundational backbone because beginner
to intermediate English learners
actually like all need to know a bunch
of similar concepts. It isn't really
until you get intermediate and more
advanced where that starts to like more
sharply diverge and a z through b1. I
would say there's a pretty well-
definfined like sort of linear path.
Actually, a lot of the deep thinking
that we've done around how do we
structure the pedagogy is also super
useful in terms of just like matching
people to the right level and then you
can take this backbone and then
basically modify it based on the
knowledge graph on your systems
knowledge of what the user is like bad
at versus good at. I think a lot of
startups or especially edtech like that
is the core engine like you know once
you do that you can kind of teach
anything.
Totally. Yeah.
We have a few more broader fun
questions. Yeah.
Um so speak.com domain.
I looked it up. Voice.com got bought for
30 million
in 2019.
When?
2019.
Okay.
So I don't know if you want to share how
much you paid for it but it was a lot
less. I figure it would be a lot less
but I'm curious. Uh my estimate was
100K but
it was it was more than that.
More than that. Wow. Okay. I'm not going
to say anymore about the numbers.
So what's the Yeah. What was the sort?
Was it easy? Was it did you use a broker
like we had, you know, Dash from Hotspot
who sold chat.com to OpenAI and he has a
lot of very
That was like 100 million deal or
something right?
That was that was very big.
Oh, wait. No, that was AI.com. What?
Uh Chad.com.
Oh, Chad.com. Okay.
Yeah.
We bought it several years ago. It felt
very expensive for us at the time. It
was a little bit of a crazy move, but I
think we were very convinced that we
needed a super strong consumer brand
that was scalable globally. And that was
just always our ambition. Like we want
to be the way the next billion people
learn languages and we need speak.com.
So we don't regret it. It's it's such a
such a great word. Makes for great swag.
Very nice decision. You had a couple
other fun fun questions. any fun Korean
celebrity stories because you work with
so many influencers.
We have a bunch baking right now, but I
think it was, you know, some something
more generally that has just been so fun
on the journey. So, we we would visit
soul every year.
Yeah.
And seeing speak go from nothing to the
first time we saw somebody on the street
using Speak
to now our main teacher in the app is
like a mini celebrity. people come up to
her on the street as she's just walking
around Soul and recognize her from the
app, which is really cool. Now, we do a
lot of advertising. We do billboards, TV
commercials, we work with big
influencers and so on. So, just like
seeing the scale of that has me kind of
like in awe. It's like really cool. Um,
just to see something that used to be
nothing.
I wanted you to name drop like Blackpink
or I don't know.
Look, there there's there's some stuff
baking right now.
Yeah. Okay. All right. We talked about
the teal fellowship on your LinkedIn.
You kind of have this hole between 2012
and 2016 which you talked about you did
some startups. Any of them that you want
to share like ideas that you worked on
that you thought were
maybe it was just early but
yeah what you should revisit.
I've always been interested in learning
and education. One of the other field
startups that I did in that time um was
it feels silly to even talk about this
because amounted to nothing but it was
called Bloom you know the Bloom 2 sigma
problem. It was actually like named
after that and we were trying to to
basically build like a better adult
learning platform and have really cool
interactive JavaScript widgets for
various concepts that you could learn.
didn't find PMF. I was young and didn't
really know anything about business at
the time either. But I think that the
common thread through actually like
everything that I've been interested in
since leaving grad school has been how
do we build software, build tools that
help people learn things more
effectively and better and faster. And
now I feel very lucky to be in this
position because obviously AI is the
ultimate version of that, right? And
it's been completely transformative for
me personally because I just get a lot
of just inherent fun and pleasure out of
being able to like think of a concept
and then oh now I can talk to this
omnisient LLM that can tell me more
about it and I'm really good at asking
the right follow-up questions that I
want to know. Um so that that's been
completely transformative for me. Do you
get a lot of like people using speak for
therapy like you know because it's not
meant to be that but since you have
inference they will use it.
In 2023 when we first launched our AI
role plays using GPD4 back then people
were way more concerned about safety,
right? And obviously the models now are
much better at like refusals and the
line is sharper between what's
appropriate and not. But we did see a
lot of our first users start to put in
pretty questionable custom scenarios.
Um, you probably guessed
and you know like this was something we
expected but I think seeing
the logs in person is like very
different.
Got it. Um,
some shocking stuff in there.
Last couple questions. One on Andre. You
talked to him in your machine learning
journey. Um, yeah.
He's also working on edtech now. Uh, I
don't know if you've ever had
conversations with him. No, I haven't.
He's also interested in language
learning by the way. You know, one thing
that I think we didn't really realize
early on or like fully internalize at
least was just like how deep the market
is.
Say more.
It was so universal where we really
struggled to do some of the basic
startup stuff around define your like
ideal customer profile and and like you
know segment your users because our
users were everyone like we we had
parents using it with their kids. We had
really old people using it. We had
people using it for work. So that was
kind of like mindboggling.
You still did customer segmentation or
are you saying it doesn't matter?
I'm saying it was hard to do. Like we
tried and we have a sweet spot in Korea.
It's like 25 to 45 more professional
more white collar but it's very it's
like like a very long tail on either
side. Yeah. I think it's you know it's a
huge market and I think it's a very
special moment in time right now where
it's obvious that a lot of the tech is
here. I think it's really good for
humanity if we make a lot of progress
here. So, I'm really excited for his
company too.
We started asking about the teal
fellowship. So, maybe we can wrap with
one of Theo's favorite questions, which
is, uh, what's something you believe in
today that most people would not agree
with you on?
I think that
people, if you recall, expected the
world to kind of explode when GPT4 came
out. And, you know, like everything
would change. And I think if you like go
to another state outside of the Bay
Area, probably even in California,
outside of the Bay Area, and and you ask
somebody how much their life has
materially changed, it's like pretty
close to zero. Real world inertia is
enormous. Obviously, AI is probably the
most transformative technology we've
ever built, but I think in a very real
sense, the world hasn't changed that
much either. And that's a really weird
thing, right? So I think we need more
builders. We need more people building
applications. It's weird to me that
speak is actually like not that many net
new consumer AI native applications at
scale. Like there should be way more. I
would love for there to be way more.
Consumer is hard.
Yeah. I'm intimidated but like you know
it was just like there was never any
alternative for us.
Yeah. Like I said,
you didn't have a choice,
but also you're very smart. But also
maybe you have some growth hack things
that you can advise people on that like
that people could could learn. But yeah,
I agree. I I I think like the general
take actually is this is what we want,
which is slow takeoff, short timeline.
That's fair, right? This is the 2x2 that
everyone always talks about in AI
safety. You're seeing slow takeoff and
like maybe don't complain. Like we we
have a heads up because or you know
Dario's right and like half of us lose
our jobs in the next two years. Yeah.
It's a It's It's so hard to predict.
Yeah.
Sometimes I get AI anxiety
and then I just
You get anxiety.
Yeah.
Okay.
And I just focus on our users.
That's a perfect place to wrap. Thank
you so much for taking the time.
Yeah. Thank you both so much. This is
great.
[Music]
Loading video analysis...