Snipd: The AI Podcast App for Learning — with CEO Kevin Ben-Smith
By Latent Space
Summary
## Key takeaways - **Snips: AI for podcast learning**: Snips is an AI-powered podcast app designed for users who listen to podcasts specifically to learn new information, aiming to provide a more effective spoken audio platform for knowledge acquisition. (05:03, 30:47) - **From social clips to knowledge capture**: Initially envisioned as a social platform for sharing short podcast clips ('Snips'), user behavior revealed a strong preference for listening to full episodes and actively capturing knowledge, leading Snips to pivot towards enhancing learning and knowledge retention. (06:32, 07:57) - **AI transforms podcast listening**: Snips leverages AI for features like transcription, speaker diarization, chapter generation, and a chat interface to interact with episodes, moving beyond the traditional 'repurposed music player' model of podcast apps. (20:55, 56:38) - **Personalized AI prompts are key**: The future of consumer AI apps lies in moving beyond chat interfaces to personalized, invisible AI that integrates seamlessly into user habits, allowing for tailored experiences like custom summarization prompts. (42:26, 43:47) - **LLMs as judges for quality control**: To handle the uncertainty of LLMs, Snips uses a 'judge' LLM to select the best output from multiple candidates generated by a cheaper LLM, a technique applied to features like book recommendations and quote extraction. (48:48, 50:10) - **Voice interface for deeper learning**: Voice AI is seen as a critical interface for Snips to hook into existing podcast listening habits, enabling natural, in-flow conversations that enhance knowledge retention and application beyond simple consumption. (01:00:28, 01:04:26)
Topics Covered
- Your users will tell you what your product is.
- AI will become invisible, just like electricity.
- Production AI relies on regexes, not just LLMs.
- You will soon talk to algorithms to shape your feed.
- Voice AI will turn passive listening into active learning.
Full Transcript
[Music]
hey I'm here in New York with Kevin Ben
Smith of snips welcome hi hi amazing to
be here yeah this is our first ever I
think Outdoors uh podcast recording well
it's quite a location for the first time
I say I was actually unsure because you
know it's cold it's like I check the
temperature is like kind of 1 1° Celsius
but it's not that bad with the sun no
it's quite nice yeah especially with our
beautiful tea with the tea yeah perfect
we're gonna talk about Snips uh I'm a
Snips user I had to basically you know
apart from Twitter it's like the the
number one use app on my phone nice um
when I when I wake up in the morning I I
open Snips and I you know see what's
what's new and I I think in terms of
time spent or usage on my phone like
it's it's it's I think it's number one
or number two nice nice so so I I really
had to talk about it also because I
think like people interested in AI want
to think about like how can they and
we're an AI podcast we have to talk
about aast out um but before we get
there we just finished the a engineer
Summit and you came for the the two days
how was it uh it was quite incredible I
mean for me the most valuable was just
being in the same room with like-minded
people who are building the future and
who are seeing the future you know
especially when it comes to AI agents
it's so often I have conversations with
friends who are not in the AI world and
it's like so quickly it happens that you
it sounds like you're you're talking in
science fiction
and uh it's just crazy talk it was you
know it's so refreshing to to talk with
so many other people who already see
these things and um yeah Be Inspired
then by them and not always feel like
like okay I think I'm just crazy and
like this will never happen it really is
happening and uh for me it was very
valuable so day two more Rel more
relevant for you than day one yeah day
two so day two was the engineering track
uh that was definitely the most valuable
for me like also as a ition and myself
especially there were one or two talks
that had to do with voice Ai and AI
agents with voice okay so that was uh
quite fascinating also spoke with the
speakers afterwards yeah and yeah they
were also very open and and you know
this this sharing attitude that's uh I
think in general quite prevalent in the
AI Community I also learned a lot like
really practical things that I can now
take away with me yeah I mean on my side
I I think I watched only like half of
the talks as I was running around and I
think people saw me like towards the end
I was kind of collapsing I was on the
floor like uh towards the end because I
I needed to get get AR rest but yeah I'm
excited to watch The Voice say I talks
myself yeah yeah do that and I mean from
my side thanks a lot for organizing this
conference for bringing everyone
together do you have anything like this
in
Switzerland the short answer is no um I
mean I have to say the AI community in
especially ZK where where we're based
yeah it is quite good and it's uh
growing uh especially driven by eth the
the Technical University there and all
of the big companies they have ai teams
there Google like Google has the biggest
Tech Hub outside of the US in Zurich
yeah Facebook is doing a lot in reality
Labs apple has a secret AI team open Ai
and and swap pic just announced that
they're coming to Zurich yeah um so
there's a lot happening yeah so yeah uh
I think the most recent notable move I
think the entire Vision team from Google
uh Lucas buyer um and and all the other
authors of sigp left Google to join open
AI which I thought was like it's like a
big move for a whole team to move all at
once Al at same time so I've been to
Zurich and it just feels expensive like
it's a great City yeah great University
but I don't see it as like a business
Hub is it a business Hub I guess it is
right like it's kind of well
historically it's uh it's a finance Hub
Finance Hub yeah I mean there are some
some large Banks there right especially
UBS uh the the largest wealth manager in
the world but it's really becoming more
of a tech Hub now with all of the big uh
tech companies there I guess yeah and
but and research Wise It's all eth and
there's some other things yeah yeah yeah
it's all driven by eth and then uh it's
sister University epfl which is in loan
okay um which they're also doing a lot
but it's it's it's really eth and
otherwise no I mean it's a beautiful
really beautiful city I can recommend to
anyone to come visit zerich uh let me
know happy to show you around and of
course you know you have the nature so
close you have the mountains so close
you have so so beautiful lakes um I
think that's what makes it such a
livable City yeah um and the cost is not
is not cheap but I mean we're in New
York City right now and uh I don't know
I paid $8 for a coffee this morning so
uh the coffee is cheaper in Z than the
New York city so okay let's talk about
snip what is Snip and you know will'll
talk about your origin story but I just
let's let's get a crisp what is snipped
yeah I always see two definitions of
snipped so I'll give you one really
simple straightforward one and then a
second more nuanced um which I think
will be valuable for the rest of our
conversation so the most simple one is
just to say look we're an AI powered
podcast app so if you listen to podcasts
we're now providing this AI enhanced
experience but if you look at the more
nuanced uh perspective it's actually we
we've have
a very big focus on people who like your
audience who listen to podcasts to learn
something new like your audience you
want they want to learn about AI what's
happening what's what what's the latest
research what's going on and we want to
provide a a spoken audio platform where
you can do that most effectively and AI
is basically the way that we can achieve
that yeah means so an end yeah exactly
when you started was it always meant to
be AI or was it was it more about the
social Shar
so the first version that we ever
released was like 3 and a half years ago
okay yeah so this was before chat GPT
whisper yeah before whisper yeah so I
think a lot of the features that we now
have in the app they weren't really
possible yet back then but we already
from the beginning we always had the
focus on knowledge that's the reason why
you know we and our team why we listen
to podcasts but we did have a bit of a
different approach like the idea in the
very beginning was so the name is
snipped and you can create these what we
call
Snips uh which is basically a small
snippet like a clip from a from a
podcast um and we did Envision sort of
like a like a social Tik Tok platform
where some people would listen to full
episodes and they would snip certain
like the best parts of it and they would
post that in a feed and other users
would consume this feed of snips um and
use that as a discovery tool or just as
a means to an end and so you would have
both people who create Snips and people
who listen to Snips so our big
hypothesis in the beginning was you know
it will be easy to get people to listen
to these Snips but super difficult to
actually get them to create them so we
focused a lot of uh lot of our effort on
making it as seamless and easiest
possible to create a snip yeah it's
similar to Tik Tok you you need cap cut
for for there to be videos on Tik Tok
exactly exactly and so for for snip
basically whenever you hear an amazing
Insight a great moment you can just
triple tap your headphones and our AI
actually then saves the moment that you
just uh listen to and summarizes it to
to create a note and this is then
basically a snip so yeah we built we
built all of this uh launched it and
what we found out was basically the
exact
opposite so we saw that people use the
Snips to discover podcast but they
really you know really love listening to
long form uh podcasts but they were
creating Snips like crazy
and this was this was definitely one of
these aha moments when we realized like
hey we should be really doubling down on
the knowledge of learning of yeah
helping you learn most effectively and
helping you capture the knowledge that
you listen to and actually do something
with it because this is in general you
know we we live in this world where
there's so much content and we consume
and consume and consume and it's so easy
to just at the end of the podcast you
just start listening to the next podcast
and 5 minutes later you've forgotten 90%
99% of what you've actually just learned
yeah yeah you don't know this but and
and most people don't know this but this
is my fourth podcast my third podcast
was a personal mixtape podcast where I
snipped manually sections of podcast
that I liked and added my own commentary
on top of them and published them in
small episodes nice so those would be
maybe 5 to 10 minute slips of something
that I thought was a good story or like
a good insight and then I added my own
commentary and published it as a
separate podcast it's cool is that still
live it's still live but it's not active
but you can go back and find it if
you're if if you're curious enough
you'll see it nice nice yeah you have to
show me later but it was so manual uh
because basically what my process would
be I hear something interesting I note
down the Tim stamp and I note down the
URL of the the podcast I used to use
overcast so it'll just link to the
overcast page and then put in my
notetaking app go home uh whenever I
feel like publishing I will take one of
those things and then download the MP3
clip out the MP3 and record my intro
outro and then publish it as a as a
podcast but now Snips I mean I can just
kind of double click or triple tap I
mean those are very similar stories to
what we hear from our users you know
it's it's normal that you're doing
you're doing something else while you're
listening to a podcast so a lot of our
users they're driving they're working
out walking their dog so in those
moments when you hear something amazing
it's difficult to just write them down
or you know you have to take out your
phone and some people take a screenshot
write down a time stamp and then later
on you have to go back and try to find
it again of course you can't find it
anymore there's no search there's no
command F and um these these were all of
the issues that that that we encountered
also ourselves as users and given that
our background was in AI we realized
like wait hey this is this should not be
the case like podcast apps today they're
still they're basically repurposed music
players but we actually look at podcast
as one of the largest sources of
knowledge in the world and once you have
that different angle of looking at it
together with everything that AI is now
enabling you realize like hey this is
not the way that we that podcast apps
should be yeah yeah agre uh you
mentioned something there you said your
background is in AI first of all who's
the team and what do you mean your
backgrounds in AI those are two very
different
questions um maybe starting with with my
backstory yeah my backstory actually
actually goes back like let's say 12
years ago or something like that I moved
to Zurich to study at eth and actually I
studied something completely different I
studied mathematics and economics
basically with this this specialization
for Quant Finance same okay wow all
right so yeah then as you know all of
these mathematical models for um asset
pricing derivative pricing quantitative
trading and for me the thing that that
fascinated me the most was the
mathematical modeling behind it uh
mathematics uh statistics but I was
never really that passionate about the
finance side of things really oh okay
yeah I mean okay we're we're different
there I mean one just let's say symptom
that I noticed now like like looking
back during that time I think I never
read an academic paper about the subject
in my free time and then it was towards
the end of my studies I was already
working for a big bang one of my best
friends he comes to me and says hey I
just took this course you have to you
have to do this you have to take this
lecture okay and I'm like what what what
is it about it's called machine learning
and I'm like what what what kind of
stupid name is
that uh so he sent me the slides and
like over a weekend I went through all
of the slides and I just I I just knew
like freaking hell like this is it I'm
I'm in love wow yeah okay and that was
then over the course of the next I think
like 12 months I just really got into it
started reading all about it like
reading blog post starting building my
own models was this course by a famous
person famous university was it like
and Cera thing or uh no so this was a
eth course Oh e uh so a professor at eth
did you teaching English by the way or
yeah yeah okay so these slides are
somewhere available yeah yeah definitely
I mean now they're quite outdated yeah
yeah sure sure sure well I think you
know reflecting on the finance thing for
a bit so I I was used to be a Trader uh
sside and buy side I was options Trader
first and then I I was more of like a
quantitative uh hedge fund uh analyst we
never really use machine learning it was
more like a little bit of statistical
model but really like you you fit you
know your
regression no I mean that's that's what
it is and or you you solve partial
differential equations and have then
numerical methods to to to solve these
that's that's for your degree that's
that's not really what you do at work
right unless what I don't know what you
do at work in my job no no we weren't
solving the P yeah you learn all this in
school and then you don't use it I mean
we we well let's put it like that um in
some things yeah I mean I did code
algorithms that would do it but but it
was basically like it was the the most
basic algorithms and then you just like
slightly improved them a little bit like
you just tweaked them here and there it
wasn't like starting from scratch like
oh here's this new partial differential
equation like how do we no um yeah yeah
I mean that's that's real life right
most most of it's kind of boring or
you're you're using established things
because they're established because uh
they tackle the most important topics um
yeah portfolio management was more
interesting for me um and uh we we were
sort of the first to combine like social
data with with quantitative trading and
I think I I think now it's very common
but um yeah then you you went went deep
on machine learning and then what you
quit your job yeah yeah I quit my job
because it um I mean I started using it
at at the bank as well like try like you
know I like desperately tried to find
any kind of excuse to like use it here
or or there but it just was clear to me
like no if I want to do this um like I
just have to like make a real cut so
equit my job and joined an early stage
uh Tech startup in Zurich where then
built up the AI team over 5 years wow
yeah we built various machine learning
uh things for for Banks from like models
for for sales teams to identify which
clients like which product to sell to
them and and with what reasons all the
way to we did a lot lot with bank
transactions one of the actually most
fun projects for me was we had an an NLP
model that would take the booking text
of a transaction like a credit card
transaction and pretty fired yeah
because they had all of these you know
like numbers in there and abbreviations
and whatnot and sometimes you look at
they like what what is this and it was
just you know it would just change it to
I don't know CVS yeah yeah but I mean
would you have hallucinations no no no
the way that everything was set up it
wasn't like it wasn't yet fully yeah end
to end generative uh newal Network as
what you would use today okay okay yeah
awesome and and then when did you go
like fulltime on snit Yeah so basically
that was that was afterwards I mean how
that started was the friend of mine who
got me into machine learning uh him and
I uh like he also got me interested into
startups he's had a big impact in my
life and the two of us would just uh jam
on on like ideas for startups every now
and then and his background is also in
AI data science and we had a couple of
ideas but given that we were working
full times we was thinking about uh so
we participated in hack Zurich that's
Europe's biggest hackathon um or at
least was at the time and we said hey
this is just a weekend let's just try
out an idea like hack something together
and see how it works and the idea was
that we'd be able to search through
podcast episodes like within a podcast
yeah so we did that long story short uh
we may have should do it like to build
something that we realized hey this
actually works you can you can find
things again in podcasts we are like a
natural language search and we pitched
it on stage and we actually won the
hackathon which was cool I mean we we
also I think we had a good um like a
good good pitch or good example so we we
used the famous Joe Rogan episode with
Elon Musk where ELO musk smokes a joint
okay um it's like a two and a half hour
episode so we were on stage and then we
just searched for like smoking weed and
it would find that exact moment and we
will play it and it just like come on
with Elon Musk just like smoking also it
was video as well no it was actually
completely based on audio but we did
have the video for the presentation
which had a had of course an amazing
effect yeah like this gave us a lot of
activation energy but it wasn't actually
about winning the hackathon yeah but the
interesting thing that happened was
after we pitched on stage several of the
other participants like a lot of them
came up to us and started saying like
hey can I use this like I I have this
issue and like some also came up and
told us about other problems that they
have like very adjacent to this with a
podcast where it's like like could could
I use this for that as well and that was
basically the the moment where I
realized hey it's actually not just us
who are think who are having these
issues with with podcast and getting to
the making the most out of this
knowledge yeah um they other people yeah
that was now I guess like four years ago
or something like that and then yeah we
decided to quit our jobs and start start
this whole snip thing yeah how big is
the team now uh we're just four people
we just four people yeah like four we
all technical yeah basically two on the
the backend side so one of my
co-founders is this person who got me
into machine learning and startups and
we won the hackathon together so we have
two people for the backend side with the
AI and and all of the other backend
things and two for the front end side
building the app which is mostly Android
and uh iOS yeah it's IOS and Android we
also have a watch app for for Apple but
yeah it's mostly those yeah the watch
thing uh it was very funny cuz in the in
the l space Discord you know most of us
have been slowly adopting Snips you came
to me like a year ago and uh you
introduced snip to me I was like I don't
know I'm you know I'm very sticky to
overcast think slowly we switch um why
watch so it goes back to a lot of our
users they do something else while while
listening to a podcast right and one of
the US giving them the ability to then
capture this knowledge even though
they're doing something else at the same
time is one of the killer features um
maybe I can actually maybe at some point
I should maybe give a bit more of an
overview of what the all of the features
that we have sure so this is one of the
the the killer features and for one big
use case that people um use this for is
for running yeah so if you're a big
Runner a big jogger or cycling like
really really cycling um competitively
and a lot of the people they don't want
to take their phone with them when they
go running so you load everything onto
the watch so you can download episodes I
mean if you if you have an Apple watch
that has internet access like with a SIM
card you can also directly stream um
that's also possible yeah of course it's
a it's basically very limited to just
listening and snipping and then you can
see all of your Snips later on your
phone let me tell you this error I just
got error playing episode substack The
Hoster of this podcast does not allow
this podcast to be played on Apple watch
yeah that's a very beautiful thing so we
found out that all of the podcasts
hosted on substack are you cannot play
them on on Apple watch what is this
restriction what like don't ask me we
try to reach out to substack we try to
reach out to some of the bigger
podcasters who are hosting their their
podcast on substack to also let them
know um substack doesn't seem to care
this is not specific to our app you can
also check out the Apple podcast app
it's the same problem it's just that we
actually have identified it and we we
tell the user what's going on I would
say I've been you know we host it we
host our podcast on on substack but
they're not very serious about their
podcasting tools I've told them before
I've been very upfront with them so I
don't feel like I'm you know [ __ ] on
them in any way and uh it's kind of it's
kind of sad because otherwise it's a
perfect Creator platform but the way
that they treat podcasting as a as an
afterthought I think it's really
disappointing maybe given that you
mentioned all these features maybe I can
give a bit of a better overview of the
features that that we have because for
us it's clear in our minds maybe for for
some of the I mean okay I'll tell you
I'll tell you my version you can correct
me right so first of all I think the
main job is for it to be a podcast
listening app um it should be basically
a complete super set of what you
normally get on overcast or apple
podcast anything like that you pull your
show list from from listen notes like
how do you how do you find shows like
got type in anything you you find them
right uh yeah we have we have a search
engine that is powered by listen notes
but I mean in the meantime we have a
huge database of like 99% of all
podcasts out there ourselves yeah what I
noticed that the default experience is
you do not Auto download shows and
that's that's one very big difference
for you guys versus other apps uh where
like you know if I'm subscribed to a
thing it auto downloads and I already
have the MP3 downloaded over overnight
for me I have to put actively put it
onto my queue then it auto downloads and
actually I initially didn't like that I
think I maybe told you that I was like
oh it's like a feature that I don't like
because it means that I have to choose
to listen to it in order to download and
not to this is like optin the difference
between optin and opt out so I opt into
every episode I listen to and then like
you know you open it and depends on
whether or not you have the ai ai stuff
enabled but the default experience is no
no AI stuff enabled you can listen to it
you can see the Snips the number of
snips and where people snip during the
episode which roughly correlates the
interest level and obviously you can
snip there I think that's the default
experience U I think snipping is really
cool like I use it to share a lot in our
Discord I think we have tons and tons of
just people sharing Snips of stuff and
tweeting stuff is also like a nice
pleasant experience but like the real
features come when you actually turn on
the AI stuff and and so the reason I got
snipped because I got fed up with
overcast not implementing any a features
at all in instead they spent 2 years
rewriting their app to be a little bit
faster and I'm like like it's 2025 I
should have a podcast that has
transcripts that I can search very very
basic thing overcast will basically
never have it yeah I think that was a
was a good like basic overview maybe I
can uh please add a bit to it with the
with the the AI features that we have so
one thing that we do every time a new
podcast comes out we uh transcribe the
episode we do speaker diarization we
identify the speaker names each guest we
extract a mini bio of the guest uh try
to find a picture of the guest online
add it we break the the podcast down
into chapters uh as an AI generated
chapters with that one's very handy with
a quick description per uh title and
quick description for each uh chapter we
identify all books that get mentioned
on a on a podcast uh you can tell I
don't use that
one it depends on a podcast there there
are some podcast where the guests often
recommend like an amazing book so later
on you can you can find that again so
literally you search for the word book
or no I just read blah blah blah uh no I
mean it's it's all llm based so
basically we have we have an llm that
goes through the entire transcript and
identifies if a user uh mentions a book
then we use perplexity API together with
various other llm orchestration to go
out there in the internet find
everything that there is to know about
the book find the cover find who or what
the who the author is uh get a quick
description of it for the author we then
check on which other episodes the author
appeared on yeah that is killer because
that like for me if if if there's an
interesting book the first thing that I
do is I actually listen to a podcast
episode with the with a writer because
he usually gives a really great overview
already on a on a podcast sometimes the
podcast is with the person as a guest
sometimes it's podcast is about the
person without him there do you pick up
both so yes we pick up both in like our
latest models but actually what we show
you in the app the goal is to currently
only show you the guest to separate that
in the future we want to show the other
things more but for what it's worth I
don't mind yeah I don't think like if I
like if I like somebody I'll just learn
about them regardless of whether they or
not yeah I mean yes or no we we we have
seen there are some personalities where
this can break down so for example the
first version that we released with this
feature it picked up much more often a
person even if it was not a guest yeah
for example the the best examples for me
is Sam Altman and Elon Musk like they're
just mentioned on every second podcast
and it has like they're not on there and
if you're interested in actually like
learning from them yeah I see um yeah we
updated uh our our algorithms improved
that a lot and now it's gotten much
better to only pick it up if they guest
um yeah so this this is um maybe to come
back to the features two more important
features like we have the ability to
chat with an episode yes of course you
can do the old style of searching
through a transcript with a keyword
search but I think for me this is this
is how you used to do search and
extracting knowledge in the in the past
old school um the AI way is is basically
an llm uh so you can ask the llm hey
when do they talk about topic X if
you're interested in only a certain part
of the episode you can ask them for for
to give a um quick overview of the
episode key takeaways um afterwards also
to create a note for you so this is
really like very open open-ended and
yeah and then finally the Snipping
feature that we mentioned just to
reiterate yeah I mean here the the
feature is that whenever you hear an
amazing idea you can triple tap your
headphones or click a button in the app
and the AI summarizes the Insight you
just heard uh and saves that together
with the original transcript and audio
in your knowledge Library I also noticed
that you you skip dynamic content um so
Dynamic content we do not skip it
automatically oh sorry you detect but we
detect it yeah I mean that's one of the
thing that most people don't don't
actually know that like the way that ads
get inserted into podcasts or into most
podcasts is actually that every time you
listen to a podcast you actually get
access to a different audio file and on
the server uh a different ad is inserted
into the MP3 file automatically yeah
based on IP exactly and um um that what
that means is if we transcribe an
episode and have a transcript with time
stamps like Words word specific time
stamps if you suddenly get a different
audio file like the whole time sets are
messed up and that's like a huge issue
and for that we actually had to build
another algorithm that would dynamically
on the Fly resync the audio that you're
listening to the transcript that we have
yeah which is a fascinating problem in
and of itself you you think by matching
up the sound waves or like you think by
matching up words like basically do
partial transcription we're not matching
up words it's it's happening on the
basically like a bites level matching
yeah okay so it relies on this it relies
on the there be exact matches some point
uh so it's actually not uh we're
actually not doing exact matches but
we're doing fuzzy matches wow to to
identify the the moment it's basically
um we basically build Shazam for podcast
uh just as a little side project to to
solve this issue yeah yeah actually fun
fun fact apparently the Shazam algorithm
is open it's is they published the paper
I talked about it yeah I I haven't
really dived into the paper I thought it
was kind kind of interesting that
basically no one else has built
Shaz yeah I mean well the one thing is
the algorithm like if you now talk about
Shazam right the other thing is also
having the uh the data base behind it
and having the user mindset that if they
have this problem they come to you right
yeah yeah yeah I'm very interested in
the tech stack there's a big data
pipeline could you share like you know
what is the tech stack what are the you
know the most interesting or challenging
pieces of it so the general text tag is
our entire back end is or 90% of our
back end is written in Python okay
hosting everything on uh Google Cloud uh
platform and our front end is uh written
with well we're using the flutter um
framework ah so it's written in dots and
then but compiled natively so we have
one code base for that handles both
Android and iOS you think that was a
good decision that's something that a
lot of people are exploring um so up
until now yes okay look it it has its
pros and cons some of the you know for
example earlier I mentioned we have a
Apple Watch app yeah I mean that there's
no flatter for that right so that you
build native and then of course you have
to sort of like sync these things
together I mean I'm not the front end
engineer so I'm not just relaying this
information but our front front end
engineers very happy with it it's
enabled us to be quite fast and be on
both platforms from from the very
beginning and when I when I talk with
people and they hear that that we are
using flatter usually they you know they
they think like ah it's not performant
it's super junk juny and and everything
and then they use our app and they're
always super surprised or if they've
already used their app tell them they're
like what um so there is actually a lot
that you can do the danger the the the
concern there's a few concerns right one
it's Google so when would they when they
going to abandon it two it you know
they're they're optimized for Android
first so iOS is like a second second
thought or like you can feel that it is
not a native IOS app uh but you guys put
a lot of care into it and then maybe
three from my point of view JavaScript
as a JavaScript guy react native was
supposed to be that dream and I think
that it hasn't really fulfilled that
dream U maybe Expo is trying to do that
but um again it is not does not feel as
productive as flutter and I I spent a
week on flutter and d and I'm an
investor flutter flow which is the local
uh flutter flutter startup that's doing
very very well I think a lot of people
are still flutter Skeptics yeah wait so
are you moving away from flutter uh no
we don't have plans to do that you're
just saying about the the watch okay
let's go back to the stack U you know
that was just to give you a bit of an
overview I think the more interesting
things are of course on the AI side yeah
so we like as I mentioned earlier when
we started out it was before chat TBT
before the chat GPT moment before there
was the GPT 3.5 turbo uh API so in the
beginning we actually were running
everything ourselves open source models
try to fine-tune them they worked the
results but let's let's be honest they
weren't what was the soda before whisper
the transcription yeah uh we were using
Wave to w w um that was a Google one
right no it was a Facebook Facebook one
that was actually one of the papers like
when that came out for me that was one
of the reasons why said we we should try
something to start a startup in the
audio space for me it was a bit like
before that i' had been following the
NLP space uh quite closely and as as I
mentioned earlier we we did some stuff
at at the startup as well that I was
working at before and wave to work was
the first paper that I had at least seen
where the whole Transformer architecture
moved over to audio Yeah and bit more
General way of saying it is like it was
the first time that I saw the trans for
architecture being applied to continuous
data instead of discrete tokens okay and
it worked amazingly ah and like the
transformer architecture plus
self-supervised learning like these two
things moved over and then for me it was
like hey this is now going to take off
similarly as the text space has taken
off and with these two things in place
even if some features that we want to
build are not possible yet they will be
possible in the near term uh with his
trajectory so there's a little side side
note no so in the meantime yeah we're
using whisper we're still hosting some
of the models ourselves so for example
the whole transcription speaker
diarization pipeline uh you need it to
be as cheap as possible yeah exactly I
mean we're doing this at scale where we
have a lot of audio that we're what
numbers can you disclose like what what
are just to give people an idea because
it's a lot so we have more than a
million podcasts that that we've already
processed when you say a million so
processing is basically you have some
kind of list of podcast that you Auto
process and and others where a paying
pay member can choose
to press a button and and and transcribe
it right is that the rough idea yeah
yeah yeah exactly yeah and if when you
press that button or we Auto transcribe
it yeah so first we do the we do the
transcription we do the the speaker
diarization so basically you identify
speech blocks that belong to the same
speaker this is then all orchestrated
within within llm to identify which
speech speech block belongs to which
speaker together with you know we ident
as I mentioned we identify the guest
name and the bio so all of that comes
together with an llm to actually then
assign assign speaker names to to each
block yeah and then most of the rest of
the the pipeline we've now used we've
now migrated to llm apis uh so we use
mainly open AI uh Google models so the
Gemini models and and the open AI models
and we use some perplexity basically for
those things where we need where we need
web search that's something I'm still
hoping
especially open AI will also provide us
an API oh why well basically for us as a
consumer the more providers there are
the more downtime you know the more
competition and it will um lead to
better better uh results and um lower
costs over time I don't I don't see
perity as expensive if you use the web
search the price is like $5 per a th000
queries okay which is Affordable but uh
if you compare that to just a normal llm
call okay um it's it's uh much more
expensive have you tried EXA we've uh
looked into it but we haven't really
tried it um I mean we we started with
perplexity and uh it works it works well
and if I remember correctly XI is also a
bit more expensive I don't I don't know
uh they seem focus on the search thing
as a search API whereas perplexity maybe
more consumer business that is high
higher margin like I'll put it like
perplexity is trying to be a product ex
just trying to be INF structure yeah so
that that would be my distinction there
and then the other thing I will mention
is Google has a search grounding feature
yeah which you yeah yeah yeah we've uh
We've also tried that out um not as good
so we we didn't we didn't go into too
much detail in like really comparing it
like quality-wise because we actually
already had the perplexity one and it
and it's and it's working um I think
also there the price is actually higher
than perplexity yeah really yeah Google
should cut their prices
maybe it was the same price I don't want
to say something incorrect but it wasn't
cheaper it wasn't like compelling and
and then then there there was no reason
to switch so I mean maybe like in
general like for us given that we do
work with a lot of content price is
actually something that we do look at
like for us it's not just about taking
the best model for every task but it's
really getting the best like identifying
what kind of intelligence level you need
and then getting the best price for that
to be able to really scale this and and
provide us um yeah let our users use
these features with as many podcasts as
possible yeah yeah I wanted to double
double click on diarization yeah uh it's
something that I don't think people do
very well so you know I'm I'm a I'm a b
user I don't have it right now but and
they were supposed to speak but they
dropped out last minute um but uh we've
had them on the podcast before and I
it's not great yet do you use just
panotes the the default stuff or do you
find any tricks for diarization so we do
use the the open source packages but we
have tweaked it a bit here and there for
example if you mentioned the B AI guys I
actually listened to the podcast episode
which was super nice and when you
started talking about speaker
diarization and I just had to think
about their use case like with all of
the different environments um it could
be basically be anything it's completely
out of domain like there's no there's no
data for this yeah I mean I was feeling
with them because like our advantage is
that we're working with very high
quality audio Yeah it's very controlled
usually recorded in a studio this is
quite an exception I guess it is kind of
a studio it's like pretty quiet there
consistent background noise which you
can edit out uh this New York it's nice
it's a character um no so that that of
course uh helps us uh another thing that
helps us is that we know
certain structural aspects of the
podcast for example how often does
someone speak like if someone like let's
say there's a 1 Hour episode and someone
speaks for 30 seconds that person is
most probably not the guest and not the
host it's probably some ad uh like some
speaker from an ad you know so we have
like certain of these tics heuristics
yeah exactly that we can use and we
leverage to like improve things and in
the past we've we've also changed the
clustering algorithm so basically how
how a lot of this the the speaker
diarization works is you basically
create an embedding for the speech
that's happening and then you try to
somehow cluster these these um
embeddings uh and then find ah this is
all one speaker this is all another
speaker and there we've also tweaked a
couple of things where we again used
heuristics that we could apply from
knowing how podcasts function um and
that's also actually why I was feeling
so much with the B guys because like all
of these her istics like they like for
them it's probably almost impossible to
use any heris because it can just be any
any situation any uh anything um so
that's that's uh one thing that we do
yeah another thing is that we actually
combine it with llms so the transcript
llms and and the speaker diarization
like bringing all of these together to
recalibrate some of the switching points
like when does this speaker stop when
does the next one start the El can add
errors as well you know I don't I
wouldn't feel safe using them to be so
precise I I mean at the end of the day
like also just to not give a wrong
impression like the speaker diarization
is also not perfect uh that we're doing
right um I basically don't really notice
it like I use it for search like yeah
yeah it's not perfect yet but it's it's
uh it's gotten quite good like
especially if you compare if you if you
look at some of the like if you take a
latest episode and you compare it to an
episode that came out a year ago we've
improved it quite a bit down W it's
beautifully presented oh I love that I
can click on the T the transcript uh and
it goes to a time stamp so simple but
you know it should exist yeah I agree I
agree so this I'm loading a 2hour
episode of the tech meme right home
where there's there's a lot of different
guests calling in and you've identified
the guest name and uh yeah they yeah so
these are all llm based yeah it's really
nice yeah yeah like the speaker names I
would say um I would say that you know
obviously I'm a power user of all these
tools uh you have done a better job than
the script
okay wow the script is so much funding
they have they had they opening ey
invested in them and they still
suck so I don't know like you know keep
going like you're doing you're doing
great yeah thanks thanks um I mean I
would I would say that especially for
anyone listening who's interested in
building a consumer app with AI I think
the like especially if your background
is in Ai and you love working with AI
and doing all of that I think the most
important thing is just to keep
reminding yourself of what's actually
the job to be done here like what does
actually the consumer want like for
example you now were just delighted by
the ability to click on this word and it
jumps there yeah like this is not this
is not rocket science this is like you
don't have to be like I don't know Andre
Kathy to come up with that and build
that right and I think that's that's uh
something that's super important uh to
keep in mind yeah yeah amazing I mean
there's so many features right it's it's
so packed there's quotes that you pick
up
the summarization oh by the way uh I'm
going to use this as my official feature
request um I want to customize what how
it summarized I want to I want to have a
custom prompt yeah U because your
summarization good but you know I I have
different preferences right like you
know so one thing that you can already
do today I completely get your feature
request and I think just I'm sure people
have asked it I mean maybe just in
general as a as a how I see the future
you know like in the future I think all
everything will be personalized like not
that this is not specific to us yeah um
and today we're still in a in a phase
where the cost of llms at least if
you're working with like such long
context Windows as as us I mean there
there's a lot of tokens in if you take
an entire podcast so you still have to
take that cost into consideration so if
for every single user we regenerate it
entirely it it gets expensive but in the
future this you know cak will continue
to go down and then it will just be
personalized so that being said you can
already today if you go to the player
screen okay um and open up the the chat
yeah you can go to the to um to the chat
yes and just ask for a summary in your
style yeah okay I mean I I listen to
consume you know yeah yeah I i' i' never
really use this feature I don't know I
think that's that's me being a slow
adopter no no I mean that's has when
does the conversation start okay I mean
you can just type anything I think what
you're describing I mean that maybe that
is also an interesting topic to talk
about yes where like basically I told
you like look we have this chat you can
just ask for it yeah and this is this is
how chat gbt works today but if you're
building a consumer app you have to move
beyond the chat box uh people do not
want to always type out what they want
so your feature request was even though
theoretically it's already possible what
you are actually asking for is hey I
just want to open up the app and it
should just be there in a form beautiful
way such that I can read it or consume
it without any issues and um I think
that's in general where a lot of the the
the opportunities lie currently in the
market if you want to build a um a
consumer up taking the capability and
the intelligence but finding out what
the actual user interface is the best
way how a user can engage with this uh
intelligence in a natural way is this
something I've been thinking about as
kind of like AI That's not in your face
M because right now you know we we like
to say like oh use notion has notion AI
we have the little thing there and it's
or like some other any other platform
has like the sparkle magic W Emoji like
that's our AI feature use this and it's
like really in your face a lot of people
don't like it you know it just kind of
be become invisible kind of like an
invisible AI 100% I mean the the way I
see it as AI is is the electricity of of
the future and like no one like like we
don't talk about I don't know this this
microphone uses electricity this phone
you don't think about it that way it's
just in there right it's not an
electricity enabled product no it's just
a product yeah it will be the same with
AI I mean now it's still a something
that you use to Market your product I
mean we do it we do the same right um
because it's still something that uh
people realize ah they're doing
something new but at some point no it'll
just be a podcast app yeah and it will
be
normal this AI in there I noticed you do
something interesting in your chat where
you Source the time stamps yeah is that
part of this prompt is there a separate
pipeline that adds Source sources this
is uh actually part of the prompt um so
this is all prompt engineering um uh you
should be able to click on it yeah yeah
yeah I clicked on it um this is all
prompt engineering with how to provide
the the context you know because we
provide all of the transcript how to
provide the context and then yeah get
the model to respond in a correct way
with a certain format and then rendering
that on on the front end this is one of
the examples where I would say it's so
easy to create like a quick demo of this
I mean you can just go to chat TBD past
this thing in say like yeah do this okay
like 15 minutes and you're done yeah but
getting this to like the production
level that it actually works 99% of the
time okay this is then where where the
difference lies yeah so um for this
specific feature like we actually also
have like countless rag xes ah
that that they're just there to correct
certain things that the llm is doing
because it doesn't always adhere to the
format correctly and then it looks super
ugly on the front end so yeah we have
certain reges that correct that and
maybe you'd ask like why don't you use
an llm for that because that's sort of
the again the AI native way like who
uses reg exess anymore but with the chat
for user experience it's very important
that you have the streaming because
otherwise you need to wait so long until
your message has arrived so we're
streaming live the like just like chat
chbt right you get the answer and it's
streaming the text so if you're
streaming the text and something is like
incorrect it's currently not easy to
just like pipe like stream this into
another stre yeah yeah yeah stream this
into another stream and get the stream
back which corrects it that would be
amazing I don't know maybe you can
answer that do you know of any um
there's no API that does this yeah like
you cannot stream in if you own the
models you can uh you know whatever
token sequence has has been admitted
start loading that into the next one if
you fully own the models uh I don't it's
probably not worth it what you think is
better and I think most Engineers who
are new to AI research and benchmarking
actually don't know how much reg reg
Xing there is that goes on in normal
benchmarks it's just like this ugly list
of like 100 different you know matches
for some criteria that you're looking
for
yeah um no it's very cool I think it's
it's an example of like real world
engineering yeah do you have tooling
that you're proud of that you developed
for yourself is it just a test script or
is
it I think it's a bit more I guess the
term that has come up is uh Vibe coding
well Vibe coding is something no sorry
that's actually something else in this
in this case but no no yes um Vibe evals
was a term that in one of the talks
actually on on um I think it might have
been the first the first or the the
first day in at the conference so
someone brought that up and yeah yeah uh
because yeah a lot of the talks were
about evals right which is so important
and yeah I think for us it's a bit more
Vibe evals you know that's also part of
you know being a startup we can take
risks like we can take the cost of maybe
sometimes it failing a little bit or
being a little bit off and our users
know that and they appreciate that in
return like we're moving fast and
iterating and building building amazing
things but you know Spotify or something
like that half of our Fe teachers will
probably be in a six month review
through legal or I don't know what uh
before they could out let's just say
Spotify is not very good at
podcasting um I have a documented uh
dislike for for their podcast features
just overall really really well
integrated any other like sort of llm
focus engineering challenges or problems
that that you you want to highlight I
think it's not unique to us but it goes
again in the direction of handling the
uncertainty of llms so for example with
last year at the end of the year we did
sort of a snipped wrapped and one of the
things we thought it would be fun to
just to do something with a with an llm
and something with the Snips that that
the user has and uh three let's say
unique llm features were that we
assigned a personality to you based on
the the Snips that that you have it I
mean it was just all I guess a bit of a
fun playful way I'm going to look at
mine I I forgot mine already um yeah I
don't know whe it's actually still in
the in the No No we we all took
screenshots of it we posted it in the in
the Discord and the the second one was
uh we had a learning scorecard where we
identified the topics that you snipped
on the most and you got like a little
score for that and the third one was a a
quote that stood out and the quote is
actually a very good example where we
would run that for user and most of the
time it was an interesting quote but
every now and then it was like a super
boring quote that you think like like
how like why did you select that like
come on for there the solution was
actually just to say hey give me five
candidates so it extracted five quotes
as a candidate and then we piped it into
a different model as a judge llm as a
judge and there we used the um a much
better model okay because with the the
initial model again as as I mentioned
Also earlier we do have to look at the
like the the costs because like we have
so much text that goes into it so we
there we use a bit more cheaper model
but then the judge can be like a really
good model to then just choose one out
of five so this is a the Practical
example I can't find it bad search in
Discord um so so you do recommend having
a much smarter model as a
judge uh yeah yeah and that works for
you yeah yeah interesting I think this
year I'm very interested in LM as a
judge being more developed as a concept
I think for things like you know sniff
WRA like it's it's fine like you know
it's it's it's entertaining there's no
right answer I mean we also have it um
we also use the same concept for our
books feature where we identify the the
mentioned books yeah because there it's
the same thing like 90% of the time it
it works perfectly out of the box one
shot and every now and then it just uh
starts identifying books that were not
really mentioned or that are not books
or yeah made yeah starting to make up
books and uh there basically we have the
same thing of like another llm
challenging it um yeah and actually with
the speakers we do the same now that I
now that I think about it yeah um so I'm
I I think it's it's a great technique
interesting you run a lot of calls yeah
okay you know you mention cost you move
from cell phos a lot of models to the to
the you know big live models open the
eye uh and Google uh no anthropic um no
we love Claude like in my opinion Claude
is the the best one when it comes to the
way it formulates things the Personality
yeah the personality okay I actually
really love it but yeah the cost is is
still high so you canot you tried hiu
but you're you're like you have to have
son it uh like basically we like with
hiu we haven't experimented too much we
obviously work a lot with 3.5 son uh
also you know cing yeah for coding like
in cursor just in general also
brainstorming we use it a lot um I think
it's a great brainstorm partner but yeah
with uh with with a lot of things that
we've done done we we opted for
different models what I'm trying to
drive at is how much cheaper can you get
if you go from closed models to open
models and maybe it's like 0% cheaper
maybe it's 5% cheaper or maybe it's like
50% cheaper do you have a sense it's
very difficult to to judge that I don't
really have a sense but I can I can give
you a couple of thoughts that have gone
through our minds over the time because
obviously we we do realize like given
that we we have a couple of tasks where
just so many tokens going in um at some
point it will make sense to to offload
some of that uh to an open source model
but going back to like we we're a
startup right like we're not an AI lab
or whatever like for us actually the
most important thing is to iterate fast
because we need to learn from our users
improve that and yeah just this velocity
of this these iterations and for that
the closed models hosted by open AI
Google is and topic they're just
unbeatable because you just it's just an
API call yeah um so you don't need to
worry about so much complexity behind
that so this is I would say the biggest
reason why we're not doing more in this
space but there are other thoughts uh
also for the future like I see two
different like we basically have two
different usage patterns of llms where
one is this this pre-processing of a
podcast episode like this initial
processing like the transcription
speaker diarization chapter iation we do
that once and this this usage pattern
it's it's quite predictable because we
know how many podcasts get released when
um so we can sort of have a certain
capacity and we can we we're running
that 24/7 it's one big Q running 247
what's the Q job Runner uh is it J uh
Jango just like the python one no that
that's just our own like in our database
uh and the back end talking to the
database picking up jobs fighing it back
I'm just curious in orchestration and I
mean we we of course have like a lot of
other orchestration where we use uh the
Google pops up okay uh thing but okay so
we have this this this usage pattern of
like very predictable uh usage and we
can max out the the usage and then
there's this other pattern where it's
for example the snippet where it's like
a user it's a user action that triggers
an llm C and it has to be real time and
there can be moments where it's bik
usage and there can be moments when
there's very little usage for that there
is that's basically where these llm API
calls are just perfect because you don't
need to worry about scaling this up
scaling this down um handling handling
these issues Ser versus server 4 yeah
exactly like I see them a bit like I see
open Ai and all of these other providers
I see them a bit as the like as the
Amazon sorry AWS of of AI so it's a bit
similar how like back before AWS you
would have to have your your servers and
like buy new servers or get rid of
servers and then with AWS it just became
so much easier to just Ram stuff up and
down yeah and this is like the taking it
even even um to the next level for AI
yeah I I am a big believer in this
basically it's you know Intelligence on
demand yeah we're probably not using it
enough in our daily lives to do things I
should we should be be able to spin up
100 things at once and like go through
things and then you know stop and I feel
like we're still trying to figure out
how to use LMS in our lives effectively
yeah yeah yeah 100% I think that goes
back to the whole like that that's for
me where the big opportunity is for if
you want to do a startup um it's not
about that you can let the big Labs
handle the challenge of more
intelligence yeah but um it's the
existing intelligence how do you
integrate how do you actually
incorporate it into your life it's AI
engineering okay cool cool cool um the
one one other thing I wanted to touch on
was multimodality in Frontier models
dwar cash had a interesting application
of Gemini recently where he just fed raw
audio in and got Dior transcription out
yeah or time stamps out and I think that
will come so basically what what what
we're saying here is another wave of
Transformers eating things cuz right now
models are pretty much single modality
things you know you have whisper you
have a pipeline everything uh no no no
we only fit like the raw the raw files
do you think that will be realistic for
you I 100% agree okay basically
everything that we talked about earlier
with like the speaker diarization and
htics and everything I I completely
agree like in the in the future that
would just be put everything into a big
multimodal llm okay and it will output
uh everything that you want yeah so I've
also experimented with like just with
with Gemini 2 H with Gemini 2.0 flash
yeah yeah just for fun because the big
difference right now is still like the
cost difference of doing speaker
diarization this way or doing
transcription this way is a huge
difference to the pipeline that we've
buil up huh okay I I need to figure out
what what that cost is because in my
mind two flash is so cheap yeah but
maybe not cheap enough for you uh no I
mean if you compare it to yeah whisper
and speaker ization and especially self
hosting it and yeah yeah yeah yeah okay
but we will get there right like this is
just a question of time and um at some
point as soon as that happens we'll be
the first ones to switch yeah awesome
anything else that you're like sort of
eyeing on the horizon as like we are
thinking about this feature we're we're
thinking about incorporating this new
functionality of of AI into our into our
app yeah we I mean there's so many areas
that we're thinking about like our
challenge is a bit more choosing yeah
choosing so I mean I think for me like
looking in into like the next couple of
years they the big areas that interest
us a lot basically four areas like one
is content um right now it's it's
podcasts I mean you did mention I think
you mentioned like you can also upload
audio books and YouTube video YouTube I
actually use the YouTube One a fair
amount but in the future we we want to
also have audiobooks natively in the app
and uh we want to enable AI generated
content like just think of take deep
research and notebook LM
podcast generation like put these
together that that should be that should
be in our app the second area is
Discovery I think in general yeah I
noticed that you don't have so you have
download counts and most Snips right
something like that yeah yeah on the
Discovery side we want to do much much
more I think in general Discovery as a
paradigm in all apps is Will undergo a
change thanks to AI you know there has
been a lot of talk before Elon bought
Twitter there was a lot of talk about
bring your own algorithm to Twitter like
that was Jack dorsey's big thing or like
he he talked a lot about that yeah and I
actually think this is coming but with a
bit of a Twist so I I think what
actually AI will enable is not that you
bring your own algorithm but you will be
able to talk you will be able to
communicate with the algorithm so you
can just tell the algorithm like hey you
keep showing me cat videos and I know I
freaking love them and that's why you
keep showing them to me but please for
the next two hours I really want to like
get more into AI stuff do not show me
cat videos and then it will just uh
adapt and um of course the question is
you know like big platforms like I don't
know let's say say Tik Tok they do not
have the incentive to offer that exactly
that's what I was going to say but we
actually like our we are driven by
helping you learn get the most like
achieve your goals and so for us is it
actually very much our incentive like
hey no you you you should be able to
guide it um yeah so that was a long way
of of of saying that I think um there
will happen a lot in recommendation
order
by yeah um popular yeah yeah I I think
collaborative filtering would be the
first step right for for rexus and then
and then some LM fancy stuff um um yeah
maybe maybe I to go back to the question
that you have before so the other like
these were the first two areas like the
other two um
voice voice as an interface as in voice
AI how is this going to exist yeah so
maybe I can tell you a bit first like
why I find it so interesting for us yeah
because voice as an interface like
historically there has been so much talk
about it and it always fell flat the
reason why I'm excited about it uh this
this time around is with any consumer
app I I like to ask myself what is the
moment in my life what is the trigger in
my life that gets me to open this app
and start using it so for example I
don't know take Airbnb it's the trigger
is like ah you want to travel and then
and then you you do that uh then you
open up the app apps that do not have
this already existing natural trigger in
your life uh it's very difficult for
Consumer app to then get a US it to app
again there's basically only one app one
super success uccessful app that has
been able to do that without this
natural trigger and that is dualingo ah
so dualingo like everyone wants to learn
a language but there's you don't have
this natural moment during your day
where it's like ah now I need to open up
this app you have the notifications
exactly the ourl memes exactly so they I
mean they gamified [ __ ] out of it I mean
super successful super beautiful they
are the goats in this in this Arena but
the much easier is actually no there is
already this trigger and then you don't
have to do all of the streaks and
leaderboards and and everything okay
that's a bit of a context now if you
look what we're doing and our goal of of
getting people to really maximize what
they get out of their listening um we
are interested in in there are a couple
of features where we know we can sort of
10x the value that people get out of a
podcast okay um but we need them to do
something for them there is friction
involved because it's it's it's all
about learning right it's about thinking
for yourself like that's these those are
the moments when you actually start yeah
really 10 Xing the value that you got
out of the podcast instead of just
consuming apply apply the knowledge yeah
okay yeah basically being forced to
think about like what was actually the
main take away for you from this episode
okay like uh this is something that I
like doing myself for every episode that
I listen to I try to boil it down to
like try to decide one single takeaway
yeah even though there might have been
10 amazing things you pick one one most
important one yeah and this is a this is
an active process that is like a forcing
function in your brain to challenge all
of the insights and really come up with
the one thing um that is applicable to
you and your life and what you might
want to do with it so it also helps you
to turn it into into action this is uh
basically a feature that we're
interested in but you have to get the
user to to use that right so when do you
get the user to use that if this is all
text based then we're basically playing
the same game as dualingo where at some
point you're going to get a notification
from snip and be like hey swix come on
you know you should do this maybe
there's a blue
L um but if you have voice you can
basically hook into the existing habits
that the user already has so you already
have this habit that you listen to a
podcast you're already doing that once
an episode ends instead of just jumping
into the next episode you can actually
have your AI companion come on and you
can have a quick conversation you can go
through these things um and how that
looks like in detail like that is still
like we need to figure that out but just
this Paradigm of you're now you're
staying in the flow like a bit this this
also relates to what you were saying
like AI that is invisible like you're
staying in the flow what you're already
doing but now we can insert a completely
new experience in there that helps you
get get the most out of your L yeah yeah
I think your framing of this is very
powerful because I think this is where
you are a product person more than an
engineer because an engineer would just
be like oh it's just chat with your
podcast it's like chat with PDF check
with podcast okay cool but you're
framing it in a different lights that
actually makes sense to me now as
opposed to previously I don't chat with
my podcast like go why I I just listen
to the podcast right but for you it's
more about retention and learning and
and all that um and because you're very
serious about it this why you started a
company um this is your focus on that
whereas yeah I'm still me like I will
admit I'm still stuck in that consume
consume consume mentality and I know
it's not good but this is you know my
default which is why I was a little bit
lost when you were saying all the things
about dual lingo and you're saying the
things about the trigger CU my trigger
for for listening to the podcast is you
know I'm by myself that's my trigger but
you're saying the trigger is not about
listening to the podcast the trigger is
remembering and retaining and processing
the podcast I just listened
to so no but so so what I meant like you
already have this trigger that gets you
to start listening to a podcast yes like
this you already have and so do I don't
know millions of people yeah so they're
more than half a billion monthly active
podcast listeners okay um so you already
have this trigger that gets you to start
listening but you do not have this
trigger as you just said yourself
basically you do not have this trigger
that gets you to
regularly um process process this
information right and um voice basically
for me is is uh the ability to hook into
your existing trigger with the trigger
that I was talking about is basically
your podcast ends and you're just still
listening so we just continue and we can
now spend you know this can be 2 minutes
like I'm not saying now this is like a
60 Minute process I think like 2 minutes
3 minutes that can just come on
completely naturally and if we manage to
do that and you start noticing as a user
like freaking hell like I'm just now
spending 3 minutes with this AI
companion but like your retention is
more I'm taking this much away and it's
not and like retention is one thing but
you like you start to take what you've
learned and apply to what's important to
you like you're thinking yeah um if we
get you to notice that feeling
then yeah then we want yeah I would say
like a lot of people rely on Anki an
notes like flash cards and all that to
to do that but making the notes is also
a chore and um I think this I think this
could be very very interesting I think
that I I'm just noticing that it's it's
kind of like a different usage mode like
you already talked about this you know
the the name of snips is very snip
Centric and I actually originally also
resisted adopting s because of that but
now you're like you know you observe
that people are listening to long form
episodes and they you're talking at the
end like the ideal implementation of
this is I browse through a bunch of
snips of the things that I'm subscribed
to I listen to the Snips I talk with it
and then maybe it double clicks on on
the the podcast and it it goes and finds
other time stamps that are relevant to
the thing that I want to talk about just
I was just thinking about that I don't
know if that's interesting I think these
are all areas that that we should
explore yeah like um we're we're still
quite open about how this will look like
in in detail what are your thoughts on
voice cloning everyone wants to continue
I have had my voice cloned and people
have talked to me my the AI version of
me is that too
creepy I I don't think it's too creepy
in the
future okay with a lot of these things
in our society is going through a change
um and things seem quite weird now that
in the future will seem normal um I
think already voice cloning has become
much more normalized I remember I was at
the I think it was
2017 uh nips conference like back when
San Diego I know la la um it was the FL
Rider one yeah yeah FL yeah so everyone
says that was Peak nips yeah um I
remember there was this this uh talk or
Workshop by liar bird
um they actually got acquired by DP
later they were doing voice clo and they
they were showing off their Tech and
there was this huge discussion later on
like the all of the moral implications
and and ethical implications um and it
really felt like this would never be
accepted by Society mhm and you look now
you have 11 labs and just anyone can
just clone their voice and like no one
really talks about it as like oh my God
the world is going to yeah so I think
Society will get will get used to that
in now case I think there are some
interesting applications where we'd also
be super interested in working together
with creators like podcast creators to
play a bit around with this with this
concept I think that would be super cool
if someone can uh you know come on to
snip go to the latent bace podcast and
start chatting with AI SS yeah um no I
think I think we'll be done we want to
obviously I think as an AI podcast we
should be first consumers of these
things yeah I would say that one
observation I've made about podcasting
this the general state of the market and
you can ask me your questions you know
things you want to ask about podcasters
we are focusing a lot more on YouTube
this year YouTube is the best podcasting
platform it is not MP3s it is not Apple
podcast it is not Spotify it's YouTube
and it's just the social layer of
recommendations and the existing habit
that people have of logging on to
YouTube and getting getting that that's
my observation I you can Riff on that
the only thing I would just say is like
when you were listing your list of
priorities you said audiobooks first
over YouTube and I would switch that if
I were you yeah like as in YouTube video
video podcasts I mean it's obvious yeah
video podcasts I hear to stay I not just
here to stay bigger yeah what I want to
do with snip is obviously also add video
to to the platform oh yeah the way I see
video is I do believe it's I like this
concept of backg groundable video I
didn't come up with this concept was
actually Gustaf srom the CPO of Spotify
exactly exactly when I speak with people
it it remains uh true that they listen
to podcasts when they do something else
at the same time like this is like 90%
of their consumption also if they if
they listen to on on YouTube but every
now and then it's nice to have the video
it's nice if you're for example just
watching a clip it's nice if they
sometimes mention something like they
show some slides or they show some
something where you need to have the
visual uh with it it helps you connect
much more with your uh with the host as
like as as as a listener but the biggest
benefit I see with video is Discovery I
think that is that is also why YouTube
has become the biggest podcast player
out there because they have the
discovery and Discovery in video is just
so much easier and so much better and so
much more engaging so this is the area
where I'm most interested about when it
comes to video and snip that we can
provide a much better much more engaging
and much more fun
Discovery experience for consumers for
yeah for consumers okay I think that you
almost have like three different
audiences the vast majority of people
for you is the people listening to
podcasts right of course then there's a
second layer of people who create Snips
right who who add extra data annotation
value to to your platform by the way we
use the snip count as a proxy for
popularity right because we have
download accounts but for example
platforms like Spotify rehost our MP3
file so we don't get any download count
from Spotify snip count is active like I
opt in to to listen to you and I shared
this those are those are really really
good metrics but the third audience that
you haven't really touched is the
podcast creators like myself yeah and
for me Discovery from that point of view
not from your point of view Discovery
for me is like I want to be discovered
and I think YouTube is still there
Twitter obviously for me substack Hacker
News I try to I Really tried very hard
to rank on Hacker News yeah I think when
Tik Tok took this very seriously they
prioritized the creators of the content
and for you the creator of the content
was the Snips but there may be a W for
you in which you prioritize the creators
of the podcast yeah interesting
observation what what uh what are some
of your ideas or thoughts you have some
specific um Riverside is the closest
that has come to it the script is number
two the script bought a Riverside
competitor and and as far as I can tell
it's not very it's not been very
successful the script just like has like
has a very very good Niche very very
good editing angle and then just has
hasn't done anything interesting since
then although Underlord is good it's not
great like your chapter iation is better
than the scripts again like they should
be able to be you they're not and
Riverside is good also very very good
very very very good like so we we
actually recently started a second
series of podcasts within Len space that
is YouTube only cuz you only find it on
YouTube and it's also shorter so like
this is like a one and a half hour 2
hour thing remote only 30 minutes Chop
Chop send it on to Riverside River Side
pretty good for that not great it
doesn't do good thumbnails it doesn't
good do the editing still a little bit
rough it has like this Auto editor where
like you know whoever's actively
speaking it focuses on the editor on on
the on the active speaker and then
sometimes it goes back to like the multi
speaker view that kind of stuff people
like that okay but like the shorts is
still not great you know like I still
need Auto I need need to manually
download it and then republish it to
YouTube the shorts I still need to pick
they they mostly suck there's still a
lot of rough edges there that ideally me
as a Creator like you know what I want
you definitely know what I want I sit
down record press the button done yeah
we're still not there yeah I think you
guys could do it okay so if I can
translate that for you it's really about
the simplifying the creation process of
of the podcast yeah and and I I'll tell
you what this this will increase the
quality because the reason that most
podcast or or YouTube videos or [ __ ] is
they're made by people who don't have
life experience who are not that
important in the world they're not doing
important jobs and so what you want to
actually enable is CEOs to each of them
make their own podcasts who are busy
they're not going to you know sit there
and figure out Riverside the a lot of
the reason that people like l space is
it takes an idiot like me who could be
doing a lot more with my life making a
lot more money having a real job
somewhere else I just choose to do this
cuz I like it but otherwise they will
never get access to me and the access to
the people that I have access to so
that's my pitch
cool cool um anything else that you
normally want to talk to podcasters
about I think we've we've covered
everything uh I guess like last last
messages uh you know go try out snipped
yeah it's a premium version so you can
use and try out everything for free also
happy to provide you with a with a link
that you can add to the show notes to
try out the premium version also for
free for a month if people want to do
that um yeah give it a shot I would say
yeah thanks for coming on uh I would say
that the after you demoed me I did not
convert for another four to 6 months
because I found it very challenging to
switch over and I think that the main
thing like you you you basically had you
you have import opml right but there's
no way to import like all the existing
like half listen to episodes or like my
rankings or whatever and for that for
listeners who who are I have a blog post
where I talked about my my my switch
just treat it as a chance to clean
house that a good point things and you
know just refocus your Fresh Start
2025 yeah great well thank you for
working on snip thank you for coming on
you know we usually SP a lot of time
talking to like big companies like
Venture startups B2B s you know that
kind of stuff but uh I think your
journey is like you know it's a small
team building a BDC consumer app is the
kind of stuff that we like to also
feature because a lot of people want to
build what you're doing and they don't
see role models that are successful that
are confidence that are like having
success um in this market which is very
challenging um so uh yeah thanks for
thanks for sharing some of your your
thoughts thanks yeah thanks thanks for
having me and thank you for creating an
amazing podcast and amazing conference
as well thank you
[Music]
Loading video analysis...