LongCut logo

Snipd: The AI Podcast App for Learning — with CEO Kevin Ben-Smith

By Latent Space

Summary

## Key takeaways - **Snips: AI for podcast learning**: Snips is an AI-powered podcast app designed for users who listen to podcasts specifically to learn new information, aiming to provide a more effective spoken audio platform for knowledge acquisition. (05:03, 30:47) - **From social clips to knowledge capture**: Initially envisioned as a social platform for sharing short podcast clips ('Snips'), user behavior revealed a strong preference for listening to full episodes and actively capturing knowledge, leading Snips to pivot towards enhancing learning and knowledge retention. (06:32, 07:57) - **AI transforms podcast listening**: Snips leverages AI for features like transcription, speaker diarization, chapter generation, and a chat interface to interact with episodes, moving beyond the traditional 'repurposed music player' model of podcast apps. (20:55, 56:38) - **Personalized AI prompts are key**: The future of consumer AI apps lies in moving beyond chat interfaces to personalized, invisible AI that integrates seamlessly into user habits, allowing for tailored experiences like custom summarization prompts. (42:26, 43:47) - **LLMs as judges for quality control**: To handle the uncertainty of LLMs, Snips uses a 'judge' LLM to select the best output from multiple candidates generated by a cheaper LLM, a technique applied to features like book recommendations and quote extraction. (48:48, 50:10) - **Voice interface for deeper learning**: Voice AI is seen as a critical interface for Snips to hook into existing podcast listening habits, enabling natural, in-flow conversations that enhance knowledge retention and application beyond simple consumption. (01:00:28, 01:04:26)

Topics Covered

  • Your users will tell you what your product is.
  • AI will become invisible, just like electricity.
  • Production AI relies on regexes, not just LLMs.
  • You will soon talk to algorithms to shape your feed.
  • Voice AI will turn passive listening into active learning.

Full Transcript

[Music]

hey I'm here in New York with Kevin Ben

Smith of snips welcome hi hi amazing to

be here yeah this is our first ever I

think Outdoors uh podcast recording well

it's quite a location for the first time

I say I was actually unsure because you

know it's cold it's like I check the

temperature is like kind of 1 1° Celsius

but it's not that bad with the sun no

it's quite nice yeah especially with our

beautiful tea with the tea yeah perfect

we're gonna talk about Snips uh I'm a

Snips user I had to basically you know

apart from Twitter it's like the the

number one use app on my phone nice um

when I when I wake up in the morning I I

open Snips and I you know see what's

what's new and I I think in terms of

time spent or usage on my phone like

it's it's it's I think it's number one

or number two nice nice so so I I really

had to talk about it also because I

think like people interested in AI want

to think about like how can they and

we're an AI podcast we have to talk

about aast out um but before we get

there we just finished the a engineer

Summit and you came for the the two days

how was it uh it was quite incredible I

mean for me the most valuable was just

being in the same room with like-minded

people who are building the future and

who are seeing the future you know

especially when it comes to AI agents

it's so often I have conversations with

friends who are not in the AI world and

it's like so quickly it happens that you

it sounds like you're you're talking in

science fiction

and uh it's just crazy talk it was you

know it's so refreshing to to talk with

so many other people who already see

these things and um yeah Be Inspired

then by them and not always feel like

like okay I think I'm just crazy and

like this will never happen it really is

happening and uh for me it was very

valuable so day two more Rel more

relevant for you than day one yeah day

two so day two was the engineering track

uh that was definitely the most valuable

for me like also as a ition and myself

especially there were one or two talks

that had to do with voice Ai and AI

agents with voice okay so that was uh

quite fascinating also spoke with the

speakers afterwards yeah and yeah they

were also very open and and you know

this this sharing attitude that's uh I

think in general quite prevalent in the

AI Community I also learned a lot like

really practical things that I can now

take away with me yeah I mean on my side

I I think I watched only like half of

the talks as I was running around and I

think people saw me like towards the end

I was kind of collapsing I was on the

floor like uh towards the end because I

I needed to get get AR rest but yeah I'm

excited to watch The Voice say I talks

myself yeah yeah do that and I mean from

my side thanks a lot for organizing this

conference for bringing everyone

together do you have anything like this

in

Switzerland the short answer is no um I

mean I have to say the AI community in

especially ZK where where we're based

yeah it is quite good and it's uh

growing uh especially driven by eth the

the Technical University there and all

of the big companies they have ai teams

there Google like Google has the biggest

Tech Hub outside of the US in Zurich

yeah Facebook is doing a lot in reality

Labs apple has a secret AI team open Ai

and and swap pic just announced that

they're coming to Zurich yeah um so

there's a lot happening yeah so yeah uh

I think the most recent notable move I

think the entire Vision team from Google

uh Lucas buyer um and and all the other

authors of sigp left Google to join open

AI which I thought was like it's like a

big move for a whole team to move all at

once Al at same time so I've been to

Zurich and it just feels expensive like

it's a great City yeah great University

but I don't see it as like a business

Hub is it a business Hub I guess it is

right like it's kind of well

historically it's uh it's a finance Hub

Finance Hub yeah I mean there are some

some large Banks there right especially

UBS uh the the largest wealth manager in

the world but it's really becoming more

of a tech Hub now with all of the big uh

tech companies there I guess yeah and

but and research Wise It's all eth and

there's some other things yeah yeah yeah

it's all driven by eth and then uh it's

sister University epfl which is in loan

okay um which they're also doing a lot

but it's it's it's really eth and

otherwise no I mean it's a beautiful

really beautiful city I can recommend to

anyone to come visit zerich uh let me

know happy to show you around and of

course you know you have the nature so

close you have the mountains so close

you have so so beautiful lakes um I

think that's what makes it such a

livable City yeah um and the cost is not

is not cheap but I mean we're in New

York City right now and uh I don't know

I paid $8 for a coffee this morning so

uh the coffee is cheaper in Z than the

New York city so okay let's talk about

snip what is Snip and you know will'll

talk about your origin story but I just

let's let's get a crisp what is snipped

yeah I always see two definitions of

snipped so I'll give you one really

simple straightforward one and then a

second more nuanced um which I think

will be valuable for the rest of our

conversation so the most simple one is

just to say look we're an AI powered

podcast app so if you listen to podcasts

we're now providing this AI enhanced

experience but if you look at the more

nuanced uh perspective it's actually we

we've have

a very big focus on people who like your

audience who listen to podcasts to learn

something new like your audience you

want they want to learn about AI what's

happening what's what what's the latest

research what's going on and we want to

provide a a spoken audio platform where

you can do that most effectively and AI

is basically the way that we can achieve

that yeah means so an end yeah exactly

when you started was it always meant to

be AI or was it was it more about the

social Shar

so the first version that we ever

released was like 3 and a half years ago

okay yeah so this was before chat GPT

whisper yeah before whisper yeah so I

think a lot of the features that we now

have in the app they weren't really

possible yet back then but we already

from the beginning we always had the

focus on knowledge that's the reason why

you know we and our team why we listen

to podcasts but we did have a bit of a

different approach like the idea in the

very beginning was so the name is

snipped and you can create these what we

call

Snips uh which is basically a small

snippet like a clip from a from a

podcast um and we did Envision sort of

like a like a social Tik Tok platform

where some people would listen to full

episodes and they would snip certain

like the best parts of it and they would

post that in a feed and other users

would consume this feed of snips um and

use that as a discovery tool or just as

a means to an end and so you would have

both people who create Snips and people

who listen to Snips so our big

hypothesis in the beginning was you know

it will be easy to get people to listen

to these Snips but super difficult to

actually get them to create them so we

focused a lot of uh lot of our effort on

making it as seamless and easiest

possible to create a snip yeah it's

similar to Tik Tok you you need cap cut

for for there to be videos on Tik Tok

exactly exactly and so for for snip

basically whenever you hear an amazing

Insight a great moment you can just

triple tap your headphones and our AI

actually then saves the moment that you

just uh listen to and summarizes it to

to create a note and this is then

basically a snip so yeah we built we

built all of this uh launched it and

what we found out was basically the

exact

opposite so we saw that people use the

Snips to discover podcast but they

really you know really love listening to

long form uh podcasts but they were

creating Snips like crazy

and this was this was definitely one of

these aha moments when we realized like

hey we should be really doubling down on

the knowledge of learning of yeah

helping you learn most effectively and

helping you capture the knowledge that

you listen to and actually do something

with it because this is in general you

know we we live in this world where

there's so much content and we consume

and consume and consume and it's so easy

to just at the end of the podcast you

just start listening to the next podcast

and 5 minutes later you've forgotten 90%

99% of what you've actually just learned

yeah yeah you don't know this but and

and most people don't know this but this

is my fourth podcast my third podcast

was a personal mixtape podcast where I

snipped manually sections of podcast

that I liked and added my own commentary

on top of them and published them in

small episodes nice so those would be

maybe 5 to 10 minute slips of something

that I thought was a good story or like

a good insight and then I added my own

commentary and published it as a

separate podcast it's cool is that still

live it's still live but it's not active

but you can go back and find it if

you're if if you're curious enough

you'll see it nice nice yeah you have to

show me later but it was so manual uh

because basically what my process would

be I hear something interesting I note

down the Tim stamp and I note down the

URL of the the podcast I used to use

overcast so it'll just link to the

overcast page and then put in my

notetaking app go home uh whenever I

feel like publishing I will take one of

those things and then download the MP3

clip out the MP3 and record my intro

outro and then publish it as a as a

podcast but now Snips I mean I can just

kind of double click or triple tap I

mean those are very similar stories to

what we hear from our users you know

it's it's normal that you're doing

you're doing something else while you're

listening to a podcast so a lot of our

users they're driving they're working

out walking their dog so in those

moments when you hear something amazing

it's difficult to just write them down

or you know you have to take out your

phone and some people take a screenshot

write down a time stamp and then later

on you have to go back and try to find

it again of course you can't find it

anymore there's no search there's no

command F and um these these were all of

the issues that that that we encountered

also ourselves as users and given that

our background was in AI we realized

like wait hey this is this should not be

the case like podcast apps today they're

still they're basically repurposed music

players but we actually look at podcast

as one of the largest sources of

knowledge in the world and once you have

that different angle of looking at it

together with everything that AI is now

enabling you realize like hey this is

not the way that we that podcast apps

should be yeah yeah agre uh you

mentioned something there you said your

background is in AI first of all who's

the team and what do you mean your

backgrounds in AI those are two very

different

questions um maybe starting with with my

backstory yeah my backstory actually

actually goes back like let's say 12

years ago or something like that I moved

to Zurich to study at eth and actually I

studied something completely different I

studied mathematics and economics

basically with this this specialization

for Quant Finance same okay wow all

right so yeah then as you know all of

these mathematical models for um asset

pricing derivative pricing quantitative

trading and for me the thing that that

fascinated me the most was the

mathematical modeling behind it uh

mathematics uh statistics but I was

never really that passionate about the

finance side of things really oh okay

yeah I mean okay we're we're different

there I mean one just let's say symptom

that I noticed now like like looking

back during that time I think I never

read an academic paper about the subject

in my free time and then it was towards

the end of my studies I was already

working for a big bang one of my best

friends he comes to me and says hey I

just took this course you have to you

have to do this you have to take this

lecture okay and I'm like what what what

is it about it's called machine learning

and I'm like what what what kind of

stupid name is

that uh so he sent me the slides and

like over a weekend I went through all

of the slides and I just I I just knew

like freaking hell like this is it I'm

I'm in love wow yeah okay and that was

then over the course of the next I think

like 12 months I just really got into it

started reading all about it like

reading blog post starting building my

own models was this course by a famous

person famous university was it like

and Cera thing or uh no so this was a

eth course Oh e uh so a professor at eth

did you teaching English by the way or

yeah yeah okay so these slides are

somewhere available yeah yeah definitely

I mean now they're quite outdated yeah

yeah sure sure sure well I think you

know reflecting on the finance thing for

a bit so I I was used to be a Trader uh

sside and buy side I was options Trader

first and then I I was more of like a

quantitative uh hedge fund uh analyst we

never really use machine learning it was

more like a little bit of statistical

model but really like you you fit you

know your

regression no I mean that's that's what

it is and or you you solve partial

differential equations and have then

numerical methods to to to solve these

that's that's for your degree that's

that's not really what you do at work

right unless what I don't know what you

do at work in my job no no we weren't

solving the P yeah you learn all this in

school and then you don't use it I mean

we we well let's put it like that um in

some things yeah I mean I did code

algorithms that would do it but but it

was basically like it was the the most

basic algorithms and then you just like

slightly improved them a little bit like

you just tweaked them here and there it

wasn't like starting from scratch like

oh here's this new partial differential

equation like how do we no um yeah yeah

I mean that's that's real life right

most most of it's kind of boring or

you're you're using established things

because they're established because uh

they tackle the most important topics um

yeah portfolio management was more

interesting for me um and uh we we were

sort of the first to combine like social

data with with quantitative trading and

I think I I think now it's very common

but um yeah then you you went went deep

on machine learning and then what you

quit your job yeah yeah I quit my job

because it um I mean I started using it

at at the bank as well like try like you

know I like desperately tried to find

any kind of excuse to like use it here

or or there but it just was clear to me

like no if I want to do this um like I

just have to like make a real cut so

equit my job and joined an early stage

uh Tech startup in Zurich where then

built up the AI team over 5 years wow

yeah we built various machine learning

uh things for for Banks from like models

for for sales teams to identify which

clients like which product to sell to

them and and with what reasons all the

way to we did a lot lot with bank

transactions one of the actually most

fun projects for me was we had an an NLP

model that would take the booking text

of a transaction like a credit card

transaction and pretty fired yeah

because they had all of these you know

like numbers in there and abbreviations

and whatnot and sometimes you look at

they like what what is this and it was

just you know it would just change it to

I don't know CVS yeah yeah but I mean

would you have hallucinations no no no

the way that everything was set up it

wasn't like it wasn't yet fully yeah end

to end generative uh newal Network as

what you would use today okay okay yeah

awesome and and then when did you go

like fulltime on snit Yeah so basically

that was that was afterwards I mean how

that started was the friend of mine who

got me into machine learning uh him and

I uh like he also got me interested into

startups he's had a big impact in my

life and the two of us would just uh jam

on on like ideas for startups every now

and then and his background is also in

AI data science and we had a couple of

ideas but given that we were working

full times we was thinking about uh so

we participated in hack Zurich that's

Europe's biggest hackathon um or at

least was at the time and we said hey

this is just a weekend let's just try

out an idea like hack something together

and see how it works and the idea was

that we'd be able to search through

podcast episodes like within a podcast

yeah so we did that long story short uh

we may have should do it like to build

something that we realized hey this

actually works you can you can find

things again in podcasts we are like a

natural language search and we pitched

it on stage and we actually won the

hackathon which was cool I mean we we

also I think we had a good um like a

good good pitch or good example so we we

used the famous Joe Rogan episode with

Elon Musk where ELO musk smokes a joint

okay um it's like a two and a half hour

episode so we were on stage and then we

just searched for like smoking weed and

it would find that exact moment and we

will play it and it just like come on

with Elon Musk just like smoking also it

was video as well no it was actually

completely based on audio but we did

have the video for the presentation

which had a had of course an amazing

effect yeah like this gave us a lot of

activation energy but it wasn't actually

about winning the hackathon yeah but the

interesting thing that happened was

after we pitched on stage several of the

other participants like a lot of them

came up to us and started saying like

hey can I use this like I I have this

issue and like some also came up and

told us about other problems that they

have like very adjacent to this with a

podcast where it's like like could could

I use this for that as well and that was

basically the the moment where I

realized hey it's actually not just us

who are think who are having these

issues with with podcast and getting to

the making the most out of this

knowledge yeah um they other people yeah

that was now I guess like four years ago

or something like that and then yeah we

decided to quit our jobs and start start

this whole snip thing yeah how big is

the team now uh we're just four people

we just four people yeah like four we

all technical yeah basically two on the

the backend side so one of my

co-founders is this person who got me

into machine learning and startups and

we won the hackathon together so we have

two people for the backend side with the

AI and and all of the other backend

things and two for the front end side

building the app which is mostly Android

and uh iOS yeah it's IOS and Android we

also have a watch app for for Apple but

yeah it's mostly those yeah the watch

thing uh it was very funny cuz in the in

the l space Discord you know most of us

have been slowly adopting Snips you came

to me like a year ago and uh you

introduced snip to me I was like I don't

know I'm you know I'm very sticky to

overcast think slowly we switch um why

watch so it goes back to a lot of our

users they do something else while while

listening to a podcast right and one of

the US giving them the ability to then

capture this knowledge even though

they're doing something else at the same

time is one of the killer features um

maybe I can actually maybe at some point

I should maybe give a bit more of an

overview of what the all of the features

that we have sure so this is one of the

the the killer features and for one big

use case that people um use this for is

for running yeah so if you're a big

Runner a big jogger or cycling like

really really cycling um competitively

and a lot of the people they don't want

to take their phone with them when they

go running so you load everything onto

the watch so you can download episodes I

mean if you if you have an Apple watch

that has internet access like with a SIM

card you can also directly stream um

that's also possible yeah of course it's

a it's basically very limited to just

listening and snipping and then you can

see all of your Snips later on your

phone let me tell you this error I just

got error playing episode substack The

Hoster of this podcast does not allow

this podcast to be played on Apple watch

yeah that's a very beautiful thing so we

found out that all of the podcasts

hosted on substack are you cannot play

them on on Apple watch what is this

restriction what like don't ask me we

try to reach out to substack we try to

reach out to some of the bigger

podcasters who are hosting their their

podcast on substack to also let them

know um substack doesn't seem to care

this is not specific to our app you can

also check out the Apple podcast app

it's the same problem it's just that we

actually have identified it and we we

tell the user what's going on I would

say I've been you know we host it we

host our podcast on on substack but

they're not very serious about their

podcasting tools I've told them before

I've been very upfront with them so I

don't feel like I'm you know [ __ ] on

them in any way and uh it's kind of it's

kind of sad because otherwise it's a

perfect Creator platform but the way

that they treat podcasting as a as an

afterthought I think it's really

disappointing maybe given that you

mentioned all these features maybe I can

give a bit of a better overview of the

features that that we have because for

us it's clear in our minds maybe for for

some of the I mean okay I'll tell you

I'll tell you my version you can correct

me right so first of all I think the

main job is for it to be a podcast

listening app um it should be basically

a complete super set of what you

normally get on overcast or apple

podcast anything like that you pull your

show list from from listen notes like

how do you how do you find shows like

got type in anything you you find them

right uh yeah we have we have a search

engine that is powered by listen notes

but I mean in the meantime we have a

huge database of like 99% of all

podcasts out there ourselves yeah what I

noticed that the default experience is

you do not Auto download shows and

that's that's one very big difference

for you guys versus other apps uh where

like you know if I'm subscribed to a

thing it auto downloads and I already

have the MP3 downloaded over overnight

for me I have to put actively put it

onto my queue then it auto downloads and

actually I initially didn't like that I

think I maybe told you that I was like

oh it's like a feature that I don't like

because it means that I have to choose

to listen to it in order to download and

not to this is like optin the difference

between optin and opt out so I opt into

every episode I listen to and then like

you know you open it and depends on

whether or not you have the ai ai stuff

enabled but the default experience is no

no AI stuff enabled you can listen to it

you can see the Snips the number of

snips and where people snip during the

episode which roughly correlates the

interest level and obviously you can

snip there I think that's the default

experience U I think snipping is really

cool like I use it to share a lot in our

Discord I think we have tons and tons of

just people sharing Snips of stuff and

tweeting stuff is also like a nice

pleasant experience but like the real

features come when you actually turn on

the AI stuff and and so the reason I got

snipped because I got fed up with

overcast not implementing any a features

at all in instead they spent 2 years

rewriting their app to be a little bit

faster and I'm like like it's 2025 I

should have a podcast that has

transcripts that I can search very very

basic thing overcast will basically

never have it yeah I think that was a

was a good like basic overview maybe I

can uh please add a bit to it with the

with the the AI features that we have so

one thing that we do every time a new

podcast comes out we uh transcribe the

episode we do speaker diarization we

identify the speaker names each guest we

extract a mini bio of the guest uh try

to find a picture of the guest online

add it we break the the podcast down

into chapters uh as an AI generated

chapters with that one's very handy with

a quick description per uh title and

quick description for each uh chapter we

identify all books that get mentioned

on a on a podcast uh you can tell I

don't use that

one it depends on a podcast there there

are some podcast where the guests often

recommend like an amazing book so later

on you can you can find that again so

literally you search for the word book

or no I just read blah blah blah uh no I

mean it's it's all llm based so

basically we have we have an llm that

goes through the entire transcript and

identifies if a user uh mentions a book

then we use perplexity API together with

various other llm orchestration to go

out there in the internet find

everything that there is to know about

the book find the cover find who or what

the who the author is uh get a quick

description of it for the author we then

check on which other episodes the author

appeared on yeah that is killer because

that like for me if if if there's an

interesting book the first thing that I

do is I actually listen to a podcast

episode with the with a writer because

he usually gives a really great overview

already on a on a podcast sometimes the

podcast is with the person as a guest

sometimes it's podcast is about the

person without him there do you pick up

both so yes we pick up both in like our

latest models but actually what we show

you in the app the goal is to currently

only show you the guest to separate that

in the future we want to show the other

things more but for what it's worth I

don't mind yeah I don't think like if I

like if I like somebody I'll just learn

about them regardless of whether they or

not yeah I mean yes or no we we we have

seen there are some personalities where

this can break down so for example the

first version that we released with this

feature it picked up much more often a

person even if it was not a guest yeah

for example the the best examples for me

is Sam Altman and Elon Musk like they're

just mentioned on every second podcast

and it has like they're not on there and

if you're interested in actually like

learning from them yeah I see um yeah we

updated uh our our algorithms improved

that a lot and now it's gotten much

better to only pick it up if they guest

um yeah so this this is um maybe to come

back to the features two more important

features like we have the ability to

chat with an episode yes of course you

can do the old style of searching

through a transcript with a keyword

search but I think for me this is this

is how you used to do search and

extracting knowledge in the in the past

old school um the AI way is is basically

an llm uh so you can ask the llm hey

when do they talk about topic X if

you're interested in only a certain part

of the episode you can ask them for for

to give a um quick overview of the

episode key takeaways um afterwards also

to create a note for you so this is

really like very open open-ended and

yeah and then finally the Snipping

feature that we mentioned just to

reiterate yeah I mean here the the

feature is that whenever you hear an

amazing idea you can triple tap your

headphones or click a button in the app

and the AI summarizes the Insight you

just heard uh and saves that together

with the original transcript and audio

in your knowledge Library I also noticed

that you you skip dynamic content um so

Dynamic content we do not skip it

automatically oh sorry you detect but we

detect it yeah I mean that's one of the

thing that most people don't don't

actually know that like the way that ads

get inserted into podcasts or into most

podcasts is actually that every time you

listen to a podcast you actually get

access to a different audio file and on

the server uh a different ad is inserted

into the MP3 file automatically yeah

based on IP exactly and um um that what

that means is if we transcribe an

episode and have a transcript with time

stamps like Words word specific time

stamps if you suddenly get a different

audio file like the whole time sets are

messed up and that's like a huge issue

and for that we actually had to build

another algorithm that would dynamically

on the Fly resync the audio that you're

listening to the transcript that we have

yeah which is a fascinating problem in

and of itself you you think by matching

up the sound waves or like you think by

matching up words like basically do

partial transcription we're not matching

up words it's it's happening on the

basically like a bites level matching

yeah okay so it relies on this it relies

on the there be exact matches some point

uh so it's actually not uh we're

actually not doing exact matches but

we're doing fuzzy matches wow to to

identify the the moment it's basically

um we basically build Shazam for podcast

uh just as a little side project to to

solve this issue yeah yeah actually fun

fun fact apparently the Shazam algorithm

is open it's is they published the paper

I talked about it yeah I I haven't

really dived into the paper I thought it

was kind kind of interesting that

basically no one else has built

Shaz yeah I mean well the one thing is

the algorithm like if you now talk about

Shazam right the other thing is also

having the uh the data base behind it

and having the user mindset that if they

have this problem they come to you right

yeah yeah yeah I'm very interested in

the tech stack there's a big data

pipeline could you share like you know

what is the tech stack what are the you

know the most interesting or challenging

pieces of it so the general text tag is

our entire back end is or 90% of our

back end is written in Python okay

hosting everything on uh Google Cloud uh

platform and our front end is uh written

with well we're using the flutter um

framework ah so it's written in dots and

then but compiled natively so we have

one code base for that handles both

Android and iOS you think that was a

good decision that's something that a

lot of people are exploring um so up

until now yes okay look it it has its

pros and cons some of the you know for

example earlier I mentioned we have a

Apple Watch app yeah I mean that there's

no flatter for that right so that you

build native and then of course you have

to sort of like sync these things

together I mean I'm not the front end

engineer so I'm not just relaying this

information but our front front end

engineers very happy with it it's

enabled us to be quite fast and be on

both platforms from from the very

beginning and when I when I talk with

people and they hear that that we are

using flatter usually they you know they

they think like ah it's not performant

it's super junk juny and and everything

and then they use our app and they're

always super surprised or if they've

already used their app tell them they're

like what um so there is actually a lot

that you can do the danger the the the

concern there's a few concerns right one

it's Google so when would they when they

going to abandon it two it you know

they're they're optimized for Android

first so iOS is like a second second

thought or like you can feel that it is

not a native IOS app uh but you guys put

a lot of care into it and then maybe

three from my point of view JavaScript

as a JavaScript guy react native was

supposed to be that dream and I think

that it hasn't really fulfilled that

dream U maybe Expo is trying to do that

but um again it is not does not feel as

productive as flutter and I I spent a

week on flutter and d and I'm an

investor flutter flow which is the local

uh flutter flutter startup that's doing

very very well I think a lot of people

are still flutter Skeptics yeah wait so

are you moving away from flutter uh no

we don't have plans to do that you're

just saying about the the watch okay

let's go back to the stack U you know

that was just to give you a bit of an

overview I think the more interesting

things are of course on the AI side yeah

so we like as I mentioned earlier when

we started out it was before chat TBT

before the chat GPT moment before there

was the GPT 3.5 turbo uh API so in the

beginning we actually were running

everything ourselves open source models

try to fine-tune them they worked the

results but let's let's be honest they

weren't what was the soda before whisper

the transcription yeah uh we were using

Wave to w w um that was a Google one

right no it was a Facebook Facebook one

that was actually one of the papers like

when that came out for me that was one

of the reasons why said we we should try

something to start a startup in the

audio space for me it was a bit like

before that i' had been following the

NLP space uh quite closely and as as I

mentioned earlier we we did some stuff

at at the startup as well that I was

working at before and wave to work was

the first paper that I had at least seen

where the whole Transformer architecture

moved over to audio Yeah and bit more

General way of saying it is like it was

the first time that I saw the trans for

architecture being applied to continuous

data instead of discrete tokens okay and

it worked amazingly ah and like the

transformer architecture plus

self-supervised learning like these two

things moved over and then for me it was

like hey this is now going to take off

similarly as the text space has taken

off and with these two things in place

even if some features that we want to

build are not possible yet they will be

possible in the near term uh with his

trajectory so there's a little side side

note no so in the meantime yeah we're

using whisper we're still hosting some

of the models ourselves so for example

the whole transcription speaker

diarization pipeline uh you need it to

be as cheap as possible yeah exactly I

mean we're doing this at scale where we

have a lot of audio that we're what

numbers can you disclose like what what

are just to give people an idea because

it's a lot so we have more than a

million podcasts that that we've already

processed when you say a million so

processing is basically you have some

kind of list of podcast that you Auto

process and and others where a paying

pay member can choose

to press a button and and and transcribe

it right is that the rough idea yeah

yeah yeah exactly yeah and if when you

press that button or we Auto transcribe

it yeah so first we do the we do the

transcription we do the the speaker

diarization so basically you identify

speech blocks that belong to the same

speaker this is then all orchestrated

within within llm to identify which

speech speech block belongs to which

speaker together with you know we ident

as I mentioned we identify the guest

name and the bio so all of that comes

together with an llm to actually then

assign assign speaker names to to each

block yeah and then most of the rest of

the the pipeline we've now used we've

now migrated to llm apis uh so we use

mainly open AI uh Google models so the

Gemini models and and the open AI models

and we use some perplexity basically for

those things where we need where we need

web search that's something I'm still

hoping

especially open AI will also provide us

an API oh why well basically for us as a

consumer the more providers there are

the more downtime you know the more

competition and it will um lead to

better better uh results and um lower

costs over time I don't I don't see

perity as expensive if you use the web

search the price is like $5 per a th000

queries okay which is Affordable but uh

if you compare that to just a normal llm

call okay um it's it's uh much more

expensive have you tried EXA we've uh

looked into it but we haven't really

tried it um I mean we we started with

perplexity and uh it works it works well

and if I remember correctly XI is also a

bit more expensive I don't I don't know

uh they seem focus on the search thing

as a search API whereas perplexity maybe

more consumer business that is high

higher margin like I'll put it like

perplexity is trying to be a product ex

just trying to be INF structure yeah so

that that would be my distinction there

and then the other thing I will mention

is Google has a search grounding feature

yeah which you yeah yeah yeah we've uh

We've also tried that out um not as good

so we we didn't we didn't go into too

much detail in like really comparing it

like quality-wise because we actually

already had the perplexity one and it

and it's and it's working um I think

also there the price is actually higher

than perplexity yeah really yeah Google

should cut their prices

maybe it was the same price I don't want

to say something incorrect but it wasn't

cheaper it wasn't like compelling and

and then then there there was no reason

to switch so I mean maybe like in

general like for us given that we do

work with a lot of content price is

actually something that we do look at

like for us it's not just about taking

the best model for every task but it's

really getting the best like identifying

what kind of intelligence level you need

and then getting the best price for that

to be able to really scale this and and

provide us um yeah let our users use

these features with as many podcasts as

possible yeah yeah I wanted to double

double click on diarization yeah uh it's

something that I don't think people do

very well so you know I'm I'm a I'm a b

user I don't have it right now but and

they were supposed to speak but they

dropped out last minute um but uh we've

had them on the podcast before and I

it's not great yet do you use just

panotes the the default stuff or do you

find any tricks for diarization so we do

use the the open source packages but we

have tweaked it a bit here and there for

example if you mentioned the B AI guys I

actually listened to the podcast episode

which was super nice and when you

started talking about speaker

diarization and I just had to think

about their use case like with all of

the different environments um it could

be basically be anything it's completely

out of domain like there's no there's no

data for this yeah I mean I was feeling

with them because like our advantage is

that we're working with very high

quality audio Yeah it's very controlled

usually recorded in a studio this is

quite an exception I guess it is kind of

a studio it's like pretty quiet there

consistent background noise which you

can edit out uh this New York it's nice

it's a character um no so that that of

course uh helps us uh another thing that

helps us is that we know

certain structural aspects of the

podcast for example how often does

someone speak like if someone like let's

say there's a 1 Hour episode and someone

speaks for 30 seconds that person is

most probably not the guest and not the

host it's probably some ad uh like some

speaker from an ad you know so we have

like certain of these tics heuristics

yeah exactly that we can use and we

leverage to like improve things and in

the past we've we've also changed the

clustering algorithm so basically how

how a lot of this the the speaker

diarization works is you basically

create an embedding for the speech

that's happening and then you try to

somehow cluster these these um

embeddings uh and then find ah this is

all one speaker this is all another

speaker and there we've also tweaked a

couple of things where we again used

heuristics that we could apply from

knowing how podcasts function um and

that's also actually why I was feeling

so much with the B guys because like all

of these her istics like they like for

them it's probably almost impossible to

use any heris because it can just be any

any situation any uh anything um so

that's that's uh one thing that we do

yeah another thing is that we actually

combine it with llms so the transcript

llms and and the speaker diarization

like bringing all of these together to

recalibrate some of the switching points

like when does this speaker stop when

does the next one start the El can add

errors as well you know I don't I

wouldn't feel safe using them to be so

precise I I mean at the end of the day

like also just to not give a wrong

impression like the speaker diarization

is also not perfect uh that we're doing

right um I basically don't really notice

it like I use it for search like yeah

yeah it's not perfect yet but it's it's

uh it's gotten quite good like

especially if you compare if you if you

look at some of the like if you take a

latest episode and you compare it to an

episode that came out a year ago we've

improved it quite a bit down W it's

beautifully presented oh I love that I

can click on the T the transcript uh and

it goes to a time stamp so simple but

you know it should exist yeah I agree I

agree so this I'm loading a 2hour

episode of the tech meme right home

where there's there's a lot of different

guests calling in and you've identified

the guest name and uh yeah they yeah so

these are all llm based yeah it's really

nice yeah yeah like the speaker names I

would say um I would say that you know

obviously I'm a power user of all these

tools uh you have done a better job than

the script

okay wow the script is so much funding

they have they had they opening ey

invested in them and they still

suck so I don't know like you know keep

going like you're doing you're doing

great yeah thanks thanks um I mean I

would I would say that especially for

anyone listening who's interested in

building a consumer app with AI I think

the like especially if your background

is in Ai and you love working with AI

and doing all of that I think the most

important thing is just to keep

reminding yourself of what's actually

the job to be done here like what does

actually the consumer want like for

example you now were just delighted by

the ability to click on this word and it

jumps there yeah like this is not this

is not rocket science this is like you

don't have to be like I don't know Andre

Kathy to come up with that and build

that right and I think that's that's uh

something that's super important uh to

keep in mind yeah yeah amazing I mean

there's so many features right it's it's

so packed there's quotes that you pick

up

the summarization oh by the way uh I'm

going to use this as my official feature

request um I want to customize what how

it summarized I want to I want to have a

custom prompt yeah U because your

summarization good but you know I I have

different preferences right like you

know so one thing that you can already

do today I completely get your feature

request and I think just I'm sure people

have asked it I mean maybe just in

general as a as a how I see the future

you know like in the future I think all

everything will be personalized like not

that this is not specific to us yeah um

and today we're still in a in a phase

where the cost of llms at least if

you're working with like such long

context Windows as as us I mean there

there's a lot of tokens in if you take

an entire podcast so you still have to

take that cost into consideration so if

for every single user we regenerate it

entirely it it gets expensive but in the

future this you know cak will continue

to go down and then it will just be

personalized so that being said you can

already today if you go to the player

screen okay um and open up the the chat

yeah you can go to the to um to the chat

yes and just ask for a summary in your

style yeah okay I mean I I listen to

consume you know yeah yeah I i' i' never

really use this feature I don't know I

think that's that's me being a slow

adopter no no I mean that's has when

does the conversation start okay I mean

you can just type anything I think what

you're describing I mean that maybe that

is also an interesting topic to talk

about yes where like basically I told

you like look we have this chat you can

just ask for it yeah and this is this is

how chat gbt works today but if you're

building a consumer app you have to move

beyond the chat box uh people do not

want to always type out what they want

so your feature request was even though

theoretically it's already possible what

you are actually asking for is hey I

just want to open up the app and it

should just be there in a form beautiful

way such that I can read it or consume

it without any issues and um I think

that's in general where a lot of the the

the opportunities lie currently in the

market if you want to build a um a

consumer up taking the capability and

the intelligence but finding out what

the actual user interface is the best

way how a user can engage with this uh

intelligence in a natural way is this

something I've been thinking about as

kind of like AI That's not in your face

M because right now you know we we like

to say like oh use notion has notion AI

we have the little thing there and it's

or like some other any other platform

has like the sparkle magic W Emoji like

that's our AI feature use this and it's

like really in your face a lot of people

don't like it you know it just kind of

be become invisible kind of like an

invisible AI 100% I mean the the way I

see it as AI is is the electricity of of

the future and like no one like like we

don't talk about I don't know this this

microphone uses electricity this phone

you don't think about it that way it's

just in there right it's not an

electricity enabled product no it's just

a product yeah it will be the same with

AI I mean now it's still a something

that you use to Market your product I

mean we do it we do the same right um

because it's still something that uh

people realize ah they're doing

something new but at some point no it'll

just be a podcast app yeah and it will

be

normal this AI in there I noticed you do

something interesting in your chat where

you Source the time stamps yeah is that

part of this prompt is there a separate

pipeline that adds Source sources this

is uh actually part of the prompt um so

this is all prompt engineering um uh you

should be able to click on it yeah yeah

yeah I clicked on it um this is all

prompt engineering with how to provide

the the context you know because we

provide all of the transcript how to

provide the context and then yeah get

the model to respond in a correct way

with a certain format and then rendering

that on on the front end this is one of

the examples where I would say it's so

easy to create like a quick demo of this

I mean you can just go to chat TBD past

this thing in say like yeah do this okay

like 15 minutes and you're done yeah but

getting this to like the production

level that it actually works 99% of the

time okay this is then where where the

difference lies yeah so um for this

specific feature like we actually also

have like countless rag xes ah

that that they're just there to correct

certain things that the llm is doing

because it doesn't always adhere to the

format correctly and then it looks super

ugly on the front end so yeah we have

certain reges that correct that and

maybe you'd ask like why don't you use

an llm for that because that's sort of

the again the AI native way like who

uses reg exess anymore but with the chat

for user experience it's very important

that you have the streaming because

otherwise you need to wait so long until

your message has arrived so we're

streaming live the like just like chat

chbt right you get the answer and it's

streaming the text so if you're

streaming the text and something is like

incorrect it's currently not easy to

just like pipe like stream this into

another stre yeah yeah yeah stream this

into another stream and get the stream

back which corrects it that would be

amazing I don't know maybe you can

answer that do you know of any um

there's no API that does this yeah like

you cannot stream in if you own the

models you can uh you know whatever

token sequence has has been admitted

start loading that into the next one if

you fully own the models uh I don't it's

probably not worth it what you think is

better and I think most Engineers who

are new to AI research and benchmarking

actually don't know how much reg reg

Xing there is that goes on in normal

benchmarks it's just like this ugly list

of like 100 different you know matches

for some criteria that you're looking

for

yeah um no it's very cool I think it's

it's an example of like real world

engineering yeah do you have tooling

that you're proud of that you developed

for yourself is it just a test script or

is

it I think it's a bit more I guess the

term that has come up is uh Vibe coding

well Vibe coding is something no sorry

that's actually something else in this

in this case but no no yes um Vibe evals

was a term that in one of the talks

actually on on um I think it might have

been the first the first or the the

first day in at the conference so

someone brought that up and yeah yeah uh

because yeah a lot of the talks were

about evals right which is so important

and yeah I think for us it's a bit more

Vibe evals you know that's also part of

you know being a startup we can take

risks like we can take the cost of maybe

sometimes it failing a little bit or

being a little bit off and our users

know that and they appreciate that in

return like we're moving fast and

iterating and building building amazing

things but you know Spotify or something

like that half of our Fe teachers will

probably be in a six month review

through legal or I don't know what uh

before they could out let's just say

Spotify is not very good at

podcasting um I have a documented uh

dislike for for their podcast features

just overall really really well

integrated any other like sort of llm

focus engineering challenges or problems

that that you you want to highlight I

think it's not unique to us but it goes

again in the direction of handling the

uncertainty of llms so for example with

last year at the end of the year we did

sort of a snipped wrapped and one of the

things we thought it would be fun to

just to do something with a with an llm

and something with the Snips that that

the user has and uh three let's say

unique llm features were that we

assigned a personality to you based on

the the Snips that that you have it I

mean it was just all I guess a bit of a

fun playful way I'm going to look at

mine I I forgot mine already um yeah I

don't know whe it's actually still in

the in the No No we we all took

screenshots of it we posted it in the in

the Discord and the the second one was

uh we had a learning scorecard where we

identified the topics that you snipped

on the most and you got like a little

score for that and the third one was a a

quote that stood out and the quote is

actually a very good example where we

would run that for user and most of the

time it was an interesting quote but

every now and then it was like a super

boring quote that you think like like

how like why did you select that like

come on for there the solution was

actually just to say hey give me five

candidates so it extracted five quotes

as a candidate and then we piped it into

a different model as a judge llm as a

judge and there we used the um a much

better model okay because with the the

initial model again as as I mentioned

Also earlier we do have to look at the

like the the costs because like we have

so much text that goes into it so we

there we use a bit more cheaper model

but then the judge can be like a really

good model to then just choose one out

of five so this is a the Practical

example I can't find it bad search in

Discord um so so you do recommend having

a much smarter model as a

judge uh yeah yeah and that works for

you yeah yeah interesting I think this

year I'm very interested in LM as a

judge being more developed as a concept

I think for things like you know sniff

WRA like it's it's fine like you know

it's it's it's entertaining there's no

right answer I mean we also have it um

we also use the same concept for our

books feature where we identify the the

mentioned books yeah because there it's

the same thing like 90% of the time it

it works perfectly out of the box one

shot and every now and then it just uh

starts identifying books that were not

really mentioned or that are not books

or yeah made yeah starting to make up

books and uh there basically we have the

same thing of like another llm

challenging it um yeah and actually with

the speakers we do the same now that I

now that I think about it yeah um so I'm

I I think it's it's a great technique

interesting you run a lot of calls yeah

okay you know you mention cost you move

from cell phos a lot of models to the to

the you know big live models open the

eye uh and Google uh no anthropic um no

we love Claude like in my opinion Claude

is the the best one when it comes to the

way it formulates things the Personality

yeah the personality okay I actually

really love it but yeah the cost is is

still high so you canot you tried hiu

but you're you're like you have to have

son it uh like basically we like with

hiu we haven't experimented too much we

obviously work a lot with 3.5 son uh

also you know cing yeah for coding like

in cursor just in general also

brainstorming we use it a lot um I think

it's a great brainstorm partner but yeah

with uh with with a lot of things that

we've done done we we opted for

different models what I'm trying to

drive at is how much cheaper can you get

if you go from closed models to open

models and maybe it's like 0% cheaper

maybe it's 5% cheaper or maybe it's like

50% cheaper do you have a sense it's

very difficult to to judge that I don't

really have a sense but I can I can give

you a couple of thoughts that have gone

through our minds over the time because

obviously we we do realize like given

that we we have a couple of tasks where

just so many tokens going in um at some

point it will make sense to to offload

some of that uh to an open source model

but going back to like we we're a

startup right like we're not an AI lab

or whatever like for us actually the

most important thing is to iterate fast

because we need to learn from our users

improve that and yeah just this velocity

of this these iterations and for that

the closed models hosted by open AI

Google is and topic they're just

unbeatable because you just it's just an

API call yeah um so you don't need to

worry about so much complexity behind

that so this is I would say the biggest

reason why we're not doing more in this

space but there are other thoughts uh

also for the future like I see two

different like we basically have two

different usage patterns of llms where

one is this this pre-processing of a

podcast episode like this initial

processing like the transcription

speaker diarization chapter iation we do

that once and this this usage pattern

it's it's quite predictable because we

know how many podcasts get released when

um so we can sort of have a certain

capacity and we can we we're running

that 24/7 it's one big Q running 247

what's the Q job Runner uh is it J uh

Jango just like the python one no that

that's just our own like in our database

uh and the back end talking to the

database picking up jobs fighing it back

I'm just curious in orchestration and I

mean we we of course have like a lot of

other orchestration where we use uh the

Google pops up okay uh thing but okay so

we have this this this usage pattern of

like very predictable uh usage and we

can max out the the usage and then

there's this other pattern where it's

for example the snippet where it's like

a user it's a user action that triggers

an llm C and it has to be real time and

there can be moments where it's bik

usage and there can be moments when

there's very little usage for that there

is that's basically where these llm API

calls are just perfect because you don't

need to worry about scaling this up

scaling this down um handling handling

these issues Ser versus server 4 yeah

exactly like I see them a bit like I see

open Ai and all of these other providers

I see them a bit as the like as the

Amazon sorry AWS of of AI so it's a bit

similar how like back before AWS you

would have to have your your servers and

like buy new servers or get rid of

servers and then with AWS it just became

so much easier to just Ram stuff up and

down yeah and this is like the taking it

even even um to the next level for AI

yeah I I am a big believer in this

basically it's you know Intelligence on

demand yeah we're probably not using it

enough in our daily lives to do things I

should we should be be able to spin up

100 things at once and like go through

things and then you know stop and I feel

like we're still trying to figure out

how to use LMS in our lives effectively

yeah yeah yeah 100% I think that goes

back to the whole like that that's for

me where the big opportunity is for if

you want to do a startup um it's not

about that you can let the big Labs

handle the challenge of more

intelligence yeah but um it's the

existing intelligence how do you

integrate how do you actually

incorporate it into your life it's AI

engineering okay cool cool cool um the

one one other thing I wanted to touch on

was multimodality in Frontier models

dwar cash had a interesting application

of Gemini recently where he just fed raw

audio in and got Dior transcription out

yeah or time stamps out and I think that

will come so basically what what what

we're saying here is another wave of

Transformers eating things cuz right now

models are pretty much single modality

things you know you have whisper you

have a pipeline everything uh no no no

we only fit like the raw the raw files

do you think that will be realistic for

you I 100% agree okay basically

everything that we talked about earlier

with like the speaker diarization and

htics and everything I I completely

agree like in the in the future that

would just be put everything into a big

multimodal llm okay and it will output

uh everything that you want yeah so I've

also experimented with like just with

with Gemini 2 H with Gemini 2.0 flash

yeah yeah just for fun because the big

difference right now is still like the

cost difference of doing speaker

diarization this way or doing

transcription this way is a huge

difference to the pipeline that we've

buil up huh okay I I need to figure out

what what that cost is because in my

mind two flash is so cheap yeah but

maybe not cheap enough for you uh no I

mean if you compare it to yeah whisper

and speaker ization and especially self

hosting it and yeah yeah yeah yeah okay

but we will get there right like this is

just a question of time and um at some

point as soon as that happens we'll be

the first ones to switch yeah awesome

anything else that you're like sort of

eyeing on the horizon as like we are

thinking about this feature we're we're

thinking about incorporating this new

functionality of of AI into our into our

app yeah we I mean there's so many areas

that we're thinking about like our

challenge is a bit more choosing yeah

choosing so I mean I think for me like

looking in into like the next couple of

years they the big areas that interest

us a lot basically four areas like one

is content um right now it's it's

podcasts I mean you did mention I think

you mentioned like you can also upload

audio books and YouTube video YouTube I

actually use the YouTube One a fair

amount but in the future we we want to

also have audiobooks natively in the app

and uh we want to enable AI generated

content like just think of take deep

research and notebook LM

podcast generation like put these

together that that should be that should

be in our app the second area is

Discovery I think in general yeah I

noticed that you don't have so you have

download counts and most Snips right

something like that yeah yeah on the

Discovery side we want to do much much

more I think in general Discovery as a

paradigm in all apps is Will undergo a

change thanks to AI you know there has

been a lot of talk before Elon bought

Twitter there was a lot of talk about

bring your own algorithm to Twitter like

that was Jack dorsey's big thing or like

he he talked a lot about that yeah and I

actually think this is coming but with a

bit of a Twist so I I think what

actually AI will enable is not that you

bring your own algorithm but you will be

able to talk you will be able to

communicate with the algorithm so you

can just tell the algorithm like hey you

keep showing me cat videos and I know I

freaking love them and that's why you

keep showing them to me but please for

the next two hours I really want to like

get more into AI stuff do not show me

cat videos and then it will just uh

adapt and um of course the question is

you know like big platforms like I don't

know let's say say Tik Tok they do not

have the incentive to offer that exactly

that's what I was going to say but we

actually like our we are driven by

helping you learn get the most like

achieve your goals and so for us is it

actually very much our incentive like

hey no you you you should be able to

guide it um yeah so that was a long way

of of of saying that I think um there

will happen a lot in recommendation

order

by yeah um popular yeah yeah I I think

collaborative filtering would be the

first step right for for rexus and then

and then some LM fancy stuff um um yeah

maybe maybe I to go back to the question

that you have before so the other like

these were the first two areas like the

other two um

voice voice as an interface as in voice

AI how is this going to exist yeah so

maybe I can tell you a bit first like

why I find it so interesting for us yeah

because voice as an interface like

historically there has been so much talk

about it and it always fell flat the

reason why I'm excited about it uh this

this time around is with any consumer

app I I like to ask myself what is the

moment in my life what is the trigger in

my life that gets me to open this app

and start using it so for example I

don't know take Airbnb it's the trigger

is like ah you want to travel and then

and then you you do that uh then you

open up the app apps that do not have

this already existing natural trigger in

your life uh it's very difficult for

Consumer app to then get a US it to app

again there's basically only one app one

super success uccessful app that has

been able to do that without this

natural trigger and that is dualingo ah

so dualingo like everyone wants to learn

a language but there's you don't have

this natural moment during your day

where it's like ah now I need to open up

this app you have the notifications

exactly the ourl memes exactly so they I

mean they gamified [ __ ] out of it I mean

super successful super beautiful they

are the goats in this in this Arena but

the much easier is actually no there is

already this trigger and then you don't

have to do all of the streaks and

leaderboards and and everything okay

that's a bit of a context now if you

look what we're doing and our goal of of

getting people to really maximize what

they get out of their listening um we

are interested in in there are a couple

of features where we know we can sort of

10x the value that people get out of a

podcast okay um but we need them to do

something for them there is friction

involved because it's it's it's all

about learning right it's about thinking

for yourself like that's these those are

the moments when you actually start yeah

really 10 Xing the value that you got

out of the podcast instead of just

consuming apply apply the knowledge yeah

okay yeah basically being forced to

think about like what was actually the

main take away for you from this episode

okay like uh this is something that I

like doing myself for every episode that

I listen to I try to boil it down to

like try to decide one single takeaway

yeah even though there might have been

10 amazing things you pick one one most

important one yeah and this is a this is

an active process that is like a forcing

function in your brain to challenge all

of the insights and really come up with

the one thing um that is applicable to

you and your life and what you might

want to do with it so it also helps you

to turn it into into action this is uh

basically a feature that we're

interested in but you have to get the

user to to use that right so when do you

get the user to use that if this is all

text based then we're basically playing

the same game as dualingo where at some

point you're going to get a notification

from snip and be like hey swix come on

you know you should do this maybe

there's a blue

L um but if you have voice you can

basically hook into the existing habits

that the user already has so you already

have this habit that you listen to a

podcast you're already doing that once

an episode ends instead of just jumping

into the next episode you can actually

have your AI companion come on and you

can have a quick conversation you can go

through these things um and how that

looks like in detail like that is still

like we need to figure that out but just

this Paradigm of you're now you're

staying in the flow like a bit this this

also relates to what you were saying

like AI that is invisible like you're

staying in the flow what you're already

doing but now we can insert a completely

new experience in there that helps you

get get the most out of your L yeah yeah

I think your framing of this is very

powerful because I think this is where

you are a product person more than an

engineer because an engineer would just

be like oh it's just chat with your

podcast it's like chat with PDF check

with podcast okay cool but you're

framing it in a different lights that

actually makes sense to me now as

opposed to previously I don't chat with

my podcast like go why I I just listen

to the podcast right but for you it's

more about retention and learning and

and all that um and because you're very

serious about it this why you started a

company um this is your focus on that

whereas yeah I'm still me like I will

admit I'm still stuck in that consume

consume consume mentality and I know

it's not good but this is you know my

default which is why I was a little bit

lost when you were saying all the things

about dual lingo and you're saying the

things about the trigger CU my trigger

for for listening to the podcast is you

know I'm by myself that's my trigger but

you're saying the trigger is not about

listening to the podcast the trigger is

remembering and retaining and processing

the podcast I just listened

to so no but so so what I meant like you

already have this trigger that gets you

to start listening to a podcast yes like

this you already have and so do I don't

know millions of people yeah so they're

more than half a billion monthly active

podcast listeners okay um so you already

have this trigger that gets you to start

listening but you do not have this

trigger as you just said yourself

basically you do not have this trigger

that gets you to

regularly um process process this

information right and um voice basically

for me is is uh the ability to hook into

your existing trigger with the trigger

that I was talking about is basically

your podcast ends and you're just still

listening so we just continue and we can

now spend you know this can be 2 minutes

like I'm not saying now this is like a

60 Minute process I think like 2 minutes

3 minutes that can just come on

completely naturally and if we manage to

do that and you start noticing as a user

like freaking hell like I'm just now

spending 3 minutes with this AI

companion but like your retention is

more I'm taking this much away and it's

not and like retention is one thing but

you like you start to take what you've

learned and apply to what's important to

you like you're thinking yeah um if we

get you to notice that feeling

then yeah then we want yeah I would say

like a lot of people rely on Anki an

notes like flash cards and all that to

to do that but making the notes is also

a chore and um I think this I think this

could be very very interesting I think

that I I'm just noticing that it's it's

kind of like a different usage mode like

you already talked about this you know

the the name of snips is very snip

Centric and I actually originally also

resisted adopting s because of that but

now you're like you know you observe

that people are listening to long form

episodes and they you're talking at the

end like the ideal implementation of

this is I browse through a bunch of

snips of the things that I'm subscribed

to I listen to the Snips I talk with it

and then maybe it double clicks on on

the the podcast and it it goes and finds

other time stamps that are relevant to

the thing that I want to talk about just

I was just thinking about that I don't

know if that's interesting I think these

are all areas that that we should

explore yeah like um we're we're still

quite open about how this will look like

in in detail what are your thoughts on

voice cloning everyone wants to continue

I have had my voice cloned and people

have talked to me my the AI version of

me is that too

creepy I I don't think it's too creepy

in the

future okay with a lot of these things

in our society is going through a change

um and things seem quite weird now that

in the future will seem normal um I

think already voice cloning has become

much more normalized I remember I was at

the I think it was

2017 uh nips conference like back when

San Diego I know la la um it was the FL

Rider one yeah yeah FL yeah so everyone

says that was Peak nips yeah um I

remember there was this this uh talk or

Workshop by liar bird

um they actually got acquired by DP

later they were doing voice clo and they

they were showing off their Tech and

there was this huge discussion later on

like the all of the moral implications

and and ethical implications um and it

really felt like this would never be

accepted by Society mhm and you look now

you have 11 labs and just anyone can

just clone their voice and like no one

really talks about it as like oh my God

the world is going to yeah so I think

Society will get will get used to that

in now case I think there are some

interesting applications where we'd also

be super interested in working together

with creators like podcast creators to

play a bit around with this with this

concept I think that would be super cool

if someone can uh you know come on to

snip go to the latent bace podcast and

start chatting with AI SS yeah um no I

think I think we'll be done we want to

obviously I think as an AI podcast we

should be first consumers of these

things yeah I would say that one

observation I've made about podcasting

this the general state of the market and

you can ask me your questions you know

things you want to ask about podcasters

we are focusing a lot more on YouTube

this year YouTube is the best podcasting

platform it is not MP3s it is not Apple

podcast it is not Spotify it's YouTube

and it's just the social layer of

recommendations and the existing habit

that people have of logging on to

YouTube and getting getting that that's

my observation I you can Riff on that

the only thing I would just say is like

when you were listing your list of

priorities you said audiobooks first

over YouTube and I would switch that if

I were you yeah like as in YouTube video

video podcasts I mean it's obvious yeah

video podcasts I hear to stay I not just

here to stay bigger yeah what I want to

do with snip is obviously also add video

to to the platform oh yeah the way I see

video is I do believe it's I like this

concept of backg groundable video I

didn't come up with this concept was

actually Gustaf srom the CPO of Spotify

exactly exactly when I speak with people

it it remains uh true that they listen

to podcasts when they do something else

at the same time like this is like 90%

of their consumption also if they if

they listen to on on YouTube but every

now and then it's nice to have the video

it's nice if you're for example just

watching a clip it's nice if they

sometimes mention something like they

show some slides or they show some

something where you need to have the

visual uh with it it helps you connect

much more with your uh with the host as

like as as as a listener but the biggest

benefit I see with video is Discovery I

think that is that is also why YouTube

has become the biggest podcast player

out there because they have the

discovery and Discovery in video is just

so much easier and so much better and so

much more engaging so this is the area

where I'm most interested about when it

comes to video and snip that we can

provide a much better much more engaging

and much more fun

Discovery experience for consumers for

yeah for consumers okay I think that you

almost have like three different

audiences the vast majority of people

for you is the people listening to

podcasts right of course then there's a

second layer of people who create Snips

right who who add extra data annotation

value to to your platform by the way we

use the snip count as a proxy for

popularity right because we have

download accounts but for example

platforms like Spotify rehost our MP3

file so we don't get any download count

from Spotify snip count is active like I

opt in to to listen to you and I shared

this those are those are really really

good metrics but the third audience that

you haven't really touched is the

podcast creators like myself yeah and

for me Discovery from that point of view

not from your point of view Discovery

for me is like I want to be discovered

and I think YouTube is still there

Twitter obviously for me substack Hacker

News I try to I Really tried very hard

to rank on Hacker News yeah I think when

Tik Tok took this very seriously they

prioritized the creators of the content

and for you the creator of the content

was the Snips but there may be a W for

you in which you prioritize the creators

of the podcast yeah interesting

observation what what uh what are some

of your ideas or thoughts you have some

specific um Riverside is the closest

that has come to it the script is number

two the script bought a Riverside

competitor and and as far as I can tell

it's not very it's not been very

successful the script just like has like

has a very very good Niche very very

good editing angle and then just has

hasn't done anything interesting since

then although Underlord is good it's not

great like your chapter iation is better

than the scripts again like they should

be able to be you they're not and

Riverside is good also very very good

very very very good like so we we

actually recently started a second

series of podcasts within Len space that

is YouTube only cuz you only find it on

YouTube and it's also shorter so like

this is like a one and a half hour 2

hour thing remote only 30 minutes Chop

Chop send it on to Riverside River Side

pretty good for that not great it

doesn't do good thumbnails it doesn't

good do the editing still a little bit

rough it has like this Auto editor where

like you know whoever's actively

speaking it focuses on the editor on on

the on the active speaker and then

sometimes it goes back to like the multi

speaker view that kind of stuff people

like that okay but like the shorts is

still not great you know like I still

need Auto I need need to manually

download it and then republish it to

YouTube the shorts I still need to pick

they they mostly suck there's still a

lot of rough edges there that ideally me

as a Creator like you know what I want

you definitely know what I want I sit

down record press the button done yeah

we're still not there yeah I think you

guys could do it okay so if I can

translate that for you it's really about

the simplifying the creation process of

of the podcast yeah and and I I'll tell

you what this this will increase the

quality because the reason that most

podcast or or YouTube videos or [ __ ] is

they're made by people who don't have

life experience who are not that

important in the world they're not doing

important jobs and so what you want to

actually enable is CEOs to each of them

make their own podcasts who are busy

they're not going to you know sit there

and figure out Riverside the a lot of

the reason that people like l space is

it takes an idiot like me who could be

doing a lot more with my life making a

lot more money having a real job

somewhere else I just choose to do this

cuz I like it but otherwise they will

never get access to me and the access to

the people that I have access to so

that's my pitch

cool cool um anything else that you

normally want to talk to podcasters

about I think we've we've covered

everything uh I guess like last last

messages uh you know go try out snipped

yeah it's a premium version so you can

use and try out everything for free also

happy to provide you with a with a link

that you can add to the show notes to

try out the premium version also for

free for a month if people want to do

that um yeah give it a shot I would say

yeah thanks for coming on uh I would say

that the after you demoed me I did not

convert for another four to 6 months

because I found it very challenging to

switch over and I think that the main

thing like you you you basically had you

you have import opml right but there's

no way to import like all the existing

like half listen to episodes or like my

rankings or whatever and for that for

listeners who who are I have a blog post

where I talked about my my my switch

just treat it as a chance to clean

house that a good point things and you

know just refocus your Fresh Start

2025 yeah great well thank you for

working on snip thank you for coming on

you know we usually SP a lot of time

talking to like big companies like

Venture startups B2B s you know that

kind of stuff but uh I think your

journey is like you know it's a small

team building a BDC consumer app is the

kind of stuff that we like to also

feature because a lot of people want to

build what you're doing and they don't

see role models that are successful that

are confidence that are like having

success um in this market which is very

challenging um so uh yeah thanks for

thanks for sharing some of your your

thoughts thanks yeah thanks thanks for

having me and thank you for creating an

amazing podcast and amazing conference

as well thank you

[Music]

Loading...

Loading video analysis...