This Startup Built the Infrastructure Powering Voice AI

By YC Root Access

Summary

Topics Covered

Obsession Fuels Founder Endurance
Voice AI Needs Full Tech Ecosystem
Threshold Crossings Ignite PMF
Promptable Models Unlock Intelligence
Subject Matter Expertise Beats Capital

Full Transcript

It's super cool to be here today with Dylan from Assembly AI. Dylan, thank you so much for joining us.

>> Yeah, I'm excited to be here.

>> So, Assembly AI is one of the first AI companies YC ever funded in it that in our first what we called at the time AI cohort back in the summer of 2017 and today it's actually one of the most

successful companies YC has ever funded.

For folks who aren't familiar, how about you just tell everybody what Assembly AI is and uh where you're at now.

>> So, we help other companies build voice AI features and applications. Everything

from AI notetakers to AI capabilities in contact centers to real-time voice agents, healthcare companies are building ambient medical scribes on top

of our voice AI infrastructure platform.

So, we have about a million developers that have signed up to the platform. Um,

we have about 10,000 customers. Last

year o around 250 million voice hours were sent through our platform and we're now doing >> 250 million voice hours.

>> Yeah, we're now doing um almost 2 million hours per day. Uh so the run rate on that would be you know like 700 million voice hours for this year's and

it's continuing to grow week over week.

Um, so we really are focused on helping other companies uh have the voice AI infrastructure and primitives they need to just innovate around voice and to

build innovative products and features either like standalone or in their in their products.

>> Do you have an example of maybe one or two customers that people would know and how uh how they're built on a >> Yeah, for sure. So notetaking is a really popular use case that companies

are are uh shipping with our platform.

So if you've used granola or fireflies notetakers, those use Assembly Eyes uh platform.

>> Interesting. So So all all my Granola notes that I you know I said where it was actually all going through assembly.

>> Yeah. Uh that's right. Um also um in in the hiring uh hiring segment. So um if you've used Metaview or um Ashb's

Noteaker. Okay. Ash. Yeah. So notetakers

Noteaker. Okay. Ash. Yeah. So notetakers

are a really big segment. We're also um pretty widely deployed in contact center. So there's a big contact center

center. So there's a big contact center company they're called Calabrio. Um they

provide contact center software to big uh brands like Delta Airlines.

>> Okay. So if I called 1800 Delta or whatever to like change my Delta flight, there's a good chance that like under the hood that audio was actually being processed by assembly through this like

chain of customer relationships.

>> Most likely. Yeah. So right now um product teams are shipping uh products and cap like capabilities around voice with our platform. But then we're also

seeing enterprises. So right now we have

seeing enterprises. So right now we have a a lot of um Fortune 500 enterprises that are building AI capabilities themselves within their contact center

operations or within their trust and safety operations around voice to automate workflows for their teams. So it's a mix of product teams but then also enterprise companies that are are

uh looking to build with voice AI. So

Zoom for example um has used our our infrastructure for a number of different capabilities. Yeah,

capabilities. Yeah, >> you guys were super early to AI 2017.

This was like way way before chat GPT. I

remember your batch there was what a handful of companies working on AI.

>> Yeah, there were like 10 other companies in the w in the AI cohort.

>> Tell us the origin story. How did you end up starting this company? How did

you end up getting into AI so early?

>> Yeah, so I um I started a I mean the the really you know early story is I started a company in college um and and it was not an AI

company but through that uh that experience I taught myself how to code.

So, um, you know, you wouldn't need to do this now with AI, but back then I bought a bunch of programming books on PHP and Python and Django, and I just

read them all and taught myself how to code and started building um like SAS software products and and and launching them as as little startups. Um,

>> like what did you one was like a fundraising tool for college organizations. Um, one was a tool where

organizations. Um, one was a tool where uh small businesses could like put up a QR code and then customers could text them uh uh like get a number that they

could text feedback to. This was like anonymous feedback capture through text message, bunch of random stuff. Um, and

I I learned from that that I really loved programming and that I really loved the the creative um like journey of building something and and like

thinking of this you had this like fuzzy picture of a product or something in your mind and then you could just you know through programming build it and see it come to life and get feedback. It

was this very like addictive kind of um feedback loop that I I really enjoyed.

So after college, I just uh kept programming and kept you know learning about um uh everything from like uh uh security to machine learning and I

really got interested in machine learning. So this is like support vector

learning. So this is like support vector machine uh time. This is not like deep neural net time.

>> Um >> eventually I joined a machine learning team out in San Francisco. So, I I moved to San Francisco, joined a team at

Cisco, um the company Cisco that's based here. It's actually like a few blocks

here. It's actually like a few blocks from where we're filming right now is where my job was in Dog Patch. Yeah. And

that was where I got I started getting learning about neural networks and getting into neural networks. So, to

give you a sense of time, this is like 2015. Um so, uh like very early into you

2015. Um so, uh like very early into you know the neural network. This was like right when when we wait when did we crack Imagenet?

>> I think it was like 2013 or 14. It was

around that time. Okay.

>> Um but I do remember going to the very first TensorFlow meetup down at Google's headquarters in like 2014 or 15. So it

was you know that that was the the um the state of things at that time. So a

couple things happened around that time.

Um, one is I saw that uh AI and and neural networks and deep learning were going to um take off over the next decade that it was such early days and

data compute algorithms were just going to continue to get stronger and stronger and they were going to bring, you know, really disruptive new capabilities to

everything from self-driving cars to natural language processing and, you know, everything that we've seen over the last 10 years. Um, and so I just really felt like, wow, that's an area I

wanna I was drawn to and wanted to keep going uh deeper in. And then I I bought an Amazon Echo when that came out in 2015.

>> It was crazy because my experience with voice recognition uh prior to the Amazon Echo was that it sucked, right? Siri didn't work.

sucked, right? Siri didn't work.

Everything sucked. You didn't use it.

>> But the Amazon Echo, you you could use it across the room. So far field it worked like you know great distance from the microphone. Your TV could be on. You

the microphone. Your TV could be on. You

could be you know shouting to it over a bunch of background noise and it would still work. And what was so incredible

still work. And what was so incredible was that it worked so well that I found I was building new habits around the product because it was reliable.

>> So setting timers, asking for the weather, playing music, it was just like >> I was blown away by how well those things worked over voice and that it was reliable. Um and so I started looking

reliable. Um and so I started looking into voice recognition technology to just build stuff on my own and to innovate and experiment on my own. Um

and there was nothing in the market at the time. It was either like bad or um

the time. It was either like bad or um so the the company nuance at the time which was like a big >> they were the big incumbent.

>> Yeah. Exactly. Um, I got in touch with them and I, you know, was trying to find their developer SDKs and it, you had to pay, I think it was like, you know, thousands of dollars up front.

>> I remember I I purchased it. Yeah.

>> And then they would mail you a CDROM >> with a developer SDK on it. And I didn't even have a CDROM drive in my laptop.

So, you know, for me, I was used to the Twilio or Stripe >> style developer experience and and and this was like the opposite of that. Um,

so all these like ideas kind of came together for me and I got really excited about the idea of what if you could um, you know, use like these new deep

learning algorithms, all this innovation that was happening around AI and deep learning at the time and to to build way

better uh, voice AI capabilities and technology and then make it super easy for any developer to build with those

things because for me I remember getting access to Twilio, you know, back in the day. So like 2012 or 2013 getting access

day. So like 2012 or 2013 getting access to Twilio >> and I had I I was just overwhelmed with creativity because I was like wow now I

have like I have access to this cool technology >> me as a as a college student developer and like I can >> experiment with all these really cool

ideas and and that has been the whole spirit about assembly you know we're not an application company we're really focused on being a developer infrastructure tool that just makes it

really easy for a developer, whether you're, you know, a a college student or a high school student or a development team at a Fortune 500, to have access to

this amazingly powerful technology um effortlessly and with a really good experience around it. So, that that was the kind of um the the mix of of ideas

that came together and then, you know, is still what we're what we're focused on, you know, delivering. uh many years later.

>> I don't know if you've read Paul Graham's essays, but uh >> some of them >> a lot of his his early essays talk about how um most of the best companies of the

past like Microsoft and Apple didn't start as companies. They started as just an engineer building technology that they found intellectually interesting without any clear idea of how it would

turn into a business and scratching their own itch, right? building the

thing that that they saw missing in in the world that they personally >> wanted to use and like you know at the time that you started this like AI was not the hot thing. It was not it was not

the thing racing huge venture rounds >> you couldn't use AI >> if you said AI in your it was deep learning was the the the thing to say if you used AI it was it was um it was like a scam.

>> Yes. Exactly.

>> That was the state of things.

>> Totally. Yeah. Totally. Because there

have been like decades of like failed promises from from AI startups. So VCs

are allergic to anything with AI in it.

>> Yeah. Exactly. Exactly. My experience as a founder has has has been one that I think you have to be obsessed with the problem that you're solving >> because what keeps me going is like I

really want this product to exist that that we have not you know finished creating yet you know and and I I'm I'm obsessed with like that problem and

wanting that to to be realized whereas I think if you're just focused on like a business opportunity you know um then and you're not obsessed with the product

or the the subject matter um you can find yourself losing interest. But I think if you're, you

interest. But I think if you're, you know, you should pick, my lesson has been like and what I share with other founders like you should pick a problem that you're you're just obsessed about because when I think about like what

would I be doing if I wasn't building assembly, you know, I I don't know like I'm having a lot of fun building assembly. Um, and I use our product all

assembly. Um, and I use our product all the time. Like I'm constantly dog

the time. Like I'm constantly dog fooding it and playing with it and and showing people demos and it's fun. So I

think that you know picking something working on something that like you want to exist because you want to be a user of it um h has been really fun to >> to me probably the craziest part of the

assembly story is that you started working on this in 2017 when very few people were interested in AI.

>> Yeah.

>> And it basically didn't start taking off until 2021 2022. You you raised your series A in 2022 5 years after starting the company.

>> Yeah. And the first years were just this like long slog where it wasn't clear that it was ever going to work.

I'd love to hear about those early years and like what kept Yeah. What what was it like? What kept you going? Especially

it like? What kept you going? Especially

as a solo founder. There weren't that many there weren't that many solo founders back in 2017.

>> Yeah. Yeah. So, I got into YC summer 2017. Um solo founder. I had pretty much

2017. Um solo founder. I had pretty much just started working on the company and I got in late. I applied late. Got in a week before the batch started and I was

just so stressed out because uh you had all these other companies that you know had like real products and they were further along and um they could iterate

so much faster. But like it I picked a very hard product to work on back then and I just remember being you know so stressed out and so overwhelmed back back in 2017. Um

>> yeah everybody else was just like building database back websites right >> they could like push things every day.

>> You get feedback you can iterate on it that night right whereas like our early users would give us feedback and it's like all right we'll cut back to you in a month when we have a a better model

and YZ is three months. So we had three you know iteration cycles. It's not it's not a lot of iteration. Um and no one was building with voice because like

what I have now realized is that um this whole ecosystem of technology needed to be created like so that you can build voice AI applications. What do I mean by

that? I mean yeah you need really good

that? I mean yeah you need really good voice AI models but you also need LMS, you also need vector databases. You also

need WebRTC. you need mobile 5G like you need this whole ecosystem >> um to to build what we're seeing developers build you know what I remember like 2021

we started to get transcription models to work like decently well um but the interesting applications were not just like transcribing something it was sentiment analysis of a phone call it

was summarizing a meeting but if you wanted to do those types of tasks back then you had to train sentiment analysis models or train summarization models, right? So whereas now you can use

right? So whereas now you can use assembly and uh you know an LLM and like what you can do is it's it's insane and it's super

easy to build and innovate around voice data.

So, it took a while for that ecosystem of technology to come together. And I

think it goes back to probably what I was saying earlier where if in 2017 I if I had put together like a business plan, I would probably been like, I shouldn't work on this because, you know, the

market is so small.

>> The tab is like $10 million.

>> Yeah, exactly. The market's so small and all the technology sucks and like why am I working on this? But I, you know, I I thought it was a fun problem to work on.

Um, and I was really fortunate that back then, so Daniel Gross was our YC partner. Um, he had worked at Apple. He

partner. Um, he had worked at Apple. He

saw the state of voice recognition technology and he was a really early believer. And so he invested personally

believer. And so he invested personally in the company. Um, and really supported the company early on post YC. And we

just had these like early believers in the company that knew it was very early.

in I think for me I always had conviction that it was kind of like self-driving cars where okay once you it wasn't a question of like product market

fit like will will people want voice AI technology it was a question of um when will it be when will it have product market fit when will it get good enough

like self-driving cars um and I think that in hindsight we probably could have gone a bit faster but you know I was blurting And um yeah, it took it took a

while, but to give you, you know, a sense, it was like the first couple years, not much progress. We're like

three, four people. And then I think we landed our first, you know, real customer in um 2021.

>> Who is that? The first real customer, >> like real legit customer, still a customer of ours. Um a contact center company. Okay.

company. Okay.

>> And raised our series A in January of 2022. Um and then to date we've raised

2022. Um and then to date we've raised about >> which was still a little bit before LLM.

So like it was what happened in 2021 2022. Yeah. That caused the like fir the

2022. Yeah. That caused the like fir the the beginning of the inflection point.

>> Yeah. So um COVID was a big part of it.

So during COVID all of a sudden you had way more voice data created and captured over the internet than ever before. With

remote work, podcast started to get popular around that time. Our models

started to get better. We started to use you know modern transformer architectures train on more data. So

like transcription was getting better was getting cheaper. Then you had other NLP models like BERT that made it easier to do summarization or sentiment analysis on top of transcription. So you

had like the ecosystem of technology starting to come together where it was easier to build a voice AI application.

>> And so there was acceleration in the TAM around that time. Um and Excel led our series A and they saw that too. Um, and

then between then and now, obviously, it's just to continue to accelerate, you know, pretty dramatically. We raised

about $160 million in the course of like uh I don't know, three years or something. Um, less so or or yeah, like

something. Um, less so or or yeah, like three years. So, it was uh it took a

three years. So, it was uh it took a while to get to that point, but then once it did, it really started to accelerate.

>> All the use cases in those early years, were they all nonrealtime use cases?

They're all not real time >> on all so like analyze all the calls that went through this call center o for the past week and like >> find the ones with bad >> bad bad like NPS scores or things like

that >> pretty much all you know use cases on top of pre-recorded audio and then now um there's still a ton of growth in

those use cases um like usage to our non-realtime APIs is still growing you know 200% year-over-year but real time use cases are exploding because the

real-time capabilities are now way better uh way lower cost. You have you know ecosystem of real-time technology to build real-time notetakers, real time

voice agents that market is really taking off that use case. Um but but back then it was all non-real time >> and and that was your intuition with the Amazon Echo. I mean that was a real-time

Amazon Echo. I mean that was a real-time product. That was that was the

product. That was that was the >> we started the >> magical experience.

>> Yeah. The very first API I built was a real time API.

>> But the real time models back then I mean in general all the models were bad.

um real time which is a lot harder and it's only been I would say really within the last probably 18 months that real time models have started to get like

like cross a new threshold and and that's why you're seeing real time voice AI use cases take off because that threshold has has been crossed in the

last 18 months across accuracy latency and it's still not perfect right like >> um there's there's real time speaker identification that you know doesn't

work that well today. So there's still a ton of of room to go. But but what I have found is that all of a sudden these technologies cross some threshold

where and it's so hard to pinpoint what that threshold is, but all of a sudden they cross some threshold and there's product market fit and it just takes off. And so you know for the first time

off. And so you know for the first time we experienced that was around 2021. Um

and then now we're experiencing that again. Um, and I would say that's across

again. Um, and I would say that's across both real time and non-real time again.

So some of the newer non-realtime models that we have. They're more like uh uh general purpose like audio understanding

models. So they can, you know, identify

models. So they can, you know, identify speaker genders and they can uh uh capture like uh background uh sound effects and they can they can just

understand and capture audio with new capabilities and that's unlocking new use cases. So um what what innovative

use cases. So um what what innovative teams can do is it just keeps kind of what they can do just keeps kind of increasing.

>> So that's sort of the business side. I'm

I'm kind of curious to hear about more like the personal side and the like the the the company side. Like you were building this thing very quietly. It was

really just a few people for like five years and then all of a sudden I guess I guess it wasn't like actually overnight but like in a relatively short period over the course of like three years or something. It just started like

something. It just started like >> really working and it went on fire and like what what was that like as a as a as a founder?

Yeah, definitely like learn a lot of growing pains. Um,

growing pains. Um, and I I I think I learned a lot through that period where in the early days,

um, in the early days it was kind of just like, you know, no one's paying attention. Uh, stressful in a different

attention. Uh, stressful in a different way. And then now it's like okay there's

way. And then now it's like okay there's a huge you know opportunity we really want to be a leader in the space and and and really like executing against that

that opportunity. So I would say um

that opportunity. So I would say um we you know we when you when you raise a lot of capital quickly I think that that

puts a lot of pressure on a company um and on a team. And I think that uh like the the biggest lesson I've learned

is like you just have to kind of trust your trust your instincts as a founder.

Um because you really, you know, understand like you should like the market you're building in and the right like company market fit you need and stuff. And so, um, I think

stuff. And so, um, I think like it it was it was hard to know, um, I think it was hard to know like what

were the right instincts to follow um, when we started to see a lot of acceleration in the company and um, and and that just like took a bit of time, but now I think our instincts have have

been more have become like more refined as a company and then me too as a founder. you have sort of incredible

founder. you have sort of incredible insight into what's happening in the voice AI market. Like what's happening now, especially over the last few months? What are what are the big things

months? What are what are the big things you guys are working on? Where do you think it's going to go in 2026?

>> Yeah, so there's a couple of very exciting things we're seeing. So, um,

real time voice agents are are like they work really well. Now, it's kind of wild. Um my uh

wild. Um my uh uh yeah they real-time voice agents they they work really well now. So I think those are being rapidly deployed because the success rate of a real-time voice

agent is high enough now where they can deliver a good customer experience over customer support um you know frontline like reception tasks and I think as a

consumer I don't know have you have you interacted with one yet?

>> Of course.

>> Yeah. Yeah. So like when when you I'm starting to now when I call a plumber or service tech realize like oh I'm talking to a voice AI cuz I know like the >> Yeah. the the the the vast majority of

>> Yeah. the the the the vast majority of people can't tell.

>> Right. Exactly. Yeah. And it's we're still in this phase where >> you can't tell but then like some some some people will midway through be able to tell and it just gets weird. It's

like wait I'm I I just realized I'm talking to an AI. Um, so I think real-time voice agents are going to continue to be widely deployed and um,

and the ROI there is like crazy and the capabilities around real time are getting really amazing. We're also

seeing a lot of demand around um, robotics and consumer hardware. So a lot of like pretty popular robotics companies, I don't know if I can name them or not, but a lot of really popular

robotics companies are um, putting our models on the robots. So, you know, humanoid robots that are walking around.

>> That's cool.

>> Yeah. I think robotics, but also consumer hardware, you're going to start interacting with that more and more through your voice. So, even, you know, the coffee machine downstairs, right?

Like, you have to kind of flip through a touchcreen. But if you could walk up to

touchcreen. But if you could walk up to that coffee machine and just ask for the exact type of coffee that you want, >> you know, there's no reason why that technology shouldn't also exist.

>> Totally.

>> It took a while for consumer hardware to all have a touchcreen on it. Um but now I think you know voice will also be a another modality into these um hardware

devices that we that we buy. So you're

seeing a lot of demand there. But then

also these ambient devices. So uh

healthcare. We have a ton of healthcare companies building ambient uh scribes for doctors.

>> And this is a physical device.

>> This is uh software that they're just running on a laptop or running on a phone. and the doctor patient visit is

phone. and the doctor patient visit is captured because if you listen to some of this audio, it's actually like cra hard. Uh it's someone's laptop is

hard. Uh it's someone's laptop is recording a conversation that's like 10 15 feet away. Um tons of background noise, low speech, but the models can do

a pretty good job now at capturing all that with like, you know, accuracy rates in the 90 percentile. And so um now you can actually automate chart notes and you can automate insurance submissions

post doctor visit. So seeing a ton of healthcare companies build ambient scribes but also sales companies build ambient scribes. So

ambient scribes. So >> um there's a company we work with called zero if you've heard of them. They have

a ambient um scribe for uh in-person sales team. So you're going you know

sales team. So you're going you know doortodoor you're you're doing field sales. you have an app that they they

sales. you have an app that they they build an app you can run. It will give you advice on your your conversation that you're having um in real time in in

person. And those use cases are now

person. And those use cases are now working well enough where um where the sales people that are using that tool are >> I think the quote I heard from Jake,

their founders, they're taking like 10 to 20k more per quarter home in take-home pay because they're they're getting this advice that they weren't getting before. So, um, real-time voice

getting before. So, um, real-time voice agents, um, uh, consumer hardware, and then ambient capabilities that are kind

of listening, healthcare, um, sales, those are where we're seeing a ton of new applications being built and a lot of lot of like green field.

>> What's the team working on now? What's

on what's what's on the road map for 2026?

>> We're really focused on building smarter voice AI models. And so what I mean by that is models that can do more than just transcribe your speech, that you can give instructions to, that you can

control. Um, they're just a lot more

control. Um, they're just a lot more intelligent. I think, uh, current voice

intelligent. I think, uh, current voice AI models have gotten really good, but they're still kind of dumb. they don't

have a lot of intelligence in them um to understand context to uh you know for example understand this person's talking to me but they're in a noisy room and

there's a you know baby crying and they're they're they're sound stressed right um >> to to understand speakers that have uh that speak different languages really

well to know >> who's the primary speaker who's the background speaker a common issue with you If you're building a robot, for example, a common issue is like which voice do I listen to and take commands

from and which voice is the voice that's just like chatting with the human.

>> Interesting. Kind of the like noisy restaurant problem.

>> Yeah.

>> Yeah. And and knowing like who who is saying what, what's the role of the person. So all this we think about as

person. So all this we think about as you know more intelligent, you know, voice AI capabilities that we're focused on building that will operate in real

time. Um I we think that's like a a a

time. Um I we think that's like a a a big capability that we're seeing a lot of demand for. And so that's we just released a model last week um called

Universal 3 Pro. It's our latest model and it's our most intelligent model because you can give it instructions on what you want it to do around the audio that it's listening to and it can follow

those instructions pretty well. And this

is just the first release. Um now I can show you some of that >> instructions like a like an LM prompt.

>> Correct. Yeah. So it's it's we we we think about it somewhere in the middle between a multimodal LM and a traditional speechto text model where

what we really did was focus on creating a reliable you know transcription model um that can can stay on the rails on transcription and transcription like tasks but that you can still give

instructions to. Whereas if you were to

instructions to. Whereas if you were to use a multimodal LLM today and give it instructions, it goes off the rails too much. It's not

reliable enough in real world use cases.

So um we focused a lot on the post- training of this model so that it's like a reliable transcription model, but you can give it instructions and really guide it on what you want it to do.

>> Okay. So let me see if I if I if I understand this right. Like historically

the trans transcription models like the ones that you guys built only do transcription. They don't they they lack

transcription. They don't they they lack the general in like the the general intelligence of a of a of an LLM. And so

like the magical combination came when you paired a transcription model with an LLM. What this new model, it's kind of

LLM. What this new model, it's kind of like the fusion of the two of them. It

is a transcription model, but you've actually injected the like generalized intelligence of an LLM into it.

>> Yeah. More more or less. Exactly. Yeah.

Yeah. because we if you think about a multimodal general purpose LM it can do transcription tasks but that's maybe 10% of its total training data right so it

also can do math tasks and for um our customers they need a reliable voice AI model that can understand speech and

speakers and so we really focus the model on those types of tasks and um it doesn't get confused and think it's like a an assistant it it will it will kind

of operate as this like as this kind of in this like narrow space >> and I assume it's also much smaller and faster and cheaper to run than a full LLM where you're like taking this giant brain and having it only do this one

little thing, right?

>> Yeah. And you can run it in real time.

Um you can actually deploy it on your own server. So we support self-hosting

own server. So we support self-hosting these models uh to get latency down. And

where we're really excited is for customers to use this in u their real-time voice agents because you can just capture a lot more uh a lot more information from the speaker.

>> Can we see a demo?

>> Yeah, definitely. So, here's here's a demo of Universal 3 Pro. Mhm.

Uh-huh.

And you can see it gets pretty much all the the like stutters that I I I um and the noises that I made with really low

latency. And you can even see that here

latency. And you can even see that here as I was stuttering and talking here.

Super verbatim at what it captures. Um

another example that a lot of real-time models struggle with are complex terms like emails, uh long strings of digits.

And so I'll give you an example of how well this model can do there.

>> Which makes sense. Like if I'm calling a call center, one of the most common things that they're going to do is ask me for my email address, my address, my name.

>> Yeah.

>> And so I'll give you an example of how well this can capture that.

>> Yeah. I'll read you my email.

It's 726_dillanhotmail.com.

Sorry, that was the wrong one. It's

xyz14_x28gmail.com.

Oh wow, it like completely nailed it.

Like actually much better than a human.

>> Yeah, it gets, you know, complex strings of alpha numeric really well. Even if I do things like I'll do another one.

>> A aaa bbbc 1111 2x56.

>> Okay, so if I'm calling in and I'm reading like a like a like an airplane like uh ticket number or something like that, Yeah. it's just going to be able

that, Yeah. it's just going to be able to like totally nail a string of alpha numeric like that. Yeah. And you can it's it's even like super robust the um to to really challenging audio conditions. So I'll show you something

conditions. So I'll show you something that's cool. If I whisper to it, it

that's cool. If I whisper to it, it still will get it pretty well.

>> Yeah. My email address is Dylan dy assembly.com.

assembly.com.

>> Almost. I was whispering pretty low. Uh

it got assembly.com. Wow. Now, what is cool um is, you know, a lot of times you see errors like this and >> you know, it's like you have no control.

Um but you can actually prompt this model to to to be more accurate in these types of conditions. And so, um I'll switch over here and give you an example. Um so, I can tell the model,

example. Um so, I can tell the model, let's say I I want it to convert all the speech it's hearing into Spanish. I can

prompt the model. I can say transcribe this audio and translate everything you hear into Spanish.

And I'll record some audio for this example.

Hi, my name is Dylan and I live in New York City.

So you can see um it transcribed me but it transcribed me in Spanish because I I told it here I want you to transcribe everything in Spanish.

>> Um >> so so >> so this model it it's it's effectively like speaks many languages. How many

languages?

>> Yeah. So this is a multilingual model.

Um right now it supports seven or eight languages and then we're going to add a couple dozen. um over the next couple of

couple dozen. um over the next couple of of weeks. Yeah.

of weeks. Yeah.

>> How have people been using this promptable configuration since you launched it or is it too early to say it's only like a week out?

>> You know, I think that what we found over the last couple years is that every application has different requirements.

And so what this model gives developers the ability to do is um explain those requirements to the model and then it will follow them. So let me show you an

example of what I mean by that. Um,

so this is an audio. Um, I'll play this here.

>> I hope you got our card.

>> Okay, nobody talk. We'll just wait for her to talk.

>> Well, we just wanted >> Damn it.

>> So, um, let me let me let me throw this audio in and just show you out of the box what what the behavior will be.

>> Okay.

So you can see um >> it it's missing the background speaker, right? Um

right? Um >> if you if you want the model so so some developers don't want the background speaker to be captured. Um for example, if you're a robot >> and you just want to focus on the primary speaker,

>> you can have the model do that. But in

this example, let's say you do want to pick up the background speaker. Um you

can do cool things like what I'll show you here. Um, I need to actually just

you here. Um, I need to actually just copy this prompt out. So,

so um, what let's say you want to pick up the background speech, but you don't actually want to transcribe it. You just

want to know that there's some background speech. Um, here's a prompt

background speech. Um, here's a prompt we'll give the model will say when multiple speakers talk simultaneously, mark the cross talk segments. And then

we'll run that that file through again.

>> And you can see now the model's identifying the cross talk from the background speaker. And um it's not

background speaker. And um it's not transcribing it because we told it to just mark those cross talk segments.

Okay. But if you wanted to have it actually transcribe that background speech, you could tell it and transcribe that background speech. So you have this level of controllability over how the

model works. And um again, because it's

model works. And um again, because it's not this giant multimodal LLM, it's something in the middle.

>> Um it's it's going to behave like a ST or ST like model. Um so it can do really good multilingual code switching. It can

do speaker identification tasks. And

this is just the first version. We're

we're already training new ones that are going to come out and be a little bit better at instruction following new capabilities in just a couple weeks even.

>> This is super cool.

>> Yeah.

>> How did how did you train it and how how did you get it to be so accurate at phone numbers and email addresses?

>> Yeah, the the the phone numbers, the email addresses are really cool. Um the

accuracy there. So we have spent years working with customers building these types of applications and building up this like deep subject matter expertise

into um what are the failure modes, what are the things that really matter and really most of our time has been spent on um coming up with really good

post-training techniques to make the model uh perform really well in these in these traditionally challenging enir environment. So, for example, when

environment. So, for example, when you're saying mhm or no or yes, phone numbers, a lot of times that's how you're talking to a a voice agent today when you're trying to complete some

action. Um, and we spend a lot of time

action. Um, and we spend a lot of time on the post- training on on how we're training the model to make it really good on on those types of tasks. So

there's another interesting part of the assembly AI story to me which is back in 2017 when you started this in addition to voice AI being a bad market where it was like really small and nobody really had anything working. It was also like a

hyperco competitive market where you were competing with Google and nuance which people probably don't remember but was like a giant in the space and I I forget who else was around in 2017 but

I'm curious what that was like. Yeah.

>> And like were you scared of those competitors? Did you like

competitors? Did you like >> Yeah. Were were were were investors like

>> Yeah. Were were were were investors like skeptical of investing because you had all these like giant competitors?

>> Yeah, it they're definitely skeptical of investing. Still, I would say skeptical.

investing. Still, I would say skeptical.

Uh but yeah. Yeah, definitely. I think

it like I it goes back to like I was a user of those products and I saw they weren't good enough. Um, and I think that, you know, that to me back then and

still gives me conviction over our ability to win in the space because we have um we have this subject matter expertise over the problem space, what

customers want that gives us the ability to build the best product. And I think that in general that's kind of been something I've learned like um like yes

you need capital and you need team and resources like for sure but you also need to have subject matter expertise which comes from like being very close to the problem. And there's this example

um that I I think about which is like the Wright brothers. The Wright brothers in modern day time uh self-funded their their um their the plane

development for like with like $40,000 out of their own pocket and they were competing with Samuel Langley which um was funded by the Smithsonian like you

know I think 20x the amount of funding or something. Um, and the Wright

or something. Um, and the Wright brothers won. And I a big reason for my

brothers won. And I a big reason for my takeaway is like they had such better subject matter expertise over the problem because they

>> in in one year had um a thousand test flights >> where they personally would do the test flights and they would understand what's working, what's not, what do they need to go iterate on. Whereas Samuel

Langley, I think he had like seven >> and he didn't even do them himself. He

>> he had his minions do.

>> He had his team do it and and not a a dumb person, right? Like smart guy, but missed the subject matter expertise that the Wright brothers had. And so for me,

I think any company to win, you need to have like deep subject matter expertise over like what are you trying to to um to go build and do and and what type of

product are you trying to make? Um, and

so back then and now, I think that, uh, what gives us the ability to build the best product and to win is that we're a a really dense team of subject matter

experts. We're so close with customers.

experts. We're so close with customers.

We have thousands of customers in Slack with us.

>> We uh like our our Slack is a fire hose of customer feedback and and we don't filter it to just the good stuff. like

>> we have a a a big product team channel with like 80 people in it. We got some pretty bad feedback, you know, a week or two ago and it's just pasted it there and and everyone's like, "Yeah, that's

good feedback. Let's go fix that thing."

good feedback. Let's go fix that thing."

And so that's part of our culture of like we're just trying to build the best thing. Um and I I think that is back

thing. Um and I I think that is back then what gave us, you know, an advantage and now um what still gives us the ability to to to build a better product. Did did that also play out on

product. Did did that also play out on the technology side where like Google had unlimited money to like throw compute at this problem but did they end up basically like squandering it like how Hangley did and have you guys been

able to be smarter about how you do it because you're so much closer to the problem and the customers. I don't know what they're like Google or others are doing internally, right? I know it's like super smart teams. I think that we

just care about this problem space much more. And I would say the people on our

more. And I would say the people on our team are probably like the most experienced

people at developing and building real world voice AI systems in the world because we've been so deep in this. We

have some of the the most scale coming through our our API platform like in the market and we get to learn from all of that from customer feedback good and bad and from knowing where are people trying

to take the technology and so um we have a brilliant team but we have deep subject matter expertise within that team um not just like functional

expertise and I I think that's usually the missing ingredient in a big company you know the the people leading the projects are so far removed from the

customer feedback and and a lot of times haven't even tried the product themselves, you know, like it's remarkable. Um, and as soon as you get

remarkable. Um, and as soon as you get hands- on with your product, you realize like, oh, it's bad there, it's bad there, right? And so, um, a big thing we

there, right? And so, um, a big thing we have a big part of our culture of the company is like everyone's hands-on with the product. We're all testing it. We're

the product. We're all testing it. We're

all trying it. We're all building stuff with it. Um, and so you really quickly

with it. Um, and so you really quickly see like where it's good and where it's not. So like the email demo I gave you,

not. So like the email demo I gave you, like we we we really um um understand like that as a problem and are are making sure that that works really well.

So we're operating really close to the metal and close to customers and I think that's that's what helps us um out execute bigger, you know, more wellunded teams.

>> So the team was like really small through 2021 was just like a handful of people.

>> A handful of people. Yeah.

>> Handful of people. And then in 2021, you kind of entered this like hyperrowth mode.

>> Yeah.

>> Raised a bunch of capital, went into this hyperrowth mode. Yeah.

>> Yeah. What's what's the team like today?

And then like what was that journey like going from like basically not growing to like hyperrowth all of a sudden? Were

there uh some uh uh lessons learned along the way?

>> Yeah, totally. I think that we we raised a bunch of capital and then um we started hyperscaling and growing the team and building the team really

aggressively. And I think probably um

aggressively. And I think probably um one one of the the bigger mistakes we made at that time was like just not having clear enough conviction over like

where do we want to invest versus where are we just exploring. Um and so for for example like now I I try to really avoid

hiring someone to explore an area versus like let's with our existing team go explore it and if it's working well then we can go invest in that area and hire more people into that. But I think a lot

of companies, you know, will follow the traditional playbook of like, okay, this company, which is similar to mine, did X and Y and Z. Therefore, like I should do X and Y and Z and I should go build out

a, you know, 50 person SDR team and I should go, you know, I need this and that as if it's like a franchise business. Um, but I think every

business. Um, but I think every company's different. And so knowing what

company's different. And so knowing what to invest in, what teams to build, um especially in an emerging market like the one we're in

is is is difficult. And um now we're more conservative over like, hey, where do we really want to invest versus where is this just an you know, something an area we want to explore? And if it's

exploration, let's do that with the existing team. Um, and then also just

existing team. Um, and then also just like really strongly filtering for um, non-negotiables and and culture fits. I

think so every role I think has different non-negotiables. Um, but then

different non-negotiables. Um, but then you start talking to people and you're like, "Oh, I like this person." And, you know, we're we're vibing and so yeah, let's hire them. But then you realize, oh, I really, you know, this thing was

really important in the role and I should have been firmer in that on that non-negotiable. So like now we're very

non-negotiable. So like now we're very clear on all right, for these for this role, these are the non-negotiables for this role, these are the non-negotiables. For me as a founder,

non-negotiables. For me as a founder, one thing that's really important when hiring is um why do you want to work at a voice AI

company? Like why here? um not just an

company? Like why here? um not just an AI company, not just a VC back company, not just a company that's growing fast, but like what about our product, what about our space, our market, our

customers are are you passionate about?

Um because I think that um that that's a really important ingredient, right? like there's there's

ingredient, right? like there's there's challenging problems >> here that you know and at any company there's a lot of challenging problems but I think that you know what I care a

lot about is is is finding people that are really excited about our product and what our customers are building um and and will be a lot more you know obsessed

and passionate as a result and so that's something that I think I've I also um am a lot more like strict on when we're hinging hiring people now. So, there's

all these like things that I've I've I've learned and I think that that was probably um one of the harder lessons to learn, but now we're we're about 80

people at the company. Um super dense team.

>> Only 80 people. That's actually a lot smaller than I would have expected given the the 700 million hours of audio that you're processing every year.

>> Yeah. Yeah. Super super small dense team. I mean, we're like, you know,

team. I mean, we're like, you know, using AI everywhere across the company um to just be like to move as fast as possible. And I I think, you know, one

possible. And I I think, you know, one thing we talk about at the company is we're going to optimize for speed and innovation. Like that's going

to be an explicit decision. So we don't do like OKR cascades and we don't do planning exercises like we have clear metrics that we want to hit and those are very transparent across the whole

company and then we're all just running towards those and we can move quickly and innovate and be dynamic and responsive because we're this small

transparent team. Um, and I I think that

transparent team. Um, and I I think that my goal is like having as little operational overhead as possible at all

times. Like teetering like just on the

times. Like teetering like just on the the the line of like complete chaos.

Like like I want to be like right there.

Um because we've, you know, we've like oscillated, right? Like it feels good to

oscillated, right? Like it feels good to have like here's every team's OKRs and here's every team's six-month roadmap and here's every team's like strategy doc >> like Microsoft style.

>> Yeah. It's like here everything is like planned and organized and it's like that that feels good but like >> that's not a replacement for like you actually got to go >> build the thing and and do what's in

that document and then what if two months in you get new information >> that you should adapt to >> right >> if you're committed to some plan you're like oh we'll do that next quarter it's like well we should be doing that right

now if that's the most important thing and so >> we really now operate the company um as small as we can as lean as we can because I think that gives us the

ability to innovate and move quickly.

Um, and that's really like what's it what it it is in in service of. So now

we are >> yeah, you know, we have about like 10 open roles at the company right now. Um,

>> you hiring >> across uh forward deployed engineers, software engineers, research engineers, um um uh uh on our marketing team we have

open roles. So across like pretty much

open roles. So across like pretty much all departments we have open roles but we really look for people that you know want to work at a company like how we operate um are excited about our product

our space and and that's not for everyone and that's totally fine but for the people that want to operate in like a you know an environment like the one we're building um you know it's amazing

and we want to we want to go find those people.

>> Okay, this is random but I'm curious.

Are you guys living in the future that you yourselves are creating? Like does

everybody talk to their computers all day long instead of typing? Do you

record all your meetings and transcribe all of them? Do your coffee makers talk?

>> So So yeah, we are I mean pretty much every meeting we have AI noteaker in there. And it's not just to take notes,

there. And it's not just to take notes, right? It's we're building this this

right? It's we're building this this knowledge base that's public at the company around customer feedback, sales conversations. I could go in right now

conversations. I could go in right now and ask this knowledge base, what should our product roadmap be based on customer feedback >> and this has access to all the transcripts of all the meetings

>> and and it's going to tell you something that's perfect that's like hey uh this is the number one thing this is the number two thing um and everyone at the company has access to to that

information which is amazing because now >> if you're an engineer on this product there's not like layers between the customer and you. It's like you actually have this fire hose of customer

feedback. Everything from what are

feedback. Everything from what are people saying on Reddit and you know X to customer conversations that we're having meetings to you know support

tickets we're getting like >> all that is brought in to the company.

>> There's like all the like truth out there right of like your product and feedback and the market and stuff right.

Um, and I think it's easy to like put a subjective point of view on that, right?

And usually it's like the people in charge that get to decide what what the subjective point of view is. But if if you have all that all that truth just like publicly available, it's like

everyone is looking at this information objectively and there is no subjective layer. Um, and I think that's been so

layer. Um, and I think that's been so powerful for us as a company. I think

that that's how like most companies should and will the companies that are successful I think will operate in the same way.

>> Yeah. I mean it's it's just essential like the the the the companies that operate without AI augmentation of human intelligence will lose to the ones that have augmented their >> Totally. Yeah.

>> Totally. Yeah.

Loading...

Loading video analysis...