Matthew Karas: Babelbit, Low-Latency Speech Translation, Bittensor Subnet 59, LLMs, GANs | Ep. 68
By Ventura Labs
Summary
Topics Covered
- LLMs Lack Translation Subsystems
- Predictive Latency Conquers Translation
- BitTensor Democratizes AI Innovation
- Single-Stage Multimodal Convergence
- Mistakes Train AI Like Humans
Full Transcript
And one of the things I learned was that an LLM doesn't have a translation subsystem in the way say it does have subsystems for things like coding. So if
you say to an LLM here's here's some Python turn it into TypeScript. It does
that with subsystems that that analyze the logic and break the task down in some quite systematic ways. But
translating say from French to English it's just because it knows the same things in French and English and knows how to say them in French and English.
its ability to translate human languages is just a facet of the fact that it's a multilingual language. And that's true
multilingual language. And that's true with human beings as well. You know, if somebody just said here is the uh the final year maths book, learn how to do all these things, you wouldn't learn that well. But if you spend years
that well. But if you spend years learning all those things and learning how to, you know, being allowed to make mistakes, you develop real skills and real intuitions. And uh you know, I I'm
real intuitions. And uh you know, I I'm not I don't know what I think about the idea of whether artificial intelligence is like human intelligence, but it has a lot in common with it. But you know when I read about GANs I thought this doesn't
make any sense you know and it took me Josh and I worked for a year just making audio GANs and you know I got I got to understand it but it's very hard in kind of in intuitive terms that's probably the best I could do that being allowed
to make mistakes trains better and that's true with human beings as well you know if somebody just said here is the uh the final year maths book learn how to do all these things you wouldn't learn that well if you spend years learning all those things and learning
how to you know being allowed to make mistakes you develop real skills and real intuitions Welcome to the Ventura Labs podcast.
Today we're joined by Matthew, the founder of Babel Pit, Subnet 59, with expertise in linguistic philosophy, computational linguistics, and speech AI
from Cambridge studies to BBC News online and audio startups. He shares his path to low latency translation. Matthew
discusses LLMs and audio predictive latency solutions, BitSensor's minor driven innovation, community support, and linking product value to tokconomics.
Now, back to Matthew with the 68th Ventura Labs podcast.
>> All right, Matthew of Babelbit Subnet 59. How you doing today, man?
59. How you doing today, man?
>> Yeah, great. and it's uh it's great to be on your uh Ventura Labs podcast. Uh
yeah fantastic.
>> Absolutely. Glad to have you on. And
before we recorded, you were going into how the name came about. If you could just continue that story of where bit originated from.
>> So um so it has uh two references which obviously are connected. So there's a story in the Bible about the tower of Babel. Now, interestingly, that's not
Babel. Now, interestingly, that's not really a protransation story because uh it's all about God intervening as people are building this tower to get to
heaven. And uh he makes them all speak
heaven. And uh he makes them all speak different languages, so they can't understand each other, so they can't do it. So, so we're kind of um maybe going
it. So, so we're kind of um maybe going against God's will by having automatic translation, but in the Hitchhiker's Guide to the Galaxy, there's this notion of the Babel fish, which is this little fish that you put in your ear, and once
it's in your ear, you hear everyone speaking your language. And uh that is the ultimate uh latencyfree automated speech-to-pech translation, and that's
what we're going for. So, we want to overtake all the competition by being like the Babelfish.
>> Interesting. Okay. Babel fish. I'll
remember that.
>> What is your personal background? Why
are you in the speech industry?
>> So, I did a degree in philosophy in the early '90s and I was quite interested in linguistic philosophy and logic. Uh, a
lot of stuff that interestingly is really coming into its own now with LLM.
So, like things like Vickenstein's theory of language is very different from the his predecessors.
uh and uh so I then went on to study computational linguistics at Cambridge and they had an interesting course because these days it would be a very obvious thing to do speech and language
in one course. In those days though speech geeks were very much into signal processing and very much in the audio processing camp and the language people
were doing what um I guess we'd call more grammatical work. you know deciding which word is a verb and which is an adjective whereas everything has moved towards statistical
models and uh but that course was fantastic for me because I came out of there um having gone in thinking I'd be interested in the language side but I was far more interested in the speech side because I felt even the language
stuff could be done statistically and most people didn't in those days but the speech stuff absolutely was a sort of mathematical thing and the surprise I had when I started was that the big
problems have been solved. It was
already possible to do accurate uh real time uh tech speech to text um you know 95% accurately with a big computer. So
uh yeah, I've had a weird career since then. Uh that's what got me interested
then. Uh that's what got me interested in it. But then I I graduated from there
in it. But then I I graduated from there in 95 and there were just loads and loads of jobs in the emerging web
industry and not many jobs in uh speech and language labs. So I went straight into a career building massively scalable systems. Um within a couple of
years I joined BBC News and led the launch of BBC News online which was very much more heavily automated than anyone else's systems at the time. And I had a
real knack for I guess building systems where the users just do what they do. So
if you're a journalist, you just write.
If you're a picture editor, you just edit pictures. And that uh enabled them
edit pictures. And that uh enabled them to scale massively. Uh so when we launched we were publishing 300 stories a day with a staff of less than a
hundred while CNN who was probably the next biggest competitor uh was probably publishing 50 to 70 stories a day with a staff of several hundred. So this was
really exciting stuff and I got very interested in that. But then I started applying my academic work to those kinds of problems. So when I left there I started and that's when the real story
of Babelfish starts even though it's uh 27 years ago or something 25 years ago at least. Um the uh uh the first thing I
at least. Um the uh uh the first thing I did was build a system which took in archived audio and video and indexed it made it searchable using the speech
track. So basically you could search the
track. So basically you could search the words and every word would have a time code and it would jump to it. And we
built systems which you could do things like copy and paste the text and it would edit the video, that sort of stuff. Uh, and uh, and I got close to
stuff. Uh, and uh, and I got close to Mike Lynch, the founder of Autonomy, and so he then staked me in a startup called Dream Media, which was not to do with
Dr. Dre, but it was uh, uh, we did used to use some of his music at our stands, but it was to do with Autonomy's core engine, which was called the dynamic reasoning engine. And that was I guess
reasoning engine. And that was I guess my first taste of real practical AI. You
know at at Cambridge we'd implemented neural nets to do speech and hidden markoff models and some of these things.
And uh but this was someone who was using I guess a less sophisticated technique but to great practical use. So
uh it was a basian kind of thing a bit worked a bit more like say elastic search something like that where you can compare different blocks of text and I
started introducing speech and video to that and uh uh so Mike and I had an interesting relationship. I I didn't uh
interesting relationship. I I didn't uh stay there that long, but we stayed in touch and famously he was killed in a uh
cap-sized boat last year. And um just at that time the origins of uh Babelbit was starting. So 5 years ago, this is a long
starting. So 5 years ago, this is a long anecdote now. So five years ago, I came
anecdote now. So five years ago, I came up with this idea for doing um a different kind of um sort of trick with
uh speech. It wasn't to do with latency.
uh speech. It wasn't to do with latency.
It was to do with eliminating noise in a novel way. So
lots of people had started using GANs to transform images and transform speech.
And I thought, well, if you're trying to morph noisy speech into clean speech with a network, you're missing a trick because there'll be a limit to how much you can actually
remove. Uh, and this is what people were
remove. Uh, and this is what people were doing. They were training a network to
doing. They were training a network to say, "Here's a bunch of of artificially noised speech. Here's the clean speech
noised speech. Here's the clean speech it came from, and let's map one to the other and see if the network can fill in the gaps and remove the noise." I
thought, since my background was in speech recognition, what if the network is recognizing the speech and then reynthesizing it to be completely clean?
And Mike didn't really want to invest in designing a new uh um GAN architecture.
So, I left. and Josh who uh I was working with at the time, he stayed and 3 years later they tried it my way.
there was a limit to how much they could do with the original morphing kind of approach and 3 years later there was already another GAN that did something similar and so they didn't have to
design it from scratch and hire a whole load of people and then they started getting this pristine clean speech and they were working on accuracy and latency problems and this is where the
story really starts because last August I was on the phone to Josh and he explained to me that they'd got the latency down to a level that I thought was never possible because I thought you got this process where you're you're recognizing all the speech. You're also
extracting all the features of how you speak, your accent and your tone of voice. And then you've got to
voice. And then you've got to reynthesize that without the noise. And
I figured that the latency at best would be between a quarter of a second and half a second. So 250 millconds I thought was the absolute optim the maximum you could do uh if you optimize
the hell out of the network. He got it down to a theoretical minimum of a tenth of that like 25 mills. in practice, in their first week of testing this new technique, they got it down to 50
milliseconds. And that is incredible.
milliseconds. And that is incredible.
It's just unbelievable. I'm not a mathematician, and Josh is far better at maths than me. And he um he explained it to me about four times before the penny dropped. And when the penny dropped, I
dropped. And when the penny dropped, I thought, "Oh, you're guessing what comes next." And that's what led to Babelfish.
next." And that's what led to Babelfish.
I thought, well, if you can do that with a signal uh uh and just guess the next part of the waveform, what if you can guess the next words? And then I started
thinking back to my linguistics lectures and two I guess opposing stories kind of uh from my um linguistics lectures, my psycho
linguistic lectures struck me. One is
that you can't translate some languages as they go along because you have to wait till the end of a clause or the end of a sentence in some cases to understand the sentence at all. So you
could have a sentence uh in German where you don't really know what the sentence is about until you hear the verb and that's at the end. And there's this famous story that all linguistic students hear and I no idea whether it's true. So, there's an American journalist
true. So, there's an American journalist in the 20s or maybe before in the First World War. He's listening to Bismar give
World War. He's listening to Bismar give a speech and she's got her interpreter there and uh um he stops speaking and she says, "What's he saying? What's he
saying?" And he says, "Sh, I'm waiting for the verb." So, there's this story.
This predates machine uh translation.
This is just a problem for interpreters that you can't understand all German sentences as they go along. You can't
create the English equivalent until you've heard the end.
uh because we put the verb quite early.
And so that's one story. The other story was in psycho linguistics, the experiments tell us that a native German speaker when listening to something like that, they know what the verb is. 99
times out of 100, they already know. So
I figured, well, what is an LLM but a predictor for the future words that are coming out of a stream of words? So, so
I figured that if a human being can do it, an LLM can do it. And actually I remembered there was a variety act on this evening TV show. So there's a a BBC
DJ called Terry Wogan who's dead now, but he had this evening TV show for years with all kinds of celebrities and sort of crazy variety acts. And one of the acts could speak to whoever was speaking to him and say what they were
saying. And basically he was guessing
saying. And basically he was guessing what they were saying and then catching up with them. And it sounded like he was saying whatever you say and uh as you say it. and uh he he just couldn't be
say it. and uh he he just couldn't be outsmarted like whatever Wogan did to try to create say something unpredictable this guy got it probably by watching his mouth by just guessing the sorts of things people do and I
figured well if he can do it an LLM can do it and then I started thinking well that is the way to conquer the latency problem if we are guessing when we can guess
uh then uh we can get huge latency gains now some things you can't guess if I said here's a list of all the people I've invited to my party next week. Uh,
and that's just a random list of people.
But if I start besides we can guess a long way. So if I start to say our
long way. So if I start to say our father who art in heaven, you know, the next minute, you know, and the interesting thing with translation, it's
even easier than the variety act guy because you only have to guess the meaning. So, if I started this podcast
meaning. So, if I started this podcast by saying, uh, you know, hi Grant, uh, thanks for inviting me on your podcast,
or I said, uh, hi to Ventura Labs, uh, and, uh, welcome to all of your audience. Um, if you
translated the wrong one, it wouldn't really matter. It's an opening phrase.
really matter. It's an opening phrase.
We want to move on. And so my idea is that when you can guess confidently, you guess. When it's something that the
guess. When it's something that the context tells you is uh it's not all that important to get it exactly right, you can guess. And sometimes you have to
hold back and listen. And the the the art of this and what we're starting to develop is what is that inner process of confidence that tells us when it's right
to guess and then get the latency gain and you move a few words forward and when is it not right to guess uh where you have to be exactly accurate and translate only the words that are
spoken. And that is the difficult thing.
spoken. And that is the difficult thing.
And that's why rather than just designing a system with all the the state-of-the-art networks that exist, we need months of iterative training that
only a process like a bit tensor mining process can really give us. Either
you're Google or open AI and you put 200 people on it or you find a way of incentivizing people that you don't have to pay. That is so this was a really
to pay. That is so this was a really weird story because I came up with that idea a year near well I guess 10 months before I even heard of bit tensor then
uh I was doing some stuff with LLMs in my job which is why I was so interested in applying them to language and uh my boss was fired and the company he was he
agreed with me about innovation his replacement didn't really so I left there in March and I started absolutely throwing myself into LLMs and I was doing really for coding. You know, I was
thinking, well, I've got a lot of ideas.
I can't afford a team. I'll just start building stuff with LLMs. And I I created a couple of apps, nothing to do with this, but um you know, releasable, highquality software, and I didn't write
a single line of code. So, I was a real LLM junkie at that stage. So, I started posting about it on LinkedIn. And then
an old friend going right back to when I worked with Mike Lynch uh a guy called Nigel Grant who is our main reseller of that speech software and the vision software as well uh got in touch and
said hey you've got to come and join my company score and so this is last week of May this year and I'd never heard of it. I joined the company the first week
it. I joined the company the first week of June. I was at the proofof talk
of June. I was at the proofof talk conference in Paris manning the desk of their stand um trying to pretend to people that I understood what I was
talking about. And I met you know I met
talking about. And I met you know I met Barry, I met uh Will from Macrocosmos.
Uh I met Moog. And so um this was just right in the center of this new thing trying to get my head around it. also
trying to get my head around score because they hired me as uh I guess head of innovation. So they had a long road
of innovation. So they had a long road map of of great things. I've known uh Nigel for 20 something years. I also
used to know Tim his CTO back then when Nigel worked with a company that he worked for. So Tim is great. He's he's
worked for. So Tim is great. He's he's
down in Australia and we didn't get enough time together because of the time zone stuff. So instead of building my
zone stuff. So instead of building my innovations on his needs and his road map, uh I was just thinking of great ideas for new subnets, great ideas for new mining challenges, and they were
mostly to do with my own pet projects and my speech stuff. And so Max and Nigel said, well, look, uh instead of you, he said that actually Tim doesn't need that much help with with this stuff
anyway, and he's a long way away, so why don't we help you start your own subnet?
And this they've just been amazing. So
they are now contracted as advisers to uh Tim, Max and Nigel, advisers to Babelbit. And um uh it was like this
Babelbit. And um uh it was like this weird thing of two coincidences that there was this thing that happened with
Josh and uh and Mike a year before. I
suddenly had the bandwidth and the ability to get machine learning experts working on it without having to get some venture capital and hire a team and all
of that stuff. And it all happened in a matter of weeks and then two other things happened. One, I went and did my
things happened. One, I went and did my desk research and I thought, well, someone else must have come up with this idea. And that's usually a good thing. I
idea. And that's usually a good thing. I
don't sort of think that, you know, it's not like you can patent that idea that easily. and
easily. and two guys that taught me at Cambridge were working on similar things. So, one
guy I've been friends with for a long while cuz he also uh had a company that Mike invested in and we built products together in the uh in the early 2000s as Tony Robinson. Absolutely extraordinary
Tony Robinson. Absolutely extraordinary guy. He was the first person to use
guy. He was the first person to use neural nets for speech that I know of for connect large vocabulary connected speech. So in 1987 when he was still a
speech. So in 1987 when he was still a student he built the first connected speech uh uh speech recognition system and then by the mid9s still very early
he had a neural net system pure neural net system that competed very well with the traditional hidden markoff model systems and it ran in much less memory much less compute resource. So Tony was
doing something on the side of his company um speechmatics very interesting just a project thinking exactly the same can LLM be used for speech he wasn't specifically thinking about looking
ahead in the same way as I was but he had moved for the first time in his career really from speech to language to to thinking about the words rather than
thinking about um translating sound into text. So another colleague of his that
text. So another colleague of his that also taught me that I didn't remain friends with in fact I don't remember liking him very much I don't think he liked me very much um a guy called Phil
Woodland very very eminent Cambridge professor and I guess now probably a very senior professor um he published a paper in April this year so this is how
cutting edge it is he was building a system which was called simol S2S so simultaneous speechtoech so simultaneous translation just really means real time translation that you're you're
translating as as you're as you go and it was called SIML SDS hyphen LLM and uh he was basically using LLMs uh and then
Tony told me about another project a French one this company called QI who have a have a system called Hippiki and they are that's probably the closest to our idea that that it's like a
multimodal LLM can kind of do everything it you can you can talk to it can generate audio and it can do the translation. And the question is,
translation. And the question is, do you have to even have a multi-stage pro uh process? You know, you can actually um have an LLM, which is a
model that understands audio natively, can generate audio natively, and can translate natively. And that's one of
translate natively. And that's one of the weirdest things about LLMs. You know, as I was learning more and more about them, I did a little course where you build your own LLM just from the works of Shakespeare and stuff like that, just just to understand better how
they work. And one of the things I
they work. And one of the things I learned was that an LLM doesn't have a translation subsystem in the way say it does have subsystems for things like
coding. So if you say to an LLM uh you
coding. So if you say to an LLM uh you know gener here's here's some Python turn it into TypeScript it does that with subsystems that that
analyze the logic and break the task down in some quite systematic ways. that
translating say from French to English.
It's just because it knows the same things in French and English and knows how to say them in French and English.
Its ability to translate human languages is just a facet of the fact that it's a multilingual language model that that it's there's no restriction on what language is indexed. So it's all indexed. So it's got whatever that
indexed. So it's got whatever that original statistic from OpenAI is like 40 terabytes of text from the the internet. So that fascinated me and um
internet. So that fascinated me and um anyway I I started thinking about how to structure this subnet, what our miners
would be doing and uh then I got a call, another call from Josh. Um the last call about his work uh was telling me about this amazing latency gain. And then we
were going to actually go for a pint with Mike, but he died in the weekend in between him telling me. So Mike never got to find out that his staff had created this amazing thing. just like me with my boss being fired. Uh without
Mike there, there was no one in his investment company that was that interested in this innovation and pushing Josh's ideas. So he said, "Do you know anything interesting that's going on?" I said, "Well, you could come
going on?" I said, "Well, you could come and help me with this. You know, it's something I could do by myself because the miners will do most of the work, but I need to set up the miners with some starting points to work from." So I
created the first one, which we're working on now. So the miners at the moment are just doing prediction. So at
the moment we're saying well what is the unique thing about our project for speech it's that we are guessing what's coming next or predicting what's coming next. So I want to prove that that is
next. So I want to prove that that is possible. So at the moment the our
possible. So at the moment the our script and the miners the one that the miners are improving very rapidly is one where the words come in one by one of
each utterance of a of a dialogue and at every word that comes in the engine tries to guess the whole utterance. So
um and that's quite an interesting task because there comes a point in that cycle. So say there's a sentence of 10
cycle. So say there's a sentence of 10 words that maybe four words in or five words in on average it's doing quite a good job of guessing the whole thing. So
that gives us a measure of how much latency we can save. Now there's an interesting problem here. Does the
script know when it's doing well? At the
moment it doesn't. At the moment, we're just trying to make it good at predicting. The next challenge, which is
predicting. The next challenge, which is what Josh is working on now, is is going to be a translation challenge. And it's
also going to have to work out when to when to predict and when not. So,
imagine probably the easiest way to imagine this is we're going to keep it in text. We're not going to do anything
in text. We're not going to do anything with audio for the moment. Uh imagine
subtitles being generated in real time like uh things like um YouTube does that pretty well. Uh there are some systems
pretty well. Uh there are some systems that do it. Okay. Uh mostly they're not brilliant. Imagine that coming out in
brilliant. Imagine that coming out in any language, but maybe a second or two later in the other languages. Now that's
that's going to be our next challenge.
So it'll be a genuinely useful piece of tech. It'll be uh taking streaming words
tech. It'll be uh taking streaming words in and putting streaming words out in another language. And the interesting
another language. And the interesting thing is that we'll we'll have to have a first attempt because the miners will do the rest of having that confidence quotient that thing of saying, "Yeah,
I'm really confident in this prediction or I'm not confident in this prediction.
Give me another word." And so that's that balance between when to predict and when not and uh on a continuous stream where the words just keep coming is going to be our next challenge. So once
Josh has built the prototype for that, we'll then put that out to the miners and say, "Okay, optimize the hell out of this." And then we'll have our first
this." And then we'll have our first translation engine, which we will commercialize. And then we'll go back to
commercialize. And then we'll go back to Josh's work from uh his previous life when he was dealing exclusively with audio, and start working on the audio in and audio out and and how we can how we
can do that. So that that's the story and it's it's got a got something for everyone including the drama of uh the tragedy of
of Mike and his daughter. Uh you know last year I had to go to their funeral.
It was unbelievably horrible. Um the
week before the same week that Josh called me with this latency gain and was Mike and I were celebrating on the Thursday before he died. We'd been
congratulating each other on being great parents because uh his daughter Hannah had just got a place to study English at Oxford as Mike says Oxford spit because he's a Cambridge guy and my son had got
a a place in Cambridge to study English as well. So we were both kind of beused.
as well. So we were both kind of beused.
Neither of us was any good at English at school. Uh we're not literary types.
school. Uh we're not literary types.
Neither of our children was particularly techy. Um but there there you go. And we
techy. Um but there there you go. And we
were just thinking, "Isn't this wonderful?" And um and then Mike said,
wonderful?" And um and then Mike said, "Yeah, let's go for a pint next Tuesday." and I said, "Oh, let's get
Tuesday." and I said, "Oh, let's get Josh along. He's got something
Josh along. He's got something interesting to tell you." And uh yeah, and then his boat um sank in the uh in the sea. Yeah. Just incredibly weird
the sea. Yeah. Just incredibly weird time. Um but uh anyway, we've all had to
time. Um but uh anyway, we've all had to move on. I'm still in touch with with
move on. I'm still in touch with with his uh his widow. She's I wondered whether she'd be interested in carrying on sponsoring Josh's work and maybe I could get could have got involved in
that, but I think that she wants to get away from all of that. So, yeah. Well,
sorry to hear about the loss of your father.
>> Yeah. No, it was unbelievable. I I
worked with him on and off for 28 years, you know. It was just incredibly weird
you know. It was just incredibly weird thing, you know, but he was the first person to just take what he'd done from his PhD, the first person I knew, obviously plenty in Silicon Valley, uh,
and elsewhere. He was the person I knew
and elsewhere. He was the person I knew personally, who took something that he did a PhD on and created a commercial product. And Autonomy was at one point
product. And Autonomy was at one point the highest market cap of any software company in Europe. obviously had that big scandal when they sold to HP, but um you know, he's a harsh businessman and
uh um you know, who knows what to believe in those sorts of situations, but but the point was it was his real ingenious his own intellectual property uh that was created and it fitted right
in with mine. So we always got them really well at the tech level and uh um you know it's it's something which uh um the world needs more of is entrepreneurs who really understand tech you know and
I guess in America you have quite a few of them we don't have very many over here you know it's um and and also we don't have the same venture capital structure here so um even if there are
some they don't tend to um you know at some point they will sell their company to uh to an American or Japanese company so things like ARM, you know, ARM is a great another Cambridge success story making, you know, there are more ARM
chips than almost anything else in the world. Um and uh uh but it's now a
world. Um and uh uh but it's now a Japanese company you know and so I think that um that's another thing that makes you know bit tensor just makes sense to me. I don't need venture capitalists
me. I don't need venture capitalists now. You know, I can uh I can create a
now. You know, I can uh I can create a world-class product and you know my my model at the moment I there's so many great subnets and you know we were talking earlier about um about
macroosmos. You know subnets are going
macroosmos. You know subnets are going to be part of our economy here because we're going to be buying our compute from shoots which we already are. We're
going to be buying our data from subnet 13 uh with their new product uh is it called Gemini I think is the um the data product. It's not the subnet name, but
product. It's not the subnet name, but they've got a product that sits on the on subnet 13, which basically will allow us to say we want to scrape the dialogue
from a bunch of German podcasts and we want we need you know 5 GB of data a week and uh and it's just self-s serve
and you know this is something which you know Tony and I and all the other speech researchers we would have killed for you 20 years ago. I mean back back in the day everyone used the same data set. It
was a 75hour corpus of someone the same person reading the Wall Street Journal I think and that's what um that's what everyone used and they liked the fact that everyone used the same because it
meant if I was comparing my algorithm with your algorithm uh that it was a fair comparison but it wasn't really a good way of training neural nets you know it was not unless any everyone
speaks the same as this guy. So um yeah this is it's just an amazing time. It's
not just that our miners come through the Bit Tensor network is our suppliers come through it as well. You know, it's just extraordinary. And then, you know,
just extraordinary. And then, you know, meeting people like yourself where you get to learn that it is a really creative, supportive community where every component in it is trying to
support every other component. You know,
it's uh it it really is unbelievable. I
mean I I described it to Tony um uh to just you know catching up say what I was telling him what I was doing asking him about his experience of LLMs and whether
how much that had changed his tech and when he got what I was saying about Bit Tensor which I must say he's very very sharp it took him a lot less time than it took me I think I think I'd been working for Score for a month before I
believed it was true you know uh he was just grinning for the rest of the conversation you know we're in a call like this and he was just grinning thinking he was just thinking how many things he could train without having to
hire 20 more people, you know. It's I've
never seen him quite so ecstatic about a tech idea. I wish I could say I' I'd
tech idea. I wish I could say I' I'd invented it, you know. It was just he was he was seriously impressed. And um
it wouldn't surprise me if if he forms a subnet for his next project, you know.
Um, I think once the cap is taken off, once there are thousands rather than hundreds of subnets, I think that, you know, it'll just be the obvious thing for someone like him who's got he's got
a he's his brain is like a conveyor belt of of great ideas. And if each one of those can go off and be trained, he doesn't have to sort of build a business around each one, then move on. He can
just do one then another then another.
You know, it's just and you know, you can have a team of miners that switch from one challenge to another. one time
they're translating a a specific engine for the Mongolian security service and the next time they're doing uh something for conversations in the oil industry or whatever, you know. And he does a lot of
work with bespoke speech recognition, you know. Um that's one of the big areas
you know. Um that's one of the big areas for us as well is bespoke language models, you know, can we predict better if we've got a corpus of data which is
relevant to the conversations. So say
we're selling to people who are using it in the in the legal industry, then uh there's all kinds of things there. The
way translations work in in law is they have to be extremely accurate. But the
way conversations go uh you know they have to they have to keep up. You have
to you know in a in an argument in a a dialogue uh the need for good quality fast translation is huge. But we'd never get it right by training a model on
podcasts. But if the legal industry on
podcasts. But if the legal industry on mass or if a a large company could provide us with uh um material to train on and we could convince them that we had a secure environment where the the
confidential information wouldn't leak then you can see a way of tying this technology into different communities and uh and I've experienced that in the
in the um in this the pure speech tech world as well. you know, uh we um uh actually another company that that Josh
and I were involved in, we were doing um uh speaker analysis. So if you've got recordings, uh of speakers, you can tell where the boundaries are, where one speaker stops, the next the next speaker starts, and characterizing the speakers
by whether they've got a different accent or whatever. And we found that uh our security service in the UK MI5 as it's called they had a lot of wiretaps
and they had trained their own model actually they they'd paid some consultants to train a version of KI which I wouldn't have chosen because it's a hard one to update you have to retrain the whole model whereas actually if they'd have used even older
technology that Tony developed you could incrementally add new vocabulary to it and stuff like that but at the same time you know Ki was free uh and they were training it for a language that they
call London Asian slang, which is a particular variant of English that uh that mostly uh young Asian guys speak.
Uh I guess women do as well, but I think it was mostly guys. And they had a problem that these guys when they're speaking, they suddenly change language halfway through. They'll they'll
halfway through. They'll they'll suddenly start speaking a bit of Erdo or speaking a bit of Arabic and uh they would uh they couldn't detect those boundaries fast enough to switch to the new model. So I did ask why they didn't
new model. So I did ask why they didn't have all the models running at once and it would but anyway they they they they probably thought they didn't have the money to do that but uh but we we came up with a very very quick way of doing
that for them. So that thing of uh where Siri or you know Google translate is not going to be good enough because you've got a particular domain
of discourse where the predictions uh where the vocabulary isn't even known to those systems you know so it's it's an exciting world we're going into and um
yeah I think I've I've been talking non-stop uh what what else do you want to know?
>> No yeah it's an interview so that's perfect. you've worked directly with MI6
perfect. you've worked directly with MI6 producing speech >> it was >> um more well it was it wasn't directly it was through one of the big integrators so um they work with all the
big integrators like um CGI that used to be logic and um you know Accenture all those sorts of people so they all have they they usually have a team that is
security cleared so in autonomy we had resellers who are uh security cleared so even before that I sold some uh technology to do something very similar to the NSA actually. So they this was
specifically for accents and speaker boundaries. So say um they uh they
boundaries. So say um they uh they wanted to so they have I don't know half a million hours of wire taps and they can't listen to all of that and so this
is going back 17 18 years maybe. So uh
they wanted to be able to do a search sort of triage on this huge archive. So
they might say, "Okay, I need I'm looking for a conversation uh where someone is speaking English with a Birmingham accent to someone speaking English with a Jordanian accent." And
even better that they're talking about seexs or AK-47s or whatever it is that they're talking about. And we we sold them that. But the interesting thing is
them that. But the interesting thing is I suspect they never use it. They paid
full price for it and they didn't argue.
It wasn't the usual government tender.
But they invited me to a meeting and there were it's the weirdest meeting I've been to. So one they weren't sort of um black suit type wearing. It was a a man and a woman. They looked like kind
of 1970s postgrad students kind of long hair jeans um you know t-shirts with band names on that kind of stuff. And
they gave me business cards that just they didn't have the organization name and they didn't have their surnames. It
was just uh it was just an email address. It was just Simon and Rebecca
address. It was just Simon and Rebecca and that was it. no other details and um they must have got me over a period of two to three hours to whiteboard up
every aspect of the algorithm that did this. Um and my guess is not it wasn't
this. Um and my guess is not it wasn't just that they were interested or they it was because they were going to go build their own that they weren't going to run our code cuz it might have back doors in or it might not meet their
needs exactly. So they just wanted so
needs exactly. So they just wanted so they were happy to pay us but uh my guess they wanted so many details I thought well unless you're building something why would you need to know all this but they really wanted to know. So
um yeah uh that was that was fascinating and I I had to fly to you know one of those uh places outside DC you know in uh um in you know between DC and um and
Langley sort of there's a a whole load of little companies that supply those um uh those agencies. Yeah, it's uh but um I mean it's interesting because I I got
a uh a call once from a UKbased um Iranian broadcaster and they wanted me to create a sort of searchable archive system for Farsy
language and it's a very um fragmented language lots of regional dialects um even a few villages might speak very differently from the next few but uh but we had a pretty good solution and they
said nobody makes a a far seas um uh speech recognition system and what they really meant was Nuance didn't make one and Microsoft didn't make one because obviously the intelligence
services are looking at Farsy and Arabic and all the Middle Eastern languages. So
I just thought this was hilarious. They
didn't realize that you know half the researchers that I know are working on on on Middle Eastern languages and I said you know don't worry we can do this uh for you. And interestingly, we created a great search engine for this
news archive, but they ended up their lawyers wouldn't let them go in the end because wouldn't let them do it because they weren't they weren't using it for their own content. They basically had a um a hub in the UK which received all
the news broadcasts from Iran from all the different state channels and all the other commercial channels. And they were basically wanted to be able to like like a like having a Reuters wire feed but
for for video news. And so because they were searching other people's content, their their lawyers eventually advised them they weren't allowed to store that stuff and the spotlight was on them for various reasons to their, you know, the
kind of exiled people and the Iranian government weren't happy. So uh yeah, it didn't go ahead in the end, but it was pretty pretty cool.
>> When doing these translations, how difficult is the accents, the slang, the different dialects?
Well, that that's something that I mean was what that we we Josh Josh and I both made it our business to become experts
on accents. Um in I guess the 20 mid
on accents. Um in I guess the 20 mid mid2010s. So 20 uh 2014 to 2019. We were
mid2010s. So 20 uh 2014 to 2019. We were
working on um a tool for training people to speak uh in a more intelligible way.
Now that could be getting rid of their accent or softening their accent, making it more understandable. And it is very difficult, but because accents are
systematic, it's really the difficulty is a function of how much data you have.
So whether it's an accent, whether it's a speech impediment, uh like a lisp or something like that, they're very easy to do because they're because they're systematic. So it just comes out in the
systematic. So it just comes out in the data that uh you know someone with a lisp says uh speaker speaker you know or whatever and uh yeah I used to do this
demo of our software uh because it was unlike the other educational software at the time it was on continuous speech so you just read and you read you can read your favorite book you could read your own conference speech so say you're
giving a speech in London you're worried about your English accent uh and it will tell you what you're doing wrong. So I
used to I used to just imitate certain very specific problems like uh one really common problem across lots of different languages is uh they can't say a short I because it doesn't exist in
the language. So something like hit
the language. So something like hit comes as heat. uh and then um making a a voiced uh th like v uh a z uh a zed
replacing a th. So so I would speak like this uh and uh I'd show the show that for the demo it would then highlight the the sounds I was getting wrong and then I'd speak normally and it would the
score would change back to the right score for the accent. And actually that was that was pretty easy once you understand the guts of how speech recognition works because it's already breaking things down into the sound. And
if you're if the accent is effectively morphing specific sounds, it's fine.
There's other ones which are kind of clustered together because the mouth is in a different position, the tongue's in a different position in a in a across a whole language. So then uh but we still
whole language. So then uh but we still kept it to these kind of more or less what's called a substitution error that instead of making sound A, you make sound B. And uh so uh now we don't have
sound B. And uh so uh now we don't have to do that because the models are big enough to take care of it all. And that
when Siri first came out, they had seven versions of English uh you know just for English. They had seven different
English. They had seven different models. And um oddly enough, I think you
models. And um oddly enough, I think you couldn't change it on your iPhone. So it
was kind of it wasn't one of the settings. So if you bought an iPhone in
settings. So if you bought an iPhone in India, it would expect you to speak Indian English with a English with a a strong Indian accent. Uh, and they did a great job, but they still, because they're only seven and there are dozens
or hundreds of English accents, they got a lot of um, you know, a lot of people made fun of them. So there you'd get these uh, uh, YouTube clips of someone with a heavy Scottish accent talking to Siri and uh, making mistakes. And they
solved that really quickly cuz when that at that time I was thinking, "Oh, this is great for us. We could sell them the engine which would make they could we could dynamically send the speaker to the right model for their accents, you
know." But they created um hybrid models
know." But they created um hybrid models that could do multiple accents, you know, really soon after. And really
everything changed in the mid 2010s, you know, that with the the sort of emergence of deep learning for audio and for speech and language, that really changed everything. And then for
changed everything. And then for language, it changed again with transformers and LLMs. So, you know, just in in my career, I've seen, you know, these massive paradigm shifts. And
I I actually um uh I created a a little graphic. I think I can probably share
graphic. I think I can probably share screen and uh and show it to you. Um
because it was slightly uh humorous. It
wasn't entirely supposed to be serious, but um but it shows how things have converged. Uh where have I got it?
converged. Uh where have I got it?
Somewhere here we go. Let's have a look.
Oh, this is the wrong one. Typical. Uh
hold on. We got on a web page.
Sorry. You can edit out the gaps, I guess. Um,
guess. Um, >> we'll leave it there. It's all good.
>> Yeah.
>> Is it reasonable to expect Babel bit to have one model that covers all languages, all accents?
>> That's a very interesting question. My
guess is is no, but not for the reason you think because the language models certainly have all languages in but there'll be
some languages which are a real minority and have less data. So in the same way that that chat GPT or grock is much better at coding in Python than it is in Elixia because there's more Python code
for it to learn from. So that's one problem. The other is that that the the
problem. The other is that that the the confidence part will be intrinsically less confident in languages with rigid word order that pushes the meaning towards the end of the sentence. So
that's an interesting one. Uh and
whether it has to know what language it's uh it's dealing with to make the right adjustments, I don't I don't yet know. I think that will have to come out
know. I think that will have to come out of the miner's job because we'll set them challenges to do German to English and English to German and maybe one maybe they're not the same. Maybe
they'll maybe if they can get way better performance by having a special German to English version, it might be the way to go. But my guess is that it would
to go. But my guess is that it would normalize over time anyway because we will ultimately be building bespoke LLMs. won't be we won't be building on
top of Deep Seek or Lama or um you know or um uh or um Mistral. We'll be uh um we'll be building our own uh because actually there's a lot of stuff in the
LMS we don't need. We don't need to be an encyclopedia. We we need to be a
an encyclopedia. We we need to be a dictionary, you know. So, um, anyway, here's the here's my little um, as I
say, it's not supposed to be, um, 100% academically correct, but um, but this is because I've been in this
game since the '9s. I came in um, you know, round about here, the mid '9s, and everything in speech recognition. the
the good systems were HMM the very earliest neural net systems which uh which Tony was developing um synthesis uh yeah the straight vocoder was uh a
big deal at the time and then um translation was done by a bunch of different technologies and I didn't really have much to do with translation then um I I did have some lectures on it
but it wasn't the thing that that that sort of grabbed my imagination then as you move in to the 2000s you get um deep learning. So um uh and you get these
learning. So um uh and you get these hybrid systems which still use a lot of the HMN based stuff. Um don't see much of that anymore. Um and there's a bit
more convergence that you're using RNN's for language and RNN's for uh for recognition. And then the last 10 years
recognition. And then the last 10 years it's been crazy but still very fragmented. So, if you look at how I
fragmented. So, if you look at how I don't know if you ever saw the the demo that um that Google did of WaveNet in 2017 when it came out.
>> I did not catch that demo. No,
>> it was just incredible. Changed
everything for everyone in the speech world and in other audio world. So they
had this system called Tatron which is their um texttospech system and they added waveet to it which is which is a neural net vocoder that made it sound
like a human being and it was the first speech synthesis that really sounded human but just the synthesis part was two stages. Now we're talking about
two stages. Now we're talking about doing recognition translation and synthesis in one stage and that's what Hibiki does and what other projects are doing. So that's the that's the final
doing. So that's the that's the final that's why this is like the punch line on the right here that uh um we can do speech recognition with an LLM. I can
talk to uh chat GBT and say please please display what I've said and it will do it. I can translate. So I had um
uh a foreign language letter um uh which I in a language I didn't even recognize.
I tried using Google translate on it and the handwriting wasn't legible.
Whatever. Chad GBT did a great job of it. You know, uh that's translating
it. You know, uh that's translating handwriting, you know, a different kind of input and did a perfect text output.
And um and then synthesis, uh it does a great job of that, too. So, as we move to true multimodal technology, I'm not saying that the synthesis on the uh on
the chat GPT app is is integral to the LLM. It's almost certainly a separate
LLM. It's almost certainly a separate subsystem. But that's where we're all
subsystem. But that's where we're all headed. We're all headed to a world
headed. We're all headed to a world where the model doesn't just understand everything about everything. It also
understands it in any mode. It
understands it understands language as speech, not understands language once it's been translated from or transcribed from speech. And that's what we're so
from speech. And that's what we're so excited about. And when I when I showed
excited about. And when I when I showed this to Josh and said, "Is this is this accurate, you think?" Uh, and I said, you know, I'm kind of joking about LLM.
He said, well, not really. You know, we really are moving in that direction.
That's that's where it's going. And um
uh and that it I think quite rightly so, you know, this is this is how the brain works, you know.
Um and uh it's very exciting, you know, to as a technologist to still be in the game having been through all of these things. And if you think about building
things. And if you think about building a system like this, even 5 years ago, you'd have to get those seven or eight different technologies all to work harmoniously together. And actually,
harmoniously together. And actually, that's that's one of Josh's main bits of expertise is collapsing multi-stage processes into a single stage. And
that's exactly what we've tasked him with. So, as we move to prediction and
with. So, as we move to prediction and then translation in a single shot, we'll be doing audio input uh into translation as a single shot and then audio and
audio out. And that he describes it as a
audio out. And that he describes it as a kind of um conveyor belt type system that that one bit of audio comes in this end and one bit of audio goes out the
other end or one samp one one you know if you imagine a waveform it's made up of all the integers that say how big the wave is at any point. Each of those integers comes out like every you know
16,000 of them a second. And this is why waveet was so amazing. Waveet is
literally predicting the next samp by samp. not saying you you can do waveet
samp. not saying you you can do waveet you can use it for music as well as speech. So you could say to you could
speech. So you could say to you could say if you got the right engine behind it you can say improvise piano in the style of shopan and it will do it and it's not doing what you think you know
it's not saying do I need a C next because this is in G or do I need a D or do I need a C sharp. It's not thinking like that at all. It's literally
thinking which integer comes next. And
that's the weird thing about generative AI is that it's it's it works like I speak. I'm not I don't know what's
speak. I'm not I don't know what's what's coming next. It just keeps coming out, you know. I'm not I'm not a German.
I don't plan the whole sentence ahead of time, you know. So um uh and uh you know I think that um
uh the big big problem that means that I can't have a startup like my accent startup a few years ago with four guys um is that these models are gigantic and
they need training. You know I can't afford to compete with open AI just with four guys in a room. And that's what startups used to be able to do because
it was all about having the clever idea.
that neural nets are so unwieldy and so unpredictable that you have to have that training effort. So when I first started
training effort. So when I first started learning about GANs, I thought they were really weird and spooky because of the language people used about them. It's
like if you're a software engineer, you have an idea, you write the code and it works or it's buggy and two days later it works, you know. But you'd get people who just developed a neuron net to do something really useful, might be for
medical imaging or something like that, and they'd say, "Oh, yeah, I was training this GAN yesterday, but the but it collapsed." Okay, well, why didn't
it collapsed." Okay, well, why didn't you know it was going to collapse? Cuz
the data is part of the logic with a with a neural network. And when you see a neural network do something interesting like uh you know you've probably seen those ones where it morphs an image of a of a modern street and
then it changes all the tarmac to cobbles and all the uh windows to uh you and the roofs and everything changes to look like an old street. Now that is
kind of a weird sort of process and when you see it happen you think oh I'd love to be able to see inside the neural net because you kind of you imagine that what the neural net has done is kind of
by iteratively training the data it's invented an algorithm to do that thing but it's not really like that you can't really look inside I did I did see that there are some projects there's an Italian team who were building a kind of
visualizer to see what goes on in neural nets but it's that kind of instinct it doesn't pay off because even if you know it's not something you can replicate.
You can't say oh let's take a shortcut and instead of training it for months let's just work out what it should be doing and just prescribe it you know the training is part of it and that's why GANs are so weird you know so GANs have
two networks which train each other so the the archetypal one they tell you about when that you learn about this is uh generating photorealistic human uh uh
images and you start with uh an image creator uh uh which um uh is no better than drawing stickmen. And then you have
a discriminator that can hardly tell a stickman from a photograph and they learn together. They get better and
learn together. They get better and better together. And I thought, well,
better together. And I thought, well, why can't you start with an excellent discriminator and that will just train the train the generator? So, you're
generating images. The discriminator
says good or bad and you train better.
But what happens if you do that is that they end up doing things like just creating the same image over and over again. they don't end up with the with
again. they don't end up with the with the breadth of experience and knowledge.
And that's kind of what's happening in this training. And I thought, well, how
this training. And I thought, well, how can that be true? Why why can't an expert discriminator help training? And
it's something to do with being allowed to make mistakes because sometimes it'll generate a synthetic image uh which the discriminator thinks of as a real image and it'll give false
feedback to the network and then it'll go off in the wrong direction, try lots of other things. So it'll end up trying lots more different stuff out and getting far more skilled at generating
images by sometimes being allowed to get it wrong which an which a excellent pre-trained discriminator wouldn't do.
But you know when I read about GANs I thought this doesn't make any sense you know and it took me you know Josh and I worked for a year just making audio GANs and you know I got I got to understand it but it's very hard in kind of you
know in intuitive terms that's probably the best I could do that that being allowed to make mistakes trains better and that's true with human beings as well you know if somebody just said here is the uh the final year maths book
learn how to do all these things you wouldn't learn that well if you spend years learning all those things and learning how to you know being allowed to make mistakes you develop real skills and real intuitions and uh you know I
I'm not I don't know what I think about the idea of whether artificial intelligence is like human intelligence but it has a lot in common with it you know um even if it's you know I don't I don't get in those philosophical
arguments I I I was a philosophy student and I find most people who uh who do don't know what the hell they're talking about anyway so um I tend to stay in the world of building stuff you know making
practical steps and trying to make new things that's the fun part of course in Babel bit. How do you balance allowing the miners to have that ability
to make mistakes?
>> That's an interesting one cuz obviously my Yeah, at that stage I suppose is it the miners that make the mistakes or the network? Yeah, I suppose it's
that miners do that anyway. So miners will find any way they can to win. So either
they have some great intuition because they're experienced or they just try loads of different things until the scores come out higher. So we in our repo we we bundle the test scripts so
you can try it out as much as you want.
You can get to understand how our validators will be scoring you and uh then when you when you sub make your submission um hopefully that's that's based on quite a bit of experience but there's nothing to stop you submitting
you know every day. We're going to have more fre at the moment that we're doing about about one challenge a day. Uh
we've got about I suppose I think there are about 20 miners on this morning.
When it gets into the uh into the hundreds I guess we'll we'll probably do one an hour. But um yeah, it's interesting. That's that's one I must I
interesting. That's that's one I must I must talk to um uh the other people that are doing the same kind of networks because obviously there all sorts of networks for you know things like storage and compute and whatever.
I see that score have now you know they've now moved to an iterative network to improve their process. Their
old network used to actually do work for customers. So so the uh the incentive
customers. So so the uh the incentive mechanism was divided between everyone that did some work and you did more work you got more money. They're now moving to a you know a winner takes all
approach um as well and we we've you know we we forked their repo. We didn't
write it from scratch. So that's how we got you know got live in two weeks from a standing start you know and uh and then the other one that I've been talking a lot to is Shaq from Ridges and there he's also got this iterative
process. So I think it's yeah we should
process. So I think it's yeah we should definitely start comparing notes on how the what is the mixture of um of ingenuity
in those iterations from the miners and pure optimization. So some people will
pure optimization. So some people will just take a a bunch of code and say I know how to make this run better, run faster, whatever. Other people will say
faster, whatever. Other people will say actually I know how to make it run differently and the results will be better. And um uh and actually I I um
better. And um uh and actually I I um the first time I met Shaq I was talking about my experience of using LLM to generate code and he said oh well if you've solved some of those problems you
should mine for us and you can earn some money. So that that might be something I
money. So that that might be something I do is um spend my weekends being a ridges miner. Uh yeah.
ridges miner. Uh yeah.
>> What do you view the end goal of Babelbit as? Is it develop your own
Babelbit as? Is it develop your own consumer product? Is it just feed an API
consumer product? Is it just feed an API to others?
>> I think an API is easier. I mean, I've worked in consumer products a lot. So,
major things like um you know, the um obviously things like the BBC news website, but um you know, I developed ITVs, catchup TV service, stuff like
that. And those were all building on big
that. And those were all building on big brands. Also did one spin-off of the
brands. Also did one spin-off of the Open University, which was a new brand called FutureLearn. And it's tough.
called FutureLearn. And it's tough.
whereas an API if you can prove that your results are better or your results are the same and it's cheaper um then the sales and marketing the branding is less important it's less of a gamble so
I think even if we did develop consumer products we would definitely develop APIs first because one thing one I one lesson I really got from autonomy was that if you create fundamental
technologies you can sell through expert resellers and expert uh OEM uh um partners who bundle your tech into their products in any sector you like because
you don't have to be an expert in children's products or in uh uh consumer products or in business products. You
can be or in you know legal products versus oil industry products. So um I think that's um that's the main reason is that you can you can get to the whole world if you're just selling the pure tech. So I think that's the yeah it'll
tech. So I think that's the yeah it'll be to replace those API based products like uh you know that Google and AWS and Microsoft have with just lower latency
more accurate translation. There's
another idea I had about the productization which I think would work very well for this kind of context. So,
obviously this is going to be a recorded podcast, but say this was live um and you want an audience of multiple languages that you would set the uh the
parameters so that it's a little bit more generous uh on errors while we're talking so people can keep up with what we're saying. But then you have a
we're saying. But then you have a parallel process which is optimized for accuracy which then gives you a written transcript in your language of the whole thing which is super accurate. So, so
you can't wait while you're listening.
You don't want to be waiting all the time. Uh, so you you can relax on the
time. Uh, so you you can relax on the accuracy uh there. And I think that that's that's one of the most interesting things as we develop this kind of what I referred to before this confidence quotient is this measure
internal measure of how well is the network doing in terms of its predictions. Uh that will be the thing
predictions. Uh that will be the thing that changes. You can say okay well you
that changes. You can say okay well you know in this context uh start start translating when you're only 30% confident. in this other context
confident. in this other context translate only when you're 90% confident you know and that's that's a really interesting balance and that will be definitely an art of how the API works because different you know if you put
that in the hands of the users or the implementers uh they can they can play with that to their hearts content you know and that's uh absolutely the way I see it going but um at the same time you know Apple made a big deal of their
translating AirPods and I kind of feel like uh yeah in if in a year's time they're using Babel bit and they they rebrand their AirPods as air fish or as
babel pods or I don't know.
>> Yeah, put an I in front of it. An I I babel. Yeah. Yeah.
babel. Yeah. Yeah.
>> I babel. I see it.
>> Matthew, I got I got one last question for you here today. And that is since you are a specialist in linguistics and speech, how well what advice would you
have for other specialists when trying to take advantage of Bit Tensor? How can
they utilize Bit Tensor to further their specialty? Well, I mean, I I got, you
specialty? Well, I mean, I I got, you know, I got incredibly lucky um working with Max and Nigel and Tim because they'd done it from scratch. uh they'd
come in and they got a lot of help from other people as well but nowhere near as much as as they helped me and you know I had help from Yuma and others but so the advice I would give is one get get into
the community it's an unbelievably friendly community you know when when Tim was working with Tim he's in Australia it's very hard to find time to meet so I would ask questions of Steph
at microcosmos and you know uh he gave us so much of his time with no you know obviously hopefully he'll make a big sale and we'll be buying data from him forever more. But but at the same time,
forever more. But but at the same time, you know, he really, you know, let us explain our problems in detail, gave us all kinds of advice. Um, helped us with test net, getting up and running, and
it's it's an incredible community, you know, really is. Um, the people I met in Paris were all happy to follow up with me even though I was a complete newbie.
There's no kind of I mean, I'm I'm notorious for not getting embarrassed, you know. I'll say I don't mind looking
you know. I'll say I don't mind looking like an idiot because half the time, you know, you've got to you got to if you're going if you're constantly innovating, you're always going to be a newbie in something, you know. So, so that's the other thing. It's just don't be
other thing. It's just don't be embarrassed. We're all making it up as
embarrassed. We're all making it up as we go along because it's a new field.
And actually, you know, I mean, when at the beginning of this year, detail didn't exist, you know. So you know the the the idea you know now everyone is worried about the tokconomics of their
specific alpha and uh you know a year ago that didn't even exist you know in another year who knows what it would be like and maybe you know uh you know I worry about all kinds of things as everyone does uh you know if for
instance there was no cap anyone could form a subnet and it's a pure free market if anyone wants to join in would that be good or would it be bad you know I mean one way it would be good would be I've got five other ideas. I could put a
subnet out for all of them and see which one does best. You know, that's that sounds like a good idea. Um, another one would be, you know, what happens when the public eye, you know, casts its
spotlight on uh on Babel bit or on Bit Tensor rather. Um, you know, say there
Tensor rather. Um, you know, say there was a huge article about the economics of Bit Tensor in the Financial Times or the Wall Street Journal. Would that be good or would it be bad? Would it mean
that Elon Musk would say, "Hey, I could lower my costs. I'm going to uh create a subnet and I'm going to pay everyone 100 times and kill all the others. Or would
he just buy all the others? Or would he set set up uh you know um uh uh Grock Tensor and uh and just say Grock Tensor is better than Bit Tensor, you know. And
uh you know, so I kind of feel like the only thing to do with almost all aspects of AI is just get involved. And you
know, I got into this position because I started talking about what I was the all the fun I was having with LLM and Nigel reached out to me. Um, but
um, yeah, to be honest, uh, you don't have to have mates that will reach out to you. People are really approachable
to you. People are really approachable and it was apparent in Paris. It's
apparent, you know, online. I've never
encountered anything like that. You
know, there's this weird thing that mostly the subnets aren't competing with each other. So, that's a weird thing,
each other. So, that's a weird thing, isn't it? cuz it's like mostly in the
isn't it? cuz it's like mostly in the startup world if you were all speech people you might think well I don't want to tell you too much about my problems with my algorithm cuz I might give something away that will help you with
your one but you know Steph and uh Will are doing these incredible things with uh you know one of the the most amazing things they're doing is this distributed training of a
single uh network that's just it's mind-blowingly complex and they are ahead of anyone on that you know not just within bit tensor but no one in bit tensor is competing with that So they're
happy to give advice. I think that's that's the advice I would give. Um the
other thing is um something I didn't do.
Think about how the value your product creates can be linked to your subnet's value. So we're all trying to make our
value. So we're all trying to make our subnets more valuable to have a bigger share of the of the staked uh money. So
um my original thought was this. This is
miners are going to replace having a gigantic staff and it'll be much cheaper to run my company because of that. But
what they'll be creating will be software which I can sell in the usual way. So there's basically you know two
way. So there's basically you know two main ways to do it. one is on premises or on device that I sell I sell you the software and you run it on your network or whatever or I have it as a SAS
product in the cloud and how does that benefit my alpha you know how does that benefit an investor on the bit tensor side so now I've been thinking well actually all my ideas
about how prediction can be made more accurate by having customer specific or sector
specific data debts means I could do some kind of link there that if you're if you're willing to pay us to um uh I
guess host your data to make the models better uh for you then uh that can be something which we could charge for as a kind of stake that while we're doing
that you're staking with us and if you want to carry on that's fine. I was
trying to come up with our equivalent of what Hippius are doing uh with storage that there's kind of I mean they're they're doing it with a a separate token even but that idea really was really
ingenious and I I I'm trying I wrote a paper with about five different tokconomics ideas of how I can link the uh uh the software sales to the to the product. And I think that I've been
product. And I think that I've been circulating that with people that know more than I do. and uh had a great meeting with uh with Mog last week and you know he was so passionate about that
aspect of it and I think that that's another bit of advice I would give if you can go into it with an idea where the value created by the product intrinsically can be linked to
supporting the value of your alpha then that uh that will make it a more exciting investment opportunity for people who are staking. So we're
absolutely, you know, intent on doing that. And I think at the moment the best
that. And I think at the moment the best way I mean some people are doing it where they literally have miners who are working for a customer. So they split their incentive between different customers. The miners do work
customers. The miners do work specifically for customers and they they have some kind of link in that way. Um
that's another thing we could do. Um but
uh I didn't really think about that in advance. I don't think people were even
advance. I don't think people were even thinking about that that much 6 months ago. So it's kind of interesting um the
ago. So it's kind of interesting um the way the culture has changed a bit towards this linkage of product revenue and product uh distribution against um
staking towel. Yeah.
staking towel. Yeah.
>> Yeah, >> it is true. Bit tensor moves very fast and I I would like to echo your point of everyone is very approachable.
>> Yeah. You'd be surprised.
>> Well, Matthew, it's been a pleasure having you on today.
>> Thank you.
>> Likewise. Thank you very much. And uh
yeah, I'll see you around.
Loading video analysis...