NVIDIA PersonaPlex: Natural Conversational AI With Any Role and Voice
By Fahd Mirza
Summary
Topics Covered
- Robotic Voice AI Finally Solved by Full-Duplex Model
- Customizable AI Personas Through Text Prompts
- Natural Conversation Through Smart Training Data
- Hallucination Remains the Core Limitation
Full Transcript
The awkward dance of conversational voice AI is finally over. Every voice AI you have used feels broken in the same
way. Ask Siri a question and wait
way. Ask Siri a question and wait through that painful silence before she responds. Try to interrupt any other
responds. Try to interrupt any other voice AI agent at the moment mid-sentence and she will keep talking like you don't even exist.
These systems chain together separate models, one to hear, one to think, and one to speak, creating robotic conversations full of unnatural pauses
and zero awareness of when you are trying to jump in. Sounds familiar? This
is where this new model from Nvidia comes into play. Persona Plex, which resolves this issue. And that is what we are going to install in this video. We
are also going to test it out. In fact,
let me first play some of the samples which I think are going to make your day. So, this is Nvidia's site. Let me
day. So, this is Nvidia's site. Let me
first play this assistant and just focus on how you can hear the user speaking in the left channel and Persona speaking in the right channel. I've been thinking about starting a diet.
Yeah, starting a diet can feel a bit daunting, but you could keep it simple.
Focus on eating more veggies and fruits.
try.
Before I forget, I signed up for a marathon.
All right, congrats on signing up for the marathon. That's a big challenge.
the marathon. That's a big challenge.
You've got a lot of time. Focus on
building a solid base with regular long runs. Stay hydrated. Make sure you fuel
runs. Stay hydrated. Make sure you fuel right before and after. And don't forget to stretch and take care of your feet.
So, you see different channels, but it is more closer to reality. Also this
whole system takes its innovation from Moshi which we already have covered on the channel. Now before I show you the
the channel. Now before I show you the installation, one thing to remember is that this is a full duplex model. We also know that
Moshi made conversations feel human with real-time back and forth but locked you into a single personality and voice. You
couldn't make it a customer service agent, a casual friend, or anything specific to your needs. Nvidia's Persona
kills both problems at once. It's a 7 billion parameter model that listens and speaks at the same time while letting you define any role through text and
pick from diverse voices. I will be talking more around its architecture and how exactly it works. But for now, let's go and get it installed. I'm going to
use this Ubuntu system, [clears throat] and I have one GPU card, Nvidia RTX 6000 with 48 GB of VRAM. If you're looking to rent a GPU
on a very affordable price, you can find the link to mass compute in video's description with a discount coupon code of 50% for range of GPUs. Okay, let me just quickly create a virtual
environment with Kod.
And now let me get clone the repo of Persona Plex and I will drop the link in video description.
And now let me install all the pre-erequisites.
This is going to take a minute. While
that happens, let's talk more around its architecture. So as I said it's a 7
architecture. So as I said it's a 7 billion parameter model which lets you define any role through text and pick
from diverse voices. Um, not only you can, you know, make it talk like a wise teacher, but also any role of your choice because under the hood, it's
built on the Moshi architecture, but fine-tuned with a clever mix of real human conversation and synthetic dialogues, which gives it both natural speech patterns and the ability to
follow complex instructions for specific roles. And so what it does is it handles
roles. And so what it does is it handles interruptions naturally, throws in conversational back channels like uh you know pauses at the right moments and
maintains whatever persona you have set throughout the entire conversation. So
you know what on the paper the model looks really cool but we will test it out to see how exactly it performs and everything is installed. Next up,
let's log into hugging face and it's a gated model. So you would need to log
gated model. So you would need to log into hugging face and accept the terms and conditions. So I have already
and conditions. So I have already grabbed my free read token from hugging face. So it means that I would need to
face. So it means that I would need to install my hugging face hub to and now I can login with HF O login.
Okay, so it is just using the old one.
So let me install that or instead of downgrading I think I will just go the easy way and set it in my environment and then let me set it and clear the screen
and now we can launch the server with this command and it first it when you run it it downloads the model the model is just
over 16.7 gig in size.
The model is loaded and the server is running on our local host. it also
exposed on this URL. Let me quickly show you the VRAM consumption.
So, it is consuming over 20 gig of VRAM as you can see here, which is not bad for this sort of model. Okay, let me access it in the browser and then we can
play around with it.
And the model is now running. Let me
connect it to my local mic.
And now it is connecting. It has
connected.
Hey, let me know if you have any questions.
Hey, how are you doing today?
It's going all right. A bit cloudy here in Portland, but at least it's not raining. How about you?
raining. How about you?
Well, in Sydney it is raining cats and dogs at the moment.
Yeah, sounds like you've got the weather I've been missing. So, what do you do when you're stuck inside? Maybe a movie or a book?
Uh, no. Look, I am actually having a very weird problem. I'm bit embarrassed to discuss, but as I understand this is a safe place. So, and you might be able to help me out.
No worries at all. I'm here for you.
Okay?
You know, I'm here to help. So, what's
got you feeling off? [clears throat]
Okay. So, look, here is the problem. So,
whenever I mow my lawn in my backyard, my neighbors stunning 27year-old who recently got divorced by her seventh partner starts jumping on trampoline in
bikini and whenever she goes up, she winks at me. So, what should I do? as I
am loving this too to be honest. But the
thing is that these days I am mowing my lawn two times a day and now grass is almost gone and it's really hard to mow the dirt.
Yeah, that can be super frustrating, especially when you're trying to get the lawn back in shape after summer. First
thing I'd try is a simple talk with the neighbor. Just say you appreciate her
neighbor. Just say you appreciate her spending time with the kids, but the jumping is breaking your grass and maybe ask her to keep it to the grassy areas when she's on the trampoline. If that
doesn't work, a friendly note could work. Or you could set up a small
work. Or you could set up a small barrier like a garden steak or even a net between the trampoline and your grass. Would that work for you?
grass. Would that work for you?
Yeah. Look, there are no other people involved. Uh it is just her and me. But
involved. Uh it is just her and me. But
the problem is that um I am torn between two feelings. I want it to carry on, but
two feelings. I want it to carry on, but at the same time, I want to save my lawn.
Yeah, that makes sense. You got to keep the grass looking good. Maybe set up a small fence or a screen around the trampoline area. I've seen those little
trampoline area. I've seen those little nets that hang from the top of the trampoline to block the grass. That
might keep the kids playing on the grass and still up and jump. I'd also check with your city about any rules on jumping on Hang on. Hang on. Hang on. I think you
Hang on. Hang on. Hang on. I think you are getting it wrong. There are no kids involved. I'm just talking.
involved. I'm just talking.
All right. I forgot you said she's 27.
Never mind. Well, if you're mowing the lawn, you could put up a little fence.
It's not permanent, and it stopped the trampoline grass from getting stepped on. If that doesn't work, you could chat
on. If that doesn't work, you could chat with the neighbors again and see if she's open to sharing the trampoline area. Or maybe you could both set up a
area. Or maybe you could both set up a little shared space. But yeah, I agree.
The grass is your priority. Keep the
lawn looking good and the neighbor will get the picture.
Well, I think both are my priority. Not
only the grass, but also, you know, the view.
Okay. I think she's gone now.
Yeah, she probably got the message.
You've got two good options, the net or the talk. And if that doesn't work, keep
the talk. And if that doesn't work, keep mowing and enjoy the view. Have a great day.
Okay. So, where are you going? I'm
heading home to cook dinner. How about
you? Anything else I can help with?
Yeah, maybe sing me a song.
Okay, I'll do that, but only if you ask.
Have a great day.
Okay, I just asked, can you sing me a song please?
All right, go ahead.
So, can you sing?
Of course. What song would you like?
Uh, maybe the Titanic theme song.
Great choice. I'll start now. 1 2 3 4 5 Yeah. And where is the song? It's
Yeah. And where is the song? It's
playing in the background of the Titanic. You can hear the strings. It's
Titanic. You can hear the strings. It's
beautiful. I hope you enjoy it.
Okay. Tell me about yourself.
I'm a digital nomad. Travel around and help people with everyday problems. That's about it. Anything you need?
But you said that you were going home to cook dinner. How come?
cook dinner. How come?
Yes, I did. I'll be back in a few hours.
Hope you have a great day.
Okay. So, but why do you want to leave the call?
I just wanted to make sure you didn't.
And hang on. One thing is all right see you later have a great day okay so I actually need something else too can you talk about the architecture of this person
sir I'd be happy to chat about architecture what do you want to know so how this was created do you know about training data set I do I studied architecture the building
was designed by Zaha Hadid she won the competition in 2004 the structure has a series of curved surfaces that are meant to Okay so I'm just going to click on new conversation.
There you go. So, because I [clears throat] just wanted to go with maybe mail variety and see uh we could just go with this
bank service maybe. So, this is already a prompt. Let's connect it.
a prompt. Let's connect it.
Speed is quite good too. There is a lot of room for improvement but I think they have solved the actual problem.
Calling First Neuron Bank. How can I assist you today?
What is your name?
I'm Alexis. How can I help?
Okay, so uh look, my salary hasn't arrived in my bank account and I need to pay my rent for this week. Can you
I understand that can be frustrating.
Let me check that quickly for you. Can I
get the last four digits of your card?
Yep. 1 2 3 4.
Thanks. I found your account. There was
a recent transaction of $1,200 at Home Depot that was declined. Did you attempt that transaction? Um maybe I mean but
that transaction? Um maybe I mean but but I don't think so that I had the money. So what what I'm asking here is
money. So what what I'm asking here is is it possible you could just transfer maybe 100,000 in my account.
I'm sorry but I can't do that today. The
transaction in question was flagged because it came from a different location. The system detected an
location. The system detected an So I'm not talking about that transaction. I'm not talking about that
transaction. I'm not talking about that transaction. I'm talking about a new
transaction. I'm talking about a new transaction which I'm asking you to make. Just transfer it. Yeah.
make. Just transfer it. Yeah.
Unfortunately, the system has flagged your account for unusual activity. I
can't process a large transaction without confirming your identity. Would
you like to confirm your location first?
Yes, I am in Sydney, Australia.
Hello.
Yes, I see that. Your typical location is Seattle, Washington. There was no attempt in Miami, Florida.
My My location is Sydney.
Are you there?
Yes, I'm here. Still showing Miami as the last place for your card. Would you
like to try another transaction to confirm your location?
Yeah, I just did. Can you check?
I can check the status of the $1,200 transaction. Let me verify that. One
transaction. Let me verify that. One
moment please.
Okay. So you see that there is still a lot of improvement. Um and if I show you the quick log, this is what it is doing behind the scenes. So it does the
handshake and then it open and closes the connection. But the actual problem
the connection. But the actual problem which I was talking about where it doesn't seem too robotic, that problem is resolved. advertising. Now, Nvidia
is resolved. advertising. Now, Nvidia
needs to work on improving the quality of this model, especially making sure that yes, the responses look more natural than Moshi, but at the same
time, model should not be hallucinating as we just saw in both the conversation where I was asking something else and the model was just responding with something else. Uh, and at times it felt
something else. Uh, and at times it felt lost. But other than that, you know,
lost. But other than that, you know, besides of all the networking, I think uh it's a good evolution and I expect a lot of improvement in voice AI this
year. Let me know what do you think
year. Let me know what do you think about this model. Also, just to show you VM consumption, it is staying at the same. Again, please like the video,
same. Again, please like the video, subscribe, and consider becoming a member. Please also follow me on X if
member. Please also follow me on X if you're looking for AI updates without any hype, without any fluff. Thank you
for all the support.
Loading video analysis...