Tim Panton - Triming Glass to Glass latency of a Video stream one layer at a time.
By Software Mansion
Summary
## Key takeaways - **Trimming Latency: A Layer-by-Layer Approach**: Achieving ultra-low glass-to-glass latency in video streams is a process of continuous optimization, involving meticulous 'trimming' at each layer of the communication stack. [00:22] - **Race Car Latency: Sound vs. Sight**: For applications like race car telemetry, latency targets are derived from sensory perception; a goal of under 200ms is set, informed by the time it takes for engine sound to reach the pit crew, ensuring real-time awareness. [04:33] - **Beyond GStreamer: Direct Hardware Access**: Significant latency savings (around 40ms) were achieved by bypassing GStreamer and interacting directly with hardware encoders and cameras, leveraging Java's Foreign Functions & Memory API for efficient buffer management. [09:08] - **The Lip Sync Trade-off**: Disabling lip sync in video streams where audio-visual synchronization isn't critical, such as in race car or autonomous vehicle feeds, can yield a surprising 20ms latency reduction by avoiding complex frame alignment. [12:42] - **Browser Latency Differences**: Safari demonstrated a 20ms advantage over Chrome in H.264 rendering due to a more efficient packet handling strategy, although this advantage may be temporary as browser optimizations evolve. [15:06] - **Network Choice Matters: Local SIMs & 4G vs. 5G**: Utilizing local SIM cards instead of roaming and opting for stable 4G over jittery 5G can shave off critical milliseconds by reducing VPN overhead and jitter buffer requirements, respectively. [16:36], [17:38]
Topics Covered
- Why 200 milliseconds is the critical latency target.
- Gstreamer pipelines introduce significant latency.
- Safari renders video 20ms faster than Chrome.
- Aggressive jitter buffer management reduces latency.
- Small, cumulative optimizations halve video latency.
Full Transcript
[Music]
Thanks to Dan for the intro and thanks
for all of you for coming and inviting
me and whatever. So yeah, this is a
actually much more pragmatic than you
might expect from me. This is very kind
of detailed stuff of like how do you get
the glass to glass latency of a video
stream as low as you can? And it turns
out that it's just a series of trimming,
trimming, trimming at each layer. Um, so
I'm Tim Panton. I'm the CTO at Pipe. Um,
I wrote 10 years ago, it turns out, um,
a web RTC stack for small devices and
it's sort of still doing things. It's
still in baby monitors, but it's also
going around racetracks at 250 kph,
which is kind of fun. Um, I write open
source. Um, there's a bunch of protocol
implementations and also, um, something
else. I'm not quite sure what you call
it, but interface for V v for 2 all in
pure Java because well, memory safety
basically. Um, so
this is a race car camera. So this
camera sits over the driver's shoulder
and it sends uh live video and audio to
the pit crew in high quality at low
latency over a long range. um I say long
range but like 5 kilometers typically of
these tracks um in diameter and at high
speeds not only like reasonable bit
rates but also the cars are moving quite
fast um and we use uh public 5G networks
for this and it's not broadcast right
the the the end consumer of this is the
pit crew and the team sponsors it's not
the wider audience um so it's not
broadcast.
And this is what it looks like and
sounds like. Um, V8s are astonishingly
noisy if you take the silencers off,
but um, yeah, it's kind of gets you
going a bit. Um, the other thing we're
doing is, um, with the same technology,
we're putting these cameras, we're
investigating putting these cameras into
autonomous vehicles. Um, with some
funding from the EU, we're doing this
investigation.
Initially they we've done some technical
tests to see whether it makes sense,
whether it works and now we're looking
at um the regulatory aspects of like
whether there's a legislative and a
practical uh requirement for this. Um
and the basically the idea is it allows
a remote human to
see what the autonomous vehicle is
doing. um and one time in a thousand
maybe intervene and tell it that yes, it
is allowed to go around that tree that's
fallen in the road or you know whatever.
Um and probably as a hint rather than
actually driving it, the remote operator
is not going to have a probably not
going to have a steering wheel in their
hand. They're probably going to just
like okay a menu of options that the uh
that the autonomous vehicle provides
them with. Um and our camera works
pretty well for this. This is a test we
did at the at the former Tegel airport
um where you can run autonomous vehicles
without like much difficulty. Um there's
no safety well there safety issues but
they're much constrained. So what
protocol could we use for doing that? Um
I mean it's slightly cheating asking
this question but I'll ask it anyway. So
I asked chat GPT actually and chat GPT
produced this list which you're all
probably pretty familiar with. um
they're all protocols. It was
interesting to see that RTSP is still on
that list. Um which I think is probably
the oldest one there. But anyway, uh so
we g it gave me this list and it
basically what that means is you have to
make a decision about what latency is
acceptable. So, what does latency mean
in the context of a race car? And the
answer is that 10 seconds, which is a
kind of median of those protocols, um,
is a heck of a long way on a racetrack,
right? You start talking about the car
corner you're approaching and he's
actually there, it doesn't make any
sense anymore. So, um, it matters. The
10second thing isn't going to work in
this context. So, what is the goal? like
how do you do what how do you know what
number is enough? Um so there are a
couple of metrics and one of them is
that actually you can hear the gear
changer on a V8 from at least 100 meters
away. Um so we reckon that in order for
that to be kind of roughly right, you
want the speed of sound, you got the
time of the speed of sound of it
actually arriving and it arriving over
your laptop and you want them roughly in
sync. And so that gives you 290
milliseconds based on the speed of sound
in warm air. Um, so that's like gives
you one metric. The other one for the
autonomous vehicles is that it turns out
that the
American drink drive limit for blood
alcohol is roughly equivalent to 200
milliseconds of cognitive impairment. So
that gives you a measure for like how
long you can wait before you make your
mind up. Um so we reckon that aiming for
under 200 milliseconds was a safe bet.
Um that is the low end of what WebRTC
can do. And as far as chat GPT is
concerned, sorry Ali, I know this is
disagreed with but chat GPT tells me
that no other protocol will do that
includingQ
mock. Um so yeah uh I'm sure I'll be
told I'm wrong about that but anyway. So
or chat GPT is wrong. It's not me. Um,
and I do love having an AI to blame for
my mistakes and and to for justifying my
decisions. It's really nice. Um, so this
is a typical pipe device. Um, it's a
it's a Linux box. I don't do these
little devices yet. Dan is braver than
me in that context. Um, so uh it's a
it's a sits there. It runs as an agent
and typically we have an H.264 264 feed
and maybe a microphone and some data and
whatever and we run it over a
peer-to-peer connection with data
channel and over the data channel we can
carry things like LAR and PTZ commands
and setting the hue saturation and
brightness and stuff like that. So
there's kind there's more data that goes
over the data channel. Um it's it's
useful to have that data data channel
around which some of the other protocols
don't support like RTSP doesn't support
a side channel for data. Um so how do I
measure latency right well the easy way
which is basically very easy to do uh
you go to a website and this is
clock.zone zone and you put this clock
on the screen and then you point the
video camera at the screen and then you
put the video output displayed next to
it and you do a screenshot and you
inspect them and you do the subtraction
in your head and you come out with in
this case 350 milliseconds.
The problem is that it's not truly glass
to glass. It's not from that lens to
this screen because it's never left this
screen. Like the the screen cap is
before it's left the glass. And what's
worse, that screen cap could be in some
way synchronized to the rendering engine
because it's on the same hardware and
it's probably done in the same, you
know, GPU or whatever. So, you're you
you've got some correlation risks there.
Um, and it's a pain in the neck to do
each time. you have to kind of do it uh
manually. So, we ended up building this
thing um which is essentially it's doing
the same thing. You you you uh point
your camera at a light source in this
case and and what we're really
simulating is like the brake lights of
the car in front, right? The brake light
comes on and how long does it take from
that light coming on to the remote
supervisor seeing that on their screen?
That's the measure you're looking at.
Um, and so basically what happens is
that the we flash the light in front of
the camera and then we render that
camera onto a screen over the
appropriate video link and then we put a
light sensor on it and the light sensor
and the flash are connected on the same
um, in this case Beagle Bone small
processor and it can measure the time
difference between the two and that
actually and there's an open source um
thing on GitHub of with the software for
that and a description of the hardware
for it, but it can give you a reasonably
accurate measurement down to certainly
10 5 to 10 millisecond accuracy for what
the latency of a of a call is.
So yeah, so having got a way of
measuring it, I've now got get down from
350 to sub 200.
And that's basically a matter of working
through each of the layers and taking
stuff out. So the big one, the big
saving was 40 milliseconds in getting
rid of Gstreamer. Now I love Gstreamer.
It's a great thing to use like it's the
first starting point in all of the
projects to put a Gstreamer project uh
pipeline together and do something.
Problem is it's a pipeline. So
intrinsically it has several frames in
the pipeline. And the thing with
pipelines is nothing comes out until
you've filled it. And so you end up with
a couple of frames worth of latency in a
in a typical Gstream of pipeline. And
however hard you do it, you still end up
with some latency. So what we do what we
ended up doing is is talking direct to
the encoder uh and talking directly to
the camera. Um and and in the this
particular hardware that's not too
difficult. I put it off for a very long
time because I didn't want to do it in
JNI, but I still wanted to use Java. And
Java 24 has this really sweet well 23
technically has this really sweet thing
which uh is foreign functions and memory
which is a replacement for JNI. They
deny it. They say it's nothing to do
with JNI but it's effectively a
replacement for JNI which allows you to
do
memory safe access to foreign DLS or
foreignos which is actually really
sweet. Um, so basically we can mm mapmap
video buffers into Java's memory, tell
Java that that's what it is and it Java
will treat it safely and we can get
callbacks from V forL2 which says when
they've been updated and we can treat
them which I like not as much as I liked
reading reading from dev video like I'm
a plan N guy really at heart and reading
from dev video was a much cleaner way of
doing it but nobody supports that
anymore. So that was 40 milliseconds
which is nice. Um, weirdly moving to
IPv6 from IPv4 will give you around 10
millisecond saving if you can because it
gets rid of that gets rid of that in the
camera because we've got a a 5G modem in
the camera, right? And that's doing its
own uh if it gets a V4 address and then
the carrier gateway is doing another set
of NAT and then the local router and
local Wi-Fi are doing another set of NAT
and each of those is running cues. So it
looks like getting rid of those cues and
getting rid of that nap processing saves
you around 10 milliseconds which is
worth having. And then the same sort of
saving is available if you get rid of
turn candidates. If you don't do turn if
if you can get a direct um peer
genuinely peer-to-peer address to
address session uh then it takes
basically it takes a leg out of the the
trip. Now that doesn't always work.
There are cases where routting via a a
turn server from um someone like
Cloudflare for example will actually be
quicker, but it's typically only quicker
if you change jurisdictions. Like our
users are typically less than 5
kilometers away from the source and so
they're always in the same country,
right? So if we can get and and the turn
servers almost never are. So like if we
can stay in country that's a win. So
yeah, host candidates are a good good
win often but not always.
This was the surprise one of the
surprising ones and some of them were
these things are relatively obvious. Um
but this one I just hadn't seen coming
at all which is if you disable lip sync.
Now you can't see the driver, right?
There's no lips to sync with in this in
any of our videos, right? So
there's we don't care about whether the
audio is synchronized accurately with
the video. And it turns out that doing
that costs you 20 milliseconds. And it's
essentially, as far as I can work out,
is because the there's a the there's no
sensible common factor between the 20
millisecond audio frames and the 33
millisecond video frames. You to get
them to line up, you have to like go to
multiples of 60. And that's even not
even that's not accurate. So it turns
out that the whole thing delays by
pretty much a whole audio frame in order
to sync the video. So there's a 20
millisecond saving there if you lose lip
sync, which I was quite pleased with.
Cheap. Well, it's not actually cheap,
but easy win is to go out and buy a
faster monitor, right? You just go out
and buy a gamer's monitor and you've
saved yourself 10 milliseconds. Um, more
expensive. Go out and buy a faster
internet connection. Get fiber to the
premises and not fiber to the curb. And
get rid of your DSL line, right? Those
will win you
certainly 10 milliseconds, probably
more. I mean, somebody was telling me
they had 2 millisecond ping to Google
from their fiber at home. And and that
would that would save me another that
would be like 18 then. Um,
weirdly, this was one that again
surprised me. I thought it wouldn't make
any difference. I thought moving from
Wi-Fi to Ethernet on a non- busy Wi-Fi
really wouldn't make any difference. It
turns out that's not true. There's more
jitter on Wi-Fi. Even relatively empty
Wi-Fi has more jitter on it than um than
an Ethernet. So, if you get rid of the
the in that in in the step between the
you know the fiber and your your viewing
station and get rid of the Wi-Fi and you
start doing that over fiber, you can
trim the jitter buffer by another 5
milliseconds. It's not a huge saving,
but it's worth having.
Um, and I point out that at this chunk
juncture, we're getting close to 200
milliseconds, which is nice. Now, this
is another total shock, right? I didn't
I actually didn't believe this when I
did it. And and I suspect that it won't
last long, particularly not after this
presentation. Somebody in the Chrome
team will fix it. But at the at last
time I measured it, Safari is on H.264
264 is fast is 20 milliseconds faster at
rendering than Chrome. And it looks like
the reason is really funny, which is
that
Chrome doesn't believe that you've
finished getting frame the packets for a
frame until it sees the marker bit on
the subsequent RTP packet for the next
frame on the first RTP packet for the
next frame. Then it says, "Oh, okay. I
must have must have had the last frame."
and starts rendering it. Whereas Safari
is depacketizing each of the packets in
that frame and interpreting at least at
the null level. So it knows that the
last packet is the last packet and it
kicks it off to the renderer there. Now,
if you're completely filling the
pipeline, that doesn't make any
difference. But if you're, which we
often are with 5G, if you've got excess
uplink, then you may have like half a
frame time between or more between when
your last packet turns up and the first
packet of the next one or the next
frame. So, we can get up to 10 20
millisecond saving by switching to to to
Safari from Chrome. Now, I like I say,
I'm absolutely sure that won't last, but
it will mean that Chrome is faster when
they do it. And it also may not be true
for VP8.
Um, yeah, local asim. This is sort of
one of those more obvious things, which
is that not all SIMs, not all 5G
connectivity in in in the camera is the
same. like you're constrained somewhat
by the economics and the availability,
but if you can get an a SIM from the
local provider, like if I could go down
and if I'd gone out and got an ESIM from
Plus or whoever here and loaded it in
there as opposed to what I'm doing at
the moment, which is roaming from a
European provider, that would save me 15
milliseconds because all of the roaming
providers have to wrap up the local
traffic into a VPN, send it back to
their home country and then send it back
here. And that costs a varying amount,
but of the order of 15 milliseconds.
Now, again, it's not disastrous. It's
like none of these things on their own
are disastrously bad, but if you can
trim each of them out, you can get
closer and closer to the the 200
millisecond target.
So um yeah surprisingly
4G, good 4G is better than bad 5G in the
sense that it has less jitter. It's not
necessarily to do with the actual bit
rate. It's to do with the fact that the
if you get a delayed packet or two,
Chrome or a lost packet or two, Chrome
will grow the GIF Chrome or lib Web RTC
strictly will grow the jitter buffer and
so you'll end up being delayed a bit.
Whereas if you can get a solid 4G
connection that's consistent, then you
won't have that problem. So there's
another 10 milliseconds there. This is
the big win. Actually, this was the
first big win that we had, which is
essentially it's also about managing the
jitter buffer at both ends um of
using bandwidth estimation to be super
aggressive about not letting cues arrive
anywhere in this in the in the path. Um
and and when you do lose a packet, if
it's an old packet, you don't RTX it
because if you RTX it, then what happens
is that like everything gets delayed
until it turns up and then it you end up
with a bigger jettter buffer. So we
basically we keep a much shorter cache
than I'd expected for RTXs and you never
ever rather we never ever resend never
ever RTX anything that's older than the
last fullframe because there's no point.
It'll never get physically rendered onto
the screen and it will grow the jitter
buffer for you whether you want it or
not. So the big kind of takeaway message
of that is we back off super fast the
moment we see anything that uh looks
like a queue forming and then we ramp up
slowly afterwards.
Um this isn't available under many
circumstances but in in a few tracks and
in a few races it is where there's a
private 5G network. So you're not on
public 5G, you're on private 5G. Now
it's the same protocol. So in theory,
you'd think it wouldn't make much
difference, but it there are two factors
that mean that you can save 10 or more
milliseconds. Um, and one of them is
that private 5G the core you can be
running the core on a little rack next
to you and you can basically plug your
Ethernet into where the core is breaking
out your call and so you have no transit
time between the core and the and the
screen. So that's a hu that's a
noticeable win. And then the other win
is and you this isn't exclusively for
private 5G. There are APIs that in
theory will allow in the future one to
do this on public networks. But you can
somewhat tune the network as how many
uplink slots it gives you and when. And
and that can tune.
It doesn't save a huge amount, but it
can save a few milliseconds um of
getting an an uplink slot sooner for for
each uh each outbound packet uplink
packet. So, and if it's a private 5G,
you can mess with it. I mean, the thing
about this camera, right, is that it's
essentially it's an uplink device. Like,
in contrast to almost all normal 5G
usage, which is essentially down link,
this is uplink. Um, so if you can tweak
the parameters such that it's more
uplink friendly, there's a win there.
So after all that, tada, we got 170
milliseconds, which is nice. I mean, you
can't ever get all of it. Um, you can
get close, but not all of it.
So, I have some caveats. Like I said,
you can't get all of these all of the
time. But you can get some of them some
of the time to sort of paraphrase.
Um, you can't always get what you want.
But
all of them depend on being you being in
control of more of the stack. Like the
more of the stack you control, the more
you control which eims are in there,
more you control how much the buffering
is and all of that, the more you can
trim, the more you can manage this
stuff. And basically the the huge
takeaway message is that small wins, all
that the yak shaving that I've ended up
doing, they all add up to actually about
half. Like we've gone from 350 to 170.
So like it's a significant win in a
series of quite small steps.
So yeah, um that's pretty much all I'm
going to see except I'm going to see
whether I can actually show you the
camera in action
maybe.
Let's see
whether
we connect.
And the answer is no.
Well, we are connected actually. That's
weird. There we go. Yeah. So, um, yes,
this is me and I'm waving at it and it's
pretty much in sync. I mean, I'm
guessing that because this is roaming,
I'm guessing we're at 250 milliseconds,
something like that. Um, and then you
can do stuff like like just to show you,
you can like change the we could make me
black and white
or gray and gray actually, but or turn
the contrast up and all of that stuff.
And that's all over the D channel. Yeah.
Anyway, so um let's see if I can get us
out of full screen mode and get us back
to the presentation. Yeah. So yeah, um
contact me, find me afterwards,
obviously on the boat or elsewhere. Um
I'm Tim at pipe um pi.p.
Uh Steelely Glint on care social on the
fediverse. do consulting on open source,
WebRTC,
obviously Pipe, but other random things
as Dan will kind of has proven. I'm
up for stupid challenges sometimes if
they seem amusing. I'm also on the
stupid challenge front a member of the
WebRTC working group. Not all of the
decisions are my fault. So anyway, so
yeah, questions. I'm a little bit deaf,
so shout.
>> Actually, I think that we have a
question from chat. So, let me
Yeah, we have a live stream at
stream.rtc on uhlive
and there is a there is a chat over
there. So, question from mate.
If minimal latency is the goal, what
would even motivate the use of WebRTC
over traditional 5G mobile radio which
offers 10 to 100 milliseconds?
>> So
I don't know what your like I don't have
the license to make phones like it's a
like we need a radio signal. Okay. So,
we're using 5G, but if we want to use
5G, we have to use um
something that's licensed. And then on
top of that, we've got to do something
with the data. We got to render it. It's
got to be in some sort of format. And
so, we've chosen to use I suppose the
choice we ended up making was the fact
that we wanted the renderer in the
browser. So, we wanted something that
was browser compatible because what we
don't want is the teams to have to lug
around another device. Like, it's fine
to sell them this to bolt into the car.
They're okay with that. But what I don't
want to do is like having to have a
screen that they take round with them
that has our custom software in it. Um,
what they actually do is fire up Chrome
or Safari and they view the stream on
that. Um, so that's kind of I think the
answer that that's a very long answer
which should be summarized as we wanted
to be web compatible. Okay.
>> Yeah. So uh quick question about the uh
codec you used for this test. Was it
H264 and did you use hardware encoding
on the device to get the encoding as
fast as possible?
>> Uh yes and yes. I've looked at other
codecs. have looked serious at H.265
and the hardware appears to support it,
but I can't get any browser to render
the H.265 it generates. I've not put
enough time in to find out why. So, it's
on my to-do list. Um, but for the
moment, um, we're on H.264 with a
hardware encoder. And I mean we can so
the trick is the hardware encoder is
fast enough that it will generate a
frame. It will encode a frame before the
next one's view ready from the camera.
So we can do that all in a loop.
>> Right. And then related to this if you
um played with the frame rate we found
that using much faster frame rates
actually speed um reduces the latency
which is
>> Yeah. Um, it also ups the bit rate.
So, so the short answer is no. I mean,
I've thought about it. I played with it
very briefly, but the problem is that
like we're on a bit rate budget as well.
I mean, these things get through
a couple of gigabytes an hour, which on
roaming rates can get expensive. I mean,
not race car expensive, but expensive.
So, um, we're kind of disincclined to
push it too far up, but it's it's
something we will test. And maybe for
the autonomous vehicles, maybe there's
maybe the finances are different. So,
yeah, it's something we will try.
>> Very cool. Thank you.
>> At least two more questions. Uh, one
over there and oh, three more questions.
>> Can I ask?
>> If you have a whoever has a mic can ask
questions.
>> All right. So, um, you came up with like
13 or so ways to shave off a few
milliseconds.
How do did you come up with this list
and can you come up with more even? How
are you going to do that?
>> Is there more to come out? Um, well, I
mean, we've just heard one and and it
has that is on my to-do list and I think
there are a few places. There's there's
a flag in Chrome which ought to do
something and doesn't. there's a um
play out delay hint uh something and in
theory it should be the jitterbuffer
target depth and I changed it and the
graph changed but the timing didn't and
I'm like I have no idea what happened
there. So uh that was only last week so
at some point I'm going to go and bug
somebody on the Chrome team and ask them
to explain what I've done wrong. Um, so
I think there's because it was saying
the jitter buffer was like 60
milliseconds and I could turn it down in
theory to turn it down to 30 and not
care or 40 maybe. So I don't know. Um,
yeah, there are a few places still to
try but in the end 200 is if we can hit
200 that's probably enough.
Um actually it's uh real a little bit
related because most of the savings I
saw it's related to jitter buffer um
like the um jitter the bandwidth
estimations not rating net rating so
forth so can you just uh suggest to the
chromium team because it's a pretty big
market for example the autonomous
vehicle that let's um let's give um full
control over over how to set it so how
to set up the player buffer because it
turns out that it saves a lot,
tremendously a lot.
>> Uh, yes, that's something I was hoping.
Well, so there's a trade-off because as
I said in answer one of the earlier
questions, we're really keen that people
can just use a stock browser. So I don't
want like in theory I could go out and
build a client app, but I don't want to
do that. So I will need to use the
controls that Chrome or or Safari Chrome
and Safari offer me.
>> Yeah, because I saw that at least you I
think 40 60 milliseconds you save
because you just uh tune not just sorry
tuned the the playout buffer. If you
have the full control of the code, you
can write your own playout buffer but
you want it to compatible with the
browser. So it seems that uh we need to
have more control on the playout buffer
in order to reuse the delay for pretty
much and yeah. Yes. And there are gaming
companies who say exactly the same
thing. Yeah.
>> And and so which is one of the reasons
why being a member of the W3CW web RTC
working group is useful. Um but it
doesn't I don't make a browser so I
can't actually make things happen. I can
just hint that it would be nice if they
did and then maybe they do and maybe
they don't. But yes, your your point's
right.
>> Thanks.
uh do you think the uh choice of
language like Java affects the uh
performance or compared to something
more performant like Rust or Go Py and
particular? So it turns out look for
WebRTC there's almost no performance
difference. Like there's a startup cost
in Java which is Jan Dan's joke about my
3minut wait while it loaded. But that
was from an 2015 SD card. So let's like
part of it was just memory read. But
yeah, um so it's not a performance issue
because in any sensible uh device you're
using a hardware encoder and you're
using hardware encryption. Like if
you're not, then you've lost on a small
device. The the the energy budget's too
high. So it actually doesn't make a huge
performance difference. And if I was
doing it again, starting again now, I'd
probably be just using it in Golang and
and and using Pion. But I started this
before Sean started pile. Um and and
it's nice. The other thing is it's like
this is a bit egotistical of me, but
it's really nice to be able to be in
complete control of the whole damn
stack. Like any change I want to make,
it's my problem. I mean, okay, I can't
break it for other customers, but if I
want to add a feature that does
something for this particular use case,
not a problem. I don't have to negotiate
with anyone. I can just go do it.
Yeah. So you mentioned that you were
able to bring down the latency to 170
millisecond, but what is the baseline?
So do you know that what is the actual
like theoretical minimum let's say a
ping round trip time if you were able to
do some direct routing?
I don't think that
I don't think that's necessarily
dominant, right? I think we've got to
the point and if you're on the 5G
private network on the same core, then I
don't think the ping time's dominant. I
think you're now at the point of how
many frames does do you have to hold in
this buffer? How many frames does Chrome
hold in the playout? And and to some
extent,
is there a network buffer in the kernel
that I haven't managed to tune out? So,
and I think once I've got down past the
obvious network steps, I think it's
actually it's at the edges and not in
the middle. Now, on this one, like
there's this is roaming and there's
probably 100 milliseconds of of of
latency between here and there in the
network. But for the rest of it, if like
if it's a private network, that's not
not going to be the case. So, yeah. Um,
I don't know what the theoretical
minimum is. I mean, I guess technically
it's the
twice the frame interval. Like, it's
probably 60 milliseconds, 66.
>> So, thanks Tim. Uh, a couple of, uh,
comments here. um the delay chart that
you had uh from uh CHP
uh about the HSP stuff high efficiency
streaming protocol it doesn't really uh
you use quick I think that was a another
mistake on ch part
>> yay okay so I I mean it just uses HTTP
so there's a separate uh uh track or uh
representation for low delay extension
so joining you know another
representation from the current
representation becomes much faster. So
that's uh you know has been the low
delay low latency extensions for dash uh
support that. So I just wanted to say
that and secondly um you know u JP the
FFMPE guy has shown that uh with the
Quake from one computer to the next over
a local area network he can uh transmit
uh frames uh in about 16 milliseconds
which is one frame duration at 60 frames
per second.
>> Okay.
>> So
>> so it's a fra so the the theoretical
minimum is about one frame interval.
>> Yeah. And then at 60 frames per second
is 16 milliseconds. Uh but there is no I
mean there's no packet loss nothing.
It's just the you know direct connection
between two two computers. And
>> so I've got a factor of 10 to go.
>> Yeah. Exactly. Uh so and uh tomorrow we
are going to show that over the internet
we can get to about three frame latency
around 100 milliseconds with our current
mock implementation. So yeah, I mean
there is no I guess bare minimum but the
frame interval would be the ideal number
in that case. Yeah,
>> thank you.
>> All right, thank you for the wonderful
teaser Ali for tomorrow stock. Uh any
other questions?
If not, oh there is
one more the final one for this talk.
>> Okay, uh I have question or two
questions. Have you tried different
network uh adapters the internet
adapters like uh some gamer ones or
maybe for server ones it may be faster
and related question have you tried
running your uh application on different
operating system as the kernel may have
different optimization in this network
stack
>> I have tried
not it's the short answer is not really
um I we're not really in control over
the receiving end. So, typically the
race engineers will turn up with a like
a ruggedized leavono laptop and I can't
really do much with that. Um, so I'm not
and it typically runs Windows so I'm
like I'm not in control of that that end
of it. this end
I could in theory change the hardware
but like so for example um and this is
one of the things that learn that Dan
will tell you as well that like hardware
changing it is expensive so this has a a
set of fins on the back which go onto
the 5G modem to cool it okay so the
whole of the back plate is a is a heat
sink now if I change the 5G modem we
have to redesign the case so like I'm
disincclined to change the hardware
adapter to change the hardware at all.
Like it's a big cost. Um so I'd have to
be really convinced that it was worth
it. What I have done and what's really
bizarre about this modem is there are I
think four different ways of getting IP
connectivity out of it. Like the the
worst possible case which is funny but
useless is you can actually run PPP over
it. You can it will do it will emulate a
serial port and you can run at whatever
it was. Um it wasn't E, it was C or
something, but it was an AT command
which just fired up PPP. Um and you can
do that. Oh, it's dial. That's right.
You could dial it like it was dial plus
something. Yeah, you can do that and it
will fire up a PPP connection and it'll
run it over a serial port, but you're
limited to like that's disastrously
slow. And then the next one up is it
will it will emulate a USB Ethernet
device, but turns out that's not as
quick as you'd hope. And then the one
that we're actually using is it's
emulating a
PCIe Ethernet and that's actually pretty
quick because you're on the PCIe bus. So
yes, we have played with different
interfaces, but actually with the same
hardware,
but none of those saved more than a
couple of milliseconds.
>> All right, let's wrap it up with the
round of applause for team
Loading video analysis...