Tim Panton - Triming Glass to Glass latency of a Video stream one layer at a time.

By Software Mansion

Summary

## Key takeaways - **Trimming Latency: A Layer-by-Layer Approach**: Achieving ultra-low glass-to-glass latency in video streams is a process of continuous optimization, involving meticulous 'trimming' at each layer of the communication stack. [00:22] - **Race Car Latency: Sound vs. Sight**: For applications like race car telemetry, latency targets are derived from sensory perception; a goal of under 200ms is set, informed by the time it takes for engine sound to reach the pit crew, ensuring real-time awareness. [04:33] - **Beyond GStreamer: Direct Hardware Access**: Significant latency savings (around 40ms) were achieved by bypassing GStreamer and interacting directly with hardware encoders and cameras, leveraging Java's Foreign Functions & Memory API for efficient buffer management. [09:08] - **The Lip Sync Trade-off**: Disabling lip sync in video streams where audio-visual synchronization isn't critical, such as in race car or autonomous vehicle feeds, can yield a surprising 20ms latency reduction by avoiding complex frame alignment. [12:42] - **Browser Latency Differences**: Safari demonstrated a 20ms advantage over Chrome in H.264 rendering due to a more efficient packet handling strategy, although this advantage may be temporary as browser optimizations evolve. [15:06] - **Network Choice Matters: Local SIMs & 4G vs. 5G**: Utilizing local SIM cards instead of roaming and opting for stable 4G over jittery 5G can shave off critical milliseconds by reducing VPN overhead and jitter buffer requirements, respectively. [16:36], [17:38]

Topics Covered

Why 200 milliseconds is the critical latency target.
Gstreamer pipelines introduce significant latency.
Safari renders video 20ms faster than Chrome.
Aggressive jitter buffer management reduces latency.
Small, cumulative optimizations halve video latency.

Full Transcript

[Music]

Thanks to Dan for the intro and thanks

for all of you for coming and inviting

me and whatever. So yeah, this is a

actually much more pragmatic than you

might expect from me. This is very kind

of detailed stuff of like how do you get

the glass to glass latency of a video

stream as low as you can? And it turns

out that it's just a series of trimming,

trimming, trimming at each layer. Um, so

I'm Tim Panton. I'm the CTO at Pipe. Um,

I wrote 10 years ago, it turns out, um,

a web RTC stack for small devices and

it's sort of still doing things. It's

still in baby monitors, but it's also

going around racetracks at 250 kph,

which is kind of fun. Um, I write open

source. Um, there's a bunch of protocol

implementations and also, um, something

else. I'm not quite sure what you call

it, but interface for V v for 2 all in

pure Java because well, memory safety

basically. Um, so

this is a race car camera. So this

camera sits over the driver's shoulder

and it sends uh live video and audio to

the pit crew in high quality at low

latency over a long range. um I say long

range but like 5 kilometers typically of

these tracks um in diameter and at high

speeds not only like reasonable bit

rates but also the cars are moving quite

fast um and we use uh public 5G networks

for this and it's not broadcast right

the the the end consumer of this is the

pit crew and the team sponsors it's not

the wider audience um so it's not

broadcast.

And this is what it looks like and

sounds like. Um, V8s are astonishingly

noisy if you take the silencers off,

but um, yeah, it's kind of gets you

going a bit. Um, the other thing we're

doing is, um, with the same technology,

we're putting these cameras, we're

investigating putting these cameras into

autonomous vehicles. Um, with some

funding from the EU, we're doing this

investigation.

Initially they we've done some technical

tests to see whether it makes sense,

whether it works and now we're looking

at um the regulatory aspects of like

whether there's a legislative and a

practical uh requirement for this. Um

and the basically the idea is it allows

a remote human to

see what the autonomous vehicle is

doing. um and one time in a thousand

maybe intervene and tell it that yes, it

is allowed to go around that tree that's

fallen in the road or you know whatever.

Um and probably as a hint rather than

actually driving it, the remote operator

is not going to have a probably not

going to have a steering wheel in their

hand. They're probably going to just

like okay a menu of options that the uh

that the autonomous vehicle provides

them with. Um and our camera works

pretty well for this. This is a test we

did at the at the former Tegel airport

um where you can run autonomous vehicles

without like much difficulty. Um there's

no safety well there safety issues but

they're much constrained. So what

protocol could we use for doing that? Um

I mean it's slightly cheating asking

this question but I'll ask it anyway. So

I asked chat GPT actually and chat GPT

produced this list which you're all

probably pretty familiar with. um

they're all protocols. It was

interesting to see that RTSP is still on

that list. Um which I think is probably

the oldest one there. But anyway, uh so

we g it gave me this list and it

basically what that means is you have to

make a decision about what latency is

acceptable. So, what does latency mean

in the context of a race car? And the

answer is that 10 seconds, which is a

kind of median of those protocols, um,

is a heck of a long way on a racetrack,

right? You start talking about the car

corner you're approaching and he's

actually there, it doesn't make any

sense anymore. So, um, it matters. The

10second thing isn't going to work in

this context. So, what is the goal? like

how do you do what how do you know what

number is enough? Um so there are a

couple of metrics and one of them is

that actually you can hear the gear

changer on a V8 from at least 100 meters

away. Um so we reckon that in order for

that to be kind of roughly right, you

want the speed of sound, you got the

time of the speed of sound of it

actually arriving and it arriving over

your laptop and you want them roughly in

sync. And so that gives you 290

milliseconds based on the speed of sound

in warm air. Um, so that's like gives

you one metric. The other one for the

autonomous vehicles is that it turns out

that the

American drink drive limit for blood

alcohol is roughly equivalent to 200

milliseconds of cognitive impairment. So

that gives you a measure for like how

long you can wait before you make your

mind up. Um so we reckon that aiming for

under 200 milliseconds was a safe bet.

Um that is the low end of what WebRTC

can do. And as far as chat GPT is

concerned, sorry Ali, I know this is

disagreed with but chat GPT tells me

that no other protocol will do that

includingQ

mock. Um so yeah uh I'm sure I'll be

told I'm wrong about that but anyway. So

or chat GPT is wrong. It's not me. Um,

and I do love having an AI to blame for

my mistakes and and to for justifying my

decisions. It's really nice. Um, so this

is a typical pipe device. Um, it's a

it's a Linux box. I don't do these

little devices yet. Dan is braver than

me in that context. Um, so uh it's a

it's a sits there. It runs as an agent

and typically we have an H.264 264 feed

and maybe a microphone and some data and

whatever and we run it over a

peer-to-peer connection with data

channel and over the data channel we can

carry things like LAR and PTZ commands

and setting the hue saturation and

brightness and stuff like that. So

there's kind there's more data that goes

over the data channel. Um it's it's

useful to have that data data channel

around which some of the other protocols

don't support like RTSP doesn't support

a side channel for data. Um so how do I

measure latency right well the easy way

which is basically very easy to do uh

you go to a website and this is

clock.zone zone and you put this clock

on the screen and then you point the

video camera at the screen and then you

put the video output displayed next to

it and you do a screenshot and you

inspect them and you do the subtraction

in your head and you come out with in

this case 350 milliseconds.

The problem is that it's not truly glass

to glass. It's not from that lens to

this screen because it's never left this

screen. Like the the screen cap is

before it's left the glass. And what's

worse, that screen cap could be in some

way synchronized to the rendering engine

because it's on the same hardware and

it's probably done in the same, you

know, GPU or whatever. So, you're you

you've got some correlation risks there.

Um, and it's a pain in the neck to do

each time. you have to kind of do it uh

manually. So, we ended up building this

thing um which is essentially it's doing

the same thing. You you you uh point

your camera at a light source in this

case and and what we're really

simulating is like the brake lights of

the car in front, right? The brake light

comes on and how long does it take from

that light coming on to the remote

supervisor seeing that on their screen?

That's the measure you're looking at.

Um, and so basically what happens is

that the we flash the light in front of

the camera and then we render that

camera onto a screen over the

appropriate video link and then we put a

light sensor on it and the light sensor

and the flash are connected on the same

um, in this case Beagle Bone small

processor and it can measure the time

difference between the two and that

actually and there's an open source um

thing on GitHub of with the software for

that and a description of the hardware

for it, but it can give you a reasonably

accurate measurement down to certainly

10 5 to 10 millisecond accuracy for what

the latency of a of a call is.

So yeah, so having got a way of

measuring it, I've now got get down from

350 to sub 200.

And that's basically a matter of working

through each of the layers and taking

stuff out. So the big one, the big

saving was 40 milliseconds in getting

rid of Gstreamer. Now I love Gstreamer.

It's a great thing to use like it's the

first starting point in all of the

projects to put a Gstreamer project uh

pipeline together and do something.

Problem is it's a pipeline. So

intrinsically it has several frames in

the pipeline. And the thing with

pipelines is nothing comes out until

you've filled it. And so you end up with

a couple of frames worth of latency in a

in a typical Gstream of pipeline. And

however hard you do it, you still end up

with some latency. So what we do what we

ended up doing is is talking direct to

the encoder uh and talking directly to

the camera. Um and and in the this

particular hardware that's not too

difficult. I put it off for a very long

time because I didn't want to do it in

JNI, but I still wanted to use Java. And

Java 24 has this really sweet well 23

technically has this really sweet thing

which uh is foreign functions and memory

which is a replacement for JNI. They

deny it. They say it's nothing to do

with JNI but it's effectively a

replacement for JNI which allows you to

do

memory safe access to foreign DLS or

foreignos which is actually really

sweet. Um, so basically we can mm mapmap

video buffers into Java's memory, tell

Java that that's what it is and it Java

will treat it safely and we can get

callbacks from V forL2 which says when

they've been updated and we can treat

them which I like not as much as I liked

reading reading from dev video like I'm

a plan N guy really at heart and reading

from dev video was a much cleaner way of

doing it but nobody supports that

anymore. So that was 40 milliseconds

which is nice. Um, weirdly moving to

IPv6 from IPv4 will give you around 10

millisecond saving if you can because it

gets rid of that gets rid of that in the

camera because we've got a a 5G modem in

the camera, right? And that's doing its

own uh if it gets a V4 address and then

the carrier gateway is doing another set

of NAT and then the local router and

local Wi-Fi are doing another set of NAT

and each of those is running cues. So it

looks like getting rid of those cues and

getting rid of that nap processing saves

you around 10 milliseconds which is

worth having. And then the same sort of

saving is available if you get rid of

turn candidates. If you don't do turn if

if you can get a direct um peer

genuinely peer-to-peer address to

address session uh then it takes

basically it takes a leg out of the the

trip. Now that doesn't always work.

There are cases where routting via a a

turn server from um someone like

Cloudflare for example will actually be

quicker, but it's typically only quicker

if you change jurisdictions. Like our

users are typically less than 5

kilometers away from the source and so

they're always in the same country,

right? So if we can get and and the turn

servers almost never are. So like if we

can stay in country that's a win. So

yeah, host candidates are a good good

win often but not always.

This was the surprise one of the

surprising ones and some of them were

these things are relatively obvious. Um

but this one I just hadn't seen coming

at all which is if you disable lip sync.

Now you can't see the driver, right?

There's no lips to sync with in this in

any of our videos, right? So

there's we don't care about whether the

audio is synchronized accurately with

the video. And it turns out that doing

that costs you 20 milliseconds. And it's

essentially, as far as I can work out,

is because the there's a the there's no

sensible common factor between the 20

millisecond audio frames and the 33

millisecond video frames. You to get

them to line up, you have to like go to

multiples of 60. And that's even not

even that's not accurate. So it turns

out that the whole thing delays by

pretty much a whole audio frame in order

to sync the video. So there's a 20

millisecond saving there if you lose lip

sync, which I was quite pleased with.

Cheap. Well, it's not actually cheap,

but easy win is to go out and buy a

faster monitor, right? You just go out

and buy a gamer's monitor and you've

saved yourself 10 milliseconds. Um, more

expensive. Go out and buy a faster

internet connection. Get fiber to the

premises and not fiber to the curb. And

get rid of your DSL line, right? Those

will win you

certainly 10 milliseconds, probably

more. I mean, somebody was telling me

they had 2 millisecond ping to Google

from their fiber at home. And and that

would that would save me another that

would be like 18 then. Um,

weirdly, this was one that again

surprised me. I thought it wouldn't make

any difference. I thought moving from

Wi-Fi to Ethernet on a non- busy Wi-Fi

really wouldn't make any difference. It

turns out that's not true. There's more

jitter on Wi-Fi. Even relatively empty

Wi-Fi has more jitter on it than um than

an Ethernet. So, if you get rid of the

the in that in in the step between the

you know the fiber and your your viewing

station and get rid of the Wi-Fi and you

start doing that over fiber, you can

trim the jitter buffer by another 5

milliseconds. It's not a huge saving,

but it's worth having.

Um, and I point out that at this chunk

juncture, we're getting close to 200

milliseconds, which is nice. Now, this

is another total shock, right? I didn't

I actually didn't believe this when I

did it. And and I suspect that it won't

last long, particularly not after this

presentation. Somebody in the Chrome

team will fix it. But at the at last

time I measured it, Safari is on H.264

264 is fast is 20 milliseconds faster at

rendering than Chrome. And it looks like

the reason is really funny, which is

that

Chrome doesn't believe that you've

finished getting frame the packets for a

frame until it sees the marker bit on

the subsequent RTP packet for the next

frame on the first RTP packet for the

next frame. Then it says, "Oh, okay. I

must have must have had the last frame."

and starts rendering it. Whereas Safari

is depacketizing each of the packets in

that frame and interpreting at least at

the null level. So it knows that the

last packet is the last packet and it

kicks it off to the renderer there. Now,

if you're completely filling the

pipeline, that doesn't make any

difference. But if you're, which we

often are with 5G, if you've got excess

uplink, then you may have like half a

frame time between or more between when

your last packet turns up and the first

packet of the next one or the next

frame. So, we can get up to 10 20

millisecond saving by switching to to to

Safari from Chrome. Now, I like I say,

I'm absolutely sure that won't last, but

it will mean that Chrome is faster when

they do it. And it also may not be true

for VP8.

Um, yeah, local asim. This is sort of

one of those more obvious things, which

is that not all SIMs, not all 5G

connectivity in in in the camera is the

same. like you're constrained somewhat

by the economics and the availability,

but if you can get an a SIM from the

local provider, like if I could go down

and if I'd gone out and got an ESIM from

Plus or whoever here and loaded it in

there as opposed to what I'm doing at

the moment, which is roaming from a

European provider, that would save me 15

milliseconds because all of the roaming

providers have to wrap up the local

traffic into a VPN, send it back to

their home country and then send it back

here. And that costs a varying amount,

but of the order of 15 milliseconds.

Now, again, it's not disastrous. It's

like none of these things on their own

are disastrously bad, but if you can

trim each of them out, you can get

closer and closer to the the 200

millisecond target.

So um yeah surprisingly

4G, good 4G is better than bad 5G in the

sense that it has less jitter. It's not

necessarily to do with the actual bit

rate. It's to do with the fact that the

if you get a delayed packet or two,

Chrome or a lost packet or two, Chrome

will grow the GIF Chrome or lib Web RTC

strictly will grow the jitter buffer and

so you'll end up being delayed a bit.

Whereas if you can get a solid 4G

connection that's consistent, then you

won't have that problem. So there's

another 10 milliseconds there. This is

the big win. Actually, this was the

first big win that we had, which is

essentially it's also about managing the

jitter buffer at both ends um of

using bandwidth estimation to be super

aggressive about not letting cues arrive

anywhere in this in the in the path. Um

and and when you do lose a packet, if

it's an old packet, you don't RTX it

because if you RTX it, then what happens

is that like everything gets delayed

until it turns up and then it you end up

with a bigger jettter buffer. So we

basically we keep a much shorter cache

than I'd expected for RTXs and you never

ever rather we never ever resend never

ever RTX anything that's older than the

last fullframe because there's no point.

It'll never get physically rendered onto

the screen and it will grow the jitter

buffer for you whether you want it or

not. So the big kind of takeaway message

of that is we back off super fast the

moment we see anything that uh looks

like a queue forming and then we ramp up

slowly afterwards.

Um this isn't available under many

circumstances but in in a few tracks and

in a few races it is where there's a

private 5G network. So you're not on

public 5G, you're on private 5G. Now

it's the same protocol. So in theory,

you'd think it wouldn't make much

difference, but it there are two factors

that mean that you can save 10 or more

milliseconds. Um, and one of them is

that private 5G the core you can be

running the core on a little rack next

to you and you can basically plug your

Ethernet into where the core is breaking

out your call and so you have no transit

time between the core and the and the

screen. So that's a hu that's a

noticeable win. And then the other win

is and you this isn't exclusively for

private 5G. There are APIs that in

theory will allow in the future one to

do this on public networks. But you can

somewhat tune the network as how many

uplink slots it gives you and when. And

and that can tune.

It doesn't save a huge amount, but it

can save a few milliseconds um of

getting an an uplink slot sooner for for

each uh each outbound packet uplink

packet. So, and if it's a private 5G,

you can mess with it. I mean, the thing

about this camera, right, is that it's

essentially it's an uplink device. Like,

in contrast to almost all normal 5G

usage, which is essentially down link,

this is uplink. Um, so if you can tweak

the parameters such that it's more

uplink friendly, there's a win there.

So after all that, tada, we got 170

milliseconds, which is nice. I mean, you

can't ever get all of it. Um, you can

get close, but not all of it.

So, I have some caveats. Like I said,

you can't get all of these all of the

time. But you can get some of them some

of the time to sort of paraphrase.

Um, you can't always get what you want.

But

all of them depend on being you being in

control of more of the stack. Like the

more of the stack you control, the more

you control which eims are in there,

more you control how much the buffering

is and all of that, the more you can

trim, the more you can manage this

stuff. And basically the the huge

takeaway message is that small wins, all

that the yak shaving that I've ended up

doing, they all add up to actually about

half. Like we've gone from 350 to 170.

So like it's a significant win in a

series of quite small steps.

So yeah, um that's pretty much all I'm

going to see except I'm going to see

whether I can actually show you the

camera in action

maybe.

Let's see

whether

we connect.

And the answer is no.

Well, we are connected actually. That's

weird. There we go. Yeah. So, um, yes,

this is me and I'm waving at it and it's

pretty much in sync. I mean, I'm

guessing that because this is roaming,

I'm guessing we're at 250 milliseconds,

something like that. Um, and then you

can do stuff like like just to show you,

you can like change the we could make me

black and white

or gray and gray actually, but or turn

the contrast up and all of that stuff.

And that's all over the D channel. Yeah.

Anyway, so um let's see if I can get us

out of full screen mode and get us back

to the presentation. Yeah. So yeah, um

contact me, find me afterwards,

obviously on the boat or elsewhere. Um

I'm Tim at pipe um pi.p.

Uh Steelely Glint on care social on the

fediverse. do consulting on open source,

WebRTC,

obviously Pipe, but other random things

as Dan will kind of has proven. I'm

up for stupid challenges sometimes if

they seem amusing. I'm also on the

stupid challenge front a member of the

WebRTC working group. Not all of the

decisions are my fault. So anyway, so

yeah, questions. I'm a little bit deaf,

so shout.

>> Actually, I think that we have a

question from chat. So, let me

Yeah, we have a live stream at

stream.rtc on uhlive

and there is a there is a chat over

there. So, question from mate.

If minimal latency is the goal, what

would even motivate the use of WebRTC

over traditional 5G mobile radio which

offers 10 to 100 milliseconds?

>> So

I don't know what your like I don't have

the license to make phones like it's a

like we need a radio signal. Okay. So,

we're using 5G, but if we want to use

5G, we have to use um

something that's licensed. And then on

top of that, we've got to do something

with the data. We got to render it. It's

got to be in some sort of format. And

so, we've chosen to use I suppose the

choice we ended up making was the fact

that we wanted the renderer in the

browser. So, we wanted something that

was browser compatible because what we

don't want is the teams to have to lug

around another device. Like, it's fine

to sell them this to bolt into the car.

They're okay with that. But what I don't

want to do is like having to have a

screen that they take round with them

that has our custom software in it. Um,

what they actually do is fire up Chrome

or Safari and they view the stream on

that. Um, so that's kind of I think the

answer that that's a very long answer

which should be summarized as we wanted

to be web compatible. Okay.

>> Yeah. So uh quick question about the uh

codec you used for this test. Was it

H264 and did you use hardware encoding

on the device to get the encoding as

fast as possible?

>> Uh yes and yes. I've looked at other

codecs. have looked serious at H.265

and the hardware appears to support it,

but I can't get any browser to render

the H.265 it generates. I've not put

enough time in to find out why. So, it's

on my to-do list. Um, but for the

moment, um, we're on H.264 with a

hardware encoder. And I mean we can so

the trick is the hardware encoder is

fast enough that it will generate a

frame. It will encode a frame before the

next one's view ready from the camera.

So we can do that all in a loop.

>> Right. And then related to this if you

um played with the frame rate we found

that using much faster frame rates

actually speed um reduces the latency

which is

>> Yeah. Um, it also ups the bit rate.

So, so the short answer is no. I mean,

I've thought about it. I played with it

very briefly, but the problem is that

like we're on a bit rate budget as well.

I mean, these things get through

a couple of gigabytes an hour, which on

roaming rates can get expensive. I mean,

not race car expensive, but expensive.

So, um, we're kind of disincclined to

push it too far up, but it's it's

something we will test. And maybe for

the autonomous vehicles, maybe there's

maybe the finances are different. So,

yeah, it's something we will try.

>> Very cool. Thank you.

>> At least two more questions. Uh, one

over there and oh, three more questions.

>> Can I ask?

>> If you have a whoever has a mic can ask

questions.

>> All right. So, um, you came up with like

13 or so ways to shave off a few

milliseconds.

How do did you come up with this list

and can you come up with more even? How

are you going to do that?

>> Is there more to come out? Um, well, I

mean, we've just heard one and and it

has that is on my to-do list and I think

there are a few places. There's there's

a flag in Chrome which ought to do

something and doesn't. there's a um

play out delay hint uh something and in

theory it should be the jitterbuffer

target depth and I changed it and the

graph changed but the timing didn't and

I'm like I have no idea what happened

there. So uh that was only last week so

at some point I'm going to go and bug

somebody on the Chrome team and ask them

to explain what I've done wrong. Um, so

I think there's because it was saying

the jitter buffer was like 60

milliseconds and I could turn it down in

theory to turn it down to 30 and not

care or 40 maybe. So I don't know. Um,

yeah, there are a few places still to

try but in the end 200 is if we can hit

200 that's probably enough.

Um actually it's uh real a little bit

related because most of the savings I

saw it's related to jitter buffer um

like the um jitter the bandwidth

estimations not rating net rating so

forth so can you just uh suggest to the

chromium team because it's a pretty big

market for example the autonomous

vehicle that let's um let's give um full

control over over how to set it so how

to set up the player buffer because it

turns out that it saves a lot,

tremendously a lot.

>> Uh, yes, that's something I was hoping.

Well, so there's a trade-off because as

I said in answer one of the earlier

questions, we're really keen that people

can just use a stock browser. So I don't

want like in theory I could go out and

build a client app, but I don't want to

do that. So I will need to use the

controls that Chrome or or Safari Chrome

and Safari offer me.

>> Yeah, because I saw that at least you I

think 40 60 milliseconds you save

because you just uh tune not just sorry

tuned the the playout buffer. If you

have the full control of the code, you

can write your own playout buffer but

you want it to compatible with the

browser. So it seems that uh we need to

have more control on the playout buffer

in order to reuse the delay for pretty

much and yeah. Yes. And there are gaming

companies who say exactly the same

thing. Yeah.

>> And and so which is one of the reasons

why being a member of the W3CW web RTC

working group is useful. Um but it

doesn't I don't make a browser so I

can't actually make things happen. I can

just hint that it would be nice if they

did and then maybe they do and maybe

they don't. But yes, your your point's

right.

>> Thanks.

uh do you think the uh choice of

language like Java affects the uh

performance or compared to something

more performant like Rust or Go Py and

particular? So it turns out look for

WebRTC there's almost no performance

difference. Like there's a startup cost

in Java which is Jan Dan's joke about my

3minut wait while it loaded. But that

was from an 2015 SD card. So let's like

part of it was just memory read. But

yeah, um so it's not a performance issue

because in any sensible uh device you're

using a hardware encoder and you're

using hardware encryption. Like if

you're not, then you've lost on a small

device. The the the energy budget's too

high. So it actually doesn't make a huge

performance difference. And if I was

doing it again, starting again now, I'd

probably be just using it in Golang and

and and using Pion. But I started this

before Sean started pile. Um and and

it's nice. The other thing is it's like

this is a bit egotistical of me, but

it's really nice to be able to be in

complete control of the whole damn

stack. Like any change I want to make,

it's my problem. I mean, okay, I can't

break it for other customers, but if I

want to add a feature that does

something for this particular use case,

not a problem. I don't have to negotiate

with anyone. I can just go do it.

Yeah. So you mentioned that you were

able to bring down the latency to 170

millisecond, but what is the baseline?

So do you know that what is the actual

like theoretical minimum let's say a

ping round trip time if you were able to

do some direct routing?

I don't think that

I don't think that's necessarily

dominant, right? I think we've got to

the point and if you're on the 5G

private network on the same core, then I

don't think the ping time's dominant. I

think you're now at the point of how

many frames does do you have to hold in

this buffer? How many frames does Chrome

hold in the playout? And and to some

extent,

is there a network buffer in the kernel

that I haven't managed to tune out? So,

and I think once I've got down past the

obvious network steps, I think it's

actually it's at the edges and not in

the middle. Now, on this one, like

there's this is roaming and there's

probably 100 milliseconds of of of

latency between here and there in the

network. But for the rest of it, if like

if it's a private network, that's not

not going to be the case. So, yeah. Um,

I don't know what the theoretical

minimum is. I mean, I guess technically

it's the

twice the frame interval. Like, it's

probably 60 milliseconds, 66.

>> So, thanks Tim. Uh, a couple of, uh,

comments here. um the delay chart that

you had uh from uh CHP

uh about the HSP stuff high efficiency

streaming protocol it doesn't really uh

you use quick I think that was a another

mistake on ch part

>> yay okay so I I mean it just uses HTTP

so there's a separate uh uh track or uh

representation for low delay extension

so joining you know another

representation from the current

representation becomes much faster. So

that's uh you know has been the low

delay low latency extensions for dash uh

support that. So I just wanted to say

that and secondly um you know u JP the

FFMPE guy has shown that uh with the

Quake from one computer to the next over

a local area network he can uh transmit

uh frames uh in about 16 milliseconds

which is one frame duration at 60 frames

per second.

>> Okay.

>> So

>> so it's a fra so the the theoretical

minimum is about one frame interval.

>> Yeah. And then at 60 frames per second

is 16 milliseconds. Uh but there is no I

mean there's no packet loss nothing.

It's just the you know direct connection

between two two computers. And

>> so I've got a factor of 10 to go.

>> Yeah. Exactly. Uh so and uh tomorrow we

are going to show that over the internet

we can get to about three frame latency

around 100 milliseconds with our current

mock implementation. So yeah, I mean

there is no I guess bare minimum but the

frame interval would be the ideal number

in that case. Yeah,

>> thank you.

>> All right, thank you for the wonderful

teaser Ali for tomorrow stock. Uh any

other questions?

If not, oh there is

one more the final one for this talk.

>> Okay, uh I have question or two

questions. Have you tried different

network uh adapters the internet

adapters like uh some gamer ones or

maybe for server ones it may be faster

and related question have you tried

running your uh application on different

operating system as the kernel may have

different optimization in this network

stack

>> I have tried

not it's the short answer is not really

um I we're not really in control over

the receiving end. So, typically the

race engineers will turn up with a like

a ruggedized leavono laptop and I can't

really do much with that. Um, so I'm not

and it typically runs Windows so I'm

like I'm not in control of that that end

of it. this end

I could in theory change the hardware

but like so for example um and this is

one of the things that learn that Dan

will tell you as well that like hardware

changing it is expensive so this has a a

set of fins on the back which go onto

the 5G modem to cool it okay so the

whole of the back plate is a is a heat

sink now if I change the 5G modem we

have to redesign the case so like I'm

disincclined to change the hardware

adapter to change the hardware at all.

Like it's a big cost. Um so I'd have to

be really convinced that it was worth

it. What I have done and what's really

bizarre about this modem is there are I

think four different ways of getting IP

connectivity out of it. Like the the

worst possible case which is funny but

useless is you can actually run PPP over

it. You can it will do it will emulate a

serial port and you can run at whatever

it was. Um it wasn't E, it was C or

something, but it was an AT command

which just fired up PPP. Um and you can

do that. Oh, it's dial. That's right.

You could dial it like it was dial plus

something. Yeah, you can do that and it

will fire up a PPP connection and it'll

run it over a serial port, but you're

limited to like that's disastrously

slow. And then the next one up is it

will it will emulate a USB Ethernet

device, but turns out that's not as

quick as you'd hope. And then the one

that we're actually using is it's

emulating a

PCIe Ethernet and that's actually pretty

quick because you're on the PCIe bus. So

yes, we have played with different

interfaces, but actually with the same

hardware,

but none of those saved more than a

couple of milliseconds.

>> All right, let's wrap it up with the

round of applause for team

Loading...

Loading video analysis...