LongCut logo

Reality Labs Research: A look back, a look forward–Michael Abrash (Facebook Reality Labs)

By Paul G. Allen School

Summary

## Key takeaways - **VR as Second Computing Wave**: Personal computing was the first great wave of human-oriented computing, transforming the world through 2D screens. We are now at the beginning of the second great wave with AR and VR, enabling full sensory immersion and mixing real and virtual in 3D to drive senses at full bandwidth. [08:28], [14:11] - **Reality is Brain Inference**: Our experience of reality is an illusion reconstructed by the brain from sparse sensory signals using internal models and priors, not direct observation. Virtual reality works by providing the right sensory inputs to drive the same inferences as the real world. [15:13], [22:05] - **McGurk Effect Proves Inference**: In the McGurk effect, the same audio track of 'bar bar' is perceived as 'far far' when synced with visuals of lips saying 'far', demonstrating how multisensory integration constructs our reality through inference. Moving eyes between split-screen faces saying 'bar' and 'far' changes what is heard. [19:43], [21:06] - **Research Pillars Strategy**: Oculus Research focused on 10 pillars like displays, audio, and tracking with mission-driven teams led by visionary leaders given free rein to invent full-system prototypes beyond normal product iteration. This critical mass advanced VR across displays like Boba (180° FOV), half-dome verifocal, pancake lenses, and hand-tracking. [24:43], [35:03] - **Orion AR Glasses Breakthrough**: Orion is the most advanced AR glasses prototype: under 100g, holographic displays with wide FOV for cinema screens, see-through to real world with eye contact, despite insane technical challenges. AR glasses enable contextual AI by providing egocentric data. [36:18], [37:00] - **Contextual AI Vision**: Contextual AI in AR glasses understands your goals, history, and 3D environment to proactively assist with reminders, conversation enhancement, object queries, and memory without manual input. It completes Licklider's vision by augmenting internal decision processes alongside external senses. [42:39], [47:08]

Topics Covered

  • Second Wave Drives Senses Fully
  • Reality is Brain's Inferred Illusion
  • Inference Machines Construct Reality
  • Contextual AI Augments Internal Experience

Full Transcript

Welcome everyone to the uh the distinguished lecture series of the Allen School. And uh you guys are all in

Allen School. And uh you guys are all in for a big treat today because our speaker today is Michael Abbrash.

And um if you look up Michael on Google uh it has big picture of him and a description of him with two words,

American programmer, which um to his credit, I told I told him this morning and he nodded and said that's a really good description of me,

but I'm not sure it quite does him justice. Um, so I was thinking maybe we

justice. Um, so I was thinking maybe we should add another word and I'll talk to the people I know at Google uh about this, but um I think we should add wizard at the end. American programmer

wizard. I think that's a better description of Michael Abbrash. So, how

many of you um have heard of the game Quake? Raise your hand. It's pretty good

Quake? Raise your hand. It's pretty good uh for a 30-year-old game. Um uh but for the rest of you uh this was basically

for all intents of purposes the the first real modern 3D computer graphics game on a PC uh that supported all the bells and whistles like real-time texture mapping

30 frames a second uh you know like pure navigation sixed off to and and everything. So, um, and doing that in in

everything. So, um, and doing that in in the '9s was, um, was particularly challenging because there were no GPUs.

There's no graphics hardware. So, you

had to do everything in software. And

so, it took quite a bit of wizardry to pull it off. And it was basically Michael plus another uh, famous hacker, John Carmarmac, who who pulled this off

um, back then. And that really set the stage for modern computer games in and real-time 3D graphics in in a lot of ways. So real milestone and this also

ways. So real milestone and this also built on kind of a a decade of prior work that Michael had done on um being sort of the world's expert on graphics

programming and helping ship DirectX with Microsoft and so forth. So anyway,

I think if you if you fast forward um to today, I mean all this is in some sense ancient history, but I think you can connect the dots to Michael's current

job um uh where he does immersive 3D graphics on, you know, headmounted displays. And so, uh his current job is,

displays. And so, uh his current job is, um he's chief scientist at Meta Meta Reality Labs, uh where he leads, uh a R&D for virtual augmented reality. So, I

think he's gonna tell us a lot more about that today. Um, uh, and please welcome Michael Abbrash. Thank you.

>> So, I wouldn't say wizard. John

Carmarmac, I would say wizard. Um, for

me, what I'd say is I've been around a long time and done a bunch of things and frankly that's not bad, right? I mean,

one of the cool things is you're in an industry that's always changing, but it's always interesting and it's not going to go away. So, not bad.

So, this is my third time speaking here, and it is nice to be back. The first

time was right after I started Oculus Research more than 11 years ago. And

that's a long time. long enough to go through several different names before we finally settled on reality labs research and longer than I've ever worked on one thing before

and it's been quite a trek. Um it turns out that there are many easier things than starting a research lab from

scratch, let alone in a brand new area.

So, you might well ask me, occasionally I might even ask myself why I've kept at this as my hair has turned gray. Oddly,

the best answer I've come up with dates back 86 years. And please forgive the archaic wording.

Every man should be capable of all ideas, and I believe in the future he shall be.

Making that real is why I'm still doing this. And as some of the best

this. And as some of the best technologists in the world, it will also serve you well to keep it in mind.

So what exactly does that mean? Good

question. To answer it, we'll have to look back first 11 and then 68 years, wander through a bunch of technology, and get into some metaphysics and

philosophy.

Let's start with what I expected and why back in 2014 and see how that's panned out.

Here's a quote from a SIGRA talk I gave that summer.

10 years from now, I think VR will be well down the path to transforming how we interact with technology.

Well, it's been 11 years now. And as

Hoffetter's law says, it always takes longer than you expect, even when you take into account Hoffet's law.

I think it would be fair to say that while I still believe in that vision, it has taken longer than I expected to realize VR's full potential.

Fortunately, I also hedge my bets by saying I have little doubt that before 40 years have passed, the daily lives of most people on this planet will include

both AR and VR. We're barely one decade into this, so there's still plenty of time for me to be proven right.

Regardless, AR and VR still feel like a big part of the future to me, and a lot of good things have happened. Quest,

Orion, and Rayban metas are all part of a new class of transformational technology that was nothing more than a concept 11 years ago. And in an

unexpected plot twist, it turns out that sensory immersion with VR and AR is only part of the story. And I got my first hint of that twist when I came to Udub

in 2014, as we will see later.

In a little while, I'll look at the origins of Reality Labs research and some of our results, as well as the journey to contextual AI. But first, I'd like to provide a quick refresher about

exactly why I had and still have all these lofty expectations for VR.

The roots of those expectations lie in the history of humanoriented computing.

And perhaps I've been more able or willing to see that connection because I live through that history.

The day I was born, the world had television radio movies newspapers magazines, record players, landlines,

and believe it or not, telegrams were still a thing. That was the sum total of communication channels other than talking in person. When you stepped out

of the house, there was literally, in the meaning of the word literally, no way for anyone to be able to reach you.

Take a moment and think about how alien that world would feel if you were teleported back to it.

It would feel like part of your brain had been removed. No smartphone, no laptop, no video conferencing, no Uber, no Amazon, no email, no text, no

Facebook. The list of what would be

Facebook. The list of what would be missing is basically a description of how we live today.

That is real change. And it's worth understanding why it happened because it's about to happen again.

A good place to start is by putting the revolution of the last 50 years in context. Tech is all about catching a

context. Tech is all about catching a wave early, riding it, then catching the next one. Mobile, social networks, and

next one. Mobile, social networks, and the like.

But there are waves, and then there are tsunamis.

Most of you have spent your entire lifetime inside just one tsunami riding the smaller waves within it.

That tsunami was the entirely predictable consequence of the advent of personal computing. And it's why the

personal computing. And it's why the world today bears little resemblance to the one I was born into.

I call that tsunami the first great wave of humanoriented computing.

But the first wave was just the opening act. We are at the beginning of the

act. We are at the beginning of the second great wave which is going to change the world far more than the first did.

Today I'll be talking about that second wave. But before we can properly discuss

wave. But before we can properly discuss that we need to understand the first wave. And to do that we need to go back

wave. And to do that we need to go back to its beginning in 1957.

That's when JCR Lick Lighter, who always insisted people call him Lick, encountered the experimental TX2 computer in the basement of Lincoln Labs. The TX2 was one of the first

Labs. The TX2 was one of the first computers that it was possible to interact with in real time. And that

experience led Lick to an epiphany that would eventually transform our world, sum up by this remarkable insight.

The hope is that in not too many years, human brains and computing machines will be coupled together very tightly and that the resulting partnership will think as no human brain has ever thought

and process data in a way not approached by the information handling machines we know today.

And just to round out the picture of perhaps the most important person no one has ever heard of at ARPA in the 60s, Lick started a project he called the intergalactic network which evolved into

the ARPANET which evolved into the internet. He also funded a variety of

internet. He also funded a variety of other research at ARPA that fed into his vision.

One of the researchers lick funded was Doug Angelbart for whom human computer interaction was not just an interesting problem. He felt it was critical for

problem. He felt it was critical for humanity. As the book Bootstrapping put

humanity. As the book Bootstrapping put it, Angelbart felt that the complexity of many of the world's problems was becoming overwhelming and the time for

solving them increasingly short.

What was needed, he determined, was a system to augment human intelligence, co-ransforming or co-evolving both humans and the machines they use.

That line of thinking ultimately led Angelbart to ask this question.

If in your office you as an intellectual worker were supplied with a computer display backed up by a computer that was alive for you all day and was instantly

responsive to every action that you have. How much value could you derive

have. How much value could you derive from that?

In the process of answering that question, Engelbart's team invented the mouse, the windowing interface, and hypertext among other things. In 1968,

he showed the way to the future of humanoriented computing with the mother of all demos, demonstrating what is recognizably the ancestor of the guies

that surround us today.

A good chunk of Angelbart's team eventually went to the Xerox PaloAlto Research Center, Xerox Park, in the 1970s, as did many of the researchers

Lick had been funding. Park created the Alto and refined and extended Engelbart's work to create the Alto's interface, which then directly influenced the Mac team during two

visits in 1979.

And that led to the Mac and Windows and the interactive world we live in today.

It's a measure of Lick and Angelart's impact that every one of you has a direct descendant of Lick's vision running a direct descendant of Angelart's interface in your pocket or

bag. And almost every one of you makes

bag. And almost every one of you makes your living working with another direct descendant.

And yet, as sweeping as that revolution has been, it's unfinished. Here's why.

At the most basic unromantic level, each of us is a CPU with memory, input, and output. Our entire experience of the

output. Our entire experience of the world is a function of the information coming in through our senses and the results that we perceive our actions to cause.

If the information coming in and the results of our actions become more valuable to us, our lives are improved.

Crucially, it does not make any difference where the information comes from or where the actions are expressed, which means that virtual information and

actions can be as valid as real.

That may not feel right to you. We tend

to have an understandable bias toward the real world being more important than the virtual, but realistically, it's already the world we live in. You get

your news from a browser, not from a printed newspaper or a town crier. You

talk to your friends and extended family largely through messaging and video calls and email. You do the vast majority of your work on a computer.

Virtual information and actions are already deeply embedded in our world.

But that embedding has been stuck in an intermediate state for the last 50 years. The revolution lick set in motion

years. The revolution lick set in motion has created a vast virtual world that we interact with constantly but only in a very limited way.

That virtual world is brought into the real world on 2D display surfaces that are kind of a a flaw in spaceime. A

virtual slice inserted into the real world.

Now, that's proven to be tremendously useful, but it engages only a tiny fraction of the available bandwidth of our hands and senses. So, it can only deliver a small subset of the full range

of what we're capable of experiencing and doing.

Which brings us to the second wave. In

order to fully achieve licks vision, we need to be able to drive our senses at full bandwidth with virtual content and mix real and virtual freely in 3D, dedicating as much or as little of our

bandwidth as we want to the virtual world.

But what does it mean to drive our senses?

Fortunately, it doesn't mean producing an exact replica of reality because that would be impossible. It means providing inputs that cause our senses to send the

desired signals to the brain. And that

is a very different proposition.

For example, photons from reality update constantly, but photons from screens update in discrete frames, generally fewer than a 100 times a second. But

because of the way the retina accumulates photons in order to signal the brain and the way the brain assesses motion, virtual images can be

indistinguishable from real.

Now, that's important, but it's not the key. The key is that the reality we

key. The key is that the reality we experience is a reconstruction in our minds.

In fact, it's an illusion, although it's a highly functional one.

Morpheus nailed it in the movie The Matrix.

What is real? How do you define real? If

you are talking about what you can feel, what you can smell, what you can taste and see, then real is simply electrical signals interpreted by your brain.

Morpheus made two critical points with that sentence. First, we our conscious

that sentence. First, we our conscious minds never actually interact with the real world.

Instead, we interact with signals from sensors in our eyes and our ears, in our skin, on our tongue, on our nose, and our balance organs, and throughout our body. We know only what those sensors

body. We know only what those sensors detect, interpret, and signal to the brain. And that's actually a very small

brain. And that's actually a very small subset of the real world.

Consider vision. We only see about a twoderee circle in high resolution. We

have a blind spot in each eye. We have

no blue photo receptors in the center of our vision and everything but the plane we're focusing on is blurry. Rich as it seems to be, our visual data is actually

astonishingly sparse, far too limited to let us directly experience the world.

That may not feel right to you. Hasn't

your whole life been a direct experience of the world? But if you're experiencing the real world directly, why can't you see your blind spot? Why aren't you aware that you can't see blue straight

ahead? Why can't you tell that you're

ahead? Why can't you tell that you're only seeing an area the size of your thumb in high resolution?

It's because your brain infers a model that represents the most likely state of the world at every moment based on the input from your senses, your history, and a vast array of priors. And it's

that model that you actually see, hear, feel, smell, and taste.

At the core of the reality that each and every one of us experiences lies the fact that we're inference machines, not objective observers. By which I mean

objective observers. By which I mean that there is presumably a real world out there. And your brain is taking the

out there. And your brain is taking the very limited signals coming from your senses and its internal model and trying to infer what the state of that real world is. Of

world is. Of course, reality is something we experience, so words can only take us so far. It'll help clarify things if we

far. It'll help clarify things if we look at a few examples where the inference mechanism breaks down. And we

can have a bit of fun doing it. Here's

an example that shows that your visual system is reverse engineering the world rather than just recording it.

Take a moment and figure out which of the two tabletops is wider as measured in 3D 2D and which is longer assuming you rotated them to line up.

Ready?

They're exactly the same size. You think

the shape morphed while it moved? So did

I. The first time I saw it, I had to use a ruler to measure it on the screen to convince myself. We live in a 3D world

convince myself. We live in a 3D world and the 3D objects implied by the 2D shapes of the tables are quite different from each other. Your brain does this calculation automatically for you,

making the assumption that you're looking at a 3D world.

That's wrong in the specific case of trying to compare 2D table sizes here, but in general, it allows you to function in a 3D world, which is where we happen to live. Next, let's look at

motion.

Of course, the straw isn't really going through the window. What's happening

here is that you're making a very reasonable assumption about the world that happens to be wrong.

There are a number of cues on the window that imply a perspective that doesn't exist. So, your visual system

exist. So, your visual system automatically infers that the window is spinning backward for half of a full rotation. However, there are no such

rotation. However, there are no such cues on the straw. As a result, the straw seems to rotate right through the window, even though your conscious mind knows that that doesn't match any

reasonable model of reality.

Our model of the world integrates data across all our senses. So, multi-sense

illusions are even more revealing about the inferential nature of reality. One

of these, the Mgherk effect, is perhaps the best demonstration I've seen of how we create reality.

Bar bar bar bar.

>> Obviously, she's saying bar bar.

Now, let's watch a slightly different video.

>> Bar bar bar far bar far.

>> Here we can clearly hear her saying far far, except that she isn't. The video shows her saying far, but the audio track is the same one of her saying bar that we

heard in the first video. And yet, we clearly heard her saying far. The visual

input is convincing our internal inference mechanism to interpret the audio differently.

To make it crystal clear what's going on, let's look at this one more time.

We'll have the same soundtrack saying bar, but this time we'll have a split screen with a face saying far on one side and a face saying bar on the other.

As this plays, move your eyes from one side to the other and observe how what you hear changes. Again, move your eyes back and forth and see what you hear.

Bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar

bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar bar So now ask yourself what is real depends on what you're looking at not

just the sounds hitting your eardrums you aren't a microphone you're an inference machine that integrates all available evidence to construct the most likely model of the world

in short reality is what our brain reconstructs it to be based on its model of the world and the sparse data coming from our senses. I think it's fair to say that our experience of the world is

an illusion that evolution has honed to be highly functional in terms of survival and reproduction.

Put another way, our experience of the world is nothing more or less than an interface to whatever reality is.

Which brings us to VR. The fact that reality is whatever your mind automatically infers from the nerve impulses sent by your senses based on its model of the world is why Jeremy

Balinsson says all reality is virtual.

And it means that virtual reality doesn't have to replicate the real world in order to work. It just has to provide the right inputs in order to satisfy the relevant senses and drive the correct

inference. To paraphrase Morpheus, some

inference. To paraphrase Morpheus, some rules can be bent, others can be broken.

And so at long last we come back to this quote which I came across in the book The Overstory. It was in the middle of a

The Overstory. It was in the middle of a chapter about Neie, a boy of Indian descent who explores the wonders of early microcomputers and the virtual worlds he can build within them. After

becoming paralyzed, he becomes even more absorbed in creating those worlds. So

much so that his father asks, "How is it that you have turned into a creature of such concentration?"

such concentration?" The book continues. The boy doesn't answer. They both know. Vishnu has put

answer. They both know. Vishnu has put all of living possibility into their little 8-bit microprocessor. And Neilie

will sit in front of the screen until he sets creation free.

I know exactly how Neie felt because I fell in love with my own little 8-bit processor 45 years ago. The first time I saw a creature I had created move across the screen, I knew that I would spend

the rest of my days exploring the endless worlds that were latent in silicon.

And that is the heart of the matter. All

possibilities lie dormant in the virtual world, waiting only for someone to set them free. And that vision of infinite

them free. And that vision of infinite possibility is exactly what VR promises.

All possible human experiences lie somewhere in that space. if we can just bring rich enough information to our senses.

My mission in founding Oculus Research 11 years ago was to start the exploration of those possibilities by driving our senses in a far richer way.

I didn't yet know who Lick was, but it turned out that our visions were fundamentally the same to set creation free.

So, how did I go about that and how has it worked out? Spoiler alert, we have not yet set all of creation free, but we

have made real progress.

At the start, there were only a few things I knew for sure.

First, I knew the goal was to create next generation VR. That meant we had to figure out what would be needed to take VR to the next level over the next 3 to 10 years that wouldn't happen by normal

product iteration. Then work backward to

product iteration. Then work backward to figure out how to get there, inventing whatever was needed that didn't exist.

Second, I saw a set of research pillars that were needed in order to get to the next level and were plausibly tractable within 10 years and made those the focus

of what was then Oculus Research.

Third, I concluded that the only way to properly develop those technologies was as full systems with proof of concept demos suitable to show the path to product rather than as partial advances

suitable for papers. In order to do that, those pillars required structured focused teams and road maps with a full complement of engineering, design, and other disciplines.

Furthermore, while the researchers would set direction, these were teams built around a mission, not researchers with support organizations.

Fourth, each pillar required a leader who not only had an ambitious vision and strong domain skills, but was also capable of building and leading a team around that vision. And then, having

found those leaders, I would need to give them free reign. My job was to bring the right set of people together and support them wherever they wanted to take their research rather than telling

them what to do. My assumption was that they had to know how to advance their domains better than I did. Otherwise,

the whole enterprise was pretty much doomed.

And finally, I realized that the work required to bring the second wave into existence was fundamentally different from anything that had been needed in the last few decades. Xerox Park in the

Mac established the basic model for human computer interaction, a 2D screen with a keyboard and pointer. And

everything since then had been an iteration on that model with progress driven by Moore's law and software.

In contrast, the basic model for the second wave was still unknown and breakthroughs in multiple areas, displays, computer vision, and especially interfaces and interaction

would be needed before that model could be figured out. To be honest, I underestimated both the technical challenges and how radically different the interaction model needed to be. But

I knew that we faced something new and very different and that we needed leaders who were open to new possibilities that were opening up rather than experts anchored to the old paradigm.

So when I accepted Brendan Aribb's offer to start Oculus Research in April of 2014, I knew what needed to happen.

Here's how I describe my thinking at Carnegie Melon that summer. Our approach

is to build a diverse team of leading edge researchers across the spectrum from graphics to haptics to human factors to psychology to optics combined with first rate hardware engineers,

programmers, and facilities. Build a

critical mass of interacting skills around solving VR and start prototyping technologies that look promising over the next 5 to 10 years.

And that is in fact what ended up happening. But the truth is that at the

happening. But the truth is that at the time I had no idea where to start. I had

never even worked in a research lab, much less run one.

I got a lucky break by making my first key hire almost immediately. VR and Doug Landman's research interests were made for each other. So he was interested as soon as he heard about the acquisition.

I already knew Doug from a couple of discussions with him about the cool light field display prototype he did at NVIDIA. And as a brilliant creative

NVIDIA. And as a brilliant creative display researcher who is wellconed and highly respected, he could not have been a better hire. So Doug became our very

first researcher.

That worked out very well indeed. Over

the years, Doug's display systems research team has developed a wide range of immersion technologies and prototypes. For example, the Boba

prototypes. For example, the Boba headset, which allows fields of view as high as 180 by 120 degrees, roughly 90% of the human FOV.

They also developed a half-dme series of verifocal prototypes, showing that proper depth of focus significantly improves both comfort and acuity.

As an example of the kind of system level work Doug's team does here is in halfdme 3. Six liquid crystal lenses are

halfdme 3. Six liquid crystal lenses are driven to sweep through 64 focal planes.

You can see the focal depth smoothly changing at the right as we cycle through different sets of lens states.

And as with every team I'll talk about today, there's more on the way from display systems research that I hope to be sharing before too long.

But as fortunate as it was to get Doug right off the bat, he was just one of 10 pillar leads I needed to hire. Since I

had few contacts in the research world, I decided to take a road trip to make connections and see if I could hire some of the big names in areas like computer

vision displays and audio.

So that summer I delivered talks at Carnegie Melon, the University of North Carolina at Chapel Hill, SIGRA and UDub.

And that road trip laid the foundation for Oculus Research. Let's follow along as I crisscrossed the country.

Spoiler alert number two. I did not hire any big names during that road trip.

At the first stop, the CMU faculty showed zero interest in VR.

But the one professor I had most hoped to meet wasn't there. Yaser Shik, who was at the birth of his first child.

Yaser, who actually was a pretty big name, apologized for his absence via email and promised that we would talk later. A few months after that, we video

later. A few months after that, we video conferenced, which led to a series of discussions about Yaser's vision for true human connection at a distance, and ultimately to the Pittsburgh office,

putting us on the path to enable social teleportation like this.

>> Where am I? Where are you, Mark? Where

where are we?

>> You're in Austin, right? No, I mean this place where we're we're shrouded by darkness with ultra realistic face and it just feels like we're in the

same room. This is really the most

same room. This is really the most incredible thing I've ever seen. And

sorry to be in your personal space.

>> I mean, >> we have done jiu-jitsu before.

>> Yeah. No, I was I was commenting to the team before that even that I I feel like we've choked each other from further distances than it feels like we are right now.

>> I mean, this is just really incredible.

I don't know how to describe with with words. It really feels like

words. It really feels like it feels like we're in the same room.

>> Yeah.

>> Feels like the future.

>> After CMU, I went to Chapel Hill where Desh Minocha insisted I meet one of his graduating PhD students. So, I got a demo from Rish Me and Doug Lanman told

me I should be sure to meet Andrew Mayon who had been an intern under him at NVIDIA. So, I got Andrew to show me his

NVIDIA. So, I got Andrew to show me his latest project and told him that I'd be interested in having him join us when he was ready.

I stayed in touch with Ravish and he joined us that fall. He quickly

developed a roadmap around his vision for augmenting the human audio experience. And his team has now

experience. And his team has now produced improved HRTFs and rich real-time spatial audio for VR as well as the conversation focus feature for Rayban Meta that was announced at Meta

Connect which makes it far more comfortable to carry on conversations in noisy environments.

Andrew, on the other hand, was interested, but ended up going to Microsoft, mostly because we had very little headcount early on and didn't make him an offer. Fortunately, Andrew

didn't hold a grudge, or at least he didn't show it if he did.

Actually, when I ran this past him for review, he said, "I may have held a little grudge."

little grudge." But he added a smiley face, and he did end up with us, so I don't feel too bad.

Anyway, Doug hired Andrew a couple of years later, and while I can't talk about it yet, Andrew has opened up extraordinary new possibilities for visual immersion.

I was contacted out of the blue that summer by a remarkable freelance recruiter, Grant Stanton, simply because he thought VR was cool. He brought in a dozen strong engineers and researchers

over the next couple of years, including Scott Mackey from the Hollowlands team at Microsoft, who led the effort in optics and display components.

That team went on to develop the pancake lenses used in Quest. They also built the tiramisu prototype that was shown at SIGRA in August with better than retinal

resolution and high dynamic range, showing the way to the future of VR.

My next stop was Sigraph where Doug set up a meeting with Rob Wang that led to Rob's threegeear team joining up four months later. And today, Rob's

months later. And today, Rob's handtracking is a core technology for both immersion and interaction.

Doug and I also visited ILM where we met Ronald Malle who joined later that year and formed the physics AI team which has produced key animation rendering and

rigging technology.

And that summer, Rob Caven, who I had worked with on Intel's Larby project, joined and took on leading eyetracking research. Grant Stanton brought in

research. Grant Stanton brought in Warren Hunt to lead graphics. And

finally, Shawn Keller, who was at Microsoft, reached out and joined to lead haptics and dextrous manipulation.

As an amusing aside, the first thing Shawn said to me when we met was, "What have you done, and why should I work for you?" That neatly summed up the

you?" That neatly summed up the challenge I faced in getting Oculus Research off the ground. But in the end, Shawn did join.

My last stop that summer was here at Udub. That visit resulted in world-class

Udub. That visit resulted in world-class immersion technology for VR, including SLAM, VIO, and mixed reality, as well as making tracking work in moving frames of

reference such as cars and planes. But

I'm going to save the details of that visit for later because it also led to something unexpected.

So by the end of 2014, almost all the technology pillars were at least on their way to being started, although it took until mid 2015 to actually finish

signing everyone. And at that point, RL

signing everyone. And at that point, RL research had indeed built, as I promised in my talk at CMU, a critical mass of

interacting skills around solving VR.

And over the next decade, that critical mass enabled us to take VR technology to the next level across displays, computer vision, audio, tracking, and avatars. As

we just saw, the VR industry hasn't advanced quite as much as I would have hoped for in 2014, but it's come a long way, and RL research has more in the

pipeline.

Now, I joined Facebook because of the potential I saw in VR, but Mark was equally interested in immersive AR. So

that became part of the roadmap that first summer on top of everything else.

To be honest, the first time Mark mentioned AR, I said, "But what would you use it for?" Which earned me a look that basically said, "Are you not as smart as I thought?" But I quickly

realized that VR and AR were part of a continuum of mixing real and virtual to bring more valuable information to our senses. And eventually I came to see

senses. And eventually I came to see that AR glasses were the key to more than just cool visuals as we'll see shortly.

AR was initially incubated in Oculus Research and we've continued to do AR research in areas like displays, sensors, and computer vision. 10 years

in the AR effort resulted in this.

This [Music] This is Orion, our first fully functioning prototype.

And if I do say so, the most advanced glasses the world has ever seen.

Now about a decade ago, I uh you I started putting together a team of the best people in the world to uh to build these glasses. And the the requirements

these glasses. And the the requirements are actually pretty simple, but the technical challenges to make them are insane. Um you know, they they need to

insane. Um you know, they they need to be glasses. They're not a headset, no

be glasses. They're not a headset, no wires, less than 100 grams. Uh, they need wide field of view,

holographic displays, sharp enough to pick up details, bright enough to see in different lighting conditions,

large enough to display a cinema screen or multiple monitors for working wherever you go. whether you're you're in a coffee shop or on a plane or wherever you are

and you need to be able to see through them and people need to be able to see that through them too and make eye contact with you. Right? This isn't pass

with you. Right? This isn't pass through.

This is the physical world with holograms overlaid on it.

Husted's law holds true yet again. In

2014, I was confident we'd have shipped immersive AR glasses by now, but Orion was a huge leap, and there's better yet to come.

Looking back, that first summer feels very tenuous. Things could so easily

very tenuous. Things could so easily have worked out very differently.

But in the end, what RL research has accomplished is more than I could have hoped for. And if we haven't yet freed

hoped for. And if we haven't yet freed all of creation, we've at least started down that road.

And yet there's more. Let's circle back to that 2014 UDub visit. I came to Udub to give a talk, but my real objective was to try to entice deer Fox to join

Oculus Research. So after the

Oculus Research. So after the presentation, I had a meeting scheduled with deer. At least I thought it was

with deer. At least I thought it was with deer. But when we got together, he

with deer. But when we got together, he said, "I brought someone I thought you should meet."

should meet." So I sat down to chat with deer's postoc. And 10 minutes later, my head

postoc. And 10 minutes later, my head was exploding from an incredibly powerful vision of the future so dense and wide-ranging that in some ways I'm still wrapping my mind around it 11

years later.

That posttock was Richard Nukem who had created connect fusion and dynamic fusion. But it turned out that that was

fusion. But it turned out that that was just a tiny fraction of the ambitious scope of his real vision. Deer wasn't

interested in Oculus Research, but my talk had made an impression on Richard and he was interested. Maybe over the next seven months, Richard and I talked

endlessly over pizza, coffee, breakfast while he decided where his surreal startup could best pursue his vision.

How mind-bending were those conversations? That's nicely summed up

conversations? That's nicely summed up by the most concise version of his vision that Richard shared during those early meetings.

to collapse the probability distribution for the universe.

Eventually, Richard decided Oculus Research was the right place and joined us.

As mentioned earlier, his surreal team contributed important technology to the VR and AR efforts, including VIO, SLAM, and mixed reality. But it turned out that all that was just to create the

necessary foundation for Richard's real goal. And to help explain what that is,

goal. And to help explain what that is, I need to lay a little groundwork.

Everything I've described so far is what Oculus Research initially set out to do.

But it turns out that bringing richer information to us while enabling us to act more effectively is only part of finishing licks revolution. It's the

part everyone knows from science fiction and it's been the obvious goal since the sword of Damocles in 1968.

But ask yourself this. Once we make it possible to immerse ourselves in rich information and to be able to generate actions intuitively with very little friction, what is it that determines

which information we experience and the effects of our actions?

To put it another way, how will we interact with the second great wave?

What will be the equivalent of the guey for this new world?

Let's think back to Lick's original vision.

Transmitting information through the senses and motor actions is only part of coupling humans and computers.

Everything I've discussed so far has to do with our relationship with the external world. And that is of course

external world. And that is of course very important. But it's not the

very important. But it's not the entirety of the human experience. In

particular, it's not what makes us us.

The core of being human is what happens in our minds. the hypotheses, the plans, and most especially the decisions we make. Schopenhauer noted that with one

make. Schopenhauer noted that with one exception, we can never be truly sure about our interaction with the world.

There's no way to be certain about the source of the information arriving through our senses. As the matrix made clear, the one thing we can be sure about is the will to act. We can be certain when

we formed goals, made decisions, and acted on them. And this process is internal.

Technology has vastly expanded the range and power of our interactions with the external world, but it has barely touched our internal experience.

Even though we spend every day together, my phone has no understanding of my goals or my decision processes. So, it

can't help me with my internal experience.

But if it could, it would revolutionize my life in many ways, ranging from reminders to hearing assistance to context dependent messaging and call handling to vastly improved memory and

much, much more.

The combination of AR glasses and AI that's able to understand your context and act to help you achieve your goals, that is personalized, contextual AI, can

make all that happen.

Consider the conversation focused technology that I mentioned earlier.

Imagine I'm wearing Ray-B band Meta glasses and I sit down in a noisy restaurant with a friend. With current

interface technology, I'd turn on conversation focus and look at the friend while they're talking. When a

second friend sat down, I'd turn to look at them while they talked. So, I'd be able to hear the whole conversation clearly despite the noise, although I might sometimes miss the start of what someone was saying because I wasn't

looking at them at the time. Pretty

good, right? Who wouldn't want to hear better in noisy environments?

Now, imagine instead that when I sit down with someone, the AI on my glasses notices it's noisy and detects I'm in a conversation and automatically enhances

whatever that person says. When a second person sits down, the AI detects that as well and picks up whatever each friend says, regardless of whether I'm even

looking at them at the time. I am not even aware that this is happening. As

far as I'm concerned, my hearing is just working perfectly.

The first scenario is a valuable feature, but thanks to AI that understands my needs and context, the second scenario is the full and proper

augmentation of my perception.

Now, extend that to full contextual awareness. What you're about to see is

awareness. What you're about to see is still very much research work, but imagine that your AI can reconstruct your three-dimensional environment like this

and that it can understand your motion within your environment.

that it can identify the objects around you.

that it can understand your interaction with those objects.

And that it could potentially track the motion of unique objects over time.

Again, this is still research, but once we have glasses that enable that kind of contextual understanding. It doesn't

contextual understanding. It doesn't take a lot of imagination to come up with ways that AI can help us. Questions

like, "How many calories have I had today? Which of these microwaves is top

today? Which of these microwaves is top rated?" And, "How do I fix this bike

rated?" And, "How do I fix this bike wheel?" become easy to answer.

wheel?" become easy to answer.

Relevant notes can pop up proactively in a meeting based on the discussion. You

can recall past experiences. Your calls

and texts can be handled appropriately based on how cognitively and socially engaged you are. And whether it's your mother or a robocall, the mother of all demos showed the way

to the future of humanoriented computing in the form of the guey, which allows a carefully curated set of user actions.

Contextual AI is a very different sort of interface, one that has never existed before, both egocentrically contextual and completely open-ended.

Contextual AI changes the playing field so dramatically that we need to ask ourselves an updated version of Angelart's core question. If in your life you as a member of an

informationrich society were supplied with a computer that was alive for you all day and understood your history and context and was constantly aware of your goals and able to help you achieve them.

How much value could you derive from that?

We'll be refining the answer to that question for decades to come.

11 years ago, contextual AI was Richard's real vision. At the time, I would have said, stacked miracles would be needed to make it happen. And even

one miracle is hard to come by. But as

Vladimir Lenon famously said, there are decades where nothing happens and weeks where decades happen. It wasn't quite that fast, but the pieces of contextual

AI came together very quickly in the last couple of years thanks to three key developments in particular.

First, LLMs exploded onto the scene, providing the AI part of contextual AI.

Second, Rayban metas took off and spurred a rapidly expanding AI wearables landscape, opening the door for egocentric context.

And third, progress was accelerated by Surreal's project Arya research classes, which made it possible to gather the broad range of egocentric data needed

for contextual AI and accelerated again by the more powerful human in the loop Arya 2 glasses.

We've also developed many of the supporting technologies for all day always on contextual AI, including ultra- low power sensors and the electromyiography based wristband for

Orion.

The miracles are stacking up and the technology is coming together. And I'm

confident you'll be hearing much more about contextual AI over the upcoming years.

Together, AR, VR, and contextual AI form the true realization of Lick's vision, spanning the spectrum of human experience, both internal and external.

The ultimate result will be an interface that unlocks the full range of human potential.

RL research is taking the next steps down that road to the future, but that journey will require the efforts of many others in the research community as

well, hopefully, including you.

Somewhere far down that road lies all of living possibility, and every step along it sets more of creation free.

That's the vision we've been pursuing since that first summer so long ago, even if I didn't entirely know it at the time. And now, astonishingly, the scope

time. And now, astonishingly, the scope of our research has truly grown to match this quote. Imagination should be the

this quote. Imagination should be the only limit on the range of human experience and exploration.

Or, as Richard puts it with characteristic understatement, we're creating the future interface for humanity. I wouldn't bet against him.

humanity. I wouldn't bet against him.

Thank you.

[Applause]

Loading...

Loading video analysis...