Bell's Theorem, a Glitch in Reality

By Richard Behiel

Summary

## Key takeaways - **Bell's Theorem Proves Quantum Non-Locality**: Bell's theorem demonstrates that quantum mechanics is weirdly non-local, meaning something in quantum physics isn't bothered by space and time limitations. Local hidden variable theories cannot reproduce quantum mechanical statistical predictions. [00:30], [00:42] - **Stern-Gerlach Splits Spin into Discrete Outcomes**: In the Stern-Gerlach experiment, a beam of silver atoms passes through a non-uniform magnetic field and splits into two discrete beams due to the unpaired electron's spin 1/2, with spin up (+1/2) and spin down (-1/2). This reveals a quantum effect with only two possible outcomes. [08:18], [11:55] - **Tilted Detector Probability is Cos²(θ/2)**: For a spin-up beam through a second Stern-Gerlach magnet tilted by angle θ, experiments show cos²(θ/2) probability of spin up and sin²(θ/2) for spin down, matching quantum mechanics precisely. Local hidden variable models predict a linear dependence instead. [15:44], [16:07] - **Singlet State Shows Perfect Anti-Correlation**: In the singlet spin state of two spin-1/2 particles, measurements along the same axis always yield opposite spins, with no preferred direction beforehand. Quantum correlation is -cos θ, while local hidden variables fail to match. [01:04:35], [01:05:31] - **Local HV Correlation Always Linear in θ**: Hidden variable models with uniform λ over hemispheres yield linear correlation in angle θ between axes, matching quantum mechanics only at 0°, 90°, 180° but failing elsewhere. The 'sketchy move' warps it artificially but breaks locality for entanglement. [38:35], [39:50] - **Bell Inequality: ε Cannot Be Arbitrarily Small**: Bell's core inequality shows no local hidden variable model can approximate quantum correlations arbitrarily closely; for specific angles like A⊥C, A·B=B·C=1/√2, mismatch ε ≥ √2/4 - 1/2 ≈ 0.1, proving incompatibility. [02:57:36], [02:59:23]

Topics Covered

Quantum Mechanics Proves Non-Local
Local Hidden Variables Contradict QM
Spin Measurements Defy Local Models
Singlet State Enables Perfect Prediction
Bell's Inequality Rules Out Locality

Full Transcript

Hey everyone. Today I have for you a genuine glitch in reality that's going to blow your mind and change the whole way you think about everything. So it's

called Bell's theorem and this is one of the most mysterious, unsettling, magnificent results in all of theoretical physics. So let's talk about

theoretical physics. So let's talk about it.

Bell's theorem demonstrates that quantum mechanics is weirdly non-local.

That is, there's something going on with quantum physics that doesn't seem to be bothered by the limitations of space and time. Now, of course, much has been said

time. Now, of course, much has been said about this, including in various popular science uh articles and videos and all that sort of thing. You often hear about

quantum entanglement, spooky action at a distance, and all that kind of stuff.

And there's often some crossover with uh sci-fi, about communication systems that work faster than light and all that. And

there's also kind of this woo woo connotation about consciousness and all that sort of thing. And those are all really fanciful notions, but in many cases, what you hear about Bell's

theorem and quantum entanglement and all that is not well grounded in the actual physics and the math of quantum mechanics.

And so I wanted to make a video where we actually really get into the technical details of what exactly did Bell teach us about the nature of reality. And so I

wanted to go through his famous legendary 1964 paper, you know, word for word, equation for equation. I want to really dive into it and explore with you

exactly what is his argument and what does it imply about the nature of reality.

I should point out in case you don't know, I recently made a video on the Einstein Podilski Rosen paradox which is definitely a prequel to this video.

In fact, Bell's legendary 1964 paper is called on the Einstein Podilski Rosen paradox. Okay, so this is a followup to

paradox. Okay, so this is a followup to the argument that Einstein, Podolski, and Rosen put forward back in 1935 in which they looked at quantum mechanics and said, "Hey, wait a minute.

Something's wrong here. Something's

paradoxical. Either quantum mechanics is super weird or maybe it's just incomplete."

incomplete." And so almost 30 years after that, John Stewart Bell thought about it real hard and was like, "You know what? Sorry

Einstein and friends, actually quantum mechanics is not incomplete, but rather it's just really weird and genuinely non-local in at least in some subtle ways." So that's the context in which

ways." So that's the context in which Bell wrote this paper. It's a follow-up to the argument put forward by Einstein, Podilski, and Rosen. So before watching this video, I do recommend watching my

video on the EPR paradox. Or if you haven't seen that video, but you're just familiar with the EPR paradox, then that's cool, too. You don't have to get your info from me. I'm just one of many sources on this beautiful internet.

All right, then let's get into the paper. Well, first of all, this paper is

paper. Well, first of all, this paper is broken up into six parts. Part one is the introduction.

Part two is the formulation where we sort of define our terms and think about what it is we're going to be thinking about. Part three is an illustration of

about. Part three is an illustration of some examples.

And part four has the main argument of the paper in which we find that if you try to explain quantum physics using a local hidden variable theory, you run into a contradiction. In part five, the

ideas are generalized. And in part six, we have our conclusion. So those are the six parts of this paper. We're going to go through them one at a time. And in

between these, I'm also going to have some animations and some information and equations that provide context because one thing you got to know about this

paper is it is so cryptic and it is so dense with equations and very few words that if you just try to read it, it's really hard actually. You really got to

take your time with this one. And so

we're going to take our time and I'm going to have related animations and equations to help us along and to fill in the gaps in the paper where it's assumed that the reader is going to be imagining a certain thing in mind when

they read it. Oh, and speaking of, I've put a link to the PDF in the description below the video. And I definitely recommend printing out this paper so that you have it for reference as we go

through it. If you don't have a printer,

through it. If you don't have a printer, that's fine, but then you should open it up on another screen or another tab or something.

All right. So now it's time to get into the introduction of the paper. The paper

begins. The paradox of Einstein, Podilski, and Rosen was advanced as an argument that quantum mechanics could not be a complete theory, but should be

supplemented by additional variables.

Remember at the end of the EPR paper they talked about how quantum physics is incomplete and it's missing something and you have to put variables into quantum physics in order to have it provide a complete description of

reality.

These additional variables were to restore to the theory causality and locality and that's often called local causality.

It's just the idea that cause and effect should propagate such that an object is only affected by its immediate surroundings. as opposed to some kind of

surroundings. as opposed to some kind of weird teleportation or spooky action at a distance. So Einstein and friends

a distance. So Einstein and friends argued that you have to put some kind of additional variables into quantum mechanics in order to resolve the EPR paradox and give quantum mechanics local

causality.

In this note that is Bell's paper, that idea will be formulated mathematically and shown to be incompatible with the statistical predictions of quantum mechanics. So that's what we're going to

mechanics. So that's what we're going to do today. We're going to mathematically

do today. We're going to mathematically explore the concept of hidden additional variables in quantum mechanics and show that it doesn't work and that therefore

quantum mechanics genuinely does exhibit non-local phenomena which is crazy. Like

that goes against everything we think we know about the nature of reality.

Anyway, it is the requirement of locality or more precisely that the result of a measurement on one system be unaffected by operations on a distant

system with which it has interacted in the past. That creates the essential

the past. That creates the essential difficulty. So the hidden variable story

difficulty. So the hidden variable story doesn't work if you require the theory to be local. There have been attempts to show that even without such a

separability or locality requirement, no hidden variable interpretation of quantum mechanics is possible.

These attempts have been examined elsewhere and found wanting. That is to say, actually, you can make a hidden variable interpretation of quantum mechanics work if you relax the

constraint of locality. But then it's like what's the point, right? Moreover,

a hidden variable interpretation of elementary quantum theory has been explicitly constructed. Here he's

explicitly constructed. Here he's referring to bombian mechanics. That

particular interpretation bomb mechanics has indeed a grossly non-local structure. Famously bow mechanics is a

structure. Famously bow mechanics is a non-local theory. This the non-locality

non-local theory. This the non-locality is characteristic according to the results to be proved here of any such theory which reproduces exactly the

quantum mechanical predictions. That is

to say, what we're going to show in this paper is that if you want a theory that matches the quantum mechanical statistics and you want it to involve hidden variables as advocated for by

Einstein, Prolski, and Rosen, then necessarily you're going to end up with a non-local theory. And of course, that non-locality is the same kind of dilemma that you end up having to confront if

you just take quantum mechanics at face value in which it does appear to be a non-local theory. So, no matter how you

non-local theory. So, no matter how you look at it, there's some weird non-local stuff going on in quantum mechanics.

All right. Now, before going further, I want to say a few words about spin 1/2 particles because spin 1/2 particles are the main characters of this paper. And

so, it'll be helpful to review some of the main points regarding the experiment and theory of spin 1/2 particles.

So on the experimental side for sure the most important and famous spin 1/2 experiment is the stern gerlock experiment. The way this experiment

experiment. The way this experiment works is imagine that you have an oven and inside the oven you put some silver and the oven is so hot that the silver

atoms start to evaporate and fly around with crazy high speeds and some of them are going to fly out of a hole in the oven. And then suppose you have some

oven. And then suppose you have some kind of apparatus called a columator so that we end up with a line of silver atoms flying in a particular direction.

And also suppose this whole experiment happens in a vacuum so that the silver atoms aren't bumping into air as they fly along. Now then this beam of atoms

fly along. Now then this beam of atoms is directed to fly through a strong non-uniform magnetic field. And

amazingly, what happens is that magnetic field somehow splits the beam of atoms into two beams. And it's like, what what's going on with that two beams? Why

do we have two beams? How can it be that you have one beam of atoms coming in and you have two beams going out? Well, the

key to understanding this is that a silver atom is electrically neutral.

It's 47 protons perfectly cancel out.

It's 47 electrons because it's just a neutral atom. It's not ionized. But if

neutral atom. It's not ionized. But if

you look at the electrons in a silver atom, you find that all of the electrons are paired up in their various orbitals, but there remains a single unpaired

electron in the 5s orbital.

And so for all of the paired electrons in the silver atom, their spins cancel each other out. But the unpaired 5s electron has a spin of 1/2 because an

electron is a spin 1/2 particle. And as

a result, it's sort of like the whole silver atom behaves like an electrically neutral spin 1/2 particle. So that

unpaired electron spin gives the whole atom a tiny magnetic moment. That is it makes the silver atom sort of like a tiny little magnet.

I should also say the nucleus of the silver atom also has a net spin of 1/2.

But because the nucleus is so tightly packed compared to the electrons, the magnetic effect of the nuclear spin is thousands of times smaller than the

magnetic effect of the electron spin. So

for all intents and purposes, it doesn't matter in this experiment.

So then what happens to the silver atoms as they're flying through this apparatus is that the initial beam is totally thermally random. I mean, you're talking

thermally random. I mean, you're talking about evaporated silver atoms. There's no preferred directionality to the spin.

It's all a random distribution over the spin directions. But then as they fly

spin directions. But then as they fly through the sternerlock magnet, for some reason the spins get projected either onto purely spin up or purely spin down.

And that's really weird because it's not this distribution of some continuous quantity. No, it's a quantum like either

quantity. No, it's a quantum like either up or down. There's only two options that it can be, which is super weird, right? This is a very quantum effect.

right? This is a very quantum effect.

And so then if we want to say okay well these two states are going to be separated by one quantum unit then you realize that given the symmetry of the situation since both beams are deflected

by equal amounts we can say that spin up is associated with a quantity of plus 1/2 and spin down is associated with a quantity of - 1/2. So that the

difference between plus one/2 and minus one/2 is one quantum unit. And so that's why we call this a spin 1/2 particle.

Okay. So we have two discrete beams. And clearly there's something weirdly quantum going on here. But what's really going on here? You know, cuz the story I just told about spin 1/2 and the

electron, it's like a little magnet and it separates out. What does that really mean? Like physically, how should we

mean? Like physically, how should we imagine that? Well, in a moment I'll

imagine that? Well, in a moment I'll tell you a little bit of the quantum theory and then we'll also imagine some kind of speculative hidden variable theory and we'll see that those don't really work. So, we'll get into the

really work. So, we'll get into the theory in a moment, but for now I actually want to stick on the experimental side of things so that we can learn a little bit more about how

spin 1/2 particles actually behave.

So, imagine we do a sternerlock experiment where we have a beam of silver atoms flying through. It goes

through the sternerlock magnet and it splits into two beams, spin up and spin down. Now suppose we put a wall so that

down. Now suppose we put a wall so that all the spin down atoms hit the wall and they stop going. But then the spin up atoms, they can fly right through and they can keep going. And now we have a

beam of spin up atoms. So then we line it up and pass it through another sternerlock magnet that's oriented along the same axis, the same direction in space. Well, then an amazing thing

space. Well, then an amazing thing happens, which is that in the second Stern Gerlock magnet, we only see a spin up beam. There's no spin down. And I

up beam. There's no spin down. And I

guess that's not too surprising. It kind

of makes sense because we start off with a random beam of silver atoms. We split that into a spin up and a spin down. And

then we reme-measure and we find, okay, there's only spin up. Yeah. Okay, that's

not too mind-blowing. That kind of makes a lot of sense, right? And remember, all of this is happening in a vacuum chamber. So there's no air molecules

chamber. So there's no air molecules that the silver atoms are bumping into cuz if there were, then we could imagine the beam kind of rerandomizing. You

know, eventually the silver atoms are slamming into air molecules and getting all reoriented and all that sort of thing. So this is all happening inside a

thing. So this is all happening inside a vacuum chamber. What this two-stage

vacuum chamber. What this two-stage Stern Gerlock experiment shows is that spin is a state that the atom is in, right? It's a property that persists

right? It's a property that persists with the atom and has some continuity across time. So that it makes sense to

across time. So that it makes sense to say this is a spin up atom at least for now. You know, I mean, it can bump into

now. You know, I mean, it can bump into something and change its spin. But

supposing it doesn't, then it can continue on in that spin- up state for some amount of time. So that's cool.

That gives us some sense of the physicality of spin. But we're still left with the mysterious question of why do we have two discrete options for a spin measurement anyway as opposed to

some continuous range of outcomes? And

how should we visualize a spin state?

Well, again, we'll talk about the theory of that in just a moment, but there's one more experimental thing I want to show you before we get there. What we're

going to do now is imagine slightly rotating the second magnet by some small angle theta. And then a magical thing

angle theta. And then a magical thing happens. The second beam now mostly

happens. The second beam now mostly comes out as spin up. But now there's also a spin down beam as well. And it's

very subtle because all the spin up atoms that are flying through the second detector, most of them are going to come out spin up. But every now and then there is a chance that it'll come out

spin down. And so if you think about

spin down. And so if you think about many atoms flying through and so it's sort of like a continuous beam situation, then imagine a very bright

spin up beam and a dull but nonzero spin down beam. And so then the question

down beam. And so then the question becomes what is the probability of it being spin up versus spin down in this kind of an experiment? And there's

actually a very good agreement between quantum mechanics and experimental results which show that for the atoms passing through the second magnet they have a cosine^ squar theta /2

probability of being spin up and likewise a sin^ 2 theta /2 probability of being spin down. Remember that

cosine^ 2 + sin^ square is 1. So those

probabilities add up to one 100%. And

we're going to take that as sort of a ground truth for this video.

This cosine^ 2 / 2 sin^ square. We're

going to take that as an absolute fact about reality because it has been measured in many experiments and it is a pretty direct result of quantum theory.

Oh, and one thing I should say in this diagram, you see that second beam is still horizontal even though I tilted the picture of the detector. In reality,

if you're doing an experiment like this, you would want to realign the second beam so that it comes in parallel to the detector. But there are ways of doing

detector. But there are ways of doing that without modifying the spin state of the particle. So I just didn't show that

the particle. So I just didn't show that in this diagram because I wanted to keep things simple. Actually, let me show you

things simple. Actually, let me show you this. This is a cool much better

this. This is a cool much better diagram. So this comes from Wikipedia.

diagram. So this comes from Wikipedia.

Shout outs to Clara Kate Jones for making this beautiful diagram. What this

diagram shows is a two-stage Stern Gerlock experiment. The particle beam

Gerlock experiment. The particle beam comes in. You get a 50-50 split between

comes in. You get a 50-50 split between spin up and spin down denoted as Z plus and Z minus. You know, because we're measuring along the Z-axis.

Then we send that second beam through the second detector. The second detector appears to be tilted, but is actually just in alignment with the way the Z plus beam comes out of the first

detector.

But now I want to look at something really cool, which is what if the second detector measures along a whole different axis. So, for example, if the

different axis. So, for example, if the second detector measures along the x-axis, the spin up particle beam goes through the second detector and then

splits into a 50/50 probability mix of being spin left or spin right. By the

way, instead of spin left and spin right, let's use the language spin up along x and spin down along x. So you

see when we say spin up and spin down, it's always with reference to a measurement axis and spin up is going to be the beam which goes up relative to that axis. Okay? So we can always use

that axis. Okay? So we can always use the words spin up and spin down. But in

this experiment, you can also think about it as spin left and spin right when we're measuring along the x-axis.

I suppose this experiment is not too surprising either because we see that the particles come in spin up. We

wouldn't really expect any kind of probabilistic biases as far as spin left, spin right because all we know is that the particles are all spin up and up is perpendicular to left and right.

So it'd be kind of weird if the second particle beam had some kind of bias towards left and right, right? Like

where would that come from? We should

still expect some kind of randomness along the x direction. Okay, so that doesn't really blow your mind, but this next part will.

See, imagine we have a three-stage experiment where the particle beam comes in, the first detector splits into spin up and spin down. We send only the spin up through. Then the second detector

up through. Then the second detector measures along X. So we get our spin left and our spin right or in other words along X. We can talk about it in terms of spin up and spin down along X.

And then suppose we only allow the spin up along X beam to go through. Then we

measure again along the Zaxis. And the

craziest thing happens. Look what we get. We get a 50/50 particle beam of

get. We get a 50/50 particle beam of spin up or spin down along Z. Well, how

can that be? Because the first magnet already filtered out all of the spin down along Z. So, shouldn't we expect for the outgoing beam, we should have only spin- ups, right? Isn't that what

we should expect is only spin up along Z because the first magnet already filtered out the spin down. But no, in reality in experiments, you get a 50/50

spin up along Z. So what is going on there? That's very strange. And the

there? That's very strange. And the

reason this is so strange is that we know that spin is a property of the atom. We know that it's a physical thing

atom. We know that it's a physical thing that the atom carries with it as it moves along. Right? Right? I mean, we

moves along. Right? Right? I mean, we thought about this earlier and we realized, yeah, the Stern Gerlock experiment shows us that spin is a state that the atom can be and it's a property of the atom at some moment in time. And

so, how can it be that if we've filtered out the spin down along Z atoms, somehow after the third detector, we get spin down along Z? Like, what's happening there? How can spin be a conserved

there? How can spin be a conserved quantity if it comes back like that?

Like, what's going on? Now, what I'm showing here, this is just an experimental fact. This is the reality.

experimental fact. This is the reality.

And then as people, it's on us to figure out how do we tell a story that makes sense of this reality. And so in just a moment, I'm going to tell you the quantum story, which is going to explain

what's happening here. And the long story short of that is when you measure the spin along some axis, the particle forgets its spin information along the other axis because you're resetting the

spin state of the particle. you're

projecting it into a spin igen state of whatever axis you most recently measured it on. And so once you measure it spin

it on. And so once you measure it spin up spin down along X, now all of a sudden if it's in a spin up along Xigg state, that has equal 50/50 odds of being measured spin up or spin down

along Z. But then of course when you

along Z. But then of course when you learn quantum physics you're always thinking about this is so weird and so strange and I don't like it and surely there's some kind of more classical

explanation with some kind of hidden variable. Surely there's some kind of

variable. Surely there's some kind of secret behavior happening inside the atom or to do with these detectors.

Maybe the detectors are modifying the atom in such a way as to flip them up and flip them down and kind of reset their state. All right. So when you

their state. All right. So when you learn quantum physics, you yearn for a more sane explanation.

And especially, you know what would be really nice is if we didn't have all these weird quantum probabilities, right? So wouldn't it be cool if we can

right? So wouldn't it be cool if we can come up with some kind of explanation for what's going on in the Stern Gerlock experiment, but rather than this confusing quantum story with wave functions and states, what if we can

come up with some kind of more classical deterministic model of what's going on here? Even though such models don't

here? Even though such models don't work, it's still very helpful to give it a try, see what we can come up with, and then when we figure out the way in which the model doesn't work, that'll help us

appreciate why we need quantum mechanics, even though it's super weird.

And seeing the failure of these local hidden variable models is going to segue very nicely into the core argument of Bell's paper. All right. So, I want to

Bell's paper. All right. So, I want to return to this picture of the two-stage Stern Gerlock experiment where we use the first magnet just to filter out the spin down atoms and give us a beam of

nice pure spin up atoms. Then, we're going to send those through a second detector tilted relative to the first by an angle of theta. And as we talked about earlier, the probability of the

atom being spin up in the second detector is going to be cosine^ 2 of theta / 2. In this plot, we put the theta angle along the x-axis and we put

the percentage probability that it'll be spin up on the y-axis.

So on the far left of this plot, you can see that we have a 100% chance of measuring spin up when the second detector is tilted 0° relative to the first. That is when they're in

first. That is when they're in alignment. A spin up coming in is always

alignment. A spin up coming in is always a spin up going out. On the opposite extreme, if you imagine we put the second detector all the way upside down,

180 degrees tilted, then relative to that orientation, the detector is going to say, "Hey, every particle spin down."

And now that's not too surprising because all that is is we're flipping the second detector around. So what was defined as spin up is now relative to the second detector spin down. And so

really, we don't have to think about an angle of all the way up to 180° because the interesting stuff happens with a tilt angle between 0 and 90°. And beyond

that point, there's a kind of symmetry where it's the same thing, but it's just everything's flipped relative to before.

And speaking of 90°, if we tilted the second detector 90°, then we'd have a 50/50 chance of an incoming spin up atom going out as either spin up or spin

down.

Here's an animation, and this will give us a more dynamic picture of what's going on here. So, we have our incoming beam of silver atoms coming in from the left. They go through the first

left. They go through the first detector. We split out, spin up, spin

detector. We split out, spin up, spin down. The spin- ups keep going. And on

down. The spin- ups keep going. And on

the right, what I'm showing here, and this is just a rectangle, so it's kind of abstract, but all I mean to indicate there is we're doing a spin measurement along the axis symbolized by the

orientation of that rectangle.

And as the rectangle goes back and forth, you can kind of get a feel for how the relative probability of measuring spin up and spin down along that second measurement axis changes as

a function of the angle.

On one extreme, when the detectors are aligned, spin up is always spin up. On

the other hand, when the detector is 90°, we get a 50/50 split. And in

between, we get a probability which goes with this cosine^ 2 theta / 2 curve.

Now this equation the cosine^ square of theta / 2 comes from the spinner math of what happens when you project a spin state relative to one axis onto another

axis. But all of that spinner math and

axis. But all of that spinner math and projection and all that that's the weird quantum stuff we don't want to have to deal with if we don't have to. So when

we're trying to come up with a hidden variable explanation, we want to think in terms of some kind of quantity that we can attach to each particle. maybe

some kind of arrow that indicates some sort of direction. And you know, one of the first things that comes to mind when you think about the Stern Gerlock

experiment is maybe each incoming atom has some kind of vector-like directional quantity associated with it and then maybe the detector sort of flips that

vector up or down as the particle passes through.

Now, I'm not saying that's the case. I'm

just saying that's kind of something that we might instinctively or intuitively think might be the case. And

so let's go ahead and test our intuition against logic and reason and see if it actually holds up. So what I'm showing here is an animation where we have these

atoms coming in and there's a yellow vector associated with each one of them which encodes some sort of orientational direction like thing that goes with the

atom. And so for the sake of argument,

atom. And so for the sake of argument, we can say our incoming beam should have a random distribution over those vector angles because these are evaporated silver atoms and it's all thermally

random. Then suppose we claim that what

random. Then suppose we claim that what a sternerlock magnet does is it's going to flip that arrow either up or down.

And then if it flips it up, it sends it upwards. If it flips it down, it sends

upwards. If it flips it down, it sends it downwards.

Well, at first glance, an explanation like this seems like it could possibly be kind of what's going on here. This is

a model where the Sternlock magnet plays a really active role in aligning the particle a certain way. And whether or not it flips up or flips down, we can

say the rule there is just if the vector is pointing even a little bit up, it goes up. If it's pointing even a little

goes up. If it's pointing even a little bit down, it goes down. If it's pointing perfectly horizontal, well, in reality, nothing's perfectly horizontal. There's

probability zero of that happening. And

even if it did happen, it happens so rarely you'd never even notice.

You know, the cool thing about physics is that you can put an idea forward and you can really propose it like, hey, maybe this is how it is. But one of the rules of physics is you have to stick to

whatever principles you propose. But

then if you can show that your own principle leads to a contradiction, well then sorry, but you have to redesign your model. Okay. So what I want to show

your model. Okay. So what I want to show now is that this assumption that the sternerlock magnet flips up or flips down the atom is actually not consistent with the experimental data. And the

reason is actually very simple and you can totally see it which is that if you have a two-stage Stern Gerlock experiment where the second detector is tilted. We know from the experimental

tilted. We know from the experimental data that when the second detector is tilted then some of the particles should sometimes come out spin down even if

they went in as spin up.

But if we tilt the detector anywhere between 0° and all the way up to 89.9°, then by this rule that the sternerlock magnet is going to flip the particle in

whichever way it was already kind of pointing in. Well, that leads us to see

pointing in. Well, that leads us to see that an incoming beam of spin up is always going to come out spin up.

And so right there you see that this model doesn't actually work by our own principle that we put forward about these arrows getting flipped up or flipped down and and all that it doesn't

work. It just doesn't match the

work. It just doesn't match the two-stage Stern Gerlock experiment.

And so whatever is going on with spin, it's not that. It's something else.

So what do we do? Well, just because our model didn't work doesn't mean we can't massage it into something that might work.

So let's go ahead and see if we can massage our model into something which matches the experimental data at least better than our first attempt which kind of matched the data in the case of one

sternerlock magnet but failed miserably when we had two and the second one was tilted. Well, okay. So what if we did

tilted. Well, okay. So what if we did this? Let's say that a sternerlock

this? Let's say that a sternerlock magnet doesn't actually flip the particle up or down, right? Because if

it does that, then as we've seen, the second detector is just going to give us a bunch of spin ups and no spin downs.

So let's say instead of flipping the arrow up or down, the Stern Gerlock magnet just kind of passively sorts these particles based on whether their vector points a little bit up or a

little bit down.

And so any vector that points even a little bit up, that gets sent towards the up beam. And any vector that points a little bit down, that atom goes in the down beam. But the sternerlock magnet

down beam. But the sternerlock magnet doesn't change the direction of that vector.

So maybe this vector represents a kind of classical spin axis. Then in this model, the angular momentum of the particle would be conserved as it passes

through the detector. But somehow and for some reason, the detector is just sorting the incoming particles into two beams depending on whether they're a little bit up or a little bit down.

Well, you know, there's a problem with this model, which is that philosophically, it's starting to feel a bit contrived because it's hard to reconcile the fact that we see two

discrete beams with such a passive thing going on at the detector.

Because at least before when we thought that maybe the magnet just flips the thing up or flips the thing down, there you have kind of a naturally physically dichomous situation where yeah, it's a

sword, but then it's also an action where the particles are really separated out in a binary way.

So if you have a more passive situation where it's just a sword, you kind of have to wonder, well then how is it that we get two sharp beams? But never mind all that because even though it seems

implausible, that's different than it being illogical or impossible or incoherent. You know, nature is weird.

incoherent. You know, nature is weird.

So maybe this is how it is. But now if we take this model and pass it through a second sternerlock magnet, the question comes up of does this model match the

data? In particular, do we find a

data? In particular, do we find a cosine^ squar theta / 2 of an incoming spin up remaining spin up versus a sin^ square theta /2 probability of it going

spin down? Well, if you just look at the

spin down? Well, if you just look at the animation shown here, you can see that at first glance it kind of does seem to work because when the second detector is

not tilted at all, anything coming in spin up is going to go out spin up. So

that's good. at theta equals 0, this model matches experiment.

And then if you imagine at 90°, well, there it's a 50/50 because coming in the spin up beam, that's just going to be a vector that's pointing up a little bit, but the distribution is totally random

as far as left and right. And so when the detector is tilted at 90°, that could go either way at that point, you know. And so there again, we find

know. And so there again, we find another angle at which our model matches the data. And another wonderful thing

the data. And another wonderful thing about this model is that for intermediate angles, it kind of seems like it would fit the data. You know, if you tilt the detector like 45°, you can

see there's kind of a chance that it would be spin down versus spin up. And

so at first, this feels very exciting and very promising.

But when you think through it carefully, you realize that this model actually doesn't quite match the cosine squared statistics that we get from the

experiment and from quantum physics because instead of a cosine squared function, it's actually just a linear function in theta. And that's actually a very important point. So I want to

linger on that for a moment and I want to see exactly why this model gives us a probability which is linear in theta. So

you think about the fact that we have evaporated silver atoms coming in and presumably they're all going to be randomly oriented. And so if we want to

randomly oriented. And so if we want to come up with a picture that involves this hidden variable of an orientational vector-like degree of freedom, call it lambda, then the situation we're

describing here begins with lambda vectors chosen totally at random as far as their direction is concerned. And if

you like, you can imagine lambda is being selected uniformly from the unit circle. or if you want to be fully

circle. or if you want to be fully three-dimensional, the unit sphere.

Although, as we're about to see, it actually really doesn't matter whether we think about it in terms of a two-dimensional situation or a three-dimensional situation. In either

three-dimensional situation. In either case, we find the same linear trend. All

right, then. So, the particle passes through the first sternlock magnet and all of these vectors lambda that were pointing a little bit downwards get filtered out. They go in the spin down

filtered out. They go in the spin down beam and we block that. But then if the vector is pointing even a little bit up then it keeps passing through and then it moves on to the next sterner lock

detector.

So let's go ahead and use the vector P to symbolize the polarization vector that is the axis of measurement for the first sterning lock magnet. You see here based on the diagram that all of the

particles that have made it through our filter are all going to be measured spin up if they're measured again perfectly along the direction P with no tilt angle.

And so that's what it means experimentally to prepare some spin 1/2 particles some firmians with the spin polarization along the vector P. It

means that for sure we know if we measure the spin along P we're going to get spin up.

Now then what can we say about that hidden variable vector lambda? Well, we

can say that the particles that are allowed through necessarily have lambda which is somewhere in the northern hemisphere. that is the hemisphere that

hemisphere. that is the hemisphere that points in the same kind of direction as the polarization vector P. Or in other words, these are the lambda such that lambda.p is greater than zero. And the

lambda.p is greater than zero. And the

lambda are still going to be uniformly distributed around that hemisphere because they came in uniformly distributed around the sphere and we've just cut it in half. So now we want to

ask the question of what is the probability of a particle with some lambda vector being measured spin up in the second detector which would happen

in our local hidden variable model if lambda. A is greater than zero. That is

lambda. A is greater than zero. That is

if the lambda vector happens to be pointing in the same hemisphere as the measurement axis a. And when you think about it, you realize that the probability of lambda measuring spin up

depends on the overlap of the lambda hemisphere and the a hemisphere.

See, cuz if we draw a and then we think about the hemisphere of vectors that point in kind of the same direction as a that is for which the vector a is positive, you realize that the set of

all lambdas which are going to be measured spin up is precisely the overlap between the lambda hemisphere and the a hemisphere. And given that lambda is going to have a uniform

probability distribution, we can see then that the probability of measuring spin up is just going to be the fraction of the lambda hemisphere that overlaps with A. And the probability of it

with A. And the probability of it measuring spin down is going to be the fraction of lambda's hemisphere that does not overlap with A. And if you see that, then you see one of the core

concepts of Bell's paper. We're going to describe this slightly differently in a moment when we get into the paper and it's going to be a little bit more complicated, but this right here is a very fundamental insight. Imagining

rotating hemispheres and seeing how the overlap varies linearly. That is a mental image that you want to keep in mind as we get into parts three and four of the paper. All right, then. So, just

to be really formal about this, let's go ahead and say that theta is the tilt angle between our polarization vector P and our measurement axis vector A. And

then I want you to go ahead and imagine rotating theta from 0 to pi or 180 if you want to talk in terms of degrees.

Well, when you start off with theta equals 0, p and a are aligned the same way. And there's a complete overlap

way. And there's a complete overlap between the lambda hemisphere and the a hemisphere. And so you have a 100%

hemisphere. And so you have a 100% chance, guaranteed chance that when theta is zero, you're going to measure the particle spin up. But now imagine theta growing and growing until theta

equals 90° or p<unk> /2 radians. Well,

at that point you're going to have a 50/50 overlap between the lambda hemisphere and the a hemisphere. And so

then you're going to have a 50/50 chance of measuring spin up versus spin down.

And then if you go ahead and flip it all the way around 180° A and P are perfectly antiparallel, then it'll be guaranteed that you'll measure spin down

for a theta of 180°. Bearing in mind that spin down is relative to that upside down vector a. Now these three points for which theta is 0, theta is 90

and theta is 180° all of those actually do match the experimental data and quantum mechanics. So that's all good.

quantum mechanics. So that's all good.

But what's not all good is that linear dependence on the probability of measuring spin up as a function of the angle theta. And you can see that linear

angle theta. And you can see that linear dependence just based on the way the area fraction changes as you slide theta around and you change the overlap between these two hemispheres.

You know, one way to think about the probability logic here is just imagine you're playing one of those board games that has the spinner thing and you spin the thing and then the probability that it lands on some wedge is just going to

be the wedge area. Well, yeah. So when

you think about that kind of logic and then you think about the wedge area of the overlap between the hemispheres and the way it changes you can see that the probability is indeed linear in theta.

But now that linearity is actually a real problem because from experiments and from quantum mechanics we can very confidently say that the probability of

measuring the particle spin up is not linear in the tilt angle theta but rather it's the cosine^ square of theta / 2. And that fact that cosine squared

/ 2. And that fact that cosine squared curvy fact makes our linear model very hard to believe because the math is wrong. the statistical predictions of

wrong. the statistical predictions of our model are not the true statistics of the situation.

So what do we do? We just give up. Well,

we actually should give up because as we'll see in this, you know, the whole paper is about how local hidden variable models don't work. But let's not give up yet. Let's be very stubborn, okay?

yet. Let's be very stubborn, okay?

Because technically there is a way that we can fix this particular model for this particular situation.

And the way in which we do that is going to involve a concept which we'll see later on in the paper. So we're going to try to save this model somehow. And the

way that we're going to try to do that is going to be illustrative and teach us something about the situation. Even

though ultimately this fix is going to break down when we later on start looking at quantum entanglement.

All right. Then so the way to fix the model is to define an effective measurement axis. Call that a prime. and

measurement axis. Call that a prime. and

define that as the measurement axis A tilted towards the polarization vector P such that the equation 1 - 2 theta prime

pi= cosine of theta is satisfied. Now

here by theta prime I mean the tilt angle between the polarization vector p and the effective measurement axis a prime which has been magically tilted in towards the polarization vector p. And

when you look at this equation here with the 1 - 2 theta prime pi, that is a linear equation. And then you look on

linear equation. And then you look on the right hand side and that's a cosine.

Now this equation here, it's not immediately obvious what this has to do with cosine^ 2 thet. In just a minute though, we're going to talk about expectation values and cosine of theta.

And then when we come back to this equation later on in the paper, it'll make more sense why exactly it has the form that it does. But I don't want to get into that just now because it's a bit of a tangent. For now, all I want to

say is that this equation involving theta prime and theta is going to warp the linear probability dependence of our model which is linear and theta is going

to warp that into the cosine^ 2 theta /2 curve that we expect from quantum mechanics. And in fact, that is the

mechanics. And in fact, that is the definition of where this theta prime and theta equation comes from. So this trick is actually a lot simpler than it seems

because when you think about what we have here, as we've seen, our model works when theta is 0, when theta is 90°, when theta is 180, but it breaks down in between because we have a line

instead of a cosine squar. And so all this trick is is just saying that we can go ahead and warp that line into that cosine squared curve simply by saying that the effective measurement axis that

the particle is actually being measured along is not the A that we thought it was but is actually this A tilted slightly towards the polarization vector P. And by doing that we can go ahead and

P. And by doing that we can go ahead and bend the statistical predictions of our model in such a way as to make it match the experimental data and also quantum mechanics.

Now, the first time you hear this, I mean, you should be thinking, "Rich, come on now. What? This is absurd. We

should not tolerate this. We should not go along with this." Your eyebrow should raise skeptically to the point where your forehead starts to get sore. Like,

there's just no credible way to justify this move, this little trick that we're doing. And so, for that reason, I want

doing. And so, for that reason, I want to go ahead and call this the sketchy move. I know it's kind of a playful

move. I know it's kind of a playful terminology, but there's a couple of good reasons why we want to call it this. First of all, it's a concept that

this. First of all, it's a concept that we're going to see a couple more times throughout the paper. And then secondly, I want to emphasize that this move is not illegal. It's not logically

not illegal. It's not logically impossible. Technically, it doesn't

impossible. Technically, it doesn't violate locality. There's nothing uh

violate locality. There's nothing uh physically impossible going on when we put forward this model. But it's

extremely sketchy and hard to believe because it raises so many questions. Why

should the effective measurement axis be a prime? And also, how is it then that

a prime? And also, how is it then that we have the polarization vector and also our hidden variable lambda vector that we both have to take into account?

Because the polarization vector bends the effective measurement axis. Then we

also have this lambda vector and what's going on there? And our whole model starts to become complicated and contrived and very very hard to believe.

But we're not going to dismiss it just yet. because later when we think about

yet. because later when we think about quantum entanglement, we're going to prove that even the sketchy move is no longer enough to save our model or any local hidden variable model. And that's

really at the heart of Bell's theorem.

So in summary, by going along with the sketchy move for now, we're being maximally open-minded, we're giving the local hidden variable perspective every benefit of the doubt. So that later on

when we absolutely destroy local hidden variables, when we crush this idea, we'll say, "Look, we even allowed the sketchy move and that still wasn't enough to make it work."

Now, I want to take just a moment to talk about the kind of mathematical vocabulary we use in quantum physics when we're describing measuring the spin

of a spin 1/2 particle along some direction, call it a. And to do that you often see this expression sigma a. Let

me tell you what that is. So we have the famous poly matrices which are sigma x

is 0 1 1 0 sigma y is 0 i i 0 and sigma z is 1 0 01.

And you can find the definition of these polymatrices in Griffith's intro to elementary particles equation 4.26.

Although honestly if you just Google polymatrices you'll find them all over the place. They're super famous. And

the place. They're super famous. And

these polymatrices are generators of sud 2, the le algebra of su2 which is the group that has to do with transformations of two component spinners. It's the special unitary group

spinners. It's the special unitary group of degree 2. Anyway, today we don't need to get into the group theory of su2, but I just bring up the poly matrices in a sort of vocabulary like context. Like

we're not actually going to have to explore their mathematical properties, but I just want to show you why it is that these matrices are associated with measuring the spin of a spin 1/2 particle.

You often see sigma with an arrow over it. And you can think of that as a

it. And you can think of that as a vector whose components are the three poly matrices. So you have sigma x,

poly matrices. So you have sigma x, sigma y, sigma z all packaged into this vector-like quantity. And with that

vector-like quantity. And with that sigma vector, we can go ahead and define the spin operator along the unit vector

A as S hat. The spin operator equals H bar / 2 sigma. A.

And what we mean by sigma A is we're going to multiply all of the components of our measurement direction A with each of the corresponding poly matrices. So

we have a sub x sigma x plus a sub y sigma y plus a subz sigma z. So when you pick out a particular direction in three-dimensional space and you want to

measure the spin of a particle along that direction, the components of that direction unit vector are like weights of how much of each of the poly matrices

we're going to bake into our spin operator along that direction.

Now why do we care about a spin operator?

Well, as we talked about in the EPR paper, when you have an observable quantity like spin, the value of the quantity is going to be the igen value

corresponding to the igen states of the operator. So if we have a spin 1/2

operator. So if we have a spin 1/2 particle and its state is represented by the two component spinner s then the

spin operator acts on s as the equation shat operating on s is h bar / 2 * sigma a * s

and bear in mind sigma do a this is going to be a 2x2 matrix in fact if you want to think about it in terms of the lee algebra sue 2 that matrix is going

to live at the coordinate It's a sub x, a sub y, a subz within the lee algebra which is spanned by the poly matrices sigma x, sigma y, sigma z. If that makes sense, great. If it doesn't, don't worry

sense, great. If it doesn't, don't worry about it. That's a level of group theory

about it. That's a level of group theory that we don't have to get into today.

Instead, I want to give you a specific example of what it means for a particle to be an igen state of the spin operator.

So if a particle has definite spin, that is we've measured the spin and it's either spin up or spin down along some axis, then it is going to be an igen state of the spin operator along that

axis. That's what the measurement does.

axis. That's what the measurement does.

You measure the spin of a particle and you're projecting its wave function onto an igen state of the spin operator along that axis. And so therefore s is going

that axis. And so therefore s is going to be a solution to the equation of shat acting on s equals lambda s for some real value lambda which is going to be

the spin of the particle.

As a concrete example let's suppose we're measuring the spin of a particle along the zaxis.

Well in that case our direction vector becomes 0 0 1 cuz the vector doesn't point in x. It doesn't point in y it points entirely in z. And so therefore

if we evaluate this quantity of sigma a we find that we have no sigma x no sigma y and all sigma z. And so then our spin

operator along the z direction becomes h bar / 2 1 0 01.

And so now if we want to solve for what are the igen states of spin up and spin down along z all we have to do is solve this equation of h bar / 2 * this sigma

z matrix * s equals lambda * s for some real igen value lambda and this igen vector igen value equation has the

solutions of 1 0 or 0 1 for s and then you find igen values of plus h bar / 2 and minus h R /2 respectively. And you

can verify that for yourself if you plug into that igen vector igen value equation these different options for S and lambda.

Oh, and one other thing I'll say is that for these igen vectors, you can go ahead and slap a complex phase factor onto both components and they remain states.

And in a moment, I'll show you a picture which makes that point obvious. But for

now, I just leave that as a mathematical algebraic statement. All right. Right.

algebraic statement. All right. Right.

Now, instead of the spin operator S hat, we may as well just talk in terms of sigma. A, which is conceptually it's

sigma. A, which is conceptually it's exactly the same thing as Shat. The only

difference is it's not scaled by that factor of H bar / 2. And so therefore, this sigma operator has nice dimensionless values of plus or minus

one for spin up versus spin down. And so

therefore the sentence the particle was measured spin up along the axis can be said as measuring sigma. A yielded a value of + one. Or in other words if you

want to say the particle was measured spin down along the axis. We can say sigma. A yielded a value of negative 1.

sigma. A yielded a value of negative 1.

Or if you want to say the particle was measured spin up along the b axis you say sigma.b yielded a value of + one.

say sigma.b yielded a value of + one.

Right? So what we have here is a very concise and mathematical way of saying that a spin 1/2 particle was measured along some axis and the result of that

measurement is simply the igen value + one or minus1.

So in Bell's paper, he's going to use this a lot. And so that's why I wanted to show you where sigma.A comes from and what it means. And we don't really have

to get too deep today into the theory of SU2 and spinners and all that and poly matrices. So if you're not super

matrices. So if you're not super familiar with all of these algebraic details, that's actually totally fine.

For the purpose of understanding Belle's paper, you really just have to know from a vocabulary point of view that sigma. A

means measuring the particle spin along the AIS and that the results are going to be + one or minus one depending on whether it turns out to be spin up or spin down respectively.

Before we move on, I do want to give you just a couple more examples of this concept just to make the idea a little bit more intuitive, a little bit more familiar. So suppose we had measured

familiar. So suppose we had measured instead of along Z along the X direction. Well then we find that the

direction. Well then we find that the spin operator along X is going to be H bar over 2 sigma X. And when you think about what are the solutions to h bar 2

sigma x acting on s= lambda s you find the igen states of 1 / <unk>2 * 1 plus or - 1 corresponding to igen values of

plus or - h bar / 2. That is to say we find the same exact kind of situation as before when we measured along z as far as the igen values. You have two options

spin up or spin down. The magnitude of the observable is h bar over two. But

now you have this spinner that's in a different state. It's pointing in a

different state. It's pointing in a different direction. And by the way, the

different direction. And by the way, the one over <unk>2, that's just a normalization constant. And likewise, we

normalization constant. And likewise, we can repeat exactly the same procedure.

We can measure along y. We find that the spin operator along the y direction is h bar over 2 sigma y. You solve that vector value equation. you find the igen

states of 1 /<unk>2 1 plus orus i with the same old values of plus orus h bar / 2.

And I know all of this feels very abstract, but there is a visual story that goes with this algebra. And I've

touched on it in my previous videos about the mystery of spinners and electromagnetism as a gauge theory and also driving the dro equation where there's a way of drawing a two component

spinner as a flag in three dimensions.

So for example, let's take the igen state for a particle that's in a spin up state relative to the z-axis. that is

the spinner 1 Z. Well, if we plot that using this flag picture diagram and we'll go ahead and slap on a time evolution phase factor corresponding to the energy of the particle, we see that

we have a flag that points straight up along Z. And then the time evolution

along Z. And then the time evolution phase factor, that is the rotation in the complex plane, is going to twirl that flag around.

If you're curious as to the algebraic machinery that's happening behind the scenes, definitely check out the paper an introduction to spinners by Andrew Mstein. That paper explains in depth how

Mstein. That paper explains in depth how exactly the two component spinners map on to these flag diagrams. But now then if we plot the spin down along Z spinner

01 that is you see hey it's a flag that's pointing down along Z. So that

makes sense. And now notice the time evolution phase vector which rotates the flag in the complex plane has the effect of twirling the flag but in the opposite way as before. Although really it's the

same way. It's just that the flag is

same way. It's just that the flag is pointing in the opposite direction. The

way to see this is point your right thumb along the direction that the flag pole is pointing and then you find that the phase factor is going to twirl the

flag in the same way that your fingers go around on your right hand.

So we find in these spinners a picture of a thing of some kind of quantity that has an orientation and that kind of spins around under a complex phase time evolution. And so that gives you a feel

evolution. And so that gives you a feel for some of the algebraic machinery that's happening behind the scenes when we talk about spinners and polyatrices and all of that.

And so now I want you to imagine in your mind what would the igen state of spin up along the xaxis look like?

Well, there it is. Makes sense, right?

So, this is 1 / <unk>2 1 with the time evolution phase factor. We can go ahead and also add on the spin down along xigg state. And that's exactly as you would

state. And that's exactly as you would expect. Now, let's also add in the spin

expect. Now, let's also add in the spin up along yen state. And there it is pointing along y spinning around. And if

you add in the spin down along yen state, well then there it is.

So without going into too too much detail about the algebra of spinners and all that, I just wanted to show you that there is a picture corresponding to all of this algebra. And that's something

that I would definitely encourage you to read more about and to explore. But for

the purposes of Belell's paper, we actually don't need to get too into the details there. But I hope this has been

details there. But I hope this has been useful context.

All right. So before returning to the paper, I want to say a couple of words about the concept of the expectation value of these spin measurements cuz we're going to see that concept later on

in the paper. So remember earlier we were looking at the slide shown here and we thought about how if we rotate the second magnet by an angle theta for a particle beam, which we know is going to

be spin up if we measure it vertically, then the beam is going to split into two beams. And for a small angle theta, it's going to be mostly spin up. But there's

some probability of that also being spin down. And then as we talked about

down. And then as we talked about before, the probability of spin up is going to be cosine^ squar of that tilt angle theta / 2. And likewise, the probability of it being spin down is

going to be 1 minus that. So we're going to have sin^ square of theta / 2. And

that's all fine and good and that's totally true and that's one way to talk about it. But there's another way we can

about it. But there's another way we can talk about it in terms of expectation value which is in some ways more convenient.

So to be really technical about this, suppose we go ahead and call the second magnet's axis the vector A and then as we talked about we can use the notation

sigma A as a shorthand for the result of measuring the spin along the axis A.

Because as you know when you dot the sigma vector comprised of the poly matrices by some unit vector a you end up with something that's directly proportional to the spin operator but

which has igen values of + one if the particle is measured spin up and negative 1 if the particle is measured spin down. So then now we ask the

spin down. So then now we ask the question of what is the expectation value of sigma. A and all we mean by expectation value is the average over

many measurements holding the A vector constant. Let me give you an analogy.

constant. Let me give you an analogy.

Let's say you're a gambler and somehow you have the opportunity to play a game where you have a 60% chance of winning a dollar and a 40% chance of losing a dollar. Well, in that case, the

dollar. Well, in that case, the expectation value is going to be 20 cents because you have 0.6 6 * 1 which is 6 and then you add on to that the 0.4

* -1 which is 0.4 and so you have a net 0.2 expectation value of a profit and so you should play that game. Now the

reason I bring up this analogy is because of course if you play the game once you're not going to get 20. You're

either going to make a dollar or you're going to lose a dollar. So we should not expect one game to yield 20 cents.

However, if you play that game a 100 times you're going to have about 20 bucks. that's what you should expect to

bucks. that's what you should expect to have. And so that's exactly the sense in

have. And so that's exactly the sense in which we use the term expectation value when thinking about these spin measurements. In every case, when you

measurements. In every case, when you measure the spin, it's going to be a plus one or a minus one. But depending

on the tilt angle and depending on the probability that depends on the tilt angle, there's going to be some average number that we'll find for that tilt angle over many subsequent measurements

along that axis. And if you work out the math as we'll do in just a moment, you end up with the plot shown here where on the x-axis we have the tilt angle theta and then if you look at this curve for

the expectation value and by the way we use the bracket notation here to indicate expectation value. Well, as a sanity check, let's go ahead and look at a few points and see if this curve kind of makes sense.

So first of all when theta is zero and when a is aligned with the polarization of those incoming spin-up atoms then we find an expectation value of one and that makes sense because when the second

detector is not tilted then every single time a spin up coming in is going to be a spin up going out and so sigma. A is

going to yield an igen value of plus one all the time. So you do it 100 times you're going to get 100 plus ones. And

then conversely, if we flip a all the way upside down, then you have a spin up coming in relative to the upside down second detector. That's always going to

second detector. That's always going to come out as a spin down. And so in that extreme case, you always have a negative 1 for sigma. A, therefore, the expectation value is precisely -1. Now,

if you check out this point in the middle of the plot when theta is 90° and the measurement axis A is perfectly perpendicular to the incoming spin up polarization, well, in that case, sigma.

A is going to be a +1 or a minus1, you know, each with a 50% probability. And

so if you have a set of 100 numbers which are either +1 or minus1 with equal probability, well, you add those all up and on average you're going to get zero.

All right, then. So based on the three points we've looked at, the curve seems to make sense. But how do we calculate the exact form of this curve? Well, all

you have to do is think like a gambler and say the expectation value is going to be the probability of measuring spin up along the axis A times a plus one

corresponding to spin up plus the probability of measuring spin down along the axis A time the negative 1 that corresponds to spin down. This is just like in that game where you have 60%

chance of winning a dollar, 40% chance of losing a dollar. So the expectation value is $0.2. So it's the same reasoning as a gambling calculation. And

as we saw earlier, we already know the probability of measuring spin up versus spin down. In the first case, we have a

spin down. In the first case, we have a cosine^ 2 / 2 probability of measuring spin up. And then we have a sin^ 2 thet

spin up. And then we have a sin^ 2 thet / 2 probability of measuring spin down.

Now, if you are a trig identity enthusiast, you'll recognize this form as having a delightful simplification,

which is that cosine^ 2 / 2us theta / 2 equals cosine of theta. Isn't that

wonderful how that simplifies? So that's

a super nice result. And we're going to see the same result in Belle's paper in equation 3 in a slightly different context, but it's the same exact reasoning. So anyway, that's all I

reasoning. So anyway, that's all I wanted to say about the expectation value. So just think about this as a

value. So just think about this as a pretty common and useful way of putting a statistical handle on this kind of probabilistic situation.

All right, then. So now I think we've discussed all of the prerequisites that we need for the remainder of the paper.

So now let's go ahead and get into part two formulation.

So remember how in the EPR paper they gave a specific example of a two particle wave function with anti-correlated momenta and correlated positions.

And with that wave function, we saw how if we measure the momentum of one of the particles, we end up putting the other particle in a momentum state. And

conversely, if we choose to measure the position of the particle, then we put the other one into a position state. So

that specific wave function in the EPR paper was a very mathematically convenient example to illustrate the point. However, of course, the EPR

point. However, of course, the EPR paradox is more general than just a single specific two particle wave function. And if you see equations 7 and

function. And if you see equations 7 and 8 of the EPR paper, you can see that more generically, whenever you have two particles in an entangled state and you think about representing that wave

function as a sum over states of the first particle, then when you measure the first particle and put it into that igen state, that's going to have an impact on the state of the second

particle. And so really the EPR paradox

particle. And so really the EPR paradox is just the observation that because we have the freedom to choose which observable we measure of the first particle, we have the ability then to

affect the quantum state of the second particle in a way that somehow appears to violate the constraint of local causality.

So anyway, the reason I bring that up is because in Bell's paper, we're going to use a different two particle state to get at the same fundamental paradoxical nature of quantum physics. So instead of

the particles having anti-correlated momenta and correlated positions, we're going to imagine a pair of spin 1/2 particles whose spins are going to be in

an entangled state. And this

configuration for thinking about the EPR paradox is actually not original to Bell. It was first put forward by Bow

Bell. It was first put forward by Bow and Aharonov in 1957.

So part two of Bell's paper begins with the example advocated by Bow and Aharonov. The EPR argument is the

Aharonov. The EPR argument is the following. Consider a pair of spin 1/2

following. Consider a pair of spin 1/2 particles formed somehow in the singlet spin state. Now I want to pause here and

spin state. Now I want to pause here and say what exactly is the singlet spin state? Well that means that the spins of

state? Well that means that the spins of the two particles have no preferred direction a priori. If you think about either of the particles and you're going

to measure their spin, there's total rotational symmetry in that neither of the particles has a preferred spin axis.

It's totally uniformly distributed over all possibilities.

However, the spins of the particles exhibit perfectly anti-correlated outcomes when measured along the same axis. And this is a very bizarre state

axis. And this is a very bizarre state of affairs. Intuitively, you would think

of affairs. Intuitively, you would think that such a state is not possible. And

yet, the singlet state has been measured in all kinds of experiments. So, this

really is possible. This is something that is real. And as we'll talk about later in the paper, even though it's very hard to imagine and it seems kind of surreal, the experimental data very

strongly indicates that the singlet state is actually a legit thing that can exist. And you sometimes hear the

exist. And you sometimes hear the singlet state described as the particles having equal and opposite spin. But

that's not exactly true, or rather that's too narrow of a description.

It is true that if you measure the two particles along the same axis, you'll always find that their spins are equal and opposite. But, and this is really a

and opposite. But, and this is really a super important fact about the singlet spin state. So, I want to reemphasize

spin state. So, I want to reemphasize this. Before the measurement, neither of

this. Before the measurement, neither of the particles has a preferred spin direction. This is very hard to imagine

direction. This is very hard to imagine but that is a super important aspect of what it is for the particles to be in the singlet state.

All right. So that's the singlet state.

Now imagine that we have some process which produces pairs of spin 1/2 particles in the singlet state and then each particle goes its separate way and they're both moving freely in opposite

directions.

Now then suppose we send each particle into a detector say maybe a sternerlock magnet and then we measure the spin of both particles to get a sense of the

kind of thing that happens here. At

first we're going to say that the detectors are measuring along the same axis.

Let's go ahead and denote that with the unit vectors A and B respectively. And

for starters, those unit vectors are going to be precisely aligned so that we're measuring both particles along the same spin axis. And now because the particles are in the singlet spin state,

if we measure the spin of particle one along the direction A and we get the value of + one, right? So suppose

particle one measures spin up along a then according to quantum mechanics and what it means for the particles to be in the singlet state. For sure it's 100% guaranteed that measuring the spin of

particle 2 along the same axis is going to yield a value of -1 that is spin down and vice versa. Had we measured particle one in the spin down state then we would

know for sure that particle 2 would be spin up along the same axis.

By the way, just a comment on the notation here. So, as we talked about

notation here. So, as we talked about earlier, the expression sigma A is shorthand for measuring the spin of the particle along the axis A. And this

operator returns a plus one if it's spin up along A and a minus one if it's spin down along A. Now, then the subscripts here 1 and two, all that indicates is that in the first case we're measuring

particle one and in the second case we're measuring particle two. So it's

not like we have two different sigma vectors. No, it's the same poly

vectors. No, it's the same poly matrices. It's the same operator. It's

matrices. It's the same operator. It's

just that in the first case we apply it to the first particle. And in the second case, sigma 2, we apply that to the second particle.

So now we make the hypothesis of local causality. And it seems one at least

causality. And it seems one at least worth considering that if the two measurements are made at places remote from one another, the orientation of one magnet does not influence the result

obtained with the other. And just to really emphasize that point, imagine that detector A and detector B are separated so far and that the measurement of particle one and the

measurement of particle 2 happen so closely together in time that whatever tiny time difference there is between these two measurements, not even light

could travel between detectors A and B during that time. So, we imagine that the measurements going on at detector A and detector B are completely causally disconnected if local causality is to be

believed.

But here's where we run into the APR paradox. Since we can predict in advance

paradox. Since we can predict in advance the result of measuring any chosen component of the spin of particle 2 by previously measuring the same component

of the spin of particle 1, it follows that the result of any such measurement must actually be predetermined.

That is to say because the particles start off in the singlet state with no preferred spin direction. Then imagine

particle one is measured in detector A ever so slightly before particle 2 is measured in detector B. You know by 0001 ns or whatever. Well, as soon as we've

measured particle 1 along the axis A, now we can predict with certainty the component of the spin of particle 2 along the same axis. And yet that

certainty does not exist in quantum physics. Now we can tell a story about

physics. Now we can tell a story about non-local wave function collapse where you measure particle one along axis A and the wave function instantly collapses and then particle 2 is no

longer in the singlet state but now it's for sure going to be polarized in accordance with that measurement direction A. But assuming that we don't

direction A. But assuming that we don't allow for non-local wave function collapse because we want to preserve our sanity and we want to hold on to this concept of local causality, then we find

here an apparent contradiction because the spin of particle 2 along the axis A should definitely not be predictable with certainty given the wave function of the singlet state. A quantum physics

just doesn't allow for that level of predictability unless we allow for the possibility of instantaneous wave function collapse. So then since the

function collapse. So then since the initial quantum mechanical wave function that is the singlet state does not determine the result of an individual measurement this predetermination

implies the possibility of a more complete specification of the state. And

so that is apparently the EPR paradox this time thought of in terms of spins rather than momentum and position states. And so in other words, all of

states. And so in other words, all of this thought process leads us to think that surely there must be some kind of hidden variables that go along with particles one and two in a way that

quantum mechanics doesn't account for.

And if only we had some kind of more complete model where we could figure out what are those hidden variables and what are their dynamics and how do they influence the spin measurements. Then

surely we can find a more complete and more sane and more understandable explanation of what's going on here than what quantum mechanics currently has to

offer. Well, all right then. So we want

offer. Well, all right then. So we want a more complete theory involving some kind of hidden variables. So let this more complete specification be affected

by means of parameters lambda. These are

going to be our hidden variables. So in

this video whenever you see this yellow lambda that's going to stand for whatever hidden variables we want to put into our model that's going to give us a more complete description of what's happening. So you know earlier we were

happening. So you know earlier we were looking at the Sternerlock experiment and we were trying to explain it in terms of particles carrying with them this yellow vector. And so that was an example of lambda. But now we're going

to broaden that up a little bit. Or

actually we're going to broaden it up all the way and say lambda can be whatever you want it to be. Whatever you

can imagine. A vector, a scalar, a tensor, a function, a set, whatever you want it to be. It is a matter of indifference in the following. Whether

lambda denotes a single variable or a set or even a set of functions and whether the variables are discrete or continuous. The beautiful thing about

continuous. The beautiful thing about Belle's paper is it accounts for all possible hidden variable models in one fell swoop because it's such a generic

argument as we'll see. However, we write as if lambda were a single continuous parameter. So the notation that we'll be

parameter. So the notation that we'll be using, for example, we'll integrate over all possible lambda and it'll look like we're assuming that lambda is a continuous parameter. However, what

continuous parameter. However, what Belle is saying here is that if you want to modify the argument so that lambda is not a continuous parameter but is rather a discrete parameter or a set or

whatever contrived thing you want to come up with, you can trivially modify the argument to account for that.

Replace an integral with a sum or whatever you have to do. Those kinds of modifications won't have any effect on the logical structure of the argument put forward in this paper. So now let's

think about what's happening in these detectors. And at this moment we can go

detectors. And at this moment we can go ahead and say that the axis of measurement in detector B does not have to be the same as the axis of measurement in detector A. So we're

going to make this more generic. Oh, and

one thing that I'll point out is that in everything we're about to talk about, what matters as far as the orientations of the unit vectors of A and B is only the angle between those two vectors, the

extent to which they're aligned or misaligned.

And when you think about two vectors in three-dimensional space, the two vectors are going to span a plane, and then there's going to be some angle between them in that plane. And that angle

between them, that theta angle is the relevant quantity when we're thinking about how the orientations of these two measurement axes are going to matter.

And so if you want, you can imagine a fully generic three-dimensional situation where A and B can point whichever ways you want to imagine them pointing. But because it's only the

pointing. But because it's only the theta angle between them that matters in whatever plane they happen to span, we may as well imagine the A vector pointing straight up. And then we can

imagine the B vector having some random orientation in the plane. And so the diagram shown here on your two-dimensional screen with A pointing up and B pointing wherever, imagine

rotating the B- axis a full 360. Well,

for all intents and purposes, that 360 sweep is going to span all of the possibilities as far as the ways in which we can misorient our detectors relative to each other. And actually,

you only need 180 cuz once you tilt it past 180, theta starts to come back in.

See what I mean? And then technically by symmetry, all the interesting stuff happens between 0 and 90°.

Okay. So then what is actually going on in these detectors? Well, if we assume this hidden variable model, then the result A of measuring the spin of

particle 1 along the AIS is then determined by the AIS and the hidden variable lambda.

So, particle 1 is coming in, it's carrying with it some kind of hidden variable, maybe some vector, some scalar, some tensor, whatever it is, whatever hidden variable we want to

imagine. And as particle one goes into

imagine. And as particle one goes into detector A and detector A is oriented along the A axis, then the only things that are going to affect the spin

measurement at particle 1 are the orientation that A vector and the hidden variable lambda that goes with particle 1. Because particles one and two are in

1. Because particles one and two are in the singlet state, they don't have any a priori preferred directions. So the

result of the spin measurement is going to be deterministically well determined by however the hidden variable lambda interacts with the detector oriented

along A. And likewise then the result B

along A. And likewise then the result B of measuring the spin of particle 2 along the B ais in the same instance is

determined by the B ais and lambda for exactly the same reason. And so we can write that the measurement outcome at A as a function of the measurement direction A and the hidden variables

lambda can take on a value of + one or minus1 depending on whether particle 1 is measured spin up or spin down respectively. And likewise the

respectively. And likewise the measurement result at detector B which is a function of the B ais and the hidden variables lambda is also going to

take on a value of +1 or minus1 for spin up and spin down respectively.

And we're going to leave this fully generic as far as in what way or by what function do the hidden variables interact with the measurement axis.

Whatever it is you can imagine, whatever principle you want to go ahead and postulate, then it's still for sure the case whatever these functions actually are, by definition, they're going to have values of plus or minus one

depending on the outcome of the spin measurement. Now the vital assumption of

measurement. Now the vital assumption of local causality is that the result B for particle 2 does not depend on the

setting A of the magnet for particle 1.

Nor does A depend on B. So in equation one you see that A is a function of the A vector and the hidden variables lambda. b is a function of the B vector

lambda. b is a function of the B vector and the hidden variables lambda. But

notice that A is not a function of the B vector, nor is B a function of the A vector. The reason being detectors A and

vector. The reason being detectors A and B are separated out so far and these measurements happen so quickly. So

there's no way that the information about which way one detector is oriented can propagate over to the other detector and affect the measurement result in any way. No, these two things happen in

way. No, these two things happen in different light cones. And so by local causality, you can't have the measurement result of A depending on the B vector or vice versa.

And one of the things that we're going to show in this paper is that any hidden variable model is going to have to violate that assumption. And the only way to get it to work is if you're going

to relax that constraint and say, okay, the measurement outcome at A depends on the orientation at B and vice versa. And

then it's like, oh, that's weird. That's

non-local. That is absurd. But you know that's like super weird. And then so at that point there's no advantage of using a hidden variable model because whether you take ordinary quantum mechanics or

some speculative hidden variable model in both cases you're going to have a non-local model. And so no matter how

non-local model. And so no matter how you look at it it's a glitch in reality.

All right. Then suppose we define row of lambda as the probability distribution of the hidden variables lambda.

So in other words, imagine all possible configurations of our hidden variables lambda, whether they're vectors or scalers or tensors or functions or sets, whatever you want to imagine for lambda.

There's going to be some space of configurations, some space of possibilities that lambda can take on.

And you can assign a probability to each and every configuration. And so row of lambda is precisely the distribution which defines how likely our hidden

variables are to exist in whatever state we can imagine them existing in. So this

is quite a generic thing and as we go through the paper we'll imagine some specific cases with some simple functions for row of lambda. But notice

the power in keeping this generic. See

so far we haven't narrowed down what lambda can be. Our hidden variables can be whatever you can imagine. And then

row of lambda as a probability distribution on those hidden variables can also be whatever you want to imagine. Whatever distribution you want

imagine. Whatever distribution you want to take over whatever space of variables you want to define. And even though our setup is so generic, one of the things we can still say for sure is that the

expectation value of the product of the two components measuring particle one along the A axis and measuring particle

2 along the B ais is going to be P of A and B where here P is the expectation value of the products of A and B that is the plus or minus one that's recorded at

each detector. We can say that P of A

each detector. We can say that P of A and B is going to be the integral over all possible configurations of hidden variables. Each one weighted by row of

variables. Each one weighted by row of lambda that is how likely that configuration is to be. And then as we're integrating over that space of possible hidden variables for each

possibility, we simply multiply the outcome of the measurement at detector A, that is A of A and lambda times the measurement outcome at detector B, that

is B of B and lambda.

By the way, in Belle's paper, he writes this integral as integral row lambda D lambda A * B. I like to write it in the sandwich notation where you have the integral sign on the left and the differential element on the right and

then whatever you're integrating over in between. It doesn't matter either way.

between. It doesn't matter either way.

It's just a stylistic choice. So, well,

anyway, I want to reflect on exactly what this equation means, equation two, because it is of central importance to everything that follows. So, this

parameter P, we're going to go ahead and call that the correlation between our measurements.

And this correlation has a really intuitive meaning. So the first thing to

intuitive meaning. So the first thing to notice is that P the correlation has to be somewhere in between -1 and 1. When

it's negative 1, then the measurement outcomes at detector A are going to be perfectly anti-correlated with the measurement outcomes at detector B. So

for example, this would be when detector A and detector B are aligned along precisely the same axis. Because if we have a pair of particles in the singlet

state and we measure them both along the same axis then if one is spin up the other spin down and vice versa. So if a is + one then b is minus1 and vice

versa. And so when we're measuring the

versa. And so when we're measuring the singlet state along the same axis then the product of a and b is always going

to be -1 because 1 *1 is -1 and -1 * 1 is -1. And so in that configuration if

is -1. And so in that configuration if the product of a and b is always neg -1 then equation 2 is simply the negative integral over row of lambda d lambda.

Now this is a normalized probability distribution. So when you integrate over

distribution. So when you integrate over all possibilities and each one is weighted by the probability distribution the result of that integral is always going to equal one because there's a 100% chance that the hidden variables

are in some kind of configuration.

And so then we find that P of A and B when A and B are the same vector is equal to -1.

Conversely, if we flip B around so that now B is equal to A and our measurement axes are pointing in equal and opposite directions, then we find a correlation

of one. That is the product of A and B

of one. That is the product of A and B is always going to equal one. Because if

we measure the particle spin up in detector A, but then detector B is flipped upside down relative to detector A, then the other particle is also going to be measured spin up in detector B,

but along the upside down axis. So the

singlet correlation is still there. It's

just that when you flip the vector B upside down, that's kind of a redefinition of what spin up and spin down means in detector B. And so in that case if the product of a and b is always

equal to 1 because 1 * 1 is 1 and also - 1 * 1 is 1 then equation 2 simply reduces to the integral of row of lambda d lambda which because row is a

normalized probability distribution equals 1.

Now there's one more special case that we can imagine which is when a and b are perpendicular.

So suppose a is pointing straight up and b is pointing straight to the right.

Well, in that case, we should expect a correlation of zero. The reason being in the singlet state, say you measure spin up along A, well, if B is perpendicular to A, then it could go either way. You

could get a spin up or a spin down. And

so on average, the product of A and B is going to be a + one or a minus1 about 50/50. And so that'll average out to

50/50. And so that'll average out to zero. So if we have a value of P equals

zero. So if we have a value of P equals Z, there is no correlation between the two detectors.

Okay, so that's equation two. The

correlation between our measurement outcomes is found simply by integrating over the space of all possible hidden variables weighted by the probability of each configuration of the products of

the plus -1 outcome at A times the plus orus one outcome at B. Now that

correlation given by equation 2 based on a hidden variable model should equal the quantum mechanical expectation value which for the singlet state the

expectation value of that product is going to be a b or as we saw earlier negative cosine of theta where theta is the angle between the two measurement

axis vectors a and b. And the way to prove that equation three is true, that this is the quantum mechanical expectation value, and that this does match the experimental data is just to

imagine that particle 1 gets to detector A ever so slightly before particle 2 gets to detector B. So then particle 1 is measured along the AIS and the wave

function instantly collapses. And now

particle 2 is going to be polarized opposite to the AIS. And so then when you measure the spin of particle 2 along the direction B, you can think about it

sort of like the two-stage sternerlock experiment where we create a beam of purely polarized spin- up particles and we send that through a second detector

which is tilted by some angle theta. And

then as we know we have a cosine^ 2 probability of measuring spin up sin^ 2 thet2 probability of measuring spin down. And if you take the expectation

down. And if you take the expectation value, you think like a gambler and calculate the expectation value, you end up with an expectation value of cosine of theta for the measurement outcome at

the second detector if spin up is + one and spin down is ne1. And we saw that earlier. And then the minus sign here

earlier. And then the minus sign here simply comes from the fact that the two particles in the singlet state are anti-correlated.

So if particle one is spin up along the axis A, then particle 2 is actually going to be polarized spin down along A.

And so that's where the minus sign comes from. It's basically just a 180 flip of

from. It's basically just a 180 flip of the two-stage Stern Garlock experiment that we were looking at earlier.

Well, anyway, all that's to say, quantum mechanics tells us that the correlation of the measurement outcomes for unit vector A at detector A and unit vector B

at detector B for two particles in the singlet state should be negative cosine of theta where theta is the angle between the two vectors. And so the main

question of this paper is is it possible to have some hidden variable model based on some set of possible lambdas and some probability distribution which describes

the likelihood of each lambda. Based on

a model like that, can we get equation 2 to match the quantum mechanical and the experimental value of negative cosine of theta between the vectors a and b? If

so, then such a hidden variable model might be plausible, you know, because it would match the data. It would match quantum theory and yet it would be an alternate way of looking at things. So

that's cool. But what we're going to show in this paper, in particular, part four in the contradiction, is that no local hidden variable model can actually have an equation 2 correlation which

matches the quantum mechanical correlation and the experimental data.

And so therefore, we cannot have a local hidden variable explanation of what's going on here. And so therefore, we have to confront the fact that quantum mechanics genuinely is super weird and

non-local and a glitch in reality.

Oh, and then one little caveat on the way we've formulated things here. Some

might prefer a formulation in which the hidden variables fall into two sets with the measurement outcome at A dependent on one set of hidden variables and the measurement at B depending on another

set of hidden variables.

However, this possibility is contained in the above since lambda stands for any number of variables and the dependencies thereon of A and B are unrestricted. So

in other words, if you want to have a hidden variable model where particle one carries with it some kind of set of hidden variables and particle 2 carries with it a whole another set of hidden variables, go right ahead. That's fine.

We're not ruling out that possibility.

When we use this character lambda to stand for any imaginable hidden variables, you can go ahead and imagine that in whatever way you want, including the situation where you have two sets of hidden variables, one for each particle.

You know, go for it. That's totally

fine. were not restricting that possibility at all. And likewise, in a complete physical theory of the type envisaged by Einstein, the hidden variables would have dynamical

significance and laws of motion.

Our lambda can then be thought of as initial values of these variables at some suitable instant. So in other words, if you want to think about hidden variables as some kind of fields with

dynamical significance, that's cool, too. Everything we're about to argue

too. Everything we're about to argue doesn't rule out that possibility at all. And if you want, you can imagine

all. And if you want, you can imagine lambda representing a snapshot in time of those fields. And then you can imagine those fields evolving in accordance with some dynamical equations. But none of that time

equations. But none of that time evolution is going to break that thought experiment outside of the framework that we're setting up because our argument is fully generic. Anything you can imagine

fully generic. Anything you can imagine for lambda, lambda can be. You know, I just noticed this yellow lambda. It kind

of looks like a banana peel. You

wouldn't want that as a hidden variable.

Hey, that would affect the measurement of your spin state.

All right, moving on.

Part three of the paper begins. The

proof of the main result is quite simple. Well, according to Belle, at

simple. Well, according to Belle, at least. I don't know if I would say it's

least. I don't know if I would say it's quite simple, but uh anyway, before giving it in part four, however, a number of illustrations may serve to put it in perspective.

So part three is all about establishing some context for part four looking at some specific examples which we're then going to generalize in part four when we

give the formal argumentation that local hidden variable models don't work.

Now I'm going to go ahead and break up part three into three parts 3 A 3 B and 3 C because this part of the paper is kind of naturally broken up into those three parts anyway and I want to take

the time to zoom in on each part of this individually.

So the first part of part three is that for a single particle we can make up a hidden variable story of what's going on with the spin and it's okay it seems to work.

Firstly there is no difficulty in giving a hidden variable account of spin measurements on a single particle.

Suppose we have a spin half particle in a pure spin state with polarization denoted by a unit vector P. And all that means is imagine we send a beam of spin

1/2 particles through a sternerlock magnet and then filter it out like what we saw before where we allow only the spin up particles through. Well, then if

the axis of that sternerlock magnet is the vector P, then the outgoing beam of particles are polarized with reference to that vector P. That is to say, if you

were to do a subsequent spin measurement on that particle along the direction P, then for sure the result of that measurement is going to be spin up. So

that's what it means for the particle to be polarized along the direction P.

All right. Now then suppose we let our hidden variable be for example a unit vector lambda with uniform probability distribution over the hemisphere

lambda.p is greater than zero. That is

lambda.p is greater than zero. That is

to say a lambda is going to be some additional directional or orientational degree of freedom that travels along with the particle. And we don't know exactly what lambda is going to be. All

we know about it is that it's going to have a uniform probability distribution over the hemisphere which points in the same direction as P. And so this

constraint that the dotproduct of lambda and P is greater than zero, all that means is that lambda kind of points towards P and it doesn't kind of point away from P. Now, if you think back to

what we saw earlier in this video, where we sent our particle through a two-stage Sternerlock experiment, and we supposed that all the magnet does is filters out the particles that point a little up

versus a little down without actively flipping the arrow up and down. You'll

see that that thought experiment actually gives us a beam of this kind of particle where we start off with the assumption that the incoming particle, those evaporated silver atoms, have a

totally randomly oriented lambda vector, but then we send it through the first sternerlock magnet to get a beam that's purely polarized along the axis of that magnet. And then at that point, what we

magnet. And then at that point, what we know about the lambda vector is it's still going to be totally random, but only on the half of the sphere that kind

of points along the direction P because the particles for which lambda pointed away from P were sent into the spin down beam and those didn't go forward.

And so the question then comes up, what happens if we measure the spin of this kind of particle along some axis A?

Well, we already know what the expectation value is going to be. The

expectation value of the spin of this kind of particle from quantum mechanics and from experiment is going to be the cosine of the tilt angle of the second detector relative to the first. That is

in this language we would say it's going to be the coine of the angle theta between the polarization vector P and the measurement vector A.

So then suppose that as we're building our hidden variable model, we speculate that the result of measuring along some axis A is going to be the sign of the

hidden variable lambda vector dotted with the effective measurement axis A prime. See, we're going to have to do a

prime. See, we're going to have to do a sketchy move here of the kind we talked about earlier.

And so A prime is going to be a unit vector which depends on A and P in a way to be specified. We're going to talk about exactly what that has to be in a moment, but this is exactly the same

kind of sketchy move we looked at earlier when we were thinking about how can we modify our hidden variable model into something that matches the data.

And in fact, the example we looked at earlier in the video is mathematically equivalent to what we're talking about now. Oh, and then the sign function here

now. Oh, and then the sign function here simply takes on the values of + one or minus one according to the sign of its argument. So the sign of the dotproduct

argument. So the sign of the dotproduct of the lambda vector and the effective measurement axis a prime is going to be positive if lambda kind of points along a prime and it's going to be negative if

lambda kind of points away from a prime.

And so all this is to say the measurement result is going to be spin up if lambda is in the hemisphere whose pole is a prime and otherwise it'll be

spin down if lambda is outside of that hemisphere.

And then you can say what if lambda is right on the equator relative to the north pole of a prime. Well, the

probability of lambda being perfectly on the equator is zero. And so we don't have to worry about it. As Bell says in his paper, actually this leaves the

result undetermined when lambda a prime equals zero. But as the probability of

equals zero. But as the probability of this is zero, we will not make special prescriptions for it. So we don't have to worry about that. Now then if you average over all possible hidden

variable vectors lambda in accordance with the setup we've described here the expectation value of the spin measurement is going to be 1 - 2 theta

prime over pi. Call that equation 5 where theta prime is the angle between the effective measurement axis a prime and the polarization vector p. That's

the same theta prime from our sketchy move we talked about earlier. And so

let's go ahead and see where equation 5 comes from. Why does this model give us

comes from. Why does this model give us an expectation value of 1 - 2 theta prime over pi?

Well, the reason being is that the expectation value of the spin measurement along the measurement axis A in accordance with the equation for the rule that we've stipulated here is going

to be the probability that the lambda vector is in the hemisphere defined with A prime at the pole times a + one for the spin up result plus the probability

of lambda not being in a prime's hemisphere times the negative 1 value which goes along with the spin down measurement.

So we're thinking like a gambler here and we're calculating that expectation value. And then when we think about

value. And then when we think about this, what we realize is that the expectation value of the spin measurement is going to be one, its maximum value, when the theta prime

angle is zero. That is when our polarization vector is exactly aligned with the effective measurement axis a prime, then we're always going to get spin up. like for sure 100% guarantee

spin up. like for sure 100% guarantee because when you think about the hemisphere of possible lambda vectors, well, those are going to be in the same hemisphere as the polarization vector.

So if the polarization vector and the a prime vector point in exactly the same direction, then lambda is guaranteed to be in a prime's hemisphere. So you're

always going to get a plus one in that case. And conversely, the expectation

case. And conversely, the expectation value of the spin measurement if the polarization vector P is completely antiparallel to the effective measurement axis A prime that is if

theta prime is pi or 180° then we're always going to get a negative one a spin down measurement in that case. If

the polarization vector is pointing completely away from a prime then the space of possible lambda vectors is precisely the opposite of a prime's hemisphere. And so you're always going

hemisphere. And so you're always going to get a spin down measurement in that case. And then if you think about

case. And then if you think about rotating the polarization vector P relative to A prime and think about the overlap in the hemispheres of P and A

prime, you see that the overlap varies linearly with the angle theta prime.

This goes back to what we were talking about earlier. When you imagine the

about earlier. When you imagine the board game with the spinny thing and you spin the needle and the probability of it landing somewhere simply has to do with the area of the wedge that it's

going to land on. Well, as you rotate theta prime, you see that our expectation value is going to vary linearly with the angle theta prime for precisely the same reason. And you can

think about that as a two-dimensional circle and a board game spinner thing.

Or you can think about it in the full three dimensions as if it's like an orange and you have the volume of the orange slice going along with the wedge angle. But in any case, this model is

angle. But in any case, this model is going to give us an expectation value of the spin measurement which is linearly dependent on theta prime. And so if you consider the two boundary conditions

we've looked at for theta prime= 0 and theta prime= pi and then apply the fact that this is a linear function and then just think in terms of y = mx + b. You

see that our equation for the expectation value of the spin measurement is necessarily 1 - 2 theta prime over pi. And as we know this

linear function is not what quantum mechanics predicts and is not a match of the experimental data because in both cases that's going to be the coine of the angle, not a linear function of the

angle. But here's where the sketchy move

angle. But here's where the sketchy move comes in. Right? Here's why we have a

comes in. Right? Here's why we have a prime instead of just a. Suppose then

that a prime is obtained from a by rotation towards the polarization vector p until 1 - 2 thet prime / pi equals cosine of theta. Call that equation 6

where theta is the angle between the measurement axis a and the polarization vector p. So that's that sketchy move

vector p. So that's that sketchy move that we use in order to warp the linear function into a cosine function. Well

then if we do that if we apply equation six then we have the desired result that the expectation value of the spin measurement is cosine of theta which is in alignment with quantum physics and

it's in alignment with the experimental data. And so technically we haven't done

data. And so technically we haven't done anything illegal here. We haven't broken any rules and this model therefore cannot be completely dismissed though it

is contrived and it is implausible and it's like we don't want to have to believe this because if we have a detector which is oriented along the vector A and we have to stipulate that

no actually what's happening there is the effective measurement axis is bent a little bit in towards the polarization vector. It's like uh well you can say

vector. It's like uh well you can say that but why would that be the case?

This is not a very convincing model but we will not dismiss it on the basis that it's not convincing. Instead we're going to go ahead and say look it's possible.

We're not going to rule it out just yet.

And so by lowering the epistemic standards for the hidden variable model, then that's going to hold us to a higher standard when later on we rule out all possible local hidden variable models.

Because then we'll be able to say, look, we went along with the sketchy move. We

allowed it. But even allowing that, our proof later on is going to be so strong that we're going to actually show that despite our generosity here, despite being maximally charitable to the local

hidden variable model perspective, later on we're going to show that it just doesn't work. All right. So in this

doesn't work. All right. So in this simple case there is no difficulty in the view that the result of every measurement is determined by the value of an extra variable lambda and that the

statistical features of quantum mechanics arise because the value of this variable is unknown in individual instances.

That is in this particular case we can come up with a story involving local hidden variables and it kind of appears to work even though it is a little bit sketchy.

Okay, so part three of the paper then goes on to show that hidden variables also seem to work for special cases in which the two detectors have special

orientations for their measurement axis.

Secondly, there is no difficulty in reproducing in the form of equation two that is the correlation function based on local hidden variables the only

features of the quantum mechanical and experimental correlation function three commonly used in verbal discussions of this problem. That is when our two

this problem. That is when our two measurement directions are the same in which case we have P of A and A cuz A and B are the same when they're aligned the same way. And that'll give us the

negative of the correlation that we would find when B is equal to negative A and that's equal to1.

So when the unit vectors A and B are aligned the same way, we get a perfect anti-correlation of -1. And when A and B are oppositely aligned, then we get a

perfect correlation of 1. And the other special case is when the dotproduct of A and B equals zero. That is when A and B are perfectly perpendicular to each other. in which case we have no

other. in which case we have no correlation.

So aligned the same way we have negative 1. Aligned opposite ways P is 1.

1. Aligned opposite ways P is 1.

Perpendicular P is zero. And these three special cases can be explained by a local hidden variable model. For

example, let lambda now be the unit vector lambda with uniform probability distribution over all directions and take the rules that the measurement

outcome a as a function of the unit vector a and this hidden variable lambda vector is going to be the sign of a dol lambda. And conversely, the measurement

lambda. And conversely, the measurement outcome at b as a function of the unit vector b and the hidden variable vector lambda is going to be the negative sign

of b do lambda. By the way, in Belle's paper, there's a typo here. In the

paper, it's written as B is a function of A and B, but that should be B as a function of B and lambda. All right. So,

what are we doing here? Well, what we're saying is that we have the two particles in the singlet state, and we're going to stick a unit vector onto this pair of particles. So you can imagine particles

particles. So you can imagine particles one and particles 2 both carrying along this orientational piece of information.

This unit vector lambda which is chosen totally randomly out of all possible directions. And then when particle 1

directions. And then when particle 1 gets to detector A, if lambda is pointing kind of along the direction of A, that is if the dotproduct of A and

lambda is positive, then you measure a spin up of particle 1 in detector A. And

likewise, as particle 2 is measured in detector B, if the lambda vector is pointing in the same kind of direction as B, then you measure a spin down at B.

So what this model is is kind of uh what we might instinctively expect is happening with a pair of particles who have an entangled spin because you might expect that there is some kind of

orientational quantity that each particle intrinsically has, but that quantum mechanics doesn't account for.

and that this hidden variable which carries with it a kind of orientation is what predetermines how particles 1 and two are going to be measured at A and B respectively.

And so the claim is that this rule given by equation 9 works in the special cases that the vectors A and B are perfectly parallel, perfectly antiparallel or

perfectly perpendicular. And you can

perfectly perpendicular. And you can show that that's the case. So in the first case, imagine A and B being perfectly parallel. Well then in

perfectly parallel. Well then in equation 9 you see that the rules for the measurement outcomes at a and b are going to be equal and opposite in that case because for a we have the sign of a

dot lambda but if a and b are the same vector then for b the rule is that it's the negative sign of b do lambda which is equal to a dot lambda. So you have the negative of the outcome of particle

a. Therefore we find perfect

a. Therefore we find perfect anti-correlation in the case that the unit vector a equals the unit vector b.

Likewise, then if you reverse that logic and you look at rule 9 in the case that A and B are antiparallel, so B equals negative A, then the measurement outcome

at detector A is s of A dot lambda. And

the measurement outcome at detector B is negative sign of B do lambda. B dot

lambda in this case would equal A dot lambda. And you can carry that negative

lambda. And you can carry that negative sign outside of the sign function. So

that then the two negatives cancel out and we find for the measurement outcome at B sine of A do lambda which is precisely the same as the measurement outcome at A. So in the case that the

measurement directions A and B are perfectly antiparallel we find a perfect correlation of one for the measurement outcomes with this local hidden variable model. And so in that case this model

model. And so in that case this model works just fine. And then finally, for the case that A and B are perpendicular, whatever the measurement outcome is at A, you're going to have a 50/50 chance

of it being the same or the opposite at B. And so in that case too, this model

B. And so in that case too, this model works just fine.

But again, this model has a flaw, which is that just like what we saw before in part 3A, the dependence of the measurement correlation on the angle theta between the vectors A and B is

linear in theta. It's not the negative cosine of theta that we expect from quantum physics and that is shown in experiments.

And to see that let's draw a picture where we imagine all possibilities for lambda selected uniformly across all possible directions and then we draw the

measurement direction a and you consider the hemisphere of all possible vectors that sort of point in the same direction as a that is all vectors for which a dot that vector is positive. Well, then the

measurement result at detector A is going to be spin up if lambda is in the same hemisphere as A or it'll be spin down if lambda is in the opposite hemisphere. So, we have a 50/50 chance

hemisphere. So, we have a 50/50 chance of measuring spin up or spin down, which is an agreement with experiment. But

then things get a little tricky when you also draw the measurement direction B in detector B and then you apply the same reasoning about what the measurement

result is going to be in detector B. In

this case, the result is going to be spin down if lambda is in the same hemisphere as B. Spin down because we're in the singlet state where the spins are anti-correlated and that's encoded in

the minus sign in the second part of equation 9. And then conversely,

equation 9. And then conversely, detector B will measure spin up if lambda is not in the same hemisphere as the measurement direction B.

And then if we want to go ahead and imagine this as an animation where we're sweeping the theta angle and considering simultaneously all possibilities for the hidden variable lambda that are

uniformly distributed over the sphere which you may as well imagine as a circle or a sphere because in either case the area or the volume respectively changes the same way as a function of

the theta angle. Well, then just think about what is the probability of having the same outcome at both detectors versus the probability of having opposite outcomes. And what you realize

opposite outcomes. And what you realize is that you're going to have the same outcome at both detectors when lambda is in the hemisphere of one of the measurement directions, but not in the

hemisphere of the other measurement direction. So in this animation, if you

direction. So in this animation, if you look at the two sectors with the blue arc, for both of those sectors, you're going to have the same measurement outcome for both A and B. And so the

product of the outcomes at A and B is going to equal one if lambda lies in one of the two blue sectors shown here. And

then on the other hand, if lambda is in the hemispheres of both measurement directions or neither measurement directions, then in that case you're

going to have opposite outcomes at the two detectors. And so then the product

two detectors. And so then the product of the outcomes A and B is going to be -1.

And so to find the correlation, all we have to do is compare the area of the blue sectors to the area of the red sectors. And so all the formula is is 1

sectors. And so all the formula is is 1 * the fraction of the circle taken up by the blue sectors minus 1 * the fraction of the circle taken up by the red

sectors.

And then as we sweep theta around, we can see the linear dependence of the correlation on the theta angle. And this

linear dependence of the correlation on theta, which now we've seen a few times in a few different contexts, is really at the heart of Bell's argument, as we're going to see in part four.

And so in part 3B of this paper, Bell shows us that the local hidden variable model does work for the three special cases where A and B are either parallel,

antiparallel, or perpendicular.

And when you look at the plot of the correlation that we get from our local hidden variable model that is this blue line and you compare it to the quantum mechanical correlation that we would

expect namely negative cosine of theta you see that even though these two curves are different they do in fact intersect at precisely these three special cases. And so part 3B of Belle's

special cases. And so part 3B of Belle's paper is all about saying like, yeah, the local hidden variable model does seem to work for those three special cases. But nonetheless, the local hidden

cases. But nonetheless, the local hidden variable model breaks down for anything other than those three special cases because a line is not a cosine. And

there's actually a couple of ways in which a line is not a cosine. The most

obvious one is that there's just a mismatch in these two curves for most values. So pick a theta value at random

values. So pick a theta value at random and negative cosine of theta is just not the same value as what our linear correlation gives us. So it doesn't match. But the other noticeable thing

match. But the other noticeable thing that differs between this linear correlation that we get from our local hidden variable model and the quantum mechanical correlation is that the

linear correlation has a nonzero slope at a theta angle of 0. Whereas the

quantum mechanical correlation has a flat slope of zero at theta= 0.

And this is kind of a subtle difference between these two correlation functions, but nonetheless, it is a difference and it's a difference that's totally generic to all local hidden variable models. So,

one of the things that we're going to prove in this paper in part 4 a is that any local hidden variable model is going to have a nonzero slope at a theta angle of zero.

So this animation gives us a great intuition for how the local hidden variable model gives us a correlation which depends linearly on the angle

theta between the vectors a and b. And

therefore bell goes on to say this gives a correlation as a function of a and b of1 + 2 thet pi. Call that equation 10

where theta is the angle between the vectors a and b and 10 has the properties of equation 8. that is it works for the three special cases. And

of course, the precise form of equation 10, this 2 pi, that's just y= mx plus b.

That's just what it has to be to be a line that goes through the boundary conditions given by equation 8. But

noticeably, the blue curve and the purple curve are not the same in general. Not only do their values not

general. Not only do their values not match in general but also at theta equals z the blue line has a non-zero slope whereas the purple quantum curve

has a slope of zero. Now here Belle abruptly brings up a very important point although it is kind of jarring the way in which he brings it up so abruptly

but in any case following the paper um for comparison consider the result of a modified theory in which the pure singlet state is replaced in the course

of time by an isotropic mixture of product states. This gives the

product states. This gives the correlation function a b / 3. Call that

equation 11. Now, what does that mean? I

mean, that sentence just comes out of nowhere, right? And there is a lot that

nowhere, right? And there is a lot that Belle is communicating in this one sentence. So, I want to take a moment to

sentence. So, I want to take a moment to unpack exactly what he means because this is actually a really profound point. So when we have our purple curve

point. So when we have our purple curve of negative cosine theta for the correlation between the measurement outcome at detector A and detector B.

This is based on the two particles being in the singlet spin state where before the measurement neither particle has a preferred spin direction. But the spin measurement outcomes for the two

particles are guaranteed to be anti-correlated along the same measurement axis. whatever that

measurement axis. whatever that measurement axis may be. On the other hand, if instead of the singlet state, we imagine that the two particles already have some preferred spin

direction before they're measured, but still their spins are equal and opposite relative to that particular spin direction, then we would expect anti-correlated spin measurements if the

particles are measured along that particular spin direction. But if the particles are measured perpendicularly to that spin direction, then in that case we would expect no correlation between the spin outcomes of those two

particles.

And so what Belle means by isotropic mixture of product states is that imagine when we're producing these particles instead of being in the singlet state with pure rotational

symmetry and no preferred spin axis a priori instead of that the particle pairs do have an intrinsic preferred spin direction relative to which they're equal and opposite and then by isotropic

all that means is that that direction call it n hat is selected uniformly from the sphere. So the particles preferred

the sphere. So the particles preferred direction is going to be totally random.

And so now if you imagine measuring over many such pairs of particles and for the sake of argument suppose we imagine them along the same measurement axis A. Well

sometimes that spin axis n is going to be aligned but usually it's not going to be very aligned in which case we won't really see much of a correlation. And

when you work out the math of on average, what correlation strength would we expect, you find a correlation strength which is the same as for the singlet state, but divided by a factor

of three, which represents the fact that when you average over all three dimensions of space, more often than not, our measurement directions are not going to be aligned with the spin

direction n.

And so we actually see a very strong theoretical and experimental difference between the singlet state and a situation where the particles have equal and opposite spin along some random

axis. The correlation we get from the

axis. The correlation we get from the singlet state is weirdly strong in a surreal kind of way. And this reflects the fact that in the singlet state neither particle has a preferred

direction before it's measured. And so

if you think in terms of one of the particles being measured ever so slightly before the other, then you're guaranteed to collapse the wave function along that measurement direction. And so

in the singlet state, your measurement axes are always going to be more aligned. Whereas for an isotropic

aligned. Whereas for an isotropic mixture of product states, in general, you're not going to have this kind of alignment.

All right. So Belle then goes on to say it is probably less easy experimentally to distinguish equation 10 from equation 3 than equation 11 from equation 3. So

equation 10 is the linear correlation that we get from our local hidden variable model. And equation 11 is the

variable model. And equation 11 is the A.B3

A.B3 that is negative cosine theta over 3 correlation that we get from a quantum mechanical model in which the two particles are not in the singlet state

but rather are in a product state with some preferred direction. And what Bell is saying here is that there's really a big contrast in the experimental data

between a singlet state and an isotropic mixture of product states. whereas the

linear correlation from a local hidden variable model is going to be a better approximation to the actual quantum mechanical singlet correlation. So

that's just a point about experimental practicality.

Now before moving on from part 3B, Bell makes one final comment which is that unlike equation 3, the quantum mechanical correlation negative cosine of theta, the function of equation 10,

this linear correlation we get from the local hidden variable model is not stationary. That is the slope is non

stationary. That is the slope is non zero at the minimum value -1 where theta equals 0. So we talked about that

equals 0. So we talked about that earlier when thinking about the differences between the blue line and the magenta curve that is between the local hidden variable model and the

quantum mechanical correlation.

One of the differences is that the values in general are not the same value. But another difference is that

value. But another difference is that the quantum mechanical correlation has a slope of zero at its minimum value whereas the local hidden variable line

does not. It'll be seen in part 4 a that

does not. It'll be seen in part 4 a that this is characteristic of functions of type two that is where the correlation is given by a local hidden variable

model. So in part 4 a we're going to

model. So in part 4 a we're going to prove that any local hidden variable model is going to have a nonzero slope in its correlation function at the minimum value which is incompatible with

quantum mechanics and with the experimental data. And then in part 4B,

experimental data. And then in part 4B, we're going to prove that in general, the two correlation curves for a local hidden variable model and for quantum mechanics in general cannot take on the

same values everywhere.

So in part four, we're going to prove in two different ways that local hidden variable models are not compatible with quantum mechanics and not compatible with the experimental data.

Okay, so then Bell wraps up part three by talking about how a hidden variable model could work if we allow for non-locality.

Thirdly and finally, there is no difficulty in reproducing the quantum mechanical correlation of equation three if the results of the spin measurements

at A and B in equation two, the correlation function of the local hidden variable model are allowed to depend on the measurement directions B and A

respectively as well as on A and B. And

Belle shows this by saying if we do a non-local sketchy move, we can warp the blue line into the magenta curve. So the

reasoning here is exactly the same as what we've seen before when we thought about doing a sketchy move to warp the line into the curve. But the key difference now is that when you have two entangled particles that are separated

in space, you can't do this sketchy move unless you know the angle between the measurement directions A and B, which are in different light cones. And so

this is a non-local sketchy move because somehow what's happening at detector A depends on the measurement axis at detector B and vice versa. So as a

concrete example of this, we can replace the vector A in equation 9 by an effective measurement axis A prime obtained from A by rotation towards the

measurement vector B until 1 - 2 theta prime over pi equals cosine of theta where theta prime is the angle between the effective measurement axis A prime

and B. So if you make that sketchy move

and B. So if you make that sketchy move then the blue line is going to warp into the magenta quantum curve and then in that case we would have a match between our hidden variable model and quantum

mechanics and the experimental data. And

so this is exactly the same reasoning as the sketchy moves that we looked at before. In fact, it's exactly the same

before. In fact, it's exactly the same mathematical maneuver. However, for

mathematical maneuver. However, for given values of the hidden variables, the results of measurements with one magnet now depend on the setting of the distant magnet, which is just what we

would wish to avoid, that is non-locality.

And there's really no way around that.

If you look at the example shown here where we replaced a with a prime and you think maybe there's some way to do the sketchy move differently in a way that

doesn't violate locality, well, try to do that and you find it doesn't work. So

for example, what if instead of rotating A into A prime, we leave A alone and rotate B into B prime in a way that gives us the same result. Well, that

would require for B prime to be a vector that's slightly rotated towards A. And

again, it's the same thing. And in fact, by symmetry, that reasoning is the same as before, where now we're just saying that what's happening at detector B is somehow bent towards the measurement direction A. And so really, it's the

direction A. And so really, it's the same kind of nonsense.

And then also philosophically we might expect there to be some symmetry here.

So if we wanted an idea like this to work maybe we should actually bend A to A prime and B to B prime where A prime is bent towards B and B prime is bent towards A in an equal and opposite kind

of way. But in that case then both

of way. But in that case then both detectors know something about how the other detector is configured. And so

fundamentally it's exactly the same problem no matter how you look at it.

So reflecting on part three, we've seen some specific examples of how hidden variable models don't really work. They

just don't match the experimental data, whereas quantum mechanics does. And so

what follows in part four is going to be very abstract, very mathematical, very algebraic, and we're going to take our time with it because it's a whole lot of equations and symbols and all that. But

if you followed along part three, then you already have the fundamental insight required to make sense of part four. All

we're doing in part 4 is generalizing on this specific example to show first that every local hidden variable model is going to have a correlation function

with nonzero slope at its minimum value, which is in contradiction with quantum mechanics and the experimental data.

And then second, in part 4B, we're going to show that in general, the correlation function given by a local hidden variable model cannot take on the same value as the correlation given by

quantum mechanics and experiment at every theta point. That is for every possible configuration of the measurement axes A and B. And so it's the same kind of reasoning that we've

seen in part three, but just in a much more abstract and generic kind of way.

And the abstraction is worth it. Even

though it is somewhat impenetrable and it takes a lot of time to digest, it's going to be a very powerful result. And

so, as usual, ask not for easier equations, but for stronger coffee. You

got to prepare yourself for this because it's going to be a bit of work, but it is well worth the effort.

All right, my friends. We're now ready to approach the core argument of Bell's paper, part four, contradiction.

Okay, so in the first part of part four, we're going to show that the correlation function that we get from a local hidden variable model cannot be stationary at

its minimum value when theta equals 0 unlike the quantum correlation which is stationary that is does have zero slope at its minimum value for theta equals 0.

And so this is going to be a generic difference between the kinds of correlations that local hidden variable models can give us and the correlation that we expect from quantum mechanics which is also the correlation measured

in experiments.

All right, the main result will now be proved because row is a normalized probability distribution. The integral

probability distribution. The integral over row d lambda equals 1. And we saw that before. That just means if you

that before. That just means if you consider every possible configuration of hidden variables and add them all up, each one weighted by its probability, then the result is going to be one. In

other words, the hidden variables have to be in some kind of configuration.

And next, because of the properties of equation one, where we saw that the measurement outcomes at detectors A and B can only take on the values of + one or minus one depending on whether that

detector measured spin up or spin down respectively. Then if we consider the

respectively. Then if we consider the definition of our local hidden variable correlation function in equation two where we found that P is going to be the integral over all possible

configurations of the hidden variables of the measurement result at A times the measurement result at B and this correlation is going to be a function of the measurement axes A and B. Then as

you can see this correlation P cannot be less than -1.

That is the lowest value our correlation can be is a perfectly anti-correlated value of negative 1.

And when can it take on that value?

Well, as we've seen, the correlation function can only reach -1 at a equals b. That is when the two measurements are

b. That is when the two measurements are aligned along the same axis. Then for

the singlet state, you're going to have perfectly anti-correlated results.

Measure spin up at detector A along the axis A. And for sure you know you're

axis A. And for sure you know you're going to measure spin down at detector B for an axis B which is equal to A. So

we've seen that before. That's nothing

new. And now Belle makes a technically nuanced comment which is that this is only the case if A as a function of A and lambda is equal to B as a function

of A and lambda except at a set of points lambda of zero probability.

Now this is a technical caveat that is designed to keep this argument fully generic. We know from experiments that

generic. We know from experiments that for the singlet state, it is going to be true that the measurement result at A for measurement axis A is indeed going to be equal to the negative of the

measurement result at B for measurement along the same axis A. But because we're trying to rule out the possibility of all imaginable hidden variable models, you could in theory imagine a model

where these functions A as a function of A and lambda is not necessarily equal to B as a function of A and lambda. But you

could have some superfluous configurations of hidden variables. And

that's technically fine as long as those configurations of hidden variables have zero probability. So this is a really

zero probability. So this is a really minor point and honestly it probably kind of goes without saying because we know from the experimental data that for sure the result at detector A is going to be the negative of the result at

detector B when you're measuring along the same axis. So you can think of that as an experimental boundary condition.

And if any local hidden variable model disagrees with that, that is if you have a local hidden variable model that goes against equation 13, well, that can only match the experiment if the lambda which

violate equation 13 have zero probability of occurring. Anyway, I

think the paper probably could have gone without that little comment about a set of points lambda of zero probability, but it's in there just for the reader who's going to be very pedantic about that. So, all right then. If we assume

that. So, all right then. If we assume equation 13, which is really less of an assumption and more of an experimental boundary condition, then equation 2, the

correlation for a local hidden variable model, can be written as P as a function of A and B is equal to the negative of the integral over all possible

configurations of hidden variables of the result at detector A as a function of A and lambda times the hypothetical result at detector A as a function

and lambda. Now let's linger on that for

and lambda. Now let's linger on that for a second. What is this term as a

a second. What is this term as a function of lambda? Well, what that is is imagine a generic case where we have our detectors A and B and A is aligned

with some axis A and the alignment of detector B is some axis B. Well, we know that our correlation is going to depend on the product of the measurement

results at detectors A and B. And all

equation 14 is is that the result at detector B can be thought of as the negative of the result that detector A would measure if A were aligned along

the B axis. And so you see the only difference between equation 14 and equation two is that the measurement result at B aligned along the axis B as a function of our hidden variables

lambda has been replaced with what would have been the results of the measurement at A if A were aligned along the same axis B and we had the same hidden

variables lambda. So this is just a way

variables lambda. So this is just a way of writing our correlation in terms of measurement results at detector A.

All right. And next what we're going to do is we're going to let C be another unit vector which is an alternative option for B. So imagine C as the

alignment in detector B. In fact, at first imagine that C is the same thing as B and then give it just a little nudge so that C is just a little different than B. And then the question

we can ask is if you imagine two hypothetical scenarios, one where you had the measurement axes A and B and another where you had the measurement axes A and C where C is just a little

nudge away from B. Then how do we calculate the difference in the correlations P of A and B and P of A and C? In other words, what kind of

C? In other words, what kind of difference in the correlation do we get when we apply a small little nudge on the axis of detector B?

Well, all we have to do is replace P with the integral formula given by equation 14. And we can go ahead and

equation 14. And we can go ahead and smush these together into one integral.

And we see that we have the negative integral over all possibilities for the hidden variables of a as a function of a and lambda * a as a function of b and

lambda minus a as a function of a and lambda time a as a function of c and lambda which is how the correlation function would change if we slightly

changed the measurement axis at detector B from the vector B to the very similar vector C. So now Belle goes on to

vector C. So now Belle goes on to algebraically massage this integral expression into a different form shown here.

And to see what he's done here, let's go ahead and color code this like so. So

first of all, you see that both parts of the integrant have in common this factor of a as a function of a and lambda. So

we can go ahead and factor that out and pull that to the left. And the next thing you want to look at is in the top expression there, we have that factor of

a of b and lambda. And we also have a minus sign. So now what we're going to

minus sign. So now what we're going to do to bring that into the bottom expression is we're going to factor out that term a of b and lambda. So we're

going to bring that to the left. And

then what remains is just the number one. But then we're going to go ahead

one. But then we're going to go ahead and pull in that minus sign from the outside of the integral to the inside.

And so that term is just going to be a -1 inside of the brackets from which we factored out a of b and lambda.

And then the final thing that we have to prove is that in that top expression, the term on the right involving the a of c and lambda can be brought down below

and turned into this expression a of b and lambda time a of c and lambda. And

to show that this is in fact a legitimate move, first of all, in the top equation, notice how we have two minus signs. And so those are going to

minus signs. And so those are going to cancel each other out. And then the only question that remains is, is the product of these two purple expressions times a

of cm and lambda equal to a of c and lambda? Well, yeah, it is. The reason

lambda? Well, yeah, it is. The reason

being that purple expression is the square of a of b and lambda. But

remember this capital A, this is the measurement result at detector A. And

the only values it can take on are either plus or minus one. But in either case, the square of plus or -1 equals 1.

And so yeah, the purple expression then collapses onto the number one. And we

see that this was in fact a legitimate move the way we've factored things out here. So what we end up with is the same

here. So what we end up with is the same integral we had before, but just massaged into a different form.

All right. So now bell is going to claim that this integral expression is less than another integral. So using equation one which is where we specified that the

measurement outcomes at detectors A and B can only either be +1 or minus1 then we can show that our integral expression is going to be less than or equal to the

integral over row D lambda of 1 minus A as a function of B and lambda* A of C and lambda.

Now, when I got to this part of the paper, I was looking at it and I was like, "Uh, hm. Okay, why?

How do we know that's the case?" And I was staring at it for a while and I just couldn't figure it out. I I don't know if there's supposed to be an easier way to do this because if you look at the two sides of this inequality, you see

that on the left side we have something of the form n * m -1 and on the right side we have something of the form 1 -

m. And notice that in both cases that

m. And notice that in both cases that blue expression m is the same number on both the left side and the right side.

And both n and m are necessarily integers. And because both of them are

integers. And because both of them are just a * a, we know that n and m are both going to be plus or - 1. And so

then the question just becomes whether it is in fact the case that n * mus1 is always less than or equal to 1 - m for

the four possibilities of each n and m being plus or minus one. So anyway, I ended up just checking all four possibilities and verifying that for all the four possible options, this is

actually true. I don't know if there's a

actually true. I don't know if there's a more elegant way of demonstrating that this is true. But in any case, this way works fine. It's just a little bit

works fine. It's just a little bit tedious.

If you check all four possibilities here, you find that this is in fact a legit move and the integral expression on the left is indeed always less than or equal to the integral expression on

the right.

Okay, then. So that works. But why do we care? Like what are we doing here? Well,

care? Like what are we doing here? Well,

notice this. If we look at that integral expression, the second term on the right is our correlation function evaluated for the vectors B and C.

You see, because by equation 14, we know that we can calculate our correlation function in terms of results at detector A as a function of measurement axes and the hidden variables. And in that case

we just integrate over all row d lambda a of a and lambda* a of b and lambda with a minus sign on the outside. And so

by pattern recognition we can see that the second term on the right of this integral by equation 14 is actually equal to p evaluated with the vectors b

and c. And that's going to be very

and c. And that's going to be very important in just a moment. Okay. Okay.

So having recognized P as a function of B and C, it follows that 1 + P as a function of B and C is greater than or

equal to the absolute value of P of A and B minus P of A and C. And you can kind of read that directly from the characters that are colorful here. You

see because if you think about the expression that we've been evaluating, remember we started off with thinking about what is the difference in the correlation function. If we have P as a

correlation function. If we have P as a function of A and B compared to that is minus the correlation as a function of A and C where C is a vector very much like

B but with a little nudge. And we showed after evaluating all of these integral expressions that this difference in correlations has to be less than or equal to this integral expression which

contains in it P is a function of B and C plus one. There's also a one in the integral. But because row is normalized,

integral. But because row is normalized, that one just pops outside of the integral. But then Belle goes ahead and

integral. But then Belle goes ahead and switches this expression around so that you have the difference in the correlation on the right side and we pull the expression involving P of B and C on over to the left side. And so

that's why the less than or equal to sign flips around into a greater than or equal to.

And so that reasoning justifies equation 15 without the absolute value. But now

we need to justify where that absolute value comes from. And as it turns out, the absolute value sign arises from symmetry. So imagine swapping the

symmetry. So imagine swapping the vectors B and C in all of these equations. Well, on the left hand side,

equations. Well, on the left hand side, when you consider the function P as a function of B and C, if instead we had P as a function of C and B, that's

actually the same thing. That's equal to P as a function of B and C. Because at

the end of the day, you're still measuring along the same two measurement axes. And it doesn't matter which

axes. And it doesn't matter which detector we say is detector A versus detector B. So the order of the input B

detector B. So the order of the input B and C doesn't matter in the correlation function.

However, on the right hand side where we have this difference P of A and B minus P of A and C, if you switch around B and C on that side, you end up with P of A

and C minus P of A and B, which is the same right hand side as before, but with a sign flip. And then you think about the fact that we should be able to swap B and C around in this argument. By

symmetry, there's no meaningful difference between the vectors B and C.

And so then you can imagine that the same line of reasoning shows us that our left hand side is going to be greater than or equal to plus or minus the right hand side. And so without loss of

hand side. And so without loss of generality, we can go ahead and clean that up and just say that the left hand side is greater than or equal to the absolute value of the right hand side.

So we're not losing anything by shaving off that negative option.

All right. So then Bell goes on to say that unless P is constant, the right hand side is in general of order absolute value B minus C for small

absolute value of B minus C. And in just a moment I'm going to unpack why that is. But real quick, I just want to read

is. But real quick, I just want to read the next thing that Belle wrote, which is that thus P of B and C cannot be stationary at the minimum value, which

is -1, where B equals C. Right? When the

axes are aligned, our correlation takes on its minimum value of a perfect anti-correlation.

And therefore, the correlation function cannot equal the quantum mechanical value given by equation 3, which is a b or also known as negative cosine of theta.

Okay, now I am a huge fan of Belle and his work and he is a great genius, but my goodness does he say so much with so few words and here it's kind of hard to see exactly what he's talking about. So

I want to take a moment to just unpack this and really get into what exactly he's saying here. Okay, so the first thing to realize is that if we write our

correlation function P of B and C and we think about it just as a mathematical function that takes two vectors as input and we know that this function is going

to take on a minimum value of -1 when the vector B equals the vector C. Then

we can say that if the function were stationary then the curve would be flat there at that value. Just like the negative cosine of theta curve of quantum mechanics is flat at theta

equals 0. So if we claim that we have a

equals 0. So if we claim that we have a hidden variable model that matches the quantum mechanical predictions, we should expect it to have a slope of zero

when its two vector inputs are the same vector. But now if it's flat wherever

vector. But now if it's flat wherever its two vector inputs are the same, then we can say something about this situation. We can say that if now we

situation. We can say that if now we imagine that the vectors B and C are very similar. They're almost the same. B

very similar. They're almost the same. B

is approximately C with the absolute value of B minus C. That is the size of the tiny difference between these two vectors. Call that epsilon. And let's

vectors. Call that epsilon. And let's

say epsilon is much less than one. It's

a very small number. Then our

correlation function evaluated for the inputs B and C is going to be -1 plus some positive number that is of order

epsilon squared or in principle it could be higher order in epsilon but the biggest of a number it could be for small epsilon is going to be of order of

epsilon squar but we can't have a first order term of order epsilon because then that would be a slope in the function you see what I mean if P evaluated at B

and C for B approximately equal to C if that had the form of -1 + something on the order of epsilon then the function would be sloped there and that wouldn't be stationary that wouldn't be a zero

slope situation and so what we're saying here about this second order or higher in epsilon I mean this is really just the definition of what it means for the function to be stationary at its minimum

value you know the slope is zero well okay then but the absolute value of The difference in the correlations P as a function of A and B minus P as a

function of A and C. That is the difference in correlations that we would get if first we have our detectors set up with the axis A on one side and the axis B on the other side minus what the

correlation would be if instead we had the axis A on one side and the axis C on the other side where again C is equal to B plus a tiny little nudge. Well, that

difference in the correlations, its absolute value is going to be first order in epsilon.

The reason being our correlation function changes when A and/ or B change. I mean by definition, you think

change. I mean by definition, you think about how the correlation is defined as the integral of the product of A and B integrated over all possible lambda each weighted by the probability and lambda.

Well, the only thing that can change as we rotate C a little bit away from B in the correlation function is going to be some of the results at A and B changing

sign from +1 to minus1 or minus1 to + one. And this is a very binary thing.

one. And this is a very binary thing.

And so the amount of change that's happening here is going to be directly proportional to the difference between the vectors B and C epsilon for small epsilon. And if you think about what the

epsilon. And if you think about what the measurement results are going to be at A and B as we're moving C slightly away from B, you can think about like a belt of area where A and B are flipping sign

and that's contributing to the change in the correlation. Sort of like thinking

the correlation. Sort of like thinking about an orange slice having a volume proportional to the wedge angle. And

then you see that that area of A and B flipping around is going to be directly proportional to this change in the correlations.

Well, okay. So considering all of that we run into a contradiction because then equation 15 would imply that a positive number which is second order in a small

epsilon is greater than or equal to a positive number which is first order in a small epsilon. But that's not true for a small positive epsilon the first order

term dominates because if epsilon is small then epsilon squar is a small* a small which is a tiny and so the thing should be flipped around the other way.

you know something of the order of epsilon squar is going to be smaller than order of epsilon not what equation 15 would imply and so that mathematical contradiction proves that our

correlation function cannot be stationary at its minimum value unlike the quantum mechanical correlation function which is stationary at its minimum value. And so this is one way in

minimum value. And so this is one way in which we see that a local hidden variable model cannot give us the same correlation function as quantum mechanics which is also the correlation

function that we see in experiments. And

so this right here is the first of the two parts of part 4 where we've proven that a local hidden variable model just is not capable of reproducing the statistical predictions of quantum

mechanics.

Now it's time to get into the second part of part four. And this is the main argument of Bell's paper. This is the really powerful proof that the correlation function we get from a local

hidden variable model cannot be equal to the quantum mechanical correlation function. In other words, in just the

function. In other words, in just the same way that we've seen that a line cannot be a cosine, it's true more generically that any kind of correlation

function we can get from a classical hidden variable model cannot be equal to the quantum mechanical correlation of negative cosine of theta also known as a

b. All right. So having already proven

b. All right. So having already proven the thing about the slope being non zero, Bell goes on to say, nor can the quantum mechanical correlation of

equation 3 that is a b also known as negative cosine of theta be arbitrarily closely approximated by the form of equation 2 that is a correlation

function given by a local hidden variable model. No matter what kind of

variable model. No matter what kind of local hidden variable model you want to come up with, it's just not the case that the correlations that model gives you are going to be the same as the

quantum mechanical correlations. And

this holds for all possible local hidden variable models.

The formal proof of this may be set out as follows. Well, first of all, we would

as follows. Well, first of all, we would not worry about the failure of the approximation at isolated points. So let

us consider instead of equation 2 and three the functions p bar of a and b and a dob bar. And these functions are

essentially exactly the same thing as equations 2 and three but they're averaged over vectors near the vectors a and b. So the bar denotes independent

and b. So the bar denotes independent averaging of the correlations as a function of a prime and b prime within specified small angles of a and b.

Okay, so let's pause here and think about what Belle is saying and why it matters. So this averaging thing, it's

matters. So this averaging thing, it's kind of a mathematically pedantic point.

But what Belle is saying here is look, let's be generous and say that if someone came up with a local hidden variable model which had a correlation function that matched the quantum

mechanical correlation for the most part, but there were isolated points at specific values of A and B where there was a mismatch between the local hidden

variable correlation and the quantum mechanical correlation.

So for example, let's say P of A and B is equal to A.B everywhere except at one special point where A is equal to B or

whatever it may be. And at that one infinite decimally small point suppose there's some disagreement between P and the quantum mechanical correlation.

What Belle is saying is don't worry about that. If the local hidden variable

about that. If the local hidden variable model matches the quantum correlations except at these special isolated points where for whatever reason it doesn't work out, you know what? We're going to

be generous and we're going to say that would work. The reason being

would work. The reason being experimentally we might not notice if there was a mismatch at very specific isolated points between local hidden variables and quantum mechanics. And so

when you're thinking about the mismatch between the local hidden variable correlation and the quantum mechanical correlation, you want to kind of smear things out or smooth things out just a

bit to where a mismatch at an isolated point would be totally washed away. And

so all we're doing when we're taking this average over very close nearby points is we're just saying don't worry if the correlation fails at specific isolated points. That's all that is. So

isolated points. That's all that is. So

to imagine the vectors A prime and B prime, just imagine the vectors A and B, but then smear them out just a little bit over a tiny little space of nearby

vectors. That's all that means.

vectors. That's all that means.

Suppose that for all A and B, the difference between the local hidden variable correlation and the quantum mechanical correlation is bounded by

some number epsilon.

That is P bar of A and B plus A.B bar.

the absolute value is always going to be less than or equal to this value epsilon.

Now the thing you have to see about equation 16 is that this is just the local hidden variable model correlation minus the quantum mechanical correlation because the quantum mechanical

correlation is negative a b and so this plus a b this is minus the quantum mechanical correlation and then you take the absolute value and that is just the

magnitude of the error or the mismatch between our local hidden variable models correlation and the quantum mechanical correlation.

And so what epsilon represents is the maximum amount of error in our local hidden variable model relative to quantum mechanics. And then as a

quantum mechanics. And then as a reminder, these bars are just there to say don't worry about single isolated points where there's a mismatch. We'll

allow that. We're going to go ahead and smooth out or filter out any infinite decimally small points of mismatch. So

epsilon is the maximum mismatch when you factor out any infinite decimally small areas where the two correlation functions disagree. And so if we can

functions disagree. And so if we can show that epsilon is zero for some local hidden variable model then that model would effectively reproduce the quantum

mechanical correlation. So that would

mechanical correlation. So that would work. However, it will be shown that

work. However, it will be shown that epsilon cannot be made arbitrarily small. That is what we're about to prove

small. That is what we're about to prove is that at minimum epsilon has to be some nonzero number. And so therefore, you're always going to have some mismatch between the local hidden

variable correlation and the quantum mechanical correlation, no matter the details of your local hidden variable model. So that's going to be the main

model. So that's going to be the main proof of Bell's paper. All right. So

next we're going to massage equation 16 into a slightly different form by supposing that for all a and b the

absolute value of a dob bar minus a dob is going to be less than or equal to some small number delta.

So this expression is the mismatch between the average dotproduct over the a prime and b prime that are close to a

and b minus the exact dotproduct a dob.

So you can think about this as the error introduced into the quantum mechanical correlation as a result of our averaging technique. So as we smooth things out

technique. So as we smooth things out just a little bit and we average away those infinite decimal potential points of mismatch. Suppose that this is going

of mismatch. Suppose that this is going to smear things out such that the average of the dotproduct of a and b minus the dotproduct of exactly a and exactly b is going to be at most some

small number delta.

Then from equation 16 we find that p bar of a and b plus a dob that is the average local hidden variable

correlation function minus the exact quantum mechanical correlation function evaluated at exactly a and b. Notice we

no longer have the bar over a dob is going to be less than or equal to the small number epsilon plus the small number delta. That is the mismatch

number delta. That is the mismatch between P bar and the exact quantum correlation function evaluated at A and B has to be less than or equal to the

maximum mismatch between P bar and A.B

bar plus whatever the maximum number is that results from us smearing out the quantum mechanical correlation a dob into a b bar. And that kind of makes

sense just by looking at it. But just to show exactly how equation 18 follows from equation 16 and 17, we can go ahead

and write equation 18 as p bar of a and b plus a dob bar plus a dob minus a dob bar. See, all we've done here is within

bar. See, all we've done here is within that absolute value, we've added an a.b

bar and we've subtracted out an a.b bar.

Now, why does that matter? Well, because

now we know that that has to be less than or equal to the absolute value of p of a and b plus a dob bar plus the

absolute value of a dob minus a dob bar.

And that comes from the triangle inequality, which is the idea that if you have the absolute value of x + y, that can at most be the absolute value

of x plus the absolute value of y. Well,

then now if you examine these two quantities on the right side, you see that the first one p bar of a and b plus a dob bar is what we have in equation 16. And so we know that has to be less

16. And so we know that has to be less than or equal to epsilon.

And then the yellow expression a dob minus a dob bar absolute value. Well,

that's the same thing we have in equation 17. And so that has to be less

equation 17. And so that has to be less than or equal to delta. And so

therefore, the whole thing has to be less than or equal to epsilon plus delta. And so therefore we've just

delta. And so therefore we've just proven equation 18.

Okay. So next we want to think about what exactly is par bar of a and b. Well

by equation two this is just going to be p of a and b the local hidden variable correlation function but averaged out over a space of vectors a prime and b prime which are very close to the

vectors a and b but just a little bit smeared out so we don't worry about weird singular points. And so therefore we can write P bar in exactly the same

way as we write P in equation 2. But

here we simply put a bar over A and B because when we smear out the vectors A and B a little bit and we ask what is P bar? Well, that's just going to depend

bar? Well, that's just going to depend on how smearing out A and B affects the average of the results at detector A and detector B because the correlation function is just the product of the

results at A and B integrated over all possible hidden variables. And remember

that the bar is just averaging out or smearing out the vectors A and B a little bit. So that's not going to

little bit. So that's not going to affect the distribution of hidden variables. And that's why we don't have

variables. And that's why we don't have a bar over the row because this process of smearing out A and B doesn't have any effect on the probability distribution of our hidden variables.

But now if you think about what are the values that a bar and b bar are going to take on. Well remember that the results

take on. Well remember that the results at a and b can only ever be plus or minus one. And so now when we smear out

minus one. And so now when we smear out a and b and we're going to average over the values that a and b take on for these smeared out vectors. Well then we

find that at most the absolute values of a and b are going to be one. But now it is possible for a bar and b bar to be less than one. If when we smear out the

vectors a and b, we dip into a space of the detector results where the sign flips relative to what it would have been along exactly the measurement direction a or the measurement direction

b. That is to say, if the result at

b. That is to say, if the result at detector a is a function of a is equal to 1. But if you give a a little nudge,

to 1. But if you give a a little nudge, then you could nudge the result into being negative 1. If the measurement direction a is right on the edge of what determines the sign of the result at

detector A, well then in that case, the absolute value of the result at A might be something like 0.9 or 0.8 or whatever. But no matter what, it's going

whatever. But no matter what, it's going to be some number less than or equal to 1.

All right. And next, Belle goes ahead and constructs equation 21 from equations 18 and 19 with the measurement direction A set equal to the measurement

direction B. So that yields equation 21.

direction B. So that yields equation 21.

And in just a moment, we're going to use equation 21 and we're going to see why it matters and why Bell writes it out.

But for now, I just want to reflect on how equation 21 follows from equations 18 and 19. So the first thing to recognize is that the right hand side of

equation 21 is precisely the same as the right hand side of equation 18. And then

if you look at the left hand side of equation 18, you see the p bar of a and b. And you can recognize that on the

b. And you can recognize that on the left hand side of equation 21 as the integral over all row d lambda of a as a function of b and lambda time b as a

function of b and lambda. Because

remember here in the context of equation 21, we're setting the two measurement axes to the same vector B for both detectors.

And so then we see that this integral expression is par bar evaluated for the vectors b and b.

Now then you notice there's also that plus one inside the integrant and that is simply a dob because when a and b are the same unit vector then you have b dob

which is magnitude of b ^ 2 which is 1 cuz b is a unit vector and the one we can bring inside or outside of the integral because of the fact that the integral of row lambda d lambda is equal

to 1 because row is a normalized probability distribution.

And then there's another little detail here, which is that notice how we've dropped the absolute value sign on the left side of equation 18. The reason

that's an okay move is because by inspection, the integral on the left hand side of equation 21 cannot be negative. Because if you consider the

negative. Because if you consider the product of a bar and b bar, the minimum value that can be is -1. Say a bar is 1

and b bar is ne. And so therefore, a bar * b bar + 1 is at least zero. It can't

go negative. So then when we integrate over a bar * b bar + 1, we're always integrating over a non- negative number.

And so that's why we can just go ahead and drop the absolute value sign because if we know it's not negative, then there's no point in having an absolute value sign. Okay, so I'm sure you're

value sign. Okay, so I'm sure you're wondering, what's the point of equation 21? Where are we going with this? Why

21? Where are we going with this? Why

does this matter? Well, I want to take a moment to recognize where we are currently at in the paper as a kind of natural checkpoint in part 4B. So,

everything we've done up until now is sort of the warm-up of part 4B. We've

essentially been setting the stage, thinking about what it is that we want to prove, thinking about averaging, smoothing things out, not worrying about isolated points and all this sort of thing. and then introducing these

thing. and then introducing these quantities epsilon and delta and making some algebraic observations.

In the next part of this derivation, we're going to be utilizing these equations to make an algebraic argument which is going to lead to Bell's famous result that the error between the local

hidden variable correlation and the quantum mechanical correlation that is epsilon cannot be made arbitrarily small. which is to say that no local

small. which is to say that no local hidden variable model can reproduce the statistics of quantum mechanics to an arbitrarily good approximation.

And then the next thing Belle goes ahead and does is he writes an expression for P bar as a function of A and B minus P bar as a function of A and C. And in

just a moment I'll tell you exactly what that is. But for now, let's see why the

that is. But for now, let's see why the equation is true. So if you look at equation 19, we have the definition of P bar as a function of A and B, which is simply the integral definition of the

correlation P as a function of A and B, that is equation 2, but average over smeared out vectors near A and B. So

that's why we have A bar and B bar in the integrant. Well, then if we want to

the integrant. Well, then if we want to write the expression P bar of A and B minus P bar of A and C, we can just go ahead and copy and paste equation 19 twice. in the first case evaluated for

twice. in the first case evaluated for the vectors A and B and in the second case evaluated for the vectors A and C and then you may as well smoosh them together into the same integral. So

that's all we've written here. It's

basically just equation 19.

So now let's reflect on what is this quantity P bar of A and B minus P bar of A and C. Well, you want to think of C as another alternative for B that is the

measurement axis of detector B. And this

is just like how we had imagined the vector C before in part 4 A. However,

whereas before we imagined that B and C were very similar vectors, so that C was just a little nudge away from B. And

that let us probe the behavior of the correlation P of B and C near its minimum value where B equals C. We're

now going to imagine the vector C as being totally unrelated to B. So not

just a nudge away but a whole different vector that we are totally free to choose for the measurement axis of detector B. So then in that context P

detector B. So then in that context P bar of A and B minus P bar of A and C is the difference between the correlation strengths that we would measure for the

detector settings A and B compared to the detector settings A and C. Now of

course we have the bar on the P and so we're neglecting aberant isolated points. you know, we're smoothing out

points. you know, we're smoothing out any infinite decimal pathological point.

And so that's why we have the bar and the P. All right. So then now Belle goes

the P. All right. So then now Belle goes ahead and writes this equation in a form that looks way more complicated, but is going to be useful in a moment. So he

writes out this integral expression like so. I'm not going to try to pronounce

so. I'm not going to try to pronounce this equation cuz it's a mouthful. But I

will show you why this is a legit move and why this complicated expression is in fact algebraically equivalent to the previous integral. So to recognize this,

previous integral. So to recognize this, you just have to consider the fact that if you have an expression of the form x * y - x * z. If you want, you could

write that as x * y * the quantity of 1 + w * z - x * z * the quantity of 1 + w * y, assuming all these variables

commute, which they do because they're scalers. And the reason that's true is

scalers. And the reason that's true is because on the right hand side here, the terms involving w are going to cancel each other out. In one case, you'll have uh xy wz, but then you're going to have

a minus xzwy.

And so you're going to end up with wxyzus wxyz equals zero. Then what

remains the terms multiplied by 1 is just xy - xz, which is exactly the left hand side of the equation. So if you look at the integral expression shown

here on the bottom line, you see that it has this complicated form where we have something of the form xy * 1 + wz - xz *

1 + wy. And so that's how to see that these two integrals are equivalent. So

this kind of feels like backwards math.

Like if you started with the second line, you would feel a sense of accomplishment upon seeing that the terms simplify into the first line. But

here we're going backwards. We're

expanding out the equation. We're making

it more messy because this is going to be a form that's going to be useful for us in just a moment.

All right. So, where do we go from here?

Well, think about what this equation is.

This is a generic statement that for any local hidden variable model, the difference between the correlations that we would expect with the measurement axes A and B compared to A and C is

going to be equal to this big mess of an equation involving integrating over these expressions involving the various outcomes at A and B with given

measurement axes A, B, and C. So the

difference in correlations equals a big mess. And the next move that we're going

mess. And the next move that we're going to do is we're going to convert this equation into an inequality. And in the process, we're also going to convert the

big mess into a medium-sized mess.

From equation 20, we find that the absolute value of this difference in correlations is going to be less than or equal to this medium-sized mess.

Now, to get from this inequality from the previous equation, it only takes a couple of steps. The first thing you want to do is take the absolute value of both sides. So you see on the left hand

both sides. So you see on the left hand side, we've simply taken the absolute value of the difference in correlations.

And then when you take the absolute value of the right hand side, you find that you're taking the absolute value of an integral minus an integral or plus a negative integral if you want to think

about it like that. And then you realize that by the triangle inequality, the absolute value of the sum of two integrals can be at most the absolute value of the first integral plus the

absolute value of the second integral.

And so then because we're converting the equation to an inequality, then we can go ahead and imagine the absolute value on the right hand side applying to each integral individually.

And then because a bar * b bar is at least -1 because there's no way if a bar and b bar could be less than negative 1 then the quantity 1 + a bar b bar is

always going to be non- negative. So

that's all good.

All right. Now at this stage in the derivation it should not be obvious why we care about this inequality that we've written here. But if you look at this

written here. But if you look at this equation you can see a bit of foreshadowing here. The reason being we

foreshadowing here. The reason being we have a very generic statement that applies for any local hidden variable model which says that the magnitude of

the difference in the correlations for the settings A and B versus the settings A and C are going to be bounded by an upper limit given by the right hand side

of this inequality. So you can imagine that we're just a few algebraic moves away from a very interesting result which constrains all possible local hidden variable models in a way that is

relevant to the question of whether local hidden variable models can reproduce the statistical correlations of quantum mechanics.

So in service of that goal, we can now go ahead and rewrite this inequality with a much simpler right- hand side.

See from equations 19 and 21 we can see that the expression on the right hand side is going to be less than or equal to 1 + p bar of b and c plus epsilon

plus delta. The reason being if you look

plus delta. The reason being if you look at the first of the two integrals on the right hand side we see that there's a 1 which can be pulled outside of the integral because row of lambda is a

normalized probability distribution.

And then what remains in that integral is by definition P bar evaluated with the vectors B and C by equation 19. So

the first integral is going to be exactly equal to 1 + P bar of B and C.

And then if you look at the second integral, you find that that is exactly the left hand side of equation 21 because we're integrating over row d

lambda of 1 plus a bar of b and lambda * b bar of b and lambda. And we've already established in equation 21 that that has to be less than or equal to epsilon plus

delta. And so those inequalities stack.

delta. And so those inequalities stack.

And so then we can go ahead and pull that down to the bottom line here. And

we end up with this much more elegant upper bound on the difference between the correlations of a local hidden variable model for detector settings A and B versus A and C. And now we're

really getting somewhere. You can see that things are starting to clean up really nicely. And so now Bell goes on

really nicely. And so now Bell goes on to abruptly say that finally using equation 18, the absolute value of a C

minus A.B B - 2 epsilon + delta has to

minus A.B B - 2 epsilon + delta has to be less than or equal to 1 minus B do C + 2 quantity epsilon + delta. And that's

a bit of a leap. You know, you can't really see that just by looking at it.

So, we have to take a moment to see why that's the case. All right. So, if you look at equation 18, we find that the absolute value of p bar of a and b plus a dob is less than or equal to epsilon

plus delta.

And remember what that equation means.

That is the absolute value of the difference between the correlation function given by a local hidden variable model and smoothed out a little bit. So we're neglecting any

bit. So we're neglecting any pathological aberant points minus the quantum mechanical correlation of a b.

And as we saw earlier that has to be less than or equal to epsilon plus delta where epsilon is the upper bound on the mismatch between the local hidden variable correlation and the quantum

mechanical correlation.

And this small number delta encodes the mismatch between the precise quantum mechanical correlation and the slightly smeared out quantum mechanical correlation when we're averaging over

the vectors a prime and b prime near a and b respectively. And we saw earlier why equation 18 is true. But now we can think of it in another way which is to

say equation 18 tells us that p bar of a and b is going to be equal to a dob plus some error which let's go ahead and

subscript that error sub a. And the

reason this follows directly from equation 18 is that equation 18 tells us that the difference between p bar of a and b and a dob the absolute value of

that is going to be bounded by the sum of two small numbers epsilon and delta.

And so therefore p bar of a and b and a do.b are going to be pretty similar

do.b are going to be pretty similar numbers. And so we can think about these

numbers. And so we can think about these two as the same thing plus some error factor.

So now then if you take that reasoning and you apply it to the inequality we derived before regarding the absolute value of p bar of a and b minus p bar of a and c you see that we can go ahead and

replace those p bars with a quantum correlation a dob plus error sub a and then for the negative p bar of a and c

that becomes for the same reason plus a do c minus error a sub c And you see we've gone ahead and distributed a minus sign throughout those terms.

And so thinking about equation 18 as a statement about the error between par and the quantum mechanical correlation with the absolute value of the error bounded by epsilon plus delta. We can go

ahead and replace any expression involving p bar with the quantum mechanical correlation plus that error.

And so likewise on the right hand side we can go ahead and replace par bar of b and c with negative b c plus error subbc.

And so now what we want to do is ideally we would like to replace these error factors with factors of epsilon plus delta. But when we do that, we have to

delta. But when we do that, we have to be careful because it's not guaranteed that the absolute value of the error is going to equal epsilon plus delta because in general, it's going to be

actually less than or equal to epsilon plus delta. And so if we're starting

plus delta. And so if we're starting with this inequality about the absolute value of p bar of a and b minus p bar of a and c and we want to go from that

inequality to another inequality where we can replace these error factors with factors of epsilon and delta and we want to make sure that logically our new

inequality actually does logically follow from the previous one. then we

have to consider the quote unquote worst case scenario where the magnitude of the error is indeed equal to epsilon plus delta. And in a way, this is the best

delta. And in a way, this is the best case scenario for ensuring that the inequality that we're going to arrive at is true. Because what this means is that

is true. Because what this means is that on the left hand side of this expression, we're going to subtract 2 * the quantity of epsilon plus delta

corresponding to the most that our error factors could pull down the left side of that inequality to make it as small as possible. And then likewise on the right

possible. And then likewise on the right hand side of the expression, we're going to let our error be the most it could possibly be. So we're going to add

possibly be. So we're going to add epsilon plus delta on the right side to bring up the right hand side as much as we possibly can.

And so because we did it like that where we considered, okay, worst case scenario, the error is as big as possible and we're going to let it pull down the small side and push up the big side. then we know for sure that the

side. then we know for sure that the simpler inequality where the errors have been replaced with epsilon plus delta is guaranteed to still be true.

All right. Now, there's one little adjustment we're going to do cuz when you look at an equation like this, you think maybe we can clean this up a little bit. So, let's go ahead and pull

little bit. So, let's go ahead and pull all factors of epsilon and delta on over to the left side of the expression. And

while we're at it, let's go ahead and flip around the inequality and then put everything else on the right. So with

just a little bit of algebraic maneuvering we end up with this inequality that 4 * the quantity of epsilon plus delta is guaranteed to be greater than or equal to the absolute

value of a dot c minus a dob plus b dot c minus 1. This is equation 22 of bell's paper. And this is a very profound

paper. And this is a very profound result. In fact, you know, the term

result. In fact, you know, the term Bell's theorem is kind of a vague generic statement that applies generally to Bell's observation that local hidden variable models don't work. But if you

had to take a single equation, or in this case, an inequality from Bell's paper and say this is the result. This

is the statement, well, it would be the inequality shown here. And why is that?

What's the big deal? Who cares about equation 22?

Well, to see what equation 22 can tell us, let's imagine a thought experiment where we consider the vectors A, B, and C. A is going to be a constant

C. A is going to be a constant measurement axis at detector A. And then

B and C are going to be the two different options that we imagine for detector B. And suppose for the sake of

detector B. And suppose for the sake of a specific example that A and C are perpendicular such that A dot C equals

zero. And then also A.B B is equal to B

zero. And then also A.B B is equal to B do C, which is 1 / <unk>2. That is to say, we have a 45° angle between the

vectors A and B, as well as also a 45° angle between the vectors B and C. So,

for example, if A is pointing straight up and C is pointing straight to the right, then B is going to be right in between them, a 45° angle that points up and to the right. And if we apply this

reasoning to that scenario, you'll find when you evaluate the dot products in equation 22 that 4 * the quantity of epsilon plus delta has to be greater

than or equal to the<unk> of 2 - 1, which is about 0.41.

So divide both sides by 4, you find that epsilon plus delta has to be greater than or equal to.1 something. And then

remember that delta is kind of an artifact of our smearing process. So you

can imagine making that as small as you want. In fact, if you want to make that

want. In fact, if you want to make that zero and say forget about averaging, don't worry about the averaging process.

But even then, you'll find that epsilon cannot be made arbitrarily small because in this case, it would have to be at least 01 something. But remember what epsilon is. It's a bound on the mismatch

epsilon is. It's a bound on the mismatch between the local hidden variable correlation and the quantum mechanical correlation. So if epsilon cannot be set

correlation. So if epsilon cannot be set to zero then the quantum mechanical expectation value cannot be represented either accurately or arbitrarily closely

in the form of equation 2 which is the definition of a generic local hidden variable correlation.

So that is argument. Now you can see there's a bit of algebra and it takes a moment to kind of soak it in. And when

you're first encountering this argument, probably the thing you want to do is just focus on how each step follows logically from the previous step and then think big picture about what are

our assumptions and what is the result.

And you think about how our assumptions were so generic going all the way back to equation two defining the correlation for a local hidden variable model. We

made no assumptions or any kind of restrictions on the sort of thing that our hidden variables lambda could be.

And so we've proven this very generic result which is that at least for some measurement settings A, B, and C. We can

show that there is going to be a finite nonzero mismatch between the correlation given by a local hidden variable model and the correlation given by quantum mechanics.

And here there's a possibility of getting confused by equation 22 because you might say, well wait a minute, aren't there settings of A, B, and C that make the right hand side zero and

so this isn't a problem? And that is true, but it's not surprising because remember, as we saw earlier in part 3B of this paper, you can have an agreement

between a local hidden variable model and quantum mechanics for certain specific settings of our measurement directions.

So the fact that there exist experimental configurations where a local hidden variable model might agree with quantum mechanics is not philosophically profound because the

profound thing is that there exist experimental conditions where no local hidden variable model can explain the results of quantum mechanics. All that's

to say, if you as an experimentter design an experiment where local hidden variables in quantum mechanics agree, it's like fine. Okay. But if someone else designs an experiment where they

orient their detectors in such a way, like the example given here, where no local hidden variable explanation makes sense and only quantum mechanics with its weird non-local wave function

collapse or something mathematically isomeorphic is able to explain the data.

Well, then that's the case and point right there that reality is not described by a local hidden variable model. And so even the existence of one

model. And so even the existence of one possible experimental setup that violates local realism is all you need to know that well something other than

local realism is going on in this universe. So that's a glitch in reality

universe. So that's a glitch in reality right there. You know, this is one of

right there. You know, this is one of those things that the more you think about it, the more it blows your mind.

You'd like to think the more you think about something, the less it blows your mind. But no, in this case, it's the

mind. But no, in this case, it's the opposite.

Part five, generalization.

All right. Right. So in this part of the paper, Belle is going to make the argument that even though we've been thinking in terms of spin and the singlet state of two spin 1/2 particles

with entangled spin, the same arguments regarding non-locality and correlations and hidden variables applies much more generally in quantum mechanics in a way

that doesn't depend specifically on spin. We just thought about it in terms

spin. We just thought about it in terms of spin because that's an example that's easy to think about. So Bell begins part five generalization with the example

considered above has the advantage that it requires little imagination to envisage the measurements involved actually being made cuz you can imagine the sternerlock magnets and the

orientation and the spin and all of that. But in a more formal way, assuming

that. But in a more formal way, assuming that any hermission operator with a complete set of igen states is an observable, the result is easily

extended to other systems. So in other words, it's not just about spin. We can

apply this reasoning to any quantum mechanical observable.

If two systems have state spaces of dimensionality greater than two, we can always consider two-dimensional subspaces and define in their direct

product operators sigma 1 and sigma 2 formally analogous to those used above and which are zero for states outside of the product subspace.

Whenever we have two quantum systems, no matter how complicated they might be, they'll always contain smaller two-state parts that we can focus in on. And

within those parts, we can define measurements that behave just like the simple spin measurements we discussed earlier. And when we do that in that

earlier. And when we do that in that two-dimensional subspace, there's going to be a state which is analogous to the singlet spin state but pertaining to

whatever observable we're talking about in this more general context.

Then for at least one quantum mechanical state, the singlet state in the combined subspaces, the statistical predictions of quantum mechanics are incompatible

with separable predetermination.

That is the kind of realism or local causality that we would expect from a local hidden variable theory or even a kind of quantum mechanical picture where the two states are separable. Like

remember earlier we were talking about the uh isotropic mixture of product states where each particle had an equal and opposite spin and we saw how that gave a correlation which was three times

weaker than the singlet state. Well,

that same kind of reasoning applies to this two-dimensional subspace of whatever observable we're dealing with.

you can create a state which is directly analogous to the spin singlet state. And

when you do that and you separate out the particles and you measure them in different ways, you'll find that the quantum mechanical singlet quote unquote state is always going to have weirdly

strong non-local correlations.

And so all that's to say, Bell's theorem is not about spin per se. Generically,

quantum mechanics can exhibit non-local correlations in all kinds of different observables.

All right, my friends, let's go ahead and wrap things up with part six, conclusion.

In a theory in which parameters are added to quantum mechanics to determine the results of individual measurements without changing the statistical predictions, there must be a mechanism

whereby the setting of one measuring device can influence the reading of another instrument, however remote.

That is to say, if you take Einstein's perspective that quantum mechanics needs to be supplemented with hidden variables, then Bell has proven that that hidden variable model has to

contain non-local interactions which are apparently unrestricted by the normal limitations of space and time. Moreover,

the signal involved must propagate instantaneously so that such a theory could not be Loren's invariant. and Lorent and

Loren's invariant. and Lorent and variance. That's just one of the main

variance. That's just one of the main principles of special relativity. That

is to say, once you have a non-local theory, you run into all kinds of problems with special relativity. And

really, a non-local theory just totally goes against the usual relativistic notions of space and time and causality.

Now, fortunately, because of the no signaling theorem, the non-local correlations in quantum physics are not actually able to corrupt our universe by allowing for the transmission of

information faster than the speed of light. But still, there's a deep

light. But still, there's a deep philosophical tension between the non-local correlations in quantum mechanics and the way we usually think about the nature of space and time from

a relativistic perspective. And to this day, that tension remains unresolved. We

really do not have a good explanation for what's going on with the non-local correlations in quantum mechanics.

Depending on who you ask, different people have different ideas and theories, but there's really no consensus. And the reason being, well,

consensus. And the reason being, well, one of the reasons is that all these different models are so crazy that it's like what are you going to believe in?

You want to believe in many worlds or super determinism or that you just give up the concept of realism? I mean, no matter how you try to explain the implications of Bell's theorem, it ends

up just blowing your mind. No one has yet found a sane explanation for what's going on here. All right, so this is basically the conclusion of Bell's paper right here. But then he goes on to add

right here. But then he goes on to add one additional note, a little caveat, which is, of course, the situation is different if the quantum mechanical

predictions are of limited validity.

Conceivably, they might apply only to experiments in which the settings of the instruments are made sufficiently in advance to allow them to reach some mutual rapport by exchange of signals

with velocity less than or equal to that of light. In that connection,

of light. In that connection, experiments of the type proposed by Bow and Aaronov in which the settings are changed during the flight of the

particles are crucial. And all that's to say, if you're doing an experiment where the settings of the two detectors are set in advance and then you're sending your entangled particles to each

detector, well, maybe there's some way that the two detectors have communicated with each other or established some sort of rapport somehow. And even though for each pair of particles, the measurements

are happening so fast that they're in different light cones, perhaps somehow the two detectors are already kind of in sync with each other in some sort of way. in that they somehow know the

way. in that they somehow know the settings of one another and therefore you don't need non-locality to explain the correlation results. Now, that would be a very hard to believe situation

because you'd be like, how can that be?

And you know, how and why would the two detectors know about each other, but I mean, in theory, that is a loophole that you could imagine possibly somehow being true. And so, that's why Belle mentions

true. And so, that's why Belle mentions these experiments where you change the settings of the detectors as the particles are flying along, so that there's no possible time for the two

detectors to establish a rapport with one another. And so each detector is

one another. And so each detector is going to be truly independent of each other detector. And so then you're

other detector. And so then you're really ensuring that these correlations are genuinely non-local.

Well, okay. So that's the end of the paper. I hope you found this

paper. I hope you found this interesting. I hope it's given you

interesting. I hope it's given you something to think about. So yeah,

thanks for watching. I really appreciate it. And I'll see you next time.

it. And I'll see you next time.

Hey, I want to say thank you to everyone who's been supporting my channel on Patreon. Your support really means a

Patreon. Your support really means a lot. It really makes a big difference.

lot. It really makes a big difference.

And genuinely without your support, I wouldn't be able to really dive into this full-time. So, I'm so grateful for

this full-time. So, I'm so grateful for all of you. Thank you so much. It really

means a lot.

Loading...

Loading video analysis...