LongCut logo

Statistical Rethinking 2026 - Lecture A01 - Introduction to Bayesian Workflow

By Richard McElreath

Summary

Topics Covered

  • Science Before Statistics
  • Causal Inference Without Mechanisms
  • Bayesian Beats Frequentist Realism
  • Garden of Forking Data

Full Transcript

All right, good morning everybody. I'm

Richard. Welcome to statistical rethinking for 2026. I've split the course into two sections this year, which I hope will be a useful

innovation. Um,

innovation. Um, this is the general course outline.

You're uh here on Tuesday, so you're in the beginner section. This should not discourage you from also attending Thursday if you like. It'll just be confusing. Yeah.

confusing. Yeah.

Buth maybe you'll you'll enjoy doing both. Uh the way this is going to work

both. Uh the way this is going to work is on Tuesdays I'm doing the beginner section and on Thursdays I'm doing the experience section. The experience

experience section. The experience section starts where the beginner section ends at the end of the 10 weeks.

All right. So you can lead a double life if you take both and I know some people will. Uh but the intention is if you're

will. Uh but the intention is if you're truly a beginner to basian stats and to scientific statistical modeling you start with the beginner section um we're going to go through it on a slow pace

over 10 weeks and do the first half of the textbook and the uh experience section if you want to take this again next year. Um we'll pick up exactly

next year. Um we'll pick up exactly where the beginning section ends and do the second half of the book. Uh also

since the lectures will be recorded, you can do the experience section in your free time at starting at the end of March uh if you so choose. Um or come

ask me questions. Uh the first thing you should do procedurally is make sure you write down the URL of this website.

They're at the top, the GitHub site that I'm going to maintain the course materials on. Uh there's a registration

materials on. Uh there's a registration link there too, which is just to get your email so that I can get you on the mailing list and send you links to the book and things like that. Yeah. Um but

it's not really essential. If you just want to physically come to class and be here and and hang out, that's fine as well. I'll assign one homework problem

well. I'll assign one homework problem each week, and I'll put that up for you folks later this week, depending upon how far I get in the lecture today. I

will design the homework. So there's

there's some uncertainty for all of us uh uh in that. Um are there any questions about the course structure

right now? Uh if not then uh we'll get

right now? Uh if not then uh we'll get going. U what is the point of this

going. U what is the point of this course? It's it's called statistical

course? It's it's called statistical rethinking and it is a basian course and so sometimes people think oh what's rethinking here is we're doing bays instead of frequentism and that's true but that's not really what I mean when I

say statistical rethinking. I mean

statistical rethinking is putting science before statistics. Very often

when statistics is taught, it's a bunch of procedures for getting your paper published, right? And the connection to

published, right? And the connection to some scientific theory uh described anyway is often incredibly vague and handwavy. And in this course, we don't

handwavy. And in this course, we don't work that way. We start with scientific models and we build up to statistical procedures that you can have confidence in. Uh so here's the the kind of

in. Uh so here's the the kind of political stump speeched version of that. uh for any kind of statistical

that. uh for any kind of statistical analysis to produce scientific insight it really depends upon scientific models as a statistical procedure by itself has

no meaning.

It gets its meaning from external theories formal or informal but some kind of theory is always used to interpret the output of statistical

procedure. Um, and what researchers

procedure. Um, and what researchers really need to be supported in in uh, statistical education in the sciences, I believe, is to be taught a workflow for

doing that connection, not leaving it up to heristics or some sort of guesswork or a bunch of social anxieties that you get by reading published papers, but to actually have some scientific logical

workflow that will connect a scientific theory to a statistical procedure so that you can justify it and feel confident in the interpretations of the output. it. So that's what I want to

output. it. So that's what I want to teach in this course. Uh it's going to take a long time because we're going to go at a pace that that hopefully is learnable. Uh but what we're going to

learnable. Uh but what we're going to keep coming back to again and again in the different examples in the course is this thing that I call the core basian workflow. Uh

workflow. Uh this could be elaborated and will be elaborated a lot in different examples, but this core is something that's understandable that we're going to start

to build and justify. Um uh in this core basian workflow we uh we transparently say what the scientific models we're using are. We transparently say what our

using are. We transparently say what our questions about them are and we use those to derive statistical procedures.

Um and then we produce estimates that are also uh transparently satisficing of these these questions

what we call estims are scientific questions. Um this will get elaborated

questions. Um this will get elaborated even in the first example I do and so in the interest of transparency I want to show you how elaborate this eventually gets uh because the core basian workflow

is a teachable thing is where we begin uh to scaffold you up to realistic data analysis problems and that's a that's a general teaching strategy in the course is that I start with something that's

unrealistically simple that doesn't actually look like your own thesis work right because real data analysis is much more complicated than this. Yeah. But

it's good to start here, I think, with the core and get that logic and then add things onto it. So, in that entrance, let me show you what we're going to add.

All right. But but we're not going to do it uh in the beginning here. The first

thing we're going to add is checking basic checks. There there are things

basic checks. There there are things that I call prior model checking. That

is before you've taken your statistical procedure and given it real data, there are things you can do to ensure that it works or that it makes scientific sense.

I call that prior model checking and I'll teach you some some workflow for that. And then there's posterior model

that. And then there's posterior model checking which are the things you check for the model after you fit it to the data. So even if all the prior model

data. So even if all the prior model checks pass, the model can still be bad, right? So you need posterior model

right? So you need posterior model checks and we'll do those as well. Uh

then there's a bunch of stuff we want to do with estimates. Rarely is a parameter estimate itself answer to a scientific question. There are a bunch of things

question. There are a bunch of things that we need to do to post-process model output. Um, and I want to teach you

output. Um, and I want to teach you those things as well. These include

things called marginal effects, which are predictions. Causal effects are

are predictions. Causal effects are things called marginal effects. Don't

worry about why it's called marginal yet. It's not an insult, right? Means an

yet. It's not an insult, right? Means an

average in statistics. Marginal means

average in statistics. In normal

English, it it's an insult, right? But

in in statistics, it just means average.

Uh, and in statistics, we like averages.

It's usually what we like. So, it's a compliment. uh marginal effects are so

compliment. uh marginal effects are so if you want to estimate a causal effect that's an example of a marginal effect.

Um I want to show you specifically and in different examples how to do that.

There's this thing called postratification which is incredibly useful for those of us that work with population level data is not experiments but population level data. Often we want to post stratify the estimates. Um and

then there are these things called sensitivity analyses to understand uh how the model functions and how uh give us more information about how reliable the results are beyond just the

parameter estimates. Um and then there's

parameter estimates. Um and then there's way more we could add to this as well.

This is what I call the real dirty basian workflow. Um when you do your

basian workflow. Um when you do your actual projects and and this is just the truth. um you have a bunch of different

truth. um you have a bunch of different uh models, scientific models, statistical models, data sets, estimates. Uh people come to me all the

estimates. Uh people come to me all the time with they're like halfway through their thesis and then they decide to consult the statistitian and they come to my office and they have like 30

different models and they want help. Uh

and fine, I'm there to help you, but this is this is the real deal. But we

work up to this and understanding how to deal with this complexity as we go. Uh

but in my experience, all science on the cutting edge of any field has this kind of complexity where you're working with multiple scientific models and multiple questions and a bunch of uncertainty about how to answer it statistically and

different data sets and the data sets have different flaws and you're trying to triangulate evidence. So this is the real goal. Yeah. But it's going to take

real goal. Yeah. But it's going to take us uh 20 weeks to get here. I'm going to do it in 10 because I'm doing two sections. But that's the idea is to work

sections. But that's the idea is to work up to this realism, right? to cope with these issues and all the other things that come from it. Okay, but forget that.

Um, let's come back to the core workflow. And what I hope to do today in

workflow. And what I hope to do today in this first lecture is to give you a cartoon version of this core basian workflow in a simple sampling exercise

so that you can get become acquainted with the different components of it. And

then in subsequent weeks, we're going to take real data sets and work through the same core workflow every time. uh so

that you understand the procedures. Uh

so the the core basian workflow always begins on the far left here with some what I call generative model. This is a a scientific model that can produce

data. That's what generative means.

data. That's what generative means.

Yeah. Uh it's it's a scientific theory that's been specified at a sufficient level of detail that you could make predictions with it. That makes sense.

And you'll get examples. So we'll we'll go from there. Um and then an est demand which is just a fancy Latin word for what you want.

Yeah. And uh the question the scientific questions you have. Um and estimates can be descriptive. It can be a question

be descriptive. It can be a question about just the frequency of something in a population, right? What what are the frequency of psychologists in this room?

I would say half.

Yeah, you don't have to raise your hands, but I would guess half. That's my

prior. Um, you can also have estims as we often do in the sciences which are causal estimates, things we want to know about what causes results. So, I'm going

to move through this this slide from left to right. Uh, but before I keep moving, I want to say a little bit about that about what causal inference means because in statistics, it means a lot

less than it does in science in general.

And I want to be clear about that distinction before we before we move on.

um causal inference in statistics and machine learning uh is narrower uh than the general uses of the term which is fine because all all plain English words

have technical meanings which are much narrower than we use them for in general the point is just not to be confused causal inference when we do data analysis scientific data analysis can mean multiple things uh but none of them

mean mechanism that's what I want to emphasize to you now so for the biologists here there must the other biologists you think about like what a causal model is you think like the Kreb cycle or some

something like that that's really a triumph of of science we don't mean that now that's a scientific model but causal inference is narrower questions about something like the crep cycle so it's

not about mechanisms and you can do causal inference very effective causal inference without knowing mechanisms at some resolution so I want to I want to be clear about that and give you some examples so causal inference typically

means one of three things In in scientific data analysis, it means some form of prediction, a specific kind of prediction that is the consequence of an

intervention in a system. This is the interventionist definition of causal inference. And I think very often this

inference. And I think very often this is what people implicitly mean when they talk about um they measured the effect of something is that they could tell you if you intervened in a particular way in

a system what would change. Does that

make sense? Yeah. There's some nodding heads. I appreciate that always. I like

heads. I appreciate that always. I like

the feedback. I'm kind of lost to feel alone up here. Um uh the second there'll be lots of examples of that interventionist view in in the course.

Uh the second is what I call the imputation view of causal inference and this is the idea that we would be able to guess what would have happened if

things had been different. Now this is a very subjunctive sort of statement. I'll

have an example in a in a couple of slides. And then third is the

slides. And then third is the explanatory version of causal inference where we could say not what would have happened if things had been different or

what will happen if we intervene but why something happened.

Yeah, this is another use of it. Uh let

me give you some some toy examples of each. I think this conceptual grounding

each. I think this conceptual grounding is useful before we move forward. So on

the prediction view of causal inference um I use this simple example of uh trees and uh when the trees blow in the wind

now you all know um oop sorry that advanced when the trees come back.

Okay when the trees blow in the wind you know the wind is making the trees leaves move and not the reverse. Right? But

imagine you were foolish and you didn't know that. Um

know that. Um uh you could test it by doing interventions in the system. You could

climb up the tree and shake the branches and uh will this go backwards now? Yes,

thank you. Um shake the branches and see how much wind is produced, right? A

little bit but not very much. And the

consequence you could in contrast you could blow wind on the tree and watch the leaves move. So that's an experiment you do by intervening in the system to figure out causes. You don't really

understand mechanisms there, right? So

what are the mechanisms by which wind makes leaves move? That's fluid

dynamics. None of us understand that, right? Maybe maybe there's a physicist

right? Maybe maybe there's a physicist here who does. Apologies if so. Navia

Stokes equations or something are needed to do this. And that's complicated. But

you can make predictions then about what would be the consequence of climbing up the tree and shaking the leaves if you understand cause. That's causal

understand cause. That's causal inference from the interventionist perspective. And that's the basics of

perspective. And that's the basics of the basis of randomized experiments, right? Why do experiments tell us

right? Why do experiments tell us causes? Because we've intervened in

causes? Because we've intervened in known ways in the system. At least we hope. Yeah. Okay. Um let me get back to

hope. Yeah. Okay. Um let me get back to this this next slide. Uh causal

imputation is um uh distinct from that in the sense that if we know a cause that means we

could reconstruct counterfactual outcomes things that that didn't happen um but could have happened if we

understand the system. Uh so what if some other nation had gone to the moon first before the United States? Yeah, if

you understand the causes of geopolitics and the space race and stuff, then you could you could imputee what would have happened instead, right? Uh that's

that's the imputation effect. This is

this is very closely related formally to the interventionist view because in the interventionist view you're also imputing statistically what would have happened if any individual unit had

gotten the other treatment.

Right? So that's the interventionist view. It's just a question of whether

view. It's just a question of whether you're projecting forward or projecting back. Yeah. But they're very

back. Yeah. But they're very mathematically they're very very similar uh uh operations. They're different from causal explanation as in causible explanation we're not focused on the

outcome. We're holding the outcome fixed

outcome. We're holding the outcome fixed and we're asking why it happened. It's

still causal inference because you need some sort of counterfactual model uh in there. Um so for example, why did the

there. Um so for example, why did the last glacial maximum stop when it did?

Right? the last ice age. There's still

parts of Europe you can go way north and you can find glaciers, right?

And those glaciers used to be here, right? This is a glacier carved

right? This is a glacier carved landscape. So why did the ice retreat?

landscape. So why did the ice retreat?

And there are big scientific questions about that. Explanation is another thing

about that. Explanation is another thing we like to do with causal models as well. So the tools I want to teach you

well. So the tools I want to teach you let you construct all those different things, but they're different estimates.

there are different kinds of things you want to say about the system and you need to process the statistical output in different ways to achieve them.

Okay, let me keep moving here. So

um I'm going to teach you ways to take a particular generative model and some some specific question about it some est demand and use that to develop a statistical model that will an address

that particular goal. It's not a statistical model for all purposes. It's

a statistical model for the purpose you've stated with the generative model you've stated. Yeah. Which may sound

you've stated. Yeah. Which may sound narrow, but what I want to convince you of is that's all we've got.

There is no general model free inference. Yeah. That's that's just

inference. Yeah. That's that's just unfortunately how it is. And more than that, the sorts of questions that people do in the literature are in this framework already. It's just not

framework already. It's just not transparently done. So we want to give

transparently done. So we want to give some logical transparent grounding uh in a sense to the the framework almost all scientists are already using uh the point of the statistical model in this

workflow is to extract information from the data to answer a particular question simple no there's a lot of computation

involved here but uh uh you were all born at a wonderful time can I digress for a moment and just try to convince you of that you are lucky to be alive today I I mean, sure, the oceans are

going to boil off eventually, but long after you're dead, no, you were born at a wonderful time in the sciences when computational power is incredible.

Right? So, uh, when my PhD supervisor did his dissertation, they worked on punch cards with computers. So, some of you have heard these tales. Let me

quickly bore you with what it would have been like. So, the idea is on a piece of

been like. So, the idea is on a piece of paper, he would write a computer program as an algorithm. This would then be keyed into some kind of terminal that would print literal cards that had holes

in them that were the computer program.

These cards would be delivered to a basement on the campus where some very bored graduate student would receive them and put them in a cube. These cards

would be fed into a mainframe computer which would do the tasks in order and print out the output. You would return sometimes days later uh to receive your

output and discover there was an error.

then the cycle would begin again. Yes.

And that's what that's what PhD computational research was like in the 70s. Um I I did my dissertation in the

70s. Um I I did my dissertation in the 90s. It was a lot better then. Uh the

90s. It was a lot better then. Uh the

internet was young and it seemed like a good idea, right? But still to do the sort of

right? But still to do the sort of techniques that we're going to use routinely in this course, things like um Marco Chain Monte Carlo, we had to write all of the algorithms ourselves. There

were no generalpurpose math libraries like there are now that just blackbox that away from you so you don't have to worry about it. So most of the computation as much as possible I'm

going to push into the background so that you can focus on the scientific structure of this workflow and piecing it together. Lots of people have spent

it together. Lots of people have spent count hundreds and thousands of person hours writing fantastic math libraries so that you no longer have to deal with those details. But I want you to

those details. But I want you to understand in some sense, some conceptual sense, what's going on in that math library and that will teach you that as well. But you don't have to write the algorithms luckily anymore.

That's what I mean when I say you are lucky to be in science now that the computation is there and we can do things that even a generation ago in science were was would have been thought

to be foolhardy like fit models with thousands of parameters. That's an easy thing to do now. We do it all the time and there are really good reasons to do it. Um anyway, that's my sermon about

it. Um anyway, that's my sermon about why you're lucky to be alive now. Thank

you. Okay. Um we're going to be basian in this course. And the reason for being basian is it's practical. Uh basian

inference is a general purpose way to extract information from data. It's

closely tied to generative models. Um in

simple examples, it feels like overkill, like carving a birthday cake with a chainsaw. That's the purpose of this

chainsaw. That's the purpose of this image, right? And that that can be true.

image, right? And that that can be true.

Uh I mean now everything's blackboxed away in some math library. So it doesn't seem like it's it's uh overkill, but there's a lot more computation that goes

on in fitting a simple basian model than there is in say doing an ordinary le squares approximation.

Um and they give you very very sim similar results for simple uh models.

And that's fine. But with realistic analyses with big data sets and measurement problems and missing data and the need to make um reliable

predictions then the basian framework makes essential things easier than non-basian frameworks. It's very easy to

non-basian frameworks. It's very easy to add these complexities like missing data and measurement error and latent variables and this magical thing called regularization which you need in your life. You really do. I will convince you

life. You really do. I will convince you of that. Um, all of these things are

of that. Um, all of these things are easy in bay and they're hard in other frameworks. They can be done in other

frameworks. They can be done in other frameworks. So, I'm not trying to say

frameworks. So, I'm not trying to say the other frameworks can't do this and the other frameworks do do this. It's

just once you've learned the basian machinery, it seems hard at first. You

pay upfront and then you reap the dividends later on. Whereas in the other frameworks, they're really simple in the beginning and then you encounter realistic data sets and solving those problems is much harder. So, this is

like parenting in this class, right?

where I make you pay the costs up front.

You go to school and then you know you you earn a higher wage or something like that. Um uh the last line on this slide

that. Um uh the last line on this slide it's important to emphasize basian models are naturally generative. That is

a basian statistical model is itself a simulation of a data set. You can run it forward or back. You can you can produce synthetic data with it or you can feed data into it and get parameter estimates out. It works in both directions. And

out. It works in both directions. And

that's not true of other kinds of statistical frameworks. Not always.

statistical frameworks. Not always.

Yeah. So that that's not an essential feature of a statistical framework, but it's a really nice one for connecting to this workflow where we want to start with generative models because often the basian cistical model and the generative

model look very very similar. They have

the same expressions and the same structure, but they don't have to. Okay,

last thing I want to say about that. I'm

not going to spend any more time in this course comparing bays to other frameworks really unless you have questions for me in which case I'm happy to give my opinions. I just want to say that um the other nice thing about being

a scientist now is the stats wars are really over. Uh uh the boomers uh people

really over. Uh uh the boomers uh people talk about the boomers all the time. Uh

the boomers fought about bays and frequentism in the 20th century starting in the early 20th century uh into the 80s very strongly and um those wars are

over. Uh most research stats departments

over. Uh most research stats departments are heavily basian now. uh the sciences use bays in routine ways. There are many kinds of models which are only fit using basian methods now especially in

biology. It's just a completely

biology. It's just a completely uncontroversial framework now. All those

boomer wars about the epistemological dangers of being basian. It's just it all turned out not to be true. Right?

The world has not ended and we make scientific progress. Um at the same time

scientific progress. Um at the same time uh since basian the basian community isn't as embattled as it used to be in the boomer era last century they've

relaxed and chilled out a bit right and and the philosophical overreach of certain schools of basianism have ended as well. Uh this is just a general

as well. Uh this is just a general method that forms the core of what we do with probability theory in the sciences.

Essentially all of the recent gains in artificial intelligence are have been done within the basian framework. It's a

completely uncontroversial way to do work. So that's all I want to say about

work. So that's all I want to say about it. So let them fight. Yeah. Um okay,

it. So let them fight. Yeah. Um okay,

back to our workflow. Uh building

statistical models um especially even moderately complicated ones needs its own workflow. And this I'm just going to

own workflow. And this I'm just going to assert that now. But when we do the examples in the course, I will build models up incrementally and that's its own subworkflow. So even once you've

own subworkflow. So even once you've defined the generative model and the est demand, you feel very satisfied and you know the final model you want to fit uh by logically combining those two things.

Uh you still want to incrementally build up to it just so you can reliably engineer it and reduce the cognitive strain. And so there are going to be

strain. And so there are going to be extra steps in many of the examples I give which are just about reliably developing that model so that you know that it's working. And often that means

starting with a model that's simpler than the one you need. Yeah. And two

good things come of that. Uh the first is that you get more reliable engineering of the final model. Right.

If you if you make a model that has five, six, seven variables in it and some set of interactions and then you run it and something doesn't work, you did a bunch of things at once. So you

don't know what doesn't work. So if we but if you add one thing at a time, then you know it works. The other thing you get from that is comparing the simpler models to the final model teaches you

things about how the final model works.

Things that you can't get without that comparison. And that's not about

comparison. And that's not about deciding which model to work. It's about

understanding how the model is functioning. And so I'll give you

functioning. And so I'll give you examples of that as we go too. Okay.

All right. Um so the the more uh entertaining way hopefully to to talk about these subworkflows is I feel like it's very often true in stats

courses that uh the way modeling is taught is it's like this this old meme about how to draw an owl. And this is an old meme and I know memes age very rapidly on the internet. So, but maybe

some of you know this. Um, here's how it goes. No one knows the origin of this,

goes. No one knows the origin of this, by the way. It just appeared on Reddit at some point, as things do, and and now we have it. Um, so step one of drawing an owl is to draw some circles.

Okay. Um, and then step two is you draw the rest of the owl. Aha. Okay.

Statistical modeling is often like that, right? What we teach you in in the

right? What we teach you in in the introductory stats course is the basics.

Uh, like, well, this is what you do. you

put these things on the right hand side and this thing on the left. Uh then have fun. Uh but there's way way more in

fun. Uh but there's way way more in between actually in getting the thing to work and what if something goes wrong and then how do I interpret the output?

Um getting to draw the owl is complicated and so there's a bunch of intermediate steps uh that I want to put into this. Right? So my mom went to art

into this. Right? So my mom went to art school and so I have some family history in knowing about how many intervening steps there are in drawing an owl or a horse. God horses are the worst. just a

horse. God horses are the worst. just a

messed up animal. What is up with its legs? If you've ever tried to draw a

legs? If you've ever tried to draw a horse, right? It's But artists can do

horse, right? It's But artists can do these things because they do scaffolding. They draw lines that they

scaffolding. They draw lines that they later erase or mark over. Uh those

things aren't visible at the end, but they're essential to the reliable production of the product they want. And

that's the way I I view scientific modeling as well. is there are things I want to teach you which are scaffolds, ladders that you climb, but you discard them at the end, but they're still really important to the process and

that's what I want to teach. Does that

make sense? Um, so that'll add some fluster to examples, but I think it's really essential to getting things to work. Um, so there's this metaphor I I

work. Um, so there's this metaphor I I often use that scientific data analysis is is a lot like amateur software engineering, right? It's like software

engineering, right? It's like software engineering. It is software engineering

engineering. It is software engineering because you're all scripting now. I

think this is nice is that we've moved beyond point-and-click software, which is good because it documents what you've done and it gives you more freedom. Um,

so you're really doing software engineering, but you have no training in it. That's a bad situation, right, to be

it. That's a bad situation, right, to be in. So, I'm going to give you a little

in. So, I'm going to give you a little bit of software engineering training.

Not a lot, but a little bit, just the minimal amount. And uh I I really

minimal amount. And uh I I really believe in general in this occupation and especially in computational tasks. I

believe in this thing called the 8020 rule which is you get 80% of the benefit from the first 20% of the education.

Right? And that last 20% of the benefit is going to take you 80% of the of the learning time. Uh so you can postpone

learning time. Uh so you can postpone that. But I'm going to give you that

that. But I'm going to give you that that that 80% is the idea. So you won't know everything about software engineering and that's not the goal. uh

but maybe you never need it but you need something. Um and uh there are three

something. Um and uh there are three reasons. The first is you really want to

reasons. The first is you really want to understand what you're doing right not just play in the script and get some results. Uh I think another nice thing

results. Uh I think another nice thing about being scientists now is that we've all heard of the replication crisis and we know that just playing around with our data until we get some p values less

than 5% is very dangerous for cumulative knowledge. Right? So we don't want to do

knowledge. Right? So we don't want to do that. We want to understand what we're

that. We want to understand what we're doing and be able to justify the steps.

Um, documenting your work reduces error.

Even if you have a very logical and precise way to do things and it's the right way to do it, you everybody makes mistakes and so documentation helps you reduce those mistakes. You want a way to

work that um really not only documents error but reduces its likelihood. And

third is a social thing. You want to be able to to tell your colleagues that you have worked in a way which has quality assurance, right? Uh you've all seen

assurance, right? Uh you've all seen this meme where somebody walks into somebody's house and there's trash everywhere and they go, "You live like this." Right? So this is what people's

this." Right? So this is what people's project folders are often like, right?

You don't want to show people your project folder because they'll be like, "You live like this?" Right? A bunch of loose spreadsheets and different copies of word documents and let's not live

like that anymore. Okay? um work in a way that you won't be ashamed of because it is a better way to work that gives your colleagues greater trust in the

results of your research.

Okay, that's the second sermon of the lecture and hope maybe the last. All

right, finally uh the last thing we want to get out of this, we've got our statistical model. We're going to

statistical model. We're going to combine it with some data to produce an estimate and there's a bunch of algorithms involved in doing this. Um,

and those will mainly be uh blackboxed away into math libraries for you. But I

I want to tell you what they are because there are trade-offs involved in them.

But we can we can postpone some of that as we go. And then there's going to be more of this in almost every example where we process the estimates to produce some particular comparison or

prediction or that really satisfies the scientific question we started with.

Okay. So, I've got 30 more minutes here.

And so, for the next half hour, I want to give you a basic cartoon example.

This is an artificial example, but it has all of the core structure of a real scientific example. And that is to ask

scientific example. And that is to ask if we wanted to know the proportion of this inflatable globe. I I people listening at home don't know that I'm

holding this globe in my hand here. So,

there's a real inflatable globe. So,

let's This is a question about the real planet Earth, but we don't have the planet Earth to sample from. So, uh,

we're going to use the inflatable globe.

If I wanted to know what proportion of this globe is covered in water, metaphorical water, right? Um, how would we do it? How could you figure that out?

There are different ways to do it. And I

think the one of the wonders of statistical science is we can do it in the dumbest way possible and it actually works really efficiently. And that is just by sampling. Just by sampling

random points on the surface. Yeah. And

so that's what we're going to do. And

I'm going to use that to build up a basic basian statistical model and teach you what basian inference is and how it works mechanically. We won't finish this

works mechanically. We won't finish this in this lecture. We'll finish it next week. Uh but it's going to move slow,

week. Uh but it's going to move slow, but I really want you to walk away from this with a core understanding of exactly what basian inference is and actually how simple it is. It's an

incredibly simple procedure. Um

so, uh the idea is if I take this and and I were to throw it into the crowd, I won't do that. Don't worry. If I was to throw it into the crowd at you, um, and you catch it, you could look where your

right index finger is, whether it's on water or land, and then just shout out water, and then you throw it at someone else, and then they catch it, and so on.

Don't worry, I'm not going to do it.

There's no no threat of a globe coming at you. I promise. But, uh, and we do

at you. I promise. But, uh, and we do that a bunch of times, and we will get samples. So, let's go to the next slide.

samples. So, let's go to the next slide.

Here I have animated this. Here's my

virtual globe toss. I spent an entire afternoon making this animation. I'm

very pleased with it. Thank you.

And uh there's a bunch of technical problems it turns out to solve and getting the path to work right on on a sphere. So we're we're getting samples

sphere. So we're we're getting samples here as it spins around. That's almost

land. It's Florida. It's kind of land not you know at least half the year. Uh

finally water. So we got a land, a water, two lands, three waters, a land and two waters. This is a sample of nine. Is that nine points? And so if I

nine. Is that nine points? And so if I gave you this sample just as an example and I ask you um how should we use this sample to produce an estimate? Now

remember our our question is what proportion of the globe is covered in water?

Um uh so how should we use this sample?

And there are lots of ways to use a sample to produce an estimate. Probably

you can just take a guess right now how you would use it. What's if you're going to make just a single number to summarize what this sample says about portion on the globe, you might just say

what proportion of the sample is water, right? You might think that just

right? You might think that just intuitively and many many people have.

The question is how do we justify that?

Right? Are there alternatives? And then

how do we add uncertainty onto that estimate? Because a smaller sample and a

estimate? Because a smaller sample and a larger sample should have different uncertainties. So how do we get that?

uncertainties. So how do we get that?

and we need a reliable procedure for doing that. Um, so that's what we're

doing that. Um, so that's what we're going to build up to. Uh, what we're going to eventually get to, there's going to be a bunch of this will be this will be next lecture I think by the time

we get to this is a way for any possible proportion of water. Uh, that's the horizontal axis on this slide. Those are

all the possible answers to the question, what proportion of the globe is covered in water? Every real number between zero and one is a possible answer. And for every one of those real

answer. And for every one of those real numbers, we want some statistical answer of how plausible that is. And that's

what we're we're working towards. And

what I'm going to teach you is a way to do this for each sample one at a time.

The unique information that each additional data set adds to change this curve. And this is a procedure called

curve. And this is a procedure called basian updating. It's it's kind of

basian updating. It's it's kind of famous. Yeah. And it's a really simple

famous. Yeah. And it's a really simple um uh completely derived from the axums of probability way to answer the question what information is in this

data set to answer this question. So

every little updating step there we're going to come back to this animation next lecture because there's a bunch of intervening logic that gets us here.

We're going to start with something much sillier. Okay, but eventually we're

sillier. Okay, but eventually we're going to get to this and this is a general algorithm for processing all kinds of statistical models and producing uncertainty. Um, okay, back to

producing uncertainty. Um, okay, back to the core basian workflow. I'm going to work through this with the globe in mind and show you uh in a not too detailed

way, but enough detail to give you the the cartoon version of this how this works in this example. So, the first thing we have to do is nominate some

generative model and an estimated in water. What's the globe tossing

in water. What's the globe tossing sampling model? Right? What's the

sampling model? Right? What's the

generative model of the globe? And the

way to think about this is um where does the data come from? What are the causal processes that have are producing the data set? And I've already explained

data set? And I've already explained that to you, but just let me remind you.

Um I think this animates again. Yeah.

What are we assuming when I've defined this task? I've assumed that each toss

this task? I've assumed that each toss of the globe produces an independent sample. Right? Now, that may not be true

sample. Right? Now, that may not be true because I'm just I'm not actually throwing it at you. But if I were to throw it at you, there might be correlation, right? It's hard to say,

correlation, right? It's hard to say, but this is the assumption we're making uh in thinking about the sample. Each

toss is independent. So each subsequent land and water are not correlated in any particular way. And on any particular

particular way. And on any particular toss of the globe, the chance that your finger lands on water is just the proportion of water.

Does that make sense? That's the

generative model.

I don't >> What do you mean a chance?

>> So, if I were I'm not going to throw this at you. Don't I keep saying that and I just want to make clear. Do not

feel threatened. I will never throw anything at you. Right? So, not even an inflatable globe. If I were to throw

inflatable globe. If I were to throw this and then you catch it, uh the chance that your right index finger is on water is the proportion of the globe

covered in water.

Right? That's the assumption that that goes with the generative model. That

wouldn't be true. It'd be easier to understand if I give you a counterfactual model. Suppose that's not

counterfactual model. Suppose that's not true. Suppose that your finger is drawn

true. Suppose that your finger is drawn inexorably towards water when you catch it. Then the chance would be greater.

it. Then the chance would be greater.

There would be a bias in the sampling and we're assuming that's not true.

That's the model.

>> Does that make sense?

>> Okay. Um there are definitely biased sampling uh procedures in the sciences.

So we need to think about that later in the course but right now we won't. Okay.

Um this can be written as a program and this might make you don't always have to do this. Uh but I think in this case

do this. Uh but I think in this case maybe it helps dispel a bit that there is a model here that's generative. You

can produce simulated globe tosses. This

is not a statistical model because it won't produce any inferences. It's just

a forward simulation. And I've uh used the simplest kind of R code possible.

There's no distributional assumptions.

We're just using the sample function inside inside this function wrapper so that we can run it a bunch of times. So

the way this works is we define the outcome space the possible observations and those are capital W and capital L for water and land.

Um we define how many times we're going to toss the globe and that's capital N.

And then we define the probability of each of the possible outcomes and I've I've defined those here as P and one minus P where P is the proportion of the globe covered in water. And that's

that's our est demand. We want to learn P.

Does this make sense? Um so you can run this in your in your free time. Uh and u you will get a sample a new sample every

time you run it, right? Of of the globe.

You can run it a bunch of times in fact.

So if we replicate it a bunch of times, each column here is a separate set of tosses. Um

tosses. Um uh is that right? Yes. And so so the first column is the first set of tosses and then the next and the next and the next. So there's variation. You don't

next. So there's variation. You don't

always get the same result. We want a statistical procedure which has properties which work well for samples of this type. And the question is how to

do that. And the the lucky answer is we

do that. And the the lucky answer is we just rely on probability theory and use bays. uh but before we get to that it's

bays. uh but before we get to that it's just nice to know what the generative model means. It means the view that we

model means. It means the view that we have a model of how the data are produced in this case through an experiment.

Yeah. But uh if you're sampling from nature uh without interventions then you also have a generative model. You have a generative model of how nature has produced that sample combined with how

you collected the information which may have produced biases.

um various points in the course, we're going to test things uh to be sure that they work. And I don't want to emphasize

they work. And I don't want to emphasize this too much today because we'll have lots of time in the future to talk about it. Um but one of the ways to test your

it. Um but one of the ways to test your code is to try extreme settings that you know the answer to, right? And so if the code doesn't produce the right answer on

at extreme areas, it's not producing the right answer at less extreme areas either. So you can do these informal

either. So you can do these informal tests. Um it'd be nice to have some kind

tests. Um it'd be nice to have some kind of optimal test, but there is no such thing in general. Uh so you try different ways to break uh your software and see if it's broken. So the first

thing I've done here, for example, is I set the coverage of the globe to one. I

set P equal to one, which means the globe is entirely covered in water. Um

and then I get a sample and yes, it's all water. If that weren't true,

all water. If that weren't true, then obviously my code is bad. Yeah. So

that's what I mean by trying an extreme setting. Uh this is the quick way and

setting. Uh this is the quick way and often this uncovers bugs very rapidly in your code. But some kind of testing is a

your code. But some kind of testing is a good idea. If you don't test it all, you

good idea. If you don't test it all, you miss everything, right? And then and then yes uh maybe someone else will find your error and you'll be embarrassed, but probably no one will find your error and then you will just pollute the

scientific literature and go to science hell. Yeah, that's a joke. I don't

hell. Yeah, that's a joke. I don't

believe in science hell. Only in science heaven. Okay. Um uh and then uh the

heaven. Okay. Um uh and then uh the other test I do here is so-called asintoic test. If I took a really really

asintoic test. If I took a really really big sample, we're going to throw the globe one E4 times, right? 10^ the 4th times. Um and then we're just going to

times. Um and then we're just going to count up the proportion of the sample that is water. That should converge to the proportion of the globe that's covered in water. And it does. Uh so it

seems like the simulation is working.

Our sample is going to be much smaller than one than 10^ the 4th. It's going to be nine.

Yeah, we want a statistical procedure that works the same way for large samples and small samples and gives the right answer in both cases. The right

answer not being the truth but the right answer is what information is in the sample. That's what we mean. Okay.

sample. That's what we mean. Okay.

Um now we need to build a statistical model and our statistical models in this course will always be basian estimators basian models for extracting information from the data. So I want to give you a

sense about what that means in this case and this is going to take some some time to build up and I'm going to do it cartoonishly which I hope delivers better understanding. Uh so first the

better understanding. Uh so first the logic of basian inference is incredibly simple. Often the calculations are

simple. Often the calculations are difficult and that's historically been an issue that is now now largely addressed. But the the idea is actually

addressed. But the the idea is actually hundreds of years old. Basian inference

is older than frequentist inference.

It's the old continental way of doing probability theory. And then those the

probability theory. And then those the English decided they needed a new way to do it. Right? They used they used to

do it. Right? They used they used to call basian inference the French calculus. Right? And uh um Gaus uh all

calculus. Right? And uh um Gaus uh all Germans here know who Gaus is call Gaus he did he invented linear regression as a basian procedure because there were no

frequency statistics in Gaus's lifetime.

Right? So this is an old thing. The

calculations are hard though. But the

idea is incredibly simple and old. It is

just that for each possible explanation of the sample we're going to count all the ways the sample could happen. We'll

be I I'll work this through using the globe in a moment. Um and then those those explanations uh with more ways to produce the sample are more plausible and then we we

compare them based upon those plausibilities. Um and it turns out that

plausibilities. Um and it turns out that this is uh the optimal way to process to extract information from the data. Um

it's just sometimes quite hard uh in the context of the globe tossing example for each possible proportion of water on the globe. Those are the explanations, right? The this is also

explanations, right? The this is also our estimate in this case for each possible proportion of water on the globe. Count all the ways the sample of

globe. Count all the ways the sample of tosses could happen and then those proportions with more ways to produce the sample are more plausible.

Uh so the way I want to present this to you so you can understand what counting means is to think about this metaphor with a a famous book I hope you've all

read. Uh but instead of of um forking

read. Uh but instead of of um forking past into the future of your life, we're going to have a garden of forking data.

Yeah. So the the sample you've gotten could have turned out differently.

Of all of the samples you you you could have gotten, how often is the one that you got going to happen? Sorry, English

has doesn't have enough grammar to express counterfactuals like this. So

there there will be cartoons as we go.

need a language with some grammar here.

But uh here's the idea. I want you to think about a four-sided globe. This is

what it would look like. All right. So,

this this globes are inconvenient because they have infinite sides. Yeah.

This doesn't quite because it's inflatable and you can probably see it has like eight or something actual sewn together, but a real globe is is smooth.

Um and so counting starts to sound farcical. So, I want us to forget about

farcical. So, I want us to forget about the actual globe for a moment. Just

think about a four-sided globe. Um,

those of you who play Dungeons and Dragons know what a four-sided die looks like, right? It looks like this. People,

like, right? It looks like this. People,

anybody know what these look like? Yes.

Um, so, uh, this is a four-sided globe.

All right. So, now we're going to imagine the the four-sided globe is covered 25% by water, which means exactly one of its sides is covered in water.

Good. This is the peak of my drawing skills, right? Okay.

skills, right? Okay.

Um, now what we could imagine is our uh what I'm going to build up for you is what I call the garden of forking data.

All the possible samples um that we can get for different uh different globes with different covers coverages of water. And so what are the different D4 globes that we have

here? Uh the first option would be no

here? Uh the first option would be no water at all. That is the proportion of water is zero. All four sides are land.

White means land here and blue means water. Um the second possibility is that

water. Um the second possibility is that one of the four sides is water. So

that's 25% of the globe is covered in water or half or three4s or all of it.

These are all the possible globes. So

these are all the explanations we need to work through. And so for each of these, what we're going to do is count all the ways the sample could happen.

Which sample? I'm going to start with just three tosses. Waterland water, but we'll expand that to the full sample that I showed you before. Okay, that's

our project and we've got 10 minutes, so I think I can I can get a little way into it. Um, let's start with uh just

into it. Um, let's start with uh just the one out of four, the 25% coverage example. We're imagining that were the

example. We're imagining that were the truth. We don't know that's the truth.

truth. We don't know that's the truth.

We're trying to figure out what the truth is. I know this seems backwards,

truth is. I know this seems backwards, but this is just how it works. Imagine

that that were the truth. condition on

that being the truth and then ask um how many ways are there to observe the sample water land water if the truth were that one out of four sides is

covered in water. So this is where the garden of forking data uh comes in. Um

first there's the first possibility that is the first toss of the globe. Uh one

out of four times if it's a fair die we're assuming it's a fair die. One out

of four times you will see water. Um but

other things could happen as well. You

could have seen land. Uh on the second possibility the each of those path branches again. This is the garden of

branches again. This is the garden of forking data. So once you've tossed the

forking data. So once you've tossed the globe again whatever happened the first time anything could happen the second time as well. So now there are a bunch

of possible uh different samples that arise. Uh, you're probably thinking, um,

arise. Uh, you're probably thinking, um, but all of those white things are the same. No, they're not. They're different

same. No, they're not. They're different

sides, right? We're lumping them together as

right? We're lumping them together as land because we're not paying attention to the coordinate, but it's a different sample because a different sequence of events has happened, right? So, it's

easier to think about if you think about the globe. There are a bunch of points

the globe. There are a bunch of points that you're just coding as water, but they're different places, right? You

could take that same sample if you recorded the latitude and longitude and do and ask different questions with it.

Yeah, does that make sense? They're

different samples. So, the different white dots are actually different data.

It's just that we're not going to care about those differences. We're going to categorize them the same. Uh, and then on a third toss, everything branches again. And now we have a big wonderful

again. And now we have a big wonderful garden, right, of possible of possible.

So, with three tosses of the globe, you've got um uh four uh uh four time

four times four possible data sets.

Yeah. So lots of possible samples that can arise. Yeah. But we've seen one and

can arise. Yeah. But we've seen one and the question is how many ways are there to see that data set that is how many waterland waters are there in this

garden? So let's take the first

garden? So let's take the first observation um and just trace it out, right? See if

my my animation works. Yes, this also took me an afternoon. So thank you. Um

so we've get we've gone down that path in our garden. Now we could have gone four different ways but we've just gone this way right this is the the garden of forking data. Um on the second

forking data. Um on the second observation we observe uh a land and there are three paths that would have given us that. So now there

are three ways to get the first two samples we've observed.

You feeling the garden now?

Um and then finally on our third data point we get a water again and there are three ways to do that as well because

for each of the other uh three paths there's one path we could go down which would give us that. So there are three total ways to obsing

our four-sided globe is 25% water. There

are three ways to observe the sample.

And now you're thinking, okay, but so what? We're going to compare this to the

what? We're going to compare this to the counts we get from the other possible globes. And then we're going to compare

globes. And then we're going to compare them. And this is basian updating. It

them. And this is basian updating. It

really is this dumb. Yeah. But it's also genius. Now I want to point out that

genius. Now I want to point out that there's nothing about this procedure, this logic which is chosen. Uh what you choose are the axioms of probability.

Well, you didn't choose them. Um some

mad Russian did, right? But um Kabulov but uh but those axioms nobody debates them. They're just true things about the

them. They're just true things about the definition of probability spaces. Once

you agree to those this logic is necessary. It's the only procedure

necessary. It's the only procedure that's consistent with them. Yeah. Um

there are some hardcore frequentists which disagree with what I said, but they're wrong. No. Okay. We won't have

they're wrong. No. Okay. We won't have that debate today. Um uh okay. So three

ways to see this sample for 25% water.

So, let's make a table. Uh, go back to this table before for the one out of four four-sided globe. Three ways to produce the sample. What about the other possibilities? Two of these we get for

possibilities? Two of these we get for free. I could draw the garden, but I'm

free. I could draw the garden, but I'm not going to. The first and the last, there are zero ways. Why? Because with

the first, you'll never see water, and we've seen water. And with the last, you'll never see land. And we've seen land. So, there are zero ways to get the

land. So, there are zero ways to get the sample from those procedures. Yeah, it's

not always this easy, but sometimes it is. But you could draw the garden and

is. But you could draw the garden and count, right? The same procedure works.

count, right? The same procedure works.

Uh what about the others? Let's take the third one, which is 50% water. Um and

draw the garden for that. Uh so this is this is the same kind of garden as before, but notice now that there's 50% water, and we're going to start at the bottom and trace out those paths. We get

a water, a land, and a water. How many

ways are there to get the sample for this globe? Eight. Thank you. Audience

this globe? Eight. Thank you. Audience

participation greatly appreciated.

Someone said eight. Yes, there are eight ways to do it. Are you feeling it now?

Yeah, we've got one more possibility. U

make your bets um for uh three fours covered in water. Again,

we draw the garden, but now with um three water at each branching point all the way out and then we trace out the sample uh three ways. three ways and

nine ways to get the sample for this possibility.

And then we can fill in the table. Um

for 25% we've got three ways, for 50% eight ways, and for 75% nine ways. And

those are just counts. And what I've asserted is that the relative value of these counts is an indication of the relative plausibility of each explanation of the data. Um, and those

relative plausibilities in in the basian framework, they're not an estimate in and of itself. There's no

point estimate here. It's just this is the estimate. It's the comparison of

the estimate. It's the comparison of these counts. And it's something we call

these counts. And it's something we call a posterior distribution. We can

represent that more formally in the next lecture. uh uh what I want to emphasize

lecture. uh uh what I want to emphasize now is something different about this which is that um uh these plausibilities

um have a lot of really nice properties about them uh even though they're they're a logical consequence of accepting the actions of probability but they have lots of really nice uh uh properties which is a benign thing about

the universe usually I think the universe is hostile to human life it's a miracle we're alive at all uh so it's really nice when the mathematical framework provides ides accidental benefits. It's just sort of like, wow,

benefits. It's just sort of like, wow, isn't that great? Usually, it's bad, right? Things turn out being bad, but in

right? Things turn out being bad, but in this case, they're really nice properties of these of these counts and their relative counts. Um uh we interpret them as plausibility and their relative differences and and I'll show

you ways to work with that as we go. One

of the key benefits um is it's very easy to update them as we get new data. Uh we

can just take the previous counts. we

don't have to count everything again. We

can just multiply by the the number of ways to get the new data point that we've arrived at. And this is often called basian updating. Um uh an essential insight might be before I

explain it in in more detail is that there's nothing about those three initial samples which happen simultaneously. You could process them

simultaneously. You could process them one at a time as three separate experiments. You could do them on

experiments. You could do them on different days and analyze them with the same logic. uh just updating as you go.

same logic. uh just updating as you go.

Prior data are the prior estimate that that later data update. Uh and that's the way we do it. So uh I'll be much more specific about what priors and

prior and post inference is in later lectures, but for now I want you to see it as this is a necessary consequence of the fact that you don't have to process all the data at the same time, but it

all gives you information about the same thing you want to know. And so you can use it all in the same stat model even if it arrives at different points in time. Does that make sense? That's the

time. Does that make sense? That's the

idea. So in this case in this example um uh we have our our ways to produce uh in the second column there that we've already calculated. This is from the

already calculated. This is from the garden of working data. And then say we toss the globe one more time and we get a water. And then we could say again

a water. And then we could say again just for this new sample how many ways are there to observe it? and they are zero one two three four right?

Which are just the counts of the number of sides on the die that are watered.

Yeah. And now to get the new ways to produce the total sample, we just multiply the previous counts by the new count.

Um, you could draw the garden. It'd be

really big now. And as the sample gets bigger, the garden gets really big because of how combinotaurics works. So

you don't do that. Yeah. But the

multiplication is the logical reduction of that process of counting. And so now we get 0 316 27. The differences are getting bigger, right? And the numbers

are getting bigger as well for the whole sample. Let's take the sample that I

sample. Let's take the sample that I animated uh many many slides ago now. Um

water land water water water land water, land water. Uh and actually I don't think that's the same as the one before, but it's a sample, right? And um

and let's ask what are the number of ways each of the possible globes could produce that? Well, it's still zero for

produce that? Well, it's still zero for the first one because um uh logically the way to do this calculation is you

take the number of of um uh waters and uh you you take the number of sides that are water and you exponentiate that by the number of times

you saw water because that's you're just multiplying the number of times you saw water. Um, and then you take the number

water. Um, and then you take the number of sides that are land and you exponentiate that by the number of times you saw land, which in this case is 0 to the 6 times 4 to the 3 and there's a

zero in there. So, it's all zero, right?

Which is just some math tell you what you already knew. If there was no water and you saw water, this is eliminated.

It's logically incompatible with data.

You don't need bays for that. You just

need to pay attention. What you need bays for are the intermediate cases. Uh,

so when there's one water, now again, the calculation is going to be

um is going to be uh 1 6 * 3 which is 27 27 ways to see uh the sample.

Um there are 512 ways uh to see if if the globe is 50% water to see this sample. That's a bigger difference yet.

sample. That's a bigger difference yet.

Notice the numbers just keep getting bigger the bigger your sample gets. um

seven uh 729 ways to see the see the sample if it's threequarters water. Uh

and again zero ways if it's all water.

This is basin updating. That's it. Yeah.

All that glamorous stuff. All those

science papers that have basin in the title. This is what's going on. Now it's

title. This is what's going on. Now it's

compressed into a much more efficient way than doing the counting because these numbers get big really fast. Um

there's a formula uh for producing these things. It's just that the ways for P to

things. It's just that the ways for P to produce um the sample of of counts of water in land is uh 4 P to the W. Why

four? Because there's four sides and P is the proportion that are covered in in water. So 4 P becomes one when it's 25%

water. So 4 P becomes one when it's 25% land, right? It's just a way to make a

land, right? It's just a way to make a formula out of it. Um and 4 minus 4 P to the L. And this gives you the number of

the L. And this gives you the number of ways to observe any particular sample W and L.

These numbers get big real fast and I'm going to stop on this slide because I'm out of time. But what we're going to do when we start next time is I'm going to actually finally define probability.

Probabilities aren't giant counts.

They're numbers between zero and one.

But we get them by taking these counts and just forcing them to be between zero and one. And that's all probability is.

and one. And that's all probability is.

Yes, that's it. the basis of so much science and that's exactly what it is by construction. So for example for um 20

construction. So for example for um 20 water and 10 land there are what is that a billion? Is that a billion? Someone

a billion? Is that a billion? Someone

tell me. Yes. Uh that's a billion ways um uh to observe it. We don't want to do that counting and we don't want to write numbers like that down. We'd rather have some number between zero and one. And

that's what we're going to do. So I'm

going to stop there and I promise you when you return next week uh next Tuesday I'll pick up exactly here and we'll go from here into doing basian

updating and moving away from four sides to all the sides on the globe which is the infinity of them. Um I want to hang

on get to this slide uh just to remind you uh the GitHub link at the top please visit it and give me uh your email if you want to be on the mailing list. If

you don't want to be in the mailing list, that's cool, too. You can just hang out. Okay. Uh, thank you for your

hang out. Okay. Uh, thank you for your attention and I'll see you next week.

Oh, I'll put the homework up uh later today on on the GitHub site. Thank you.

Loading...

Loading video analysis...