Identification and estimation of causal effects using observation data Professor Donald Rubin,

By Forskningskonferanser

Summary

Topics Covered

Causation Demands Intervention
Causal Effects Are Missing Data
Randomize Again on Bad Draws
Neyman Invented Potential Outcomes
Design Observational Studies as Experiments

Full Transcript

he has authored around nearly 400 Publications uh including 11 books I just saw this on the page I think I added one book one recent book that came

out just so earlier it was 10 books but now it's 11 books and has made important contributions to statistical Theory and methodology particularly in Cal

inference but in many other areas as well especially to missing the data he has uh many so many awards and honors so I will not mention anyone to

sort of leave out some so that someone will feel feel um that I neglect it um so Rubin Professor ruin has been for many years one of the most highly

cited authors in mathematics in the world IC science watch as well as in economics uh and I I just Googled him and he has more than 200 thousand

citations um uh so so it's it's um it's a really pleasure to have Don rubman here as it as a as a speaky today so please

welcome thank you all uh also to uh Pa thanks so much for the invitation to to be here and also for the uh uh past

couple three days of his company along with Sophie I've spent a couple days with him and I've learned a lot and actually uh some of the things i' i'

I've learned are also from last night from from from the dinner and so this uh presentation uh which is called essential concepts for causal inference and randomized experiments and

observational studies I'll I'll modify it a little bit as I go through uh to uh change to to change some things to make it more appropriate for this context

um this this this talk will be slightly philosophical although not nearly as philosophical as as a previous talk it'll also have some uh history and I think that one of the things that's

that's uh important about the history that I provide is how recently randomized experiments have been used uh generally and how recently they've been

used in um medicine so that that that's the part that that actually will will be added because it's it's not quite in here now I think they were all uh quite

influenced by experiences that we had when we were kids and people and uh fields that we admired and when I was a kid and you know 12 13 14 years old I

was really very interested in in physics like like a lot of kids at at the time I was I also kind of was interested in mathematics but not really the mathematics for mathematics sake I was

interested in in the mathematics for the for the kind of problems that could solve uh and uh when when I went to college I went to college at at Princeton and at the time there was a

guy there named John Wheeler who is a relatively well-known physicist probably one of the best known American physicists he just died maybe six seven years ago at 93 or something like that

and he's best known uh for two things this is is is relevant because my attitude towards causal effects and how you learn about

them was is obviously highly influenced by my my experience when I when I was a kid and this guy was a physicist and he's best known I think for uh uh being

credited with making up the name black holes and uh he always he actually denied it he said he was giving a lecture describing this phenomena where there was a an object in space that was

so dense that light couldn't even escape and somebody in an audience bigger than this said ah a black hole and and and John tried to figure out who it was who

said that and he tried to find the guy afterwards and never found him but he kept using the name black black holes and so if you if you look at Wikipedia that's where if you look at black holes it's attributed to to

Wheeler but wheeler said no it wasn't M the other thing that he was best known for is he was the PHD adviser of probably the most famous American physicist uh Richard

feeman who was a a great character uh and and very well well known in in the you know in American physics but what

they meant by causation and uh uh causal entrance is you had to intervene you had to intervene and do something and because people started doing really

serious experiments in physics in the early uh 20th century before that they had some experiments that that were going on but there wasn't this this

clear idea of of cause and effect and uh the the experimentalist probably got closer to it than than most people people really did experiments uh when I

went to graduate school uh at at Harvard the I took a course in experimental design with the Scottish statistician uh Bill cochr uh and I took this course in

1968 and it's classical experimental design where when I meant by classic experimental design is you actually had factors and you randomly assigned units OB objects of of study to different

combinations of the factors they were called treatments and you and in order to to claim that you actually knew what would happen if you did something you had to do it you had to actually

intervene and observe the outcomes um and and that was just a tenant of of experimental design just like it had been a tenant of of of my uh physics back background where actually did you

did experiments all the time and then you observed the the consequences of them when you did these things one of the one of the things one of the uh perspectives that was extremely clear

was that there was a clear separation between the science the object of inference that was sort of the question you were you were trying to study and what you did to learn about this science

so the science exists now this is I'm not saying this is true this is just saying this an attitude so the attitudes of science exists like the stars exist or black holes exist and then if you if

you want to learn about it you have to do something to to learn about it and in our own lives we typically do more minor kind of experiments but in in big

physics you you you do things to intervene um and then you measure the science at certain points in time and you me aspects of the science at points in

time and an aspect of this approach that was consistent from the time I was a kid to through this this course with with with Cochran is use the same notation

representation of the science no matter how you try to learn about it or measure it so the the idea is that that how you measure something doesn't change what it

was in the past it it had a value in the past and then you intervene some way maybe you do a randomized experiment maybe do an observational study but the

observational study or the experiment doesn't change what the truth was in the past it may affect the the the future uh and and this and that's

another aspect is that no matter how we try to measure causal effects there's always missing data why because you cannot go

back in time and undo what you did and that's an essential feature of of phys I mean there all sorts of stories about time travel you know Back to the Future and all that but yeah that

was all stories and you know didn't really believe that Time Marches on uh and missing data always exists you you cannot go back in time as a as a tenant and there's known

as a you there's a heisen Heisenberg uncertainty principle which is quantum mechanics there's a quantum mechanics you can't measure velocity and momentum at the same point in time there's a

related idea called the U observer effect where the act of measuring something changes it so in order to measure velocity or momentum or speed or

anything like that you have to bombard bombard particles have to bombard particles with other particles and the fact that you're bombarding them changes their position changes their momentum

changes their velocity so the act of doing something changes it you can't measure two things in time you can't go back in time to measure the thing again uh so these things are always sort of

obvious to me uh as as sort of tenants of of causality so if I want to find out whether uh take taking an aspirin makes a current headache for me go away I'm

wondering about what the quality of my headache will be in two hours if I take or if I don't take the aspirin if I take the aspirin then I get to observe what

my headache was will be if I if I take the aspirin I can't go back in time and untake the aspirin and and measure my headache in two hours without the

aspirin so at the individual level you can never measure causal effects in fact it's difficult to even even estimate

them so that just a fact of life and this is so of obvious in in my education and now we'll get to to what I call as

essential ideas of of causal inference uh sir Ronald Fischer in 1925 in the first edition of statistical methods for research workers actually

proposed doing randomized experiments where you actually randomly assign units to different treatments to you actually do the randomization and therefore one of the

things that that consequence is you create balance distributions on all background variables meaning variables that you could measure on all these

units objects uh in expectation there over repeated randomizations you would see see balance some recondite advice that that Fischer had um is that if you

got a bad randomization which is always possible in fact if you measure uh bad randomizations by difference in background covariates if you have 20

background covariates and you consider a significant difference to be the 0.05 level which is often done if you have 20 covariates in expectation one of the covariates will be unbalanced between

the treatment control groups and the error of Big Data with genomic data with lots and lots of covariates it's certain you'll you'll have a bad randomization with respect to

some covariant so what should you do well in in fact Fischer's advice never written but but but spoken this is through because Cochran who's my adviser

worked with uh Fischer at rothamstead said yeah you should randomize you don't want to live with a bad randomization but all the all the theory that was developed the mathematics all

says well you just randomize once because that's how the how the mathematics works out Simply so all the things that that that you read about t

tests and P values all based on one randomization if you get a bad randomization live with it that's bad advice and the reason why that that advice was given was purely for the

mathematical Simplicity and you couldn't afford to randomize because of computation it may surprise some of you who have telephones but in 1925 they didn't even have

computers and so you didn't have the ability to randomize randomize thousands and thousands of times to try to get a good randomization I think now there's no doubt that that

that fisher would say yeah we randomize in fact you can do the mathematics but it's but the usual distributions aren't the usual ones so uh and this other another

essential idea of of fishers that was back in 1925 is if you want to assess a null hypothesis which is really there's no treatment effect at all was the only

thing Fisher really ever talked about you hypothetically randomize for assessment I'll say a little bit more about about that later it's basically it's a stochastic proof by contradiction

you know Fisher was a really good mathematician uh and so he all this stuff came was kind of obvious to him um and it's a these hypothetical

randomizations are really a special case of a basian technique that I call posterior predictive P values a lot of Fisher's ideas really very basian although he gave a different name to it

since he didn't like to call them basing he call it fiducial inference uh but but it's it's very very similar well it's it's also true that uh non was also very

basing in certain ways but that's another topic and and uh Fisher really understood this idea that causal effect is comparing something that you can see

because you did it in the past with something you can't see so the example I gave of taking an aspirin or not and I want to compare the strength of my my headache in two hours I either get to

see it with the aspirin or I get to see the strength of my headache without the aspirin and so you're comparing something you can see with something you cannot see and and the jargon that I like to

use for that is something that non made up I'll get to in the next slide Jersey non of non Pearson called potential outcomes and he did that in

1923 when was Woodward by the way the book is I think 2003 is after naming in fiser it sounds like so these ideas of you

needing intervention uh go back to Wheeler and Michaelson Morley and all the physicists in the early early 20th century so it it it's a it's a standard

idea that goes back a long way and in fact these idea of potential outcomes which these comparisons between things go back notationally to to name it in 1923 but fiser obviously had this in his

mind mind when he was talking about randomized experiments in fact he has this quote from 1918 that says the following if we say this boy has grown tall because he has

been wellfed or you're not merely tracing out cause and effect in the individual instance we are suggesting that he might quite probably horrible English might quite probably have been

worse fed and in that case he would have been shorter so we're measuring his height now that's the outcome if he's wellfed he'd been wellfed in the the past that's why he's tall if we go back

in time and feed him poorly then he would have been shorter so he's clearly viewing the the the causal question the causal effect of being well-fed versus

worse fed as being height at the same point in time now it's a missing data problem so this is not a a new idea the

idea that you cause and effect have to do with a missing data problem but remarkably Fisher had no notation for these Concepts either in 1918 nor in

1925 nor in any of the future editions of of his book there there other things that I should say that are I think are essential Concepts that that Fischer had Fischer Not only was a great

mathematician but he had tremendous geometric Insight uh the reason why maybe one of the reasons why he had tremendous geometric Insight he was essentially blind and so he imagined everything all

the time and then he had people write it down for because he could barely see in fact Cochran tells a story uh when they were on Gower Street in in London when

Cochran was was a young man and Fisher was maybe 40 years old and Cochin was supposed to take Fisher across this very busy street uh to go to dinner and he was had

him by the arm and and fiser at one point got very impatient and said to Cochran oh come on fiser let's go a little natural selection never hurt

anybody so fer had a sense humor but often not but the notation to actually talk about under one treatment You observe this under the other treatment or

intervention You observe the other thing was due to non in 1923 non in 1923 was writing a thesis in somewhere in

Poland and he he called these two like the quality of your headache with aspirin and your quality of your headache without aspirin two hours earlier he essentially called them I

call I call them potential outcomes in the context of this article he was talking about an agricultural experiment where there was yield of different plots of land so the units were plots of land

and you randomly assigned a unit to one treatment or another treatment and the yield on that plot of land under treatment one treatment was y1 under the

other treatment was y z so y0 or the array of potential outcomes under one under the control treatment let's suppose and y1 or the array of potential outomes under another treatment now this

notation this setup uh implicitly made an assumption that I call stable unit treatment value assumption did that in in the in the 70s uh but

non did really understand using this notation that you cannot observe both on any one unit you cannot observe the the yield under a control

fertilizer and on the same plot as the uh yield under a new fertilizer under the at the same unit at the at the same point in time so he realized that

there there's a missing data aspect to it so that's the same as the Heisenberg uncertainty principle the observer effect so no no big deal well maybe it is a big deal but these all these ideas

were sort of in the air in the early 20th century they they were there they didn't just arise in because I wrote about it in the 70s or something like that or

George box wrote about it or or other it was it was it was around it was in the air the second essential idea of of of

of non is that you should evaluate the operating characteristics of procedures over the randomization distribution that you should imagine repeated

randomizations and see what would happen that's where the words unbiasness come from unbias since means over all the randomizations you'll get the right

answer this that came to be known as non and Pearson theory of operating characteristics but the ideas obviously of name were there in 1923 actually it's interesting to to to

read this this stuff when you read it in in the original obviously Fisher wrote in in English in 1925 and even in 1918

non wrote in publish in in Polish sorry in in Polish but this was translated uh by some people at Berkeley uh in uh in

19 1991 you can read it in statistical science like a lot of old mathematics it's really awkward to read but it still is uh

interesting um so that's another key so one of the key ideas of contributions of of non was the notation explicit mathematical notation for this and as

far as I can find out this the first use of notation philosophers didn't have it even the experimentals didn't have it certainly Mill didn't have it fiser didn't have it but but non had it in his

thesis in in in 1923 the explicit notation for these potential outcomes which has be became standard after non introduced it became standard in all

class classical textbooks on experimental design and eventually became standard even in in economics now in medicine a little late but the guys

eventually got there and I'll get to that in a minute why now another essential point of of uh of non's contribution is he worried about what's

called non he called non-additive causal effects so the the difference between y1 and y0 on one plot may be different from

the difference between y1 and Y Z on the other plot so the effect of the of the new fertilizer may not be to add 10 kilograms of yield to each plot if it

does it's an additive effect on that scale on the raw scale it's called additivity what that means if it's not not additive there'll be a nonzero cor a non-perfect

correlation between the potential outcomes now that non-perfect correlation you can't estimate from data ever because you never observe both of them at least you never observe the

partial correlation given given covariant and non worried about that and uh and because it

actually affects the finite population in you all know about confidence intervals why did non Define a confidence interval to be at least its nominal coverage

that's the way it's defined in 1934 jrssb paper uh non and defined a confidence interval to have at least its nominal coverage so a 95% confidence

interval is an interval that covers the True Value at least 95% of the time he did that because he knew in 1923 that

you can't get exact coverage even ASM totically even in big samples you can't that was an an important idea that

I think is also basically forgotten in in in current uh literature it's coming back so people now now recognize the importance of of this Insight uh so

there are lots of contributions that fiser made primarily the idea to randomize and the idea of randomizing if you have a bad randomization and the idea of doing re hypothetical

randomizations to get significance levels P values non had this idea he had the notation which a fabulous contribution the idea to evaluate operating

characteristics that led to all the name and PE stuff um and then the role of non- additive causal effects but this this method that was

used in took over experiment experimental design completely had no effect on the way observational studies were analyzed they didn't use use potential outcomes they didn't use the insights from from randomized

experiments so you read the the the literature uh in epidemiology econometrics economics there's not no use of this at

all even the recent Economist like Heckman when he writes in the in the 80s didn't use it and he claims well it's Roy's model and if you read Roy it's not

there fact it wasn't in any of the economic literature and that's why the use of these potential outcomes and in certain

extensions of it uh became known eventually and there's a literature called Ruben's causal model RCM because it always seemed obvious to me just like the the idea of you can't

go back in time seemed obvious to me these ideas seemed obvious to me and I was reading all these experimental design books that that Cochran used and they always talked about potential

outcomes potential yields and yet they were never there they never used them when doing epidemiology smoking and lung cancer is a good example Cochran was on the

surgeon generals committee uh US Surgeon generals committee that wrote the 1964 surgeon Generals Report on smoking and lung

cancer which came up to the fact yeah there have been no experiments still not no experiments randomized experiments that have de demonstrated uh any effect of smoking on lung cancer smoking on

other kinds of cancer and dogs and rats and stuff but lung cancers take apparently take too long to develop you know most rats and dogs don't live to 50

60 years old or 40 years old even when did most people develop lung cancer who who are heavy smokers 50s 60s 7s they've

been smoking for 50 60 years sometimes takes a long time for lung cancers to to develop animals don't live that long tortoises do it's tough to make a

tortoise smoke so they don't do it uh so with this contrib I I I this one paper it's a

sequence of papers uh that that I wrote in the in the 70s uh and they're all using this perspective of potential outcomes in non-randomized studies to

find causal effects this was obvious to me because remember what I said about physics you science exists and what you do to learn about the science doesn't

exist doesn't affect what the science was in the past the past existed you learned about it and then you learned what interventions do and but they but

they don't but they don't change the science um and so potential outcomes and to me just toine causal estimands quantities to be estimated in causal

inference in all situations not just in randomized experiments and it's kind of remarkable to me at the time when I wrote this I said it's so strange to me because I was doing education and and in

Psy in Psychology experiments a little bit in economics and none of the econometricians none of the epidemiologists use this potential outcome notation in the context of non-randomized studies so for example

you read the surgeon general's report there's nothing in there about about that it's all doing regression where the outcome variable is a variable and the treatment intervention is an indicator

variable in the regression could be logistic regression could be probate regression could be an ordinar Le squares regression it's all by

regression and in fact uh when I start started doing this in in the 70s non disagreed with me I was fortunate enough to in in the mid late 70s I had a

visiting appointment at University of California Berkeley where Jersey non was retired but they put the visiting faculty and the retired faculty on the same floor in Evans Hall it was a great

experience because I got I got to go to lunch with them all the time Great Character right you know European and Charming guy chain smoker so is Bill Cochran even when he

was on the surgeon general's uh committee he was a chain smoker lots of in fact I remember being at at faculty meetings at Harvard as as a visitor in the

70s faculty meetings full of smoke everybody was smoked that that that that changed in the in the 80s that's for

sure so uh when I had these discussions with with non in the 70s he said it's too speculative to talk about cause effect without randomized

experiments that always puzzled me because we've made lots of success we flying airplanes and stuff be before we did randomized experiments you know not jumping out of airplanes without a

parachute as as our previous speaker mentioned we we we learned lots of stuff we learned about arsenic uh and other sorts of stuff we didn't need randomized experiments to

learn about that so we you don't really need randomized experiments if the effects are big enough and obvious enough but non really disagreed he said it's too speculative to to talk and I'll

have something to say about that later as well um okay uh a second contribution we of of of this series of of of papers

besides using this not non's notation that he only thought should be applied in randomized experiments to talk about cause and effect in all situation was an assignment mechanism

remember I said said that there's a clear separation in science in physics at least back then between science and what you did to learn about the science an assignment mechanism is one that says

how some units got assigned to one treatment and other units got assigned to another treatment so why some potential outcomes are missing and some potential outcomes

are not missing that's all it's saying and so I said you need an assignment mechanism for causal inference whether it's a randomized experiment or not

notationally this probability of w w is a vector column Vector of indicators who got treatment who got control X or background variables covariant age sex

race things like that y z are the potential outcomes under treatment why one are the potential outcomes under control so you you condition what you're

going to get on covariates and the potential outcomes and if it's unconfounded what does it mean I hate the word confounders

because whe whe something's a confounder not completely depends upon context and and so this is a an assignment mechanism is confounded if it depends upon the potential

outcomes in fact if you look in E economics uh Roy's model written in the 1956 I think by his paper by Roy was all about people making

choices and the and the the occupational choices they make are designed to maximize their income for example so why could be income in a particular job y

zero could be income uh in in a control job being a Shoemaker y one could be your income if going to college and people make choices based on the

comparing these potential outcomes and that's confounded and that's the big advantage of randomization is it's creates

unconfounded assignment mechanisms this is from uh my 75 paper so uh the idea is that it it it gets it you don't make

assignment decisions based on what you think's going to happen in the future that right away reveals why uh medical

research using observational D data can be so difficult when you go to a doctor you don't ask the doctor flip a coin to decide what to give me a do you talk to the doctor and you

and you and the doctor make some decisions and you hope the doctor's going to make a good decision so what the doctor is implicitly doing sometime maybe if he's really good maybe even

explicitly is comparing your potential outcomes under one treatment with your potential outcome under another treatment and he chooses the one that he thinks is best for you at least that's

what you hope he's doing you hope he's not choosing the one that's bad for you if you want to see what that and that in economics is called Roy's model

so that's that's that's life so that you make that's how we make our life you know why did you come here rather than stay home this morning presumably you said well I'll get more out of of coming here and listening to these these guys

Babel than I would staying home and having a larger breakfast okay you he made that choice that's why you're sitting here rather than than eating more more food so well this idea of

being an unconfounded assignment mechanism is is critically important there's also a more U complex version of this called ignorable that has has to do with insequential experiments when

you're making a decision based on the observed outcomes from previous units so you do this with with rare diseases one of the uh extreme examples of of that in

the statistical literature is ECMO extracorporal membrane oxygenation which was done for for infants with this uh who are has uh born with with an

inability to of their lungs to oxygenate blood and they they cut kateed artery and bypass through an oxygenation also through other complicated stuff um

another contribution of this work actually in 1978 in the analys of Statistics was you you can add to this assign see look one thing that's neat about the assignment mechanism is

there's no modeling of the data all the only model that occurs is the randomization that you were doing or the assignment mechanism and In a randomized experiment you know what you did

that's known so someone who says oh there's only a minor difference between observational studies and randomized experiments that isn't true it's very and it's mathematically precise doesn't

mean you can't do causal inference uh in observational studies but it means you have to speculate about the assignment mechanism which is fine I do that a lot people do that a lot the whole

instrumental variable world does that a lot sure that's okay but you have to be explicit about it but there's no model on the data there's no regression model here there's nothing to do there's no

multivar normality there's no multivari T there's no logistic distribution what basian inference does is puts a model on the science on the things that are being conditioned on in

the in the assignment mechanism the so the covariates and the potential outcomes once you do that you have a tremendous flexibility to do many more things so you I think there's a

tremendous advantage of that in parent I've actually talked about some some of those things over the last few few days and I hope uh that we get the chance to work together on that in this in in the

registry data so the potential outcomes approach to cause inference and this this is the simplest setting under the stable unit treatment value assumption

is that W1 uh indicates an active treatment and w0 equals a control treatment and you have n units the units are the rows and you have two possible

treatments zero and one and the potential outcomes are just listed there so this is the science this you'd like to learn about the so you have y1 the outcomes of exposed to active treatment

y z the outcomes of exposed to control treatment and the causal estimands are things like the individual causal effects like for the first unit the difference between y1 and y z uh could

be the the average of all those the average causal effect could be the median causal effect and a lot of uh uh policies studies it's more interesting to try to estimate a median cause of

effect like a tax policy on selling price of houses where have you ever seen house prices compared with average cost of the house almost Nobody Does that they talk about median median prices

it's perfectly fine to do that why isn't that part of the statistical economic literature the math is hard becomes harder and trickier it could be 25th per

some some of the percentile regressions quantile regressions try to get it that but they get it a really clumsy way but the you can look at median causal effects or or 25th percentile causal

effects 25th percentile of the individual causal effects but you'll notice in order to estimate those things you have to worry about the correlation

between y1 and Y Z non's concern in 1923 with additive causal effects and I'm I'm doing this historically because I think it it's important to have an appreciation where

ideas come from and and I'd be perfectly happy to find out that I'm wrong on the history I've been looking for close to 50 years to to find better history and if I'm wrong if some some uh if John

Stewart Mill did it before I'd like to know but I don't think so the fundamental problem facing call caal inference is this thing I've kept emphasizing that for each unit You

observe only y1 or y z and the other one is missing and here you come across this this really interesting idea if you randomly assign active versus control

treatment you get a representative sample of y1s the check marks under y1 and you can get to compare that with the representative sample of of Y zeros if you have a random sample of something it

represents the whole column that's sort of obvious isn't it and people were doing random random sampling in some

sense way before 1922 or 23 or 25 and people who did g they they understood lots of this stuff but they never proposed doing randomization as far as I can see again

I'd like to know if I'm wrong did gaus do it llas do it I don't think so they they knew all about gambling they knew about you know the probability you'd get a flush in a game of poker or

four hearts in a row and some and just drawing cards at random they do all they knew all about probability but they never had the idea of randomization and in fact I've looked hard to find the

first person who proposed using randomiz randomization of any kind well non did something in 1923 but you you look at non in fact you ask

non why don't you get credit for inventing randomized experiments rather than fiser because I think 1923 is before

1925 and it but in his biography he says oh no when I when when I wrote my 1923 paper I was just doing the

mathematics I didn't understand that the mathematics was telling me to do something fiser was the first to have that idea and they hated each other at the time by by the by the 1940s they

hated each other yet yet non says that was Fischer's idea and it's one of Fisher's most brilliant ideas he's the only one that proposed randomiz he was the first one to propose actually

physically randomizing I didn't do that I was is fooling around with the mathematics there's something from from from physics that that reminds me of

that which is um in Loren equations that that occur in relativity people looked at Loren equations before had sort of curiosity that speed of light and you know and they knew the

speed of light was was sort of finite from Michaels and Morley experiments earlier so so what why did Einstein have that Insight that you should actually it's actually

telling tell you about but the speed of light is a limiting factor you know he was working in the in the Swiss U patent office what was he trying to do was he

trying to do math trying to do physics theory trying to coordinate all the clocks the train clocks and Zurich and bosel to all go click at exactly the same

time and this the time to transmit from bosel to Baron to Lan was important to try to think are they going to exactly the same time and he was worried about that that's why he started worrying

about relativity it takes time to transmit so why do I put down Gosset well Gosset was writing things also in 1918 he has a

paper he said if the plots had been randomly assigned in a in an experiment in a agricultural experiment then the expected value of mean squares would be something so there are lot it was in the

air this idea of doing randomized experiments was sort of in the air but Fisher was by far the first as far as I know the first to ever pick up on it and say you should actually do the

randomization this this the math is telling you to do something do it and uh and I think uh non would have certainly known if somebody else had

done it before fiser and so and and I talked to him he not didn't he didn't said no as far as he knows Fisher was the first one to to ever do that's something like writing

down this assignment mechanism and having it as a function of the potential outcomes that's that's kind of a flaky expression in some sense because something you're going to do now a sign

treatments can depend upon things you've observed in the past but also can depend upon things you can observe in the future so this assignment mechanism where they put

it probability of w given XY and the potential outcomes it's sort of a flaky thing no one ever wrote it down before I don't think if somebody did I'd be I'd love to know but it's something where the assignment of treatments Can Depend

not only on something you can see in the past but it can implicitly depend upon something that you can't see until the Future these potential outcomes in the in the future but that gives you the the

key tool to do all of this stuff formally and mathematically why this guy is it that name it looks like it's Pierce

misspelled it's not it's pronounced purse and he was an American philosopher he his father father was a a Harvard mathematician and he was uh at the time

of William James as a Psy psychologist and he actually proposed doing randomization in the context of a psychophysical experiment but it wasn't

for the purpose of inference it wasn't for the purpose of removing bias wasn't for the purpose of doing hypothesis test it was for the purpose of some somebody couldn't guess where the next weight

coming up was heavier or lighter so he's doing this experiment to try to determine do people people think the next weight is heavier if it's preceded by a lighter weight or a heavier

weight and you can read about that in in in P stuff it's also in in Steve stigler's history of Statistics

books so uh so but he Pur actually had some discussion of randomization he probably anticipated random sampling in surveys before non wrote about it in

1934 non wasn't aware of it I wasn't aware of it back then either I'm gonna I'm going to uh skip this I think there's a you know well maybe I'll

say something about it uh there's the the way all the epidemiologists and and economists from Forever dealt with with causal inference they use ordinary leas

squares for example or logistic regression where they lost the potential outcomes they replaced the fact these graphs do the same thing when you write

a graph with with the outcome Y what is it is it y under treatment or is it y under control well what it's what really is it's the observe value of

y and they start drawing arrows to make it all go away well in the in that Arrow plot that we saw earlier where is the correlation between the potential

outcomes that affects the inference for sure non showed that in 1923 where was it in the graph it's gone well the epidemiologists

did doing causal inance they lost it too they're not any better the psychologist lost it economists lost it they all wrote y

OBS the observe value of y which is an indicator time y1 plus one minus the indicator time y0 so it's the observe value of y which is the treatment value

if it's if the guy got treated and it's the control value if the guy got control now what does that do well then you have to put W in in some regression somewhere and then Reg address the

outcome on why but you've lost the ability to even even write the the importance of the correlation between the potential outcomes and also what it

does is this quantity y OBS is that science or is that the assignment mechanism it models them I thought that

I learned from physics when I was a kid you shouldn't model the science and what you do to learn about the science I thought that the whole idea of intervention was you you keep them separate because the science exist

it's back there and then you have to do something to learn about it so what what is this it's neither one it suppresses key key insights it's a retreat to

confusion yet it became standard in BIO statistics economics epidemia everywhere and even great statisticians like fiser Fisher made mistakes using this not why

didn't fiser use this notation he never used this notation potential outcomes why I've talked to David Cox about this and I I think it's because it

was non who made up the notation and Fisher hated non so why you use non's notation they never referred to each other you know in in in referencing they

never so n Fisher never used it and he made mistakes and he made the mistakes in statistical methods for research workers from 1925 and on and and and and

even in design of experiments and the mistakes he made were something that still plagues uh pschology and and and Medicine indirect

effects mediation moderation whatever they whatever you call it which is totally confused and still is totally confused in in Psychology and most medicine it's confused because of these

graphical displays which totally muddle it in in uh in general but this horrible notation that that was totally confused became standard and bio statistics

economics epidemiology everywhere and even great statisticians Fisher Cochran cornfield you know corn cornfield was was the guy probably the major guy in the US on smoking lung cancer very smart

guy who was a an epidemiologist confused themselves using this this notation and what we say why why did why did people get so confused why why didn't anyone think of

randomization before fiser did if fiser did it in 1925 you'd think it would go back to the Greeks or the Egyptians I mean hell they they could measure the circumference of the earth within a few hundred meters

by measuring Shadows from pyramids they knew the Earth was round so they weren't stupid but they ever proposed randomization to learn about effects of

things no so is it really taking till the 20th century before anyone had the realization that taking a random sample of of one set of potential outcomes giv us a representative

sample well I think I think one reason is human beings are not wired correctly to understand uh experiments or randomization and I think the reason for

that is at sloppier statistical thinking in the past could lead to more successed and misapplied careful thinking I call this the hunter Paradox sometime it's called Uh gamblers Paradox so supposed

going north to hunt was more successful than going south so that you went North and 60% of of random time you were successful going south 40% of the time

you're successful so a careful data mining tribe noticed this and went North 60% of the time and South 40% of the

time and then there was another tribe that was statistically naive and totally and ignored this ah sometime I go one way I eat sometime I go the other

way I'll always go the way where I eat most often so they went North all the time which tribe ate better the tribe that went North all the time or the tribe that went North 60% of the time

because they had data that suggested that was good or the tribe that went South 80% of the time the tribe that went North all the time because they at ate well 60% of the

time the tribe that went North 60% of the time and South 40% of the time assuming was all random ate well 6 * 6

plus4 time4 or 52% of the time so if you're going to survive it's better to ignore data that's what this suggests course that's not really what it

suggests it what it what it what it suggests is you better differentiate between data and loss functions how you use the data so if you have if you have

any decision Theory and you get the right answer because the you have the data and you make a decision and and that's why it's called something called the gamblers Paradox if red comes up

more often than black bet red 100% of the time and you and you'll you'll do better so the ordin ordinarily Square style

that that came to dominate economics uh work in political science work in Psychology work in education sociology all all the social science field and

medicine was bad because it buried these crucial are er uh issues like the role of additivity the inherent missing data issue was lost using just yobs and

there's no longer any need to impute the missing yobs the missing why Yi miss the the missing value of of it uh and therefore the things to estimate must be

parameters in the model because I've got to estimate something if I have all the data then there's nothing missing so I must make up parameters well what are parameters I

don't they don't exist potential outcomes exist they're real objects of inference but what's a regression coefficient or logistic

regression coefficient or hypothetical mixed effect param model you know all this nonsense is just hypothetical they're devices to get answers but they they became the object

of inference you know odds ratios and stuff and they and led to all sorts of ASM totic silliness and sloppy mathematics they lost to distinction between finite and super populations and

the empasis turned on analysis and estimators rather than study design and estimands uh and basically the uh how

much how much time do I have per B okay great uh uh and why did ordinary le le Square style become so dominant and I really think it's because of

computational limitations in the um in the past and uh and and basically I'm saying here don't fall into the Trap so retain the potential outcomes view I think it was a terrible influence on the

way observational studies were treated it modeled the difference between Association and causation causation is distinctly different because you're explicitly comparing

these potential outcomes you're not just looking at associations uh and example of that is what's called case control studies which became a staple of much of

epidemiology because both y iobs the observe value of an outcome dead or alive and wi the treatment indicator for treatment versus control were both dichotomous so logistic regression sort

of works either way but again you don't have any why I miss and the the idea of having these assignment mechanisms that create missing data or sampling mechan they're both lost so here's a quote from

from cornfield in 1959 that was reprinted in a statistics and Medicine article may I think just last year but but this is Jerry cornfield uh saying we now consider the

distinction between the kinds of inferences that can be supported by observational studies where they're prospective or retrospective prospective means you

gather I think like the kinds that were described earlier where you look forward you say I'll match people and treatment control and then I'll see what happens in the

future retrospective like the case control I get a bunch of people with the disease bunch of people without the disease the cases and the controls I'll match them up and look to see what they

were exposed to that's that works well for rare diseases but it's a extremely weak form of inference you saw that in the pyramid that was put up earlier they're they're down low in in that

pyamid the case control study because they're not as good as prospective studies even though both are observational so cornfield is saying here we now consider the distinction between the kinds of

inferences that can be supported by observational studies whether prospective or retrospective and those that can be exported by experimental Studies by randomized

experiments that there is a distinction seems undeniable so there's got to be a distinction between uh non-randomized studies and randomized studies but its

exact nature is elusive well let's look at that again there's the assignment mechanism unconfounded randomized assignment

mechanisms they can't depend upon the potential outcomes it's not that's not undeniable it's not elusive it's just sitting there but cornfield didn't see that in

U 1959 and any other things he wrote so say and neither did fiser they got

confused um yes I I had it was something wrong in the program you should we should read end the 10 15 but um so if

you could sort of w up I mean we can we can adjust just okay no no no problem yeah there there there's one more thing that I

wanted to say about the importance of of uh these experimental design courses like the the uh course I took from from Cochran classic experimental design

these experiments with with with lots of factors are becoming uh the concepts underlying them are becoming more and more important with quote big data and

registry data the idea for example one one thing I I worked on a couple years ago which is sort of medically related is you want to convert stem cells to

beta cells beta cells pancreatic beta cell cells so you can cure type 1 di diabetes they know the process it takes about 10 steps to for the conversion to

take take place but these 10 steps are very are involve lots of different things each step involves temperature different chemicals different environments and you're trying to to

convert them because you not want to implant them in people so if you have each factor has five levels so you now have 10 to the

5ifth that's a big number you can't you can't even contemplate doing doing a complete factorial experiment on them I because because two to the 10th you know it's a thousand so two to the two to the

5ifth has got to be a million or more well you can't do a million experiments you wouldn't have time you wouldn't have the people to uh to conduct them even though these are are

these experiments are done with little Wells and trays and and um uh where you spray thing various things in you can't can't possibly do them so the the the

understanding of of classical experimental design is really important and important for uh medical studies and for many medical studies now with lots

of factors which you now have the ability to look at even though they are observational data you can conceptualize the observational data as if it had has

arisen from from a randomized factorial experiment and I think that's a lot of the future is is going going to be uh based on on doing those kinds of studies

so this is just a final slide that I think you should design observational studies to approximate randomized trials and one thing that you should do which is what per and I have have talked

about is that if you can include some randomization through randomized encouragement it it it has a tremendous advantage and this randomized

encouragement design uh actually has an old history because it can be tied into this idea of instrumental variables and instrumental variables is a great idea in economics alth the although the

method of analysis is terrible it's pedagogically wonderful but but the statistical properties of of the methods of analysis are are awful so

you should maintain ideas from the past that are really important but but lose the old techniques because many of them are based on what you could compute on the back of an envelope and and we have

computers now that can do many things for you so you take advantage of that thanks very much

[Applause]

Loading...

Loading video analysis...