Applying the Causal Roadmap to Optimal Dynamic Treatment Rules with Lina Montoya - #506

By The TWIML AI Podcast with Sam Charrington

Summary

Topics Covered

Why Causal Assumptions Are Often Overlooked
Optimal Treatment Rules Require Unique Causal Assumptions
The Seven-Step Causal Roadmap
Counterfactuals Bridge the Gap Between Questions and Data
Sequential Trials Reveal Complex Treatment Dynamics

Full Transcript

[Music] all right everyone i'm here with lena montoya lena is a post-doctoral researcher at

the university of north carolina chapel hill lina welcome to the tuemo ai podcast thank you so much so great to be here i'm really looking forward to digging

into our conversation uh you've had the pleasure of kind of sitting through a little bit of pre-interview set up as i am on the road for the recording of this interview so thank you for your

patience it was fascinating we had a good time with it yeah we're going to be talking about your recent icml presentation that focuses on

your work in causal inference and uh some of your research broadly but before we do i'd love to have you share a little bit about your background

and how you came to work with machine learning yeah yeah so my background i guess i'll start actually an undergrad when i was a

psychology major and did not have machine learning or anything statistics oriented was not really on my radar so i was psychology major and then i worked a lot in research and

starting sort of digging into the data and realized i got really excited by the data and

just decided to apply to a biostatistics master's program and got in and that was at uc berkeley um i did my master's in biostatistics

and there learned about causal inference which opened up a world of of ways that we can rigorously

answer uh questions scientific questions um and then using causal inference kind of was introduced to machine learning methods that would allow us to answer

these causal questions in very flexible ways and yeah i completed my doctorate and now i'm doing a postdoc in biostatistics as well

awesome and what was the focus of your doctorate my doctor yeah so it was in a lot of it was causal inference and specifically in methods within precision

medicine so specifically the optimal dynamic treatment regime so that's basically a fancy way of saying that it's an algorithm that

takes in a patient or individual or participant covariates or characteristics and then outputs the best treatment or intervention for

that person and so yeah during my doctorate i spent a lot of time um researching sort of methods that would get at estimating this optimal dynamic

treatment rule or this individualized treatment rule and in particular um applied this method these methods to i would say two big applied projects so the first one

was within criminal justice the criminal justice system um and the second one was within the um

hiv um and uh patient uh care space got it and yeah in fact this is the work that or this is the work that uh either you

presented it uh icml or is related to the work that you presented at icml that's right the former yeah the first one that i just talked about is exactly it so

that's basically so i talk about the the optimal dynamic treatment rule a way of estimating it called the super learner algorithm and then i present an application of

this algorithm to basically defendants who have mental illness to see which defendant should get cognitive behavioral therapy or cbt

versus treatment as usual based on their characteristics so if we can find a way of administering either cbt or treatment as usual in an individualized way got it got it we'll

dig into all of that in more detail but before we do the workshop that your talk uh you were an invited speaker at this workshop it's called the neglected assumptions

in causal inference workshop yes yeah and the there's so much in that name i'd love to to have you riff a little bit on this idea of neglected assumptions and causal inference and

kind of what it means what some of the other presentations were at the workshop that kind of thing yeah yeah so i think it the name kind of came out of um maybe the idea

that causal inference and machine learning are you know gaining this tremendous popularity and that sometimes when in machine learning or when we're trying to tackle

a problem a scientific question um often there we kind of turn to these causal methods or causal tools um without sort of looking step by step

methodically to see if we're missing any causal assumptions so for example i'll give an example um so if you have a data set and you're trying to find the effect of

a variable on an outcome and you throw in all of the possible covariates that you have to find that effect using some machine learning algorithm you might be including in that set

something that's called an instrumental variable variable or a collider variable in which case if you put those variables into your model you're going to

introduce some insignificant bias and so the i think the idea is that sometimes when we you know apply these causal methods to machine learning problems we either

kind of do it blindly like without this sort of method or roadmap for doing so um or we might even not the other extreme just completely give up and say oh i can't infer any sort of causality because i don't have

the proper causal assumptions and so i think the purpose of this workshop was sort of to highlight um what assumptions are needed to make causal

inferences and um also show that different assumptions are needed for different kinds of questions that there's not kind of a one-size-fits-all these are the standard sets of assumptions that are applied to

every single causal problem um and also present uh different frameworks for answering causal questions or road maps to

be as transparent as possible um about the assumptions that we're making to answer uh scientific or causal questions got when you mentioned

uh instrument instrumented variables and collide collider variables yeah yeah what are those of these relate i'm imagining the idea of um you know correlated variables things like

that yeah yeah so that's a that's a great question and this is um yeah causal inference speak and this comes out of the pearl um

structural equation sort of directed acyclic graph world so basically um instrumental variable and

uh collider bias variable so that comes out of these graphical equations if you were to kind of graph the relationship between each of those variables the instrumental

variable is one that might affect an intervention but not the outcome and a collider variable is a variable that's

um if you have two covariates that affects that collider and those two variables as well affect the intervention and the outcome so that's kind of um

yeah this is within the sort of pearl uh directed acyclic graph two kinds of graphs that are out there okay uh can you make those

more concrete with an example um yeah so let's see let's see okay so let's take an example for example

smoking is the intervention and the outcome is lung cancer so if you were to toggle yes or no smoking then that's may have an effect on

the outcome lung cancer and so let's say i don't know a variable that affects smoking let's say socioeconomic status

um and so that's for example a variable that might affect smoking but that might not necessarily affect whether or not you get lung cancer directly and so that's something that might be an instrumental

variable additionally in economics it's used a lot for example um if you were to randomize a treatment

but you don't get perfect compliance of a treatment then the instrumental variable could be the actual flip of a coin and the intervention or the treatment is what the person actually got so those

are they're directly affected um and the outcome might be whatever outcome you're interested in so it's kind of like a proxy of your um of your intervention that you

care about that doesn't directly affect your outcome okay okay yeah what's the connection between the the work that you presented optimal

dynamic treatment rule estimation and the neglected assumptions idea yeah yeah so i think this is a really great illustration the optimal dynamic

treatment rule um so let me back up and say that so the optimal dynamic treatment rule can be considered uh a causal question so the causal

question is what is the best way of assigning an intervention um and even further you might ask well what are

people's um what would have happened had everyone received their optimal intervention what would outcomes have looked like had everyone received their optimal intervention so that is a

causal question which translates into a causal parameter now the assumptions that are needed to estimate that causal parameter are

not are different or very specific to that causal parameter versus for example just the average effect had we given everyone the exact same intervention or given uh the

intervention in a non-individualized way and so i think that this that the optimal dynamic treatment rule the set of assumptions that go with estimating that the causal assumptions

that go with that are unique to that question because it is a unique question uh versus for example the standard assumptions that we might all

uh be taught of for example estimating that average effect of a non-individualized treatment got it got it um so let's maybe dig in a little bit deeper

into the specifics of the method um it's related to an idea that comes out of berkeley called the causal roadmap uh can you talk a little bit about the

causal roadmap and what that is and what the connections are yes yes and it's something very near and dear to my heart and um something i'm quite passionate about so it came out of this

specific causal roadmap came out of berkeley was developed by maya peterson and mark vanderland and it's really a way

of going from a causal question or a scientific question and going all the way through it to see okay do i have in my data

the sufficient conditions to answer this calls a question and then do i have the tools for answering this cause causal question and finally with my data let's actually answer it with a certain

parameter or let's get out a number that will actually answer that calls a question and so specifically the steps of the causal roadmap are first of all state your question

your research question and that includes what's your population what are your variables what's your outcome the second thing is to specify your

model so what i had just said before that sort of graph the thing that relates all of your variables together your intervention connecting to your outcome your coherence your

um yeah your features for example uh relating to your outcome and your intervention so a graph that sort of relates all of the variables that you have

together um third is to translate that question that you got that you made in step one into a causal parameter

that's a function of counter factuals of your counterfactual distribution and fourth is to specify what data you

actually have and the link between your causal model and your observed data distribution model

fifth is to actually identify your causal parameter which is a function again of counter factual so that's those are things that you can't observe counter factuals are you know outcomes had

everyone received the same exact thing and then you might say okay i want to turn back time and give everyone the opposite thing you can't do that in real life

um and so this fifth step is to say well can i write my causal parameter as a function of what i can actually observe so not counterfactual not from the counterfactual distribution

um and in that step that step is one of the most important ones because that's the one that um where the causal assumptions really come to light what are the things that you need to

assume for example that everyone was randomized in your in your study that there's no unmeasured confounding um uh things like positivity the positivity

assumption meaning that everyone has a positive probability of actually getting that intervention so things like that and then the sixth step

is to actually estimate and uh and the sixth step is maybe the thing that we think is causal inference but it's just one of the steps of the causal roadmap so

yeah estimation yeah yeah so estimation is going to include things like machine learning um you know double robust estimation

um yeah all of the machinery that takes your finite sample and tries to estimate that statistical parameter that you got in step five um and

and of course yeah we want to you know use things like machine learning to flexibly get at this uh statistical or or this estimator at this point

and then the last step is to actually interpret the results and whether or not you can actually interpret what you got in step six your estimator as

a causal quantity depends on what you what you've assumed in the previous steps so um so yeah i think this

this roadmap i think just kind of um provides a way of really clearly seeing if you can actually go from a causal

question to seeing okay is the number that i have actually answering the causal question that i made in step one um and not just kind of

well on the one hand not not making sort of biased claims and on the other hand not just throwing up our hands in there and saying we can't ask for anything

yeah yeah what's the difference between step three which i believe was restating your question in terms of a causal parameter and step

five which is um writing that as a function i think there's more to it but um yeah yeah yeah let me let me clarify that because it's a really important and

subtle point so let me just say by start by saying that they they get so your causal parameter and your statistical s demand are gonna get at the same exact thing they're gonna get at the

same exact number the only difference is that that your causal parameter which is what you get in step three is a function of your counter factual

distribution meaning that it might be for example if you're interested in the what's the effect had everyone received an intervention versus if no one had

received an intervention then your causal parameter is going to be the expected outcome had everyone received the intervention minus the expected outcome

had no one received the intervention and then that world it's a hypothetical world because no one can receive both the intervention and not the intervention at the same time so

it's kind of like this how i tell my students when when i'm teaching causal efforts it's like it's your magical world of the counterfactual distribution where you can

you know toggle these things intervene these things um and look at outcomes under these different interventions you can't see that in real life that's different than step five which is

identifying your causal parameter this thing that you got from the magical world as a function of what you can actually observe your observed

data distribution and so now your uh your statistical parameter as opposed to your causal parameter is going to be the expected outcome given

that your intervention is to treat everyone that your intervention is to treat given your covariates and then averaged over all of those

minus for example the expected outcome given um your intervention is to not treat and your covariates and then averaged over your the covariate

distribution so it's the difference between again your your these parameters being a function of counter factual things so like counterfactual outcomes versus

things that we can actually observe got it and so in step five step three is you're stating this question in terms of counter factuals and step five

as you're stating them in terms of things that you can actually observe yes exactly exactly and importantly sorry i was just going to ask is the statement of these things in terms of

counterfactuals is that um you know given that there it's this magical world is that is the function of that step

so to inform our understanding of the problem or can we kind of mathematically reason via these counterfactuals through the tools that

you know we have with causal modeling and causality yeah that's a really great question so i would say that the reason for doing that for the reason to

write it as a causal parameter is this is our moment to kind of to get creative and and take the question that we're actually interested in

and write it as something that's not um realistic to do in real life but it's kind of the thing that we would have wanted to do so kind of unburdened by exactly

exactly like we would have wanted to give everyone the treatment look at the outcomes and then turn back the clock and give no one the treatment and look at the outcomes so it's our way of sort of like really formally writing down okay this is what

we would have wanted and then in later steps we're really examining to see well can we actually do that um and making that transparent and that's that's kind of the magic of

the causal roadmap and i think the importance of it and then getting back to the neglected assumptions which sometimes not explicitly said got it got it

and so the optimal dynamic treatment rule that fits in that's step six the actual estimator what was that yeah yeah okay there's a lot of steps in this

there's a lot of things so i can um yeah i can try to you know go through the road map and sort of apply it to this optimal dynamic treatment rule um to this

yeah the treatment rule problem and my presentation kind of kind of if you can like when i give longer versions of that presentation it actually goes through the like the outline are the different

steps um but yeah i can i can kind of step through so um so yeah the research question is so i think there's two research two causal questions here so

the first one is what's the rule or way of assigning treatment um that yields the highest expected outcome so that's that's the

causal question what's the what's the rule or algorithm um that uses uh individual variables to assign the best treatment possible um the second question

is what would have happened what would have outcomes looked like had everyone gotten their optimal intervention so that's in contrast to what would have outcomes looked like

had everyone gotten the same treatment right so so yeah so that's so that's the um that's the question the causal model um in the case that i

presented it's an artist if i could jump in are there assumptions that we're making about the there have to be assumptions that we're

making about the nature of treatments um you know whether they're continuous versus discrete like you know dosages or whether how many options we're making how does all that

come into play yeah yeah that completely comes into play and should be encoded in our model um and so i think that perfectly segues

into step two which is specifying our causal model which is the model in this case so i'm the the example i presented is a randomized control trial setting it's not observational

it's an experimental setting so in that case my model is that i have a set of covariates and those affect the outcome but those covariates

do not affect the treatment which is uh cbt so cbt being cognitive

yeah so i have these covariates that may affect the outcome but those covariates i'm saying i'm encoding because it's an experiment that they don't affect

uh whether or not a person was given cbt because it was randomly given okay and so um the thing that does affect whether or not a person gets cbt is a flip of a coin

because it was an experiment and then i'm also saying that cbt versus treatment as usual may affect the outcome which is recidivism at one year

and so you can imagine this graph of as like a triangle but without um an edge on one side so that the covariates affect the outcome but it doesn't affect the treatment yeah

or the intervention okay so that is my causal model um and so in that way i'm encoding that the data we're generating this is what i know about the real world

that the data were generated in this way um and so that might be uh somewhere where you would make strong assumptions if you actually know how the data were generated um so for

example i know that it was that cbt was given with a flippable coin with 0.5 probability of getting cbt um and so

okay so yeah that's step two and then step three is to translate the research question into a causal parameter and so that the first question that i talked

about was the question about the optimal rule so what's the best way of of um of treating an individual person with the

treatment that they should get and so we so the um causal parameter in that case is an indicator

that the conditional average treatment effect is bigger than zero so the conditional average treatment effect is a causal parameter because it's a function

of it has counterfactual outcomes in it and so specifically that's the average treatment outcome given something given

covariates yeah given a specific kind of person got it yeah exactly so it's the expected expected outcome under treatment minus the expected outcome under control

all conditional on a kind of person on your covariate distribution and so we're going to define the optimal rule as an indicator that that conditional average treatment

effect is bigger than zero so in other words if my effect if i'm you know a 31 year old woman would get it on my profile and my treatment effect is bigger than

zero then treat me if not don't treat me and that's going to be the rule which is this causal parameter got it so that's that causal parameter

and then i talked about also another one which is the value of that rule so what would have happened had everyone in the population gotten the optimal rule so what's the

expected outcome under the optimal rule right so in that way you can kind of see that i took the first question

and created two causal parameters out of those two questions um and then step four is to specify what data

are available and the link between the causal and statistical model and so so what we often say is that um so in this case

this in this rct that i'm talking about i can say that um my debt my data my coverage treatment and outcome were

generated by sampling 720 iid times independent identically distributed times from a model a distribution compatible with

the causal model that i described above and 720 because that's the sample that was

used in this study sure and then that in that step you know if you have um dependence between people you might encode that assumption in there as well but in this case we're

going to assume that it's iid okay and you mentioned uh you know kind of a verbal small print

uh compatibility between the assumptions and the model i'm sorry the distribution and the model is that uh is that challenging to enforce

uh and to what degree does that limit um you know your choice of distributions or things like that yeah yeah well in this case it's not very

we haven't made very strong assumptions we haven't really said anything we've only said how the variables are related each other to each other we haven't said anything about the functional form

like the outcome is a linear function of the covariates and the intervention we haven't imposed anything so at this point i would say it's quite easy to make that link from the observed data

the um the observed distribution to the counterfactual distribution because really the only thing we've imposed at this point is that uh that relationship between the variables

which we know we observe to you know that's how things actually happen um so i think that's kind of the beauty of this too is like the flexibility

of this um and also the ability to make things transparent like maybe you do know that it's a linear that there's a linear relationship between the variables somewhere but at this point

we have not said anything about that okay yeah but great question yeah um and then okay so then the next step is to actually

identify the um the causal parameter as a function of the observed data distribution and the first so to get at the optimal rule we can identify

the conditional average treatment effect as something that's called the blip function and so that's yes the blip function

and i think i think it the blip function got its name because um so this is back uh robin's uh jamie robbins paper and i think it's the idea was

it's kind of like a blip in the treatment effect for an individual kind of person um and so we can identify it oh and let me actually let me back up for a second so the assumptions that are

needed to identify these two parameters are first the randomization assumptions so no unmeasured confounders and the positivity assumption which says

that for every kind of person in your covariates that there's a positive probability of getting treatment or cbt in this case and because we're in the

experimental setting we're in the rct setting those actually both hold by design and so we can assume those to be true

and so now that we have um we've kind of explicitly stated that and you know in the observational setting you can explicitly kind of make transparent what you don't think to be

true and i think that's that that's the beauty of the road map is that you can actually um yeah be transparent about where you

think your assumptions might not hold okay um and so yeah so then we can finally identify it as this blip function um

the blip function being the outcome regression so the expected outcome given uh your treatment equals

cbt and your covariates minus the expected outcome given treatment as usual and the covariates and now if that's bigger than zero if

that if the indicator of that is bigger than zero then treat that person if not don't treat that person so now we've identified it that as a function of what we can actually observe

as opposed to counterfactuals which is what we did in the third step if that makes sense uh yeah it sounds like a

fairly straightforward encoding of what you want to see exactly impact exactly exactly yeah and so so we've um and then we can additionally identify

the value of the rule as well um in a similar way um and then the next step is to estimate and a lot of my talk goes through that as well and so

specifically to estimate the optimal rule we use the super learner algorithm which is this ensemble machine learning method

that takes into account different kinds of ways to estimate the optimal rule um there's been an explosion of methods in the literature of ways to estimate the individualized

treatment rule or optimal dynamic treatment rule and also an explosion of different names for the same exact thing [Laughter] what are some other names that we may have come across for similar ideas

yeah okay so optimal dynamic treatment rule optimal dynamic treatment regime individualized treatment rule or regime

um personalized intervention uh jeez yeah the list kind of goes on depending on your discipline

but yeah there's been an explosion of methods for basically algorithms that get at ways to personalize what people what interventions people

should at and so the superlearner algorithm has this philosophy of well why there's so many great algorithms out there why not combine them

in a smart way and the original super learner was actually came out of prediction and so the original super learner what it does is aim to estimate the outcome

regression really well so the expected outcome you know given some features um and in the same way kind of takes all of the amazing algorithms out there

for pure prediction and combines them um uh using yeah it's an ensemble machine learning algorithm

um and so specifically the super learner for the optimal rule combines different optimal rule algorithms using three different ingredients so the

first ingredient is your library so what are you know the different algorithms that you might have in there so you might have um you know regression approaches that estimate the blip

or you might have outcome weighted learning or residual weighted learning or um there's one called earl

there's a lot of different ones out there okay um and so yeah so you define kind of your and and let me also say that you might have uh optimal rule algo

so these algorithms that kind of take in patient covariates or people's covariates and take and spit out an um a treatment decision you may also include in your library static

rules meaning rules that don't take into account individual characteristics at all so for example the rule give everyone cbt or give everyone treatment regardless of who you are and

so in your library you can you know have whatever algorithms at your disposal that you that you may want to have ranging from for example really simple linear

parametric um regressions to really you know aggressive flexible machine learning algorithms you can have a diversity of these algorithms

the second step that you need is a meta learning step and so that's basically the way that you combine your machine learning your optimal rule algorithms and so you may

combine them so for example if your uh rules all estimate the blip function so a flip function is going to spit out a continuous number

um you may just take a convex combination of all of your blip predictions and combine all of your algorithms in that way but if your library has

they all kind of output a decision rule you may take a majority vote or weighted majority vote of your algorithms so that's step two is a way

to combine all of your algorithms together and then the last step step is that you need a loss function or a risk function to choose the best weighted combination

or choose the best algorithm and so there's different options for that so if again if you have algorithms that all output a blip

so which is going to be a continuous number you may use the mean squared error as your risk or you may use the value of the rule meaning the

expected outcome of each of the candidates because in that way i mean that's ultimately what you're trying to maximize right is the mean outcome so you can you know make sense to use

the mean outcome as the um sort of way of evaluating each of the candidate algorithms is there a methodology specified

as part of super learner for if you've got um if your model library includes both these uh you know blip functions for example and

classifiers for creating a loss function that is appropriate that incorporates all of these different functions or like hierarchically structuring your loss function or

something yeah that's a great question um so i i would say the only sort of um way of choosing that is really for practical

purposes is if your library has only uh so so the way that it's currently implemented right now is that you have to specify it yourself and if you have a library

um with algorithms that only output a um a decision rule but you say that you want to use mse it's just going to throw an error it won't let you

do that and so um so maybe another way to state the question is practically speaking do you have to either choose between a decision rule uh an hour a predictor of a decision or

a predictor of you know whatever the number is like a blip number right right yeah yeah good question okay so that really depends on what you want out of it so if you just want out of it

yes or no treat or not treat then you may go with the library that you know includes the static rules the blip the ones that output a yes or no to treat no

treat it might be of interest to actually see what the estimate the distribution of the estimated blip looks like and so in that case you would want to restrict your library to

algorithms that estimate the blip and so yeah you would only want in your library to have algorithms that would that would let you do that because it is informative to see

that the distribution of of the conditional average treatment effect for your sample got it okay yeah um yeah so those those are the three

things that um that should go and you're super learning i also just want to mention that the this method so the theory and the methods were were developed

um also at berkeley by mark vandalin and alex luedtke and they really were the ones who kind of paved the way for with the theory um and the methods for for doing this

optimal rule super learner and kind of yeah very very groundbreaking method i would say um

yeah and so the super learner um the the models that you're working with are they're not like pre-trained models you

have to then train your super learner right and how was that done so yeah so through that's a great question through cross validation um so yeah all of this is within the

cross validation scheme and so um on training i'm thinking like in the strictly causal formulation of this problem you don't have experimental or

observational data do you is that created through simulation or do you are you training your model based on observed data

later um i i i guess i'm thinking of it as like a supervised learning kind of problem where you have like an observation and a label or observation output yeah

your training is like correlating the two essentially um or training based on the two and that may not be the case here yeah yeah so it's yeah so it's

super it is supervised learning in the sense that you so it's interesting because you're basically trying to find

the the rule that maximizes the outcome that so you're trying to find the candidate rule that maximizes the expected outcome

and so how you do it is that you through cross-validation scheme is you um on the training set you fit a candidate rule and on the validation set you look

at the value of the rule and see how well it performs that candidate rule on that validation set using either of the um loss functions that i

mentioned then you kind of circle around yeah so i think my question is maybe where does the training set come from oh oh okay so the way that we've done it so far is just

um yeah the the sample that we have at hand and just sample splitting okay so earlier when you mentioned that you you kind of emphasize that this is

not observational i took that to me you didn't have actual outcomes but you do actually have that but you're not using that early in your causal formulation of the

the problem is that yeah so what i meant by it's not observational is that it's um so this is again like terminology is like a very um epidemiology sort of public health or

my training is um so observational in the sense that it's not experimental data in that the treatment was um

that that the treatment was randomized in this case it wasn't like we just collected the data um right without yeah like you did observe an actual

the application of the intervention and the outcome and all that stuff and you have that data yes yes yes it's just the study design and i know like yeah this is just like a terminology thing and exactly why we

need to have these conversations because it's observational in your context would be they're just some people i got them and i'm looking at what happened as opposed to i design an experiment with

randomized control trials and okay got it exactly exactly yeah it's it's not that people just got cbt and i have no idea or i have some sense of how they got it but it wasn't like i randomly assigned people to get cbs yeah it makes total

sense yeah yeah okay so you've got this uh step uh six is you've got this model yeah you train this model using super

learner um the do i do i understand the relationship between your

directed graph from step two and your model in step six um [Music] have we talked about that yeah well okay

so i can so the the model in step two allowed us to um to pair yeah that model

with the question that we actually wanted to ask the optimal rule question and say that okay in fact we can estimate like if when we're

estimating it it's a valid i got it estimate of the optimal rule yeah the the model is another name model and step two is like about problem formulation and understanding the problem and the

relationships and then yes the model in step six is a machine learning model that does prediction estimation the things that we usually think of in the context of models yeah yeah so so

model how i use it is um it's a collection of distributions and so um in in that step in that step two when i

talk about causal model it's kind of like formalizing the relationship between the different variables and not imposing any sort of distributional assumptions

um on those variables and then step six which is to estimate um it's to take all of these optimal dynamic treatment rule algorithms that

are within that are within all of this these uh models that i talked about before um to actually say okay we can get valid estimates of the optimal

dynamic treatment rule and even further we can evaluate the rule so get the expected outcome under um under everyone receive the optimal rule and we

can do that in different ways so um what i guess our our thought of is like the standard causal

estimators um so things like the g-computation formula inverse probability treatment weighting a targeted maximum likelihood estimation so those are all sort of

ways of estimating this expected outcome under the optimal rule or the value of the optimal rule okay and that's separate from your

the original estimator that we talked about the super learner yes yes yeah one relates to kind of this first

question like what is a person's optimal uh treatment and the other is how do we measure the average expected outcome if everyone got their

optimal treatment yeah yeah and that is key because i think it's like the thing it's what's most clinically clinically irrelevant most policy relevant

is the value of that optimal rule like is it in fact better to give people cbt in a more individualized way

are average outcomes better under this individualized way of giving interventions that might be more costly to administer or it might be costly to get these

variables or it might be more complicated to kind of give people treatment in a more individualized way versus a non-individualized way which is simply

give everyone cbt or give no one cbt and so we can um that's part of the in this estimation step sort of make those contrasts

between um the expected outcome under this individualized rule minus for example the expected outcome had we given everyone

cbt for example and in that way we can see kind of the added value of giving treatment in an individualized way right right and that gets us to step

seven which is your evaluation of your model yeah exactly so interpretation of what does this actually mean so that's something that you might infer okay

giving uh cbt in a more individualized way might be significantly more effective than um not giving cbt in an individualized way meaning that some

people benefit more from cbt versus treatment as usual which is you know i think important and interesting from a policy and clinical perspective

and is that ultimately what you found yeah so good question um so uh let me just say that with a

big big big uh disclaimer that what i presented at the um conference was with half of the sample size so i don't want to make any

sweeping conclusions or any conclusion definitive conclusions at all but really interestingly what we saw was that the optimal

rule said that people with high substance use levels should get treatment as usual and people with low substance use levels

should get cognitive behavioral therapy um so that's what the rule said that said when we did this contrast when we said okay

what if we had applied this rule and looked at the expected outcomes there doesn't seem to be a significant difference of doing this individualized

way of giving cbt versus for example giving everyone cbt or giving no one cbt um that might be that there's an absence of treatment effect heterogeneity it might

be that we're underpowered still that we don't have the entire sample size um but yeah at the moment we don't see any significant differences between

the three groups yeah um and it could be that cbt you know could only help and it doesn't hurt you know for these folks that's

further along they you know it's not going to help them but it's not going to hurt them and so the average outcome is going to be the same that's right that's right yeah there's no um

right that's not it doesn't really in other words one another way to say it is that the conditional average treatment effect for that kind of person is close to zero that the effect for that kind of person is actually

it's not huge yeah right and it seems like a next step you could go with this research direction is to try to incorporate the notion of a fixed resource in terms

of costs cbc cbt is going to cost for those people that you apply to so you've got some fixed budget you know do you have it seems like you could have greater

overall outcomes given a fixed constraint if you gave cbt to those people who actually benefited from it from the other that might be a way

am i this is just me testing putting all this together absolutely i love that question and that's that's totally one of the next steps is asking the

question what if we had a finite amount of resources let's say for example um only 60 of the population can get cbt

just because we don't have we can't give cbt to everyone um there's uh work out there also um alex ludkin mark vanderlin who

developed this resource constraint optimal rule method that further says you know with this contra

constraint of 60 can only get cbt um who should be getting cbt out of those 60 so basically get the people who are going to benefit the most

from cbt and allocate cbt for them and so with that constraint of only the cap is 60 of the people can get cbt

and is the the optimal dynamic treatment rule is it you know in some way like a fundamental fundamentally causal rule or is it a rule that's sometimes applied

you know you know in a non-causal setting and you know part of what you've done here is apply it in a causal setting and i guess what i mean by that is it a rule that you know is typically applied and was

developed you know in the context of all of the tooling and machinery and methodologies of causal inference or you know is it

you know it could be applied using you know some other type of statistics uh but you know it can also be applied causally and it has all these benefits you know that come from the causal

approach yeah yeah i think i think for one to actually interpret it as how we want to interpret it as an individualized treatment rule like like the

true interpretation of what we want from this as the optimal way to give treatment it is fundamentally a causal parameter and

for us to actually um get at what we want what we actually want which is this causal parameter it's important that we think about the assumptions for getting

the rule and that's i think it is fundamentally a causal parameter and if if those assumptions aren't examined that it could lead to

bias and not really us interpreting it as what it actually isn't um yeah

uh future directions in terms of your research on this yeah in terms of my research on this so exactly what you mentioned the resource constraints

um to see if you know if certain percent of the population was to was allowed to get cbt what would expected outcomes look like um

of course we want to do this on the entire sample and i think we're almost wrapped up data collection which is very exciting um i've i've also

been implementing this method on um a smart trial so that's a sequential multiple assignment randomized trial um

that wrapped up in kenya and that was looking um to see different kinds of interventions for people to stay in hiv care in rural

kenya and so the design of a smart is quite interesting so basically people

were initially randomized to get a low intensity intervention to stay in treatment so for example sms messages

standard of care or voucher so that was at the beginning randomized to one of those three treatments if a person missed a visit in their first year of care they were re-randomized to a more

intensive treatment so sms plus voucher or a peer navigator or this more intensive standard of care if a person stayed in care

in the first year then they were randomized to either stay on their initial treatment their initial low intensity treatment or not stay on their initial low intensity treatment

and so this smart design is really awesome because it basically in the design uh by design you're allowed to answer

answer these um sequential optimal dynamic treatment rules uh so yeah by design you kind of have the causal assumptions baked into in there

to be able to get at a sequential optimal dynamic treatment rule meaning what's the best way to assign this um this initial intervention and secondary

intervention in the most optimal way to for example maximize people's time and care and and not have people drop out of their hiv care

and going back to step two in our causality causal roadmap are you applying uh the

optimal um decision optimal dynamic treatment rule yeah at each of these steps separately or do you have a larger more uh expressive graph

that you formulate around this problem that you so you're kind of solving [Music] all of the steps at once okay awesome question so so within this smart design we can ask

actually all different kinds of optimal rule questions we may ask the question what's the uh best way to assign that initial intervention we could ask the question what's the

best way to assign the secondary intervention among people who are lost to follow-up we could do it among people who actually stayed and care what's the best way to give or take away that uh second that initial

intervention so there's those kind of like point treatment um optimal rule questions that we can ask um so in that case those three would have kind of a simple

model of just those three variables that i talked about at the beginning if we were to look at the sequential optimal rule so if now our causal

question is what's the best way to give the primary and secondary intervention in sequence then our causal model is going to be something it's going to be very complex it's going

to have um you know covariates at the beginning time varying covariates it's going to have two different interventions it's going to have an indicator of loss to

follow-up it's going to have an outcome and then all of the arrows either going into each other or not right um and so so yeah it

again like totally depends on whatever question you're asking are you asking about just one time point are you asking about the sequential intervention and that's going to inform what kind of

step two what kind of model you're going to have awesome awesome well thanks so much for taking the time to share with us a bit about what you're up to

uh very fascinating stuff and appreciate it oh thank you so much for having me yeah thanks so much thank you bye

Loading...

Loading video analysis...