Interpret & Summarize Mixture Models with Auxiliary Variables Distal & Moderation Model Examples
By IMMERSE Training Program
Summary
Topics Covered
- Three Classes Reveal School Harassment Patterns
- Manual Three-Step Beats BCH on Low Entropy
- Distal Outcomes Differ by Class Significantly
- Covariate Control Equalizes Class Means
- Lunch Program Moderates Test Scores by Class
Full Transcript
hello and welcome to this Ameris video series my name is Adam and in this video I'm going to cover how to interpret and
summarize mixture models with auxiliary variables so this uh video is supported by an IES funded training Grant and you can learn more about this
project by visiting our website or following us on Twitter and there'll be lots of other resources on our GitHub account
so here is a short description of what we're going to cover in this tutorial and I recommend looking at these
resources these sources if you're interested in learning more about M plus automation uh the technical documentation for the
three-step as well as other multi-step approaches that we're going to rely on to conduct this analysis
so all the the materials associated with this tutorial can be found at our GitHub on our GitHub page and so uh these are
include all the materials for following along including the data scripts and figures produced in this in this video
so if you want to practice you can find those materials here and download them we are going to continue on with this ongoing example the from the Civil
Rights data collection Repository and there's some more information about this data set if interested for this example we're going to use LCA
indicators measured for the State of Arizona in 2017 and our digital outcome variables are measured in 2018
so here we can take a look quick look at our variables um these are six LCA indicators which
include three binary variables um on whether the school reported students being crossed on the basis of disability
race or sex how as well as three indicators on the number of staff full-time equivalent staff a counselors
psychologists and Law Enforcement Officers whether or not they're present at school and here are auxiliary variables the
first one lunch program will be a covariate in our analyzes and this is a binary variable that indicates whether
the school has a lunch program or not at the school the next two variable is reading tests and math tests
will specify as outcome variables in our models and this is the average reading test or the average math test assessment score at the school
so this video Builds on uh previous videos in this video series so
um if you haven't followed along already there is a video on enumeration and the three-step method so these are steps that proceed
um uh specifying auxiliary relations so enumeration needs to be conducted first
and in this three-step video I go into greater detail into the syntax uh for conducting the three-step method in this video I'm going to skip quickly through
this syntax as I've covered it and already in a previous video and so um just for context here I'm showing the
solution that was arrived upon in the enumeration step so we conducted an enumeration and the three-step the three cost model
that best and you can see here the class sizes as well as the class indicator patterns
uh we can see in this green green class which represents about 11 of the sample we have relatively High endorsement
across the harassment variables as well as the school staff variables and in this pink class or red class which is
about a quarter of this sample we have a lower endorsement across the three harassment variables and relatively higher endorsement across
the school staff variables and this blue class is the majority of the sample of approximately 65 percent
of the sample and has low endorsement um across all indicators relative to the other two classes so I'm going to switch over now to
rstudio and get started with this uh these analyzes let me scroll up here so first I'm going to load in these
packages and we have packages for M plus automation for estimating the models the Tidy verse for
manipulating data sets as well as some packages for tape for table making tables and plotting
and next I'm going to read in the data so you can see our data is in this data subfolder and we're going to name it data three-step
now we're going to next conduct the manual three-step the reason we're doing this uh as opposed to doing the using the bch
approach is that this solution has relatively low entropy or high classification error so The bch Returned
an inadmissible solution because of negative weights and so that is the reason we're using that three-step in this example
and here is the code for running uh the step one and step two models and I'm just going to skip quickly
through this we have our auxiliary variables listed here and in this next step we're going to
extract the logits and then use these to fix the parameters using the logit and this nominal um and variable
to fix the parameters in this Step 2 model and so I'm going to run this really quickly and these models are generated here in
this step 3 M plus folder and so we can see see our step one and step two models we would
want to in a real research context we would want to look closely to make sure that these models estimated correctly
and that there's not too dramatic shifting of the class sizes or item probability parameters
and now we can go on to our first example of auxiliary variable relations which is going to be just a simple
distal outcome model so in this model we have our latent class phenomenal variable with three classes that's right
predicting reading test assessment and mock test assessment and so we can go on to is to specify this model [Music]
um this uh we're again fixing our parameters with the three-step but the knee syntax here is in this model
statement um under the overall statement we're just going to list our distal outcome variances so estimate those variances
and um then under each class specific statement we're going to freely estimate uh the the interstitial outcome
intercept means for reading tests and math assessment map test assessment as well as the variances so the means and variances are going to
be conditional on latent class and we're going to label these so that we can estimate pairwise mean differences
so this is just repeated here under Class 2 and the class 3 statements in the model constraint where can we're creating some new and additional
parameters which we've named here um we have our our reading test pairwise differences and our math test pairwise
differences so we have all possible pairwise differences listed here for for example for reading tests this is the uh
mean for class one uh compared to class two and under model test we're also going to
conduct a wall test or Omnibus tests to see if at least one of these pairwise differences is different but we can only
conduct one wild test per output report per estimated model so uh
to work around this you can comment out this syntax and uncomment this and then you could run
um a model by a different model and change the name so it doesn't overwrite your previous model
and that way you can get a weld test for both the reading and math test assessment outcomes so I'm going to just go ahead and run this model
which we've named example one distal model so we can take a look here at this model output and I'm just scrolling down here to
the model output so the the model estimated normally and we can see here uh there the the main focus of these
results an additional model will be these distal intercept means and we can see that these are freely
estimated by costs they're different in each class and we can see the magnitude looks
uh smaller for class one compared to class two and class 3 has the highest reading average reading test and math
test assessments so now we would want to determine whether these differences these mean differences are significant or not and
we can see this here um for uh the reading test assessment and the amount test assessment in this case all
uh pairwise differences are significant and so next um another way to present possibly to present these results is either in a
table or in a plot so I'm going to show how to create a plot using R and N plus automation or to plot these distal outcome mean
differences and um so what we're going to do is read in this example one distal uh distal model
output file and then extracts the relevant parameters so we can see here um
we call this data frame model step three we can see we have a lot of all the parameters here but we just need some of these to create the distal plot so we're
going to manipulate this data frame a bit and here we're just going to filter out these means
with these names uh just filter those rows and then here we're just changing um some of the labels so that it makes a
cleaner looking plot that is easier to read so you can see here what we've produced um and we have
all the information we need to create our our distal mean plot so using ggplot we're going to create this plot and
um here we can see that for this pink or red class which has is
characterized by low reported harassment and high on the staff indicators we have the lowest average reading and
test assessment values in those schools and then moderately higher assessment values for this green class and
much higher values for the blue crop class which is almost uh is is uh twice as high as the values for
the the pink or red cloths so this is a potentially interesting result um you could also add information about
uh significant pairwise differences but I haven't done that yet in this plot but it should be it would be pretty easy to add those in um
so for our next example um we're going to specify a digital outcome model with a covariate control
so here we can take a look at this model um in this pop diagram we can see uh that the covariate lunge program
predicts the latent variable um harassment and stuff um like mixture model nominal classes
and then delayed variable predicts related classes predict reading tests and map test assessment we also have the direct or main effect
of X predicting um
both distal outcomes and it is common as for like depending on your research question
that you would include these relations or you might get bias estimates so it can continue on to specifying this model
um in this model we have here I'm skipping down to the model statement we have uh here we have
uh the S estimating the covariate as a predictor of latent class so that's that regression line of X on C
and then we have um the we're estimating the direct effect of Y on X in this overall
statement so these will be um held constant across class um and we we're also going to just list
these variances for the distal outcomes under each class specific statement again uh this is going to be the same as the previous example we have listed our
intercept and variance terms and these are repeated under each class specific statement the model constraint and model tests
will be the same and in this example I will estimate that well the wall test for math test assessment I realized I
forgot to look over the wall test results in the previous example so I will take a look at that in in this in this example
thank you I'm going to run this model really quickly and this model we named ex2 digital Cove model so we can see here
our model output and I'm going to scroll up to the walled test I can find it here we have the wall test
and we can see that for the math test assessment um there are um since this p-value is non-significant in the case that there
are no significant pairwise mean differences in this model and so that's an interesting result which contrasts from the previous example
and scrolling down here to the model results we see that these
regression of x on y for reading test and mock test and these are
held constant the same across class and um we also have our intercepts here
which after controlling for the covariate we can see now our are much closer in magnitude for class
one class two as well as class three foreign
we again can create a distal outcome plot uh showing those mean differences in this case there are no significant mean differences
no I I think I forgot to show that uh here we can see that all the pairwise mean differences are non-significant
uh indicated here and so now uh since we've already gone over this Syntax for
uh creating these plots I'm going to just um look at them in the markdown it'll look a little bit nicer for example two
scrolling down so here's the syntax we can note that one thing has changed when you add covariates in the model the
labels for these intercepts or means changes from in the previous plot plot
syntax it this was a means and now it's intercept so you need to look within this um
uh model step three data object and see how those parameters are labeled so that you can extract the correct values and here we're just calling this
the same name so it's going to overwrite from the previous example and again use ggplot to create this plot and
here we can see that those distal means are are fairly close and the standard errors are overlapping which indicates that those pairwise differences are
non-significant so here um in our third example we're going to conduct a moderation
model and here we can see that we have our covariate lunch program regressed
as a predictor of reading tests and math test assessments we have these main effects and then they're going to be moderated by our latent class variable
and this is specified by simply putting the Y on X regression under each of the cost specific statements and estimating those
conditional slopes um the slopes will be will be different um based on class we're freely estimated
so returning to Art Studio um I'm gonna skip down here and here is our Syntax for specifying this moderation
model and we can see uh first here that we're going to Center our covariate this is uh
considered recommended practice often in moderation so that our distal intercepts are held at the average of of
lunch program at the grand mean a lunch program as opposed to a different reference class like which
would be indicated by zero and under the model statement we see we have our our outcomes our
covariate and our moderator C and [Music] um in the overall statement we have these which are I I think the syntax is
redundant I think the model would run without that these because we also have these um specified these regression slopes
uh and variances specified in our class specific statements so here for class 1 we're going to estimate our mean
intercept again our variance and then this slope coefficient of
um reading test on a lunch program and we're going to do the same for math test and repeat this syntax under each class
specific statement um for each of these slope coefficients we've also labeled these because we can so that we can um estimate the pairwise slope
differences and use these to evaluate our results to see if there is an in fact a moderation effect or the slopes are different
between classes so in the model constraint we've named our distal pairwise distal mean differences as well as our slope
differences so we have additionally these slope differences at pairwise differences estimated here and into the model test now there are
four um Omnibus tests that we would be interested in and send a real research context we would need to um
create multiple out multiple runs of this file to estimate each weld test
and get the values for those weld tests and so I'm just going to quickly run this model
and uh additionally uh to create this simple slopes plot we're going to need a model with the covariant
uncentered this allows us to get uh the estimates when um our covariant lunch program is at the reference category or zero
and use those to create those simple slopes plots we need those for or in this case six points um
for each class by lunch program is zero and when each for each class when lunch program is is one so we can do this using this update
function from M plus automation which allows us to take the previous model and this is uh the previous models input
syntax and we're just going to update this Define section of the input syntax and just remove the centering
argument and and so and then we'll just rerun this model and call it example three uncentered so that's a convenient
way to update and create new models in an iterative fashion so I'm just going to run this
and we have our uncentered model here and now I'm going to skip back to the markdown to take a look at these plot results
scrolling down to example three so we here we have our plot syntax and um oh here
for our district outcome plot uh I'm not going to go over this in detail because we've already used this same syntax to
create distal plots for the previous two examples so we can just see um the output results here and you can
see these A and B into X's refer to which um means are significantly different so this ping class uh the mean is
significantly different than the green and blue classes but the green and blue classes are not significantly different pairwise differences and um
so yeah this has result has changed from the previous example with the moderation specified and um
now we can create these simple slope plots to present the slope diff differences across class and
um we're going to read in that uncensored model and then extract these relevant parameters so we're going to need the
slope parameters and intercepts and um I'm just doing a little bit of a
manipulating the the labels that will show up in the plot and um also to prepare this data we need to
convert it from long format to wide format and we'll create a simple slow plot for
both for the reading outcome and the math outcome and then we'll put those plots together and um
so here we have a reading simple slopes plot produced we're going to use this the similar syntax to create the data
frame for this math outcome simple slopes plot and then create a plot here and then we're going to combine the two plots
using this Patchwork package where we're going to have the reading plot on top of the math plot so that's done here with this syntax and
we can see here our simple slopes plot nicely and we can see an inch potentially interesting result
is that for this pink or red class um schools that do not have a lunch program are performing really low on these reading
and math test assessments so this is something we might want to look into further to understand why these schools are performing so much lower
um then the the schools in these blue and green classes also it's interesting that
uh for schools with a lunch program they're all performing clustered around this 40 um Mark for both reading and math test
assessments um so it's really these schools without a lunch program that probably uh
range in SCS so these I would guess are schools that um our higher SES schools that are
performing well without a lunch program because it's not needed and then these are schools which may not have the resources needed
or infrastructure for a lunch program and these schools are performing much lower so thank you for
your time and follow along for future videos uh this this video relies heavily on the M plus automation package and
plus as well as other packages so um these reference these these sources thank you foreign
Loading video analysis...