LongCut logo

Interpret & Summarize Mixture Models with Auxiliary Variables Distal & Moderation Model Examples

By IMMERSE Training Program

Summary

Topics Covered

  • Three Classes Reveal School Harassment Patterns
  • Manual Three-Step Beats BCH on Low Entropy
  • Distal Outcomes Differ by Class Significantly
  • Covariate Control Equalizes Class Means
  • Lunch Program Moderates Test Scores by Class

Full Transcript

hello and welcome to this Ameris video series my name is Adam and in this video I'm going to cover how to interpret and

summarize mixture models with auxiliary variables so this uh video is supported by an IES funded training Grant and you can learn more about this

project by visiting our website or following us on Twitter and there'll be lots of other resources on our GitHub account

so here is a short description of what we're going to cover in this tutorial and I recommend looking at these

resources these sources if you're interested in learning more about M plus automation uh the technical documentation for the

three-step as well as other multi-step approaches that we're going to rely on to conduct this analysis

so all the the materials associated with this tutorial can be found at our GitHub on our GitHub page and so uh these are

include all the materials for following along including the data scripts and figures produced in this in this video

so if you want to practice you can find those materials here and download them we are going to continue on with this ongoing example the from the Civil

Rights data collection Repository and there's some more information about this data set if interested for this example we're going to use LCA

indicators measured for the State of Arizona in 2017 and our digital outcome variables are measured in 2018

so here we can take a look quick look at our variables um these are six LCA indicators which

include three binary variables um on whether the school reported students being crossed on the basis of disability

race or sex how as well as three indicators on the number of staff full-time equivalent staff a counselors

psychologists and Law Enforcement Officers whether or not they're present at school and here are auxiliary variables the

first one lunch program will be a covariate in our analyzes and this is a binary variable that indicates whether

the school has a lunch program or not at the school the next two variable is reading tests and math tests

will specify as outcome variables in our models and this is the average reading test or the average math test assessment score at the school

so this video Builds on uh previous videos in this video series so

um if you haven't followed along already there is a video on enumeration and the three-step method so these are steps that proceed

um uh specifying auxiliary relations so enumeration needs to be conducted first

and in this three-step video I go into greater detail into the syntax uh for conducting the three-step method in this video I'm going to skip quickly through

this syntax as I've covered it and already in a previous video and so um just for context here I'm showing the

solution that was arrived upon in the enumeration step so we conducted an enumeration and the three-step the three cost model

that best and you can see here the class sizes as well as the class indicator patterns

uh we can see in this green green class which represents about 11 of the sample we have relatively High endorsement

across the harassment variables as well as the school staff variables and in this pink class or red class which is

about a quarter of this sample we have a lower endorsement across the three harassment variables and relatively higher endorsement across

the school staff variables and this blue class is the majority of the sample of approximately 65 percent

of the sample and has low endorsement um across all indicators relative to the other two classes so I'm going to switch over now to

rstudio and get started with this uh these analyzes let me scroll up here so first I'm going to load in these

packages and we have packages for M plus automation for estimating the models the Tidy verse for

manipulating data sets as well as some packages for tape for table making tables and plotting

and next I'm going to read in the data so you can see our data is in this data subfolder and we're going to name it data three-step

now we're going to next conduct the manual three-step the reason we're doing this uh as opposed to doing the using the bch

approach is that this solution has relatively low entropy or high classification error so The bch Returned

an inadmissible solution because of negative weights and so that is the reason we're using that three-step in this example

and here is the code for running uh the step one and step two models and I'm just going to skip quickly

through this we have our auxiliary variables listed here and in this next step we're going to

extract the logits and then use these to fix the parameters using the logit and this nominal um and variable

to fix the parameters in this Step 2 model and so I'm going to run this really quickly and these models are generated here in

this step 3 M plus folder and so we can see see our step one and step two models we would

want to in a real research context we would want to look closely to make sure that these models estimated correctly

and that there's not too dramatic shifting of the class sizes or item probability parameters

and now we can go on to our first example of auxiliary variable relations which is going to be just a simple

distal outcome model so in this model we have our latent class phenomenal variable with three classes that's right

predicting reading test assessment and mock test assessment and so we can go on to is to specify this model [Music]

um this uh we're again fixing our parameters with the three-step but the knee syntax here is in this model

statement um under the overall statement we're just going to list our distal outcome variances so estimate those variances

and um then under each class specific statement we're going to freely estimate uh the the interstitial outcome

intercept means for reading tests and math assessment map test assessment as well as the variances so the means and variances are going to

be conditional on latent class and we're going to label these so that we can estimate pairwise mean differences

so this is just repeated here under Class 2 and the class 3 statements in the model constraint where can we're creating some new and additional

parameters which we've named here um we have our our reading test pairwise differences and our math test pairwise

differences so we have all possible pairwise differences listed here for for example for reading tests this is the uh

mean for class one uh compared to class two and under model test we're also going to

conduct a wall test or Omnibus tests to see if at least one of these pairwise differences is different but we can only

conduct one wild test per output report per estimated model so uh

to work around this you can comment out this syntax and uncomment this and then you could run

um a model by a different model and change the name so it doesn't overwrite your previous model

and that way you can get a weld test for both the reading and math test assessment outcomes so I'm going to just go ahead and run this model

which we've named example one distal model so we can take a look here at this model output and I'm just scrolling down here to

the model output so the the model estimated normally and we can see here uh there the the main focus of these

results an additional model will be these distal intercept means and we can see that these are freely

estimated by costs they're different in each class and we can see the magnitude looks

uh smaller for class one compared to class two and class 3 has the highest reading average reading test and math

test assessments so now we would want to determine whether these differences these mean differences are significant or not and

we can see this here um for uh the reading test assessment and the amount test assessment in this case all

uh pairwise differences are significant and so next um another way to present possibly to present these results is either in a

table or in a plot so I'm going to show how to create a plot using R and N plus automation or to plot these distal outcome mean

differences and um so what we're going to do is read in this example one distal uh distal model

output file and then extracts the relevant parameters so we can see here um

we call this data frame model step three we can see we have a lot of all the parameters here but we just need some of these to create the distal plot so we're

going to manipulate this data frame a bit and here we're just going to filter out these means

with these names uh just filter those rows and then here we're just changing um some of the labels so that it makes a

cleaner looking plot that is easier to read so you can see here what we've produced um and we have

all the information we need to create our our distal mean plot so using ggplot we're going to create this plot and

um here we can see that for this pink or red class which has is

characterized by low reported harassment and high on the staff indicators we have the lowest average reading and

test assessment values in those schools and then moderately higher assessment values for this green class and

much higher values for the blue crop class which is almost uh is is uh twice as high as the values for

the the pink or red cloths so this is a potentially interesting result um you could also add information about

uh significant pairwise differences but I haven't done that yet in this plot but it should be it would be pretty easy to add those in um

so for our next example um we're going to specify a digital outcome model with a covariate control

so here we can take a look at this model um in this pop diagram we can see uh that the covariate lunge program

predicts the latent variable um harassment and stuff um like mixture model nominal classes

and then delayed variable predicts related classes predict reading tests and map test assessment we also have the direct or main effect

of X predicting um

both distal outcomes and it is common as for like depending on your research question

that you would include these relations or you might get bias estimates so it can continue on to specifying this model

um in this model we have here I'm skipping down to the model statement we have uh here we have

uh the S estimating the covariate as a predictor of latent class so that's that regression line of X on C

and then we have um the we're estimating the direct effect of Y on X in this overall

statement so these will be um held constant across class um and we we're also going to just list

these variances for the distal outcomes under each class specific statement again uh this is going to be the same as the previous example we have listed our

intercept and variance terms and these are repeated under each class specific statement the model constraint and model tests

will be the same and in this example I will estimate that well the wall test for math test assessment I realized I

forgot to look over the wall test results in the previous example so I will take a look at that in in this in this example

thank you I'm going to run this model really quickly and this model we named ex2 digital Cove model so we can see here

our model output and I'm going to scroll up to the walled test I can find it here we have the wall test

and we can see that for the math test assessment um there are um since this p-value is non-significant in the case that there

are no significant pairwise mean differences in this model and so that's an interesting result which contrasts from the previous example

and scrolling down here to the model results we see that these

regression of x on y for reading test and mock test and these are

held constant the same across class and um we also have our intercepts here

which after controlling for the covariate we can see now our are much closer in magnitude for class

one class two as well as class three foreign

we again can create a distal outcome plot uh showing those mean differences in this case there are no significant mean differences

no I I think I forgot to show that uh here we can see that all the pairwise mean differences are non-significant

uh indicated here and so now uh since we've already gone over this Syntax for

uh creating these plots I'm going to just um look at them in the markdown it'll look a little bit nicer for example two

scrolling down so here's the syntax we can note that one thing has changed when you add covariates in the model the

labels for these intercepts or means changes from in the previous plot plot

syntax it this was a means and now it's intercept so you need to look within this um

uh model step three data object and see how those parameters are labeled so that you can extract the correct values and here we're just calling this

the same name so it's going to overwrite from the previous example and again use ggplot to create this plot and

here we can see that those distal means are are fairly close and the standard errors are overlapping which indicates that those pairwise differences are

non-significant so here um in our third example we're going to conduct a moderation

model and here we can see that we have our covariate lunch program regressed

as a predictor of reading tests and math test assessments we have these main effects and then they're going to be moderated by our latent class variable

and this is specified by simply putting the Y on X regression under each of the cost specific statements and estimating those

conditional slopes um the slopes will be will be different um based on class we're freely estimated

so returning to Art Studio um I'm gonna skip down here and here is our Syntax for specifying this moderation

model and we can see uh first here that we're going to Center our covariate this is uh

considered recommended practice often in moderation so that our distal intercepts are held at the average of of

lunch program at the grand mean a lunch program as opposed to a different reference class like which

would be indicated by zero and under the model statement we see we have our our outcomes our

covariate and our moderator C and [Music] um in the overall statement we have these which are I I think the syntax is

redundant I think the model would run without that these because we also have these um specified these regression slopes

uh and variances specified in our class specific statements so here for class 1 we're going to estimate our mean

intercept again our variance and then this slope coefficient of

um reading test on a lunch program and we're going to do the same for math test and repeat this syntax under each class

specific statement um for each of these slope coefficients we've also labeled these because we can so that we can um estimate the pairwise slope

differences and use these to evaluate our results to see if there is an in fact a moderation effect or the slopes are different

between classes so in the model constraint we've named our distal pairwise distal mean differences as well as our slope

differences so we have additionally these slope differences at pairwise differences estimated here and into the model test now there are

four um Omnibus tests that we would be interested in and send a real research context we would need to um

create multiple out multiple runs of this file to estimate each weld test

and get the values for those weld tests and so I'm just going to quickly run this model

and uh additionally uh to create this simple slopes plot we're going to need a model with the covariant

uncentered this allows us to get uh the estimates when um our covariant lunch program is at the reference category or zero

and use those to create those simple slopes plots we need those for or in this case six points um

for each class by lunch program is zero and when each for each class when lunch program is is one so we can do this using this update

function from M plus automation which allows us to take the previous model and this is uh the previous models input

syntax and we're just going to update this Define section of the input syntax and just remove the centering

argument and and so and then we'll just rerun this model and call it example three uncentered so that's a convenient

way to update and create new models in an iterative fashion so I'm just going to run this

and we have our uncentered model here and now I'm going to skip back to the markdown to take a look at these plot results

scrolling down to example three so we here we have our plot syntax and um oh here

for our district outcome plot uh I'm not going to go over this in detail because we've already used this same syntax to

create distal plots for the previous two examples so we can just see um the output results here and you can

see these A and B into X's refer to which um means are significantly different so this ping class uh the mean is

significantly different than the green and blue classes but the green and blue classes are not significantly different pairwise differences and um

so yeah this has result has changed from the previous example with the moderation specified and um

now we can create these simple slope plots to present the slope diff differences across class and

um we're going to read in that uncensored model and then extract these relevant parameters so we're going to need the

slope parameters and intercepts and um I'm just doing a little bit of a

manipulating the the labels that will show up in the plot and um also to prepare this data we need to

convert it from long format to wide format and we'll create a simple slow plot for

both for the reading outcome and the math outcome and then we'll put those plots together and um

so here we have a reading simple slopes plot produced we're going to use this the similar syntax to create the data

frame for this math outcome simple slopes plot and then create a plot here and then we're going to combine the two plots

using this Patchwork package where we're going to have the reading plot on top of the math plot so that's done here with this syntax and

we can see here our simple slopes plot nicely and we can see an inch potentially interesting result

is that for this pink or red class um schools that do not have a lunch program are performing really low on these reading

and math test assessments so this is something we might want to look into further to understand why these schools are performing so much lower

um then the the schools in these blue and green classes also it's interesting that

uh for schools with a lunch program they're all performing clustered around this 40 um Mark for both reading and math test

assessments um so it's really these schools without a lunch program that probably uh

range in SCS so these I would guess are schools that um our higher SES schools that are

performing well without a lunch program because it's not needed and then these are schools which may not have the resources needed

or infrastructure for a lunch program and these schools are performing much lower so thank you for

your time and follow along for future videos uh this this video relies heavily on the M plus automation package and

plus as well as other packages so um these reference these these sources thank you foreign

Loading...

Loading video analysis...