PSY9185 - Multilevel models
Longitudinal data analysis using multilevel models
Nikolai Czajkowski
Today
- Introduction to longitudinal data and LMM.
- Plotting longitudinal data.
- Basic two-level longitudinal
- Fixed and random coefficients
- Non-linear and piecewise continuous change
- Including predictor in the model
- Time invariat predictors
- Time varying predictors
Books
- Several examples borrowed from Mirman (2014).
- Lesa Hoffman has links to several full video courses in longitudinal modelling on her homepage.
R resources
Fitting mixed models in R:
- https://m-clark.github.io/mixed-models-with-R/introduction.html
- https://rpsychologist.com/r-guide-longitudinal-lme-lmer
Books on R:
- Cookbook for R: http://www.cookbook-r.com/
- R for data science: https://r4ds.had.co.nz/
Online resources
- R-bloggers; R news and tutorials contributed by hundreds of R bloggers.
- Stackoverflow: question and answer site for programmers.
Multilevel tutorials
To tutorials available here:
Clustered data: https://shinyibv02.uio.no/connect/#/apps/433caf55-e0da-477a-920e-bc044ef5030f/access
Longitudinal analysis: https://shinyibv02.uio.no/connect/#/apps/bf3cac08-66ca-45d0-a45c-9b9402aa6c84/access
SECTION: Introducing longitudinal analysis
Multilevel models for clustered data (1)
Classes 2
Classes 3
Classes 4
Two sides of all models
Models are an idealized account of:
- How the expected valueis related to an independent variable (mean).
- How people vary (variance).
Linear regression only models the mean value, not the variance.
Assumptions in linear models
Yi=b0+b1⋅Xi+ϵiϵi∼N(0,σ2)
- The relationship between X and Y is linear.
- The error variance (around the regression line) is normally distributed and is constant (i.e. the sd is the same) at every level of X.
- The observations are independent.
- The independent variables are measured without error.
Which assumtion is really important?
- We often focus on the extent to which the residuals are normally distributed, but in practise this rarely has a major impact on the standard errors (or p-values).
- Dependency in the data can render all p-values completely biased.
Multilevel data structures
- Data can be nested within higher order units.
- Different variables are measured at different levels.
- Traditionally handeled by aggregation or disaggregation.
Longitudinal data structures
In longitudinal multilevel models, observations are nested within persons.
- Level 1 units are individual observations
- Level 2 units are usually individuals
- People may be nested under higher order units (therapists, treatment centers, etc).
Time will be included as a level 1 predictor.
What are longitudinal models?
Time course data are the result of repeated measurments at multiple (at least two) time points.
- These sorts of data are also called longitudinal.
Two key properties distinguish time course data from other kinds of data:
- Groups of observations all come from one source (nested data).
- The repeated measurements are related by a continuous variable, usually that variable is time, but any continuous variable will do.
- We will also look at other variables, like set-size.
For analyses we will use multilevel (mixed) models that include time course data as a level-1 predictor.
- if you asked participants to name letters printed in different sizes, you could examine the outcome (letter recognition accuracy) as a function of the continuous predictor size.
Benefits of multilevel models in general
Flexibility in modelling dependency across observations
- LMMs provide a flexible framework to model and account for various sources of non-independence, reducing biased estimates and increasing the accuracy of inferences.
Avoid choosing between individuals or groups as the unit of analysis
- Cross-level interactions are uniquely possible in MLM.
Increased statistical power
- By including random effects, LMMs can improve the precision of estimates and increase statistical power compared to traditional regression models.
Generalizability
- LMMs allow researchers to investigate both the average effect of predictors and their variations across different levels, helping to generalize findings across various contexts and populations.
Study variation between and within people
- Ask more complex and interesting questions
Benefits of MLM for longitudinal data
- Handle dependency
- Traditional analyses only designed to handle dependency due to constant mean differences.
- Handles missing
- Number of observations can vary between individuals.
- Flexible treatment of time
- Measures can be taken at fixed or at varying occasions.
- No assumption that an equal amount of time has elapsed.
- Investigate within-person relationships
- Between and within effects can differ, even in direction.
longitudinal studies provide the opportunity to test hypotheses at multiple levels of analysis simultaneously
- models in this text will allow us to exam- ine both between-person and within-person relationships in the same variables at the same time
- For example, we might posit a link between stress and negative mood, such that greater amounts of stress will result in greater negative mood. But at what level of analysis is this relationship likely to hold: between persons, within persons, or both?
SECTION: Plotting longitudinal data
Plotting longitudinal data
Plotting is an even more critical first step longitudinal analysis than cross-sectional studies.
We will make the following classes of figures:
- Individual change curves.
- Plots of group mean levels and CI/confidence regions.
- Stratified plots (assess interactions).
Why ggplot?
- Plots are publication grade quality.
- The syntax is flexible and powerful framework for visualizing data.
- Summary statistics like means and standard errors can be computed while plotting.
Introduction to ggplot2
- ggplot2 is a powerful R package for data visualization
- Built on the principles of the "Grammar of Graphics"
- first assign variables in your data to properties of the graph. These assignments or mappings are called the aesthetics of your graph. Then you select “geometries,” or geoms — points, lines, bars, etc. — for those aesthetics.
- Create complex, multi-layered plots using a consistent syntax
- Easily customizable and extensible
- Widely used in data analysis and research
Spaghetti Plots for Longitudinal Data
- Visualize individual trajectories over time
- Identify patterns and trends in the data
- Assess variability between and within groups
- Detect potential outliers or influential cases
- Compare individual trajectories to the overall fixed effect of time
Generating a Spaghetti Plot with ggplot2
Below is the R-code used to generate the plot.
library(ggplot2)# Create spaghetti plotggplot(data, aes(x = time, y = y, group = id, color = group)) + geom_line(alpha = 0.5) + geom_line(data = fixed_effect_data, aes(x = time, y = y, group = 1), color = "black", size = 1.5, linetype = "solid") + labs(title = "Example Spaghetti Plot", x = "Time", y = "Outcome") + theme_minimal()# Save plot to a fileggsave("example_spaghetti_plot.png", width = 7, height = 5)
SECTION: Time-only (Unconditional) models
Features of longitudinal data
Multilevel longitudinal models permit analyses at different levels:
Between-person variance:
- Inter-individual variation.
- Ex. Biological sex, ethnicity.
Within-person variance:
- Intraindividual variation.
- Ex. Sleep the previous night.
- Eg. resting heart rate vs. exercise.
Features of longitudinal data (2)
- Within person change: Specific type of within person variation, that refers to any systematic change that is expected as a result of meaningful passage of time.
- Within person fluctuation: Undirected variation over repeated assesments seen in contexts in which one would not expect systematic change.
Unconditional longitudinal models
In unconditional longitudinal models, time is the only predictor.
- Included as a level-1 variable.
Critical first questions:
- What units should time be measured in?
- What constitutes "time=0"?
- E.g. If age, it is best to center age on another value than "0".
- How do you expect development over time to be?
- Linear, non.linear, abrupt/discontinuous?
Introducing idealized growth curves
Between-person and Within-person empty models
Empty models partition the variance, but don't account for it.
Between person empty model
yti=β0FixedIntercept+ϵtiError
Within person empty model yti=β0FixedIntercept+U0iRandomIntercept+ϵtiError
Fixed and random effects
Level 2: β0i=γ00+U0i
Level 1: yti=β0i+ϵti
Composite: yti=(γ00+U0i)+ϵti
yti=β0+u0i+ϵti
- We instead argue that deviation from the grand mean is attributable to a normally distributed residual at level 2. Now we can account for this variance by estimating a parameter (variance of 𝑈_0𝑖) instead of estimating the fixed effects of 49 dummy variables. Random effects are not designed to make inferences between the specific variants included in the study (can’t compare participant 1 to participant 50). Could we include a random effect of time? This would require at least three measurements per person.
Why random intercept?
Differences between individuals could be handled by N-1 dummy variables.
- Many estimated parameters and loss of power.
With MLM, clustering is accounted for by only one parameter, the variance of a normal distribution.
Model 1: Random intercept
Level 2: β0i=γ00+U0iU0i∼N(0,σU0i)
Level 1: yti=β0i+ϵtiϵti∼N(0,σϵ)
Composite:
yti=(γ00FixedIntercept+U0iRandomIntercept)+ϵtiError
Model 1: Random intercept
.left-column[
Level 2: β0i=γ00+U0iU0i∼N(0,σU0i)
Level 1: yti=β0i+ϵtiϵti∼N(0,σϵ)
Composite:
yti=(γ00FixedIntercept+U0iRandomIntercept)+ϵtiError
# Load necessary librarieslibrary(lme4)# Fit the random intercept modelmodel1 <- lmer(y ~ 1 + (1|subject), data=data)# Print the model summarysummary(model1)
Fixed and random effects
Multilevel (mixed) models contain two classes of coefficients.
Fixed effects: The «structural» part of the model, spesifying the expected conditional mean.
- Empty model: Only intercepts are included.
- Unconditional models: Intercepts + passage of time.
- Conditional models: Intercepts + variables accounting for differences in the passage of time.
Random effects: Coefficients that specify the stochastic (error) part of the model.
- Describes how the residuals of the Y outcome are distributed and related across the observations.
Model 2: Random intercept
Level 2: β0i=γ00+U0iU0i∼N(0,σU0i)
Level 1: yti=β0i+ϵtiϵti∼N(0,σϵ)
Composite: yti=(γ00+U0i)+ϵti
Intraclass Correlation Coefficient (ICC)
- Measures the proportion of total variance in the outcome that is attributable to the grouping structure (e.g., individuals within groups)
- Ranges from 0 to 1:
- 0: No correlation between observations within the same group
- 1: Observations within the same group are identical
- Important for assessing group-level effects in mixed models
- A high ICC indicates a strong grouping effect and the need for a multilevel or mixed model to account for the clustered data structure
Intraclass Correlation Coefficient (ICC) 2
ICC=BPvariationBP+WPvariation=Var(U0i)Var(U0i)+Var(ϵti)
Model 3: Fixed slope, Random intercept
Level 2: β0i=γ00+U0i
Level 2: β1i=γ10
Level 1: yti=β0i+β1i⋅Timeti+ϵti
Composite: yti=(γ00+U0i)+γ10⋅Timeti+ϵti
yti=(γ00FixedIntercept+U0iRandomIntercept)+γ10FixedSlope⋅Timeti+ϵtiError
Comparing Linear Mixed Models
Likelihood Ratio Tests:
- Performed with the
anova()
function. - Only valid under maximum likelihood (ML) estimation, not restricted maximum likelihood (REML).
- Performed with the
Information Criteria:
- Akaike Information Criterion (AIC) can be calculated using the
AIC()
function.
- Akaike Information Criterion (AIC) can be calculated using the
Pseudo R2:
- Purpose: Assess the proportion of variance explained by the predictors in the model, similar to R2 in linear regression.
- Calculated separately for Level 1 and Level 2 effects.
- Range: 0 to 1; higher values indicate better model fit.
- Limitation: Not directly comparable to R2 in linear regression. Interpretation should be cautious.
Over vs. underfitting (Scylla og Charybdis)
Overfitting: Poor prediction as a result of learning too much from your data.
Underfitting: Poor prediction as a result of learning too little from your data.
- You can always get better fit to your data by adding more predictors.
- Every dataset contains both systematic and unsystematic variance (noise), so overly complex models may fit well to the current data, but predict less variance in new data.
https://xcelab.net/rm/statistical-rethinking/
What happens as the model becomes too complex?
k-fold cross validation
- Cross-validation involves partitioning the data into training and test subsets.
- The model is buildt based on the training sets, and evaluated on the test sets.
- Cornerstone in machine learning approaches.
(Introduction to statistical learning)
k-fold kryssvalidering (2)
Notice:
- It is easy to explain variance in the training data, but far more difficult to explain it in (new) test data.
- Initially, as the model grows in complexity, explained variance increases. However, as it gets overly complex, explained variance in test data starts to decrease.
Metode 3: Information criteria (AIC / BIC)
- Information criteria (IC) is a class of statistics devised to strike the optimalbetween over and underfitting.
- The model with the lowest AIC/BIC value is chosen as the best one.
- Non-nested models can be compared.
Model 4: Fixed slope, Random intercept
Level 2: β0i=γ00+U0i
Level 2: β1i=γ10
Level 1: yti=β0i+β1i⋅Timeti+ϵti
Composite: yti=(γ00+U0i)+γ10⋅Timeti+ϵti
Model 5: Random slope, Random intercept
Level 2: β0i=γ00+U0i
Level 2: β1i=γ10+U1i
Level 1: yti=β0i+β1i⋅Timeti+ϵti
Composite:
yti=(γ00FixedIntercept+U0iRandomIntercept)+(γ10FixedSlope+U1iRandomSlope)⋅Timeti+ϵtiError
How do these differ?
Random effects can be correlated
Level 2: β0i=γ00+U0i
Level 2: β1i=γ10+U1i
Level 1: yti=β0i+β1i⋅Timeti+ϵti
G=[Var(U0i)Cov(U0i,U1i)Cov(U0i,U1i)Var(U1i)]
Composite: yti=(γ00+U0i)+(γ10+U1i)⋅Timeti+ϵti
Sensitivity of the intercept-slope correlation to the centering of time
Impact on Interpretability:
- Centering time can affect the interpretability of the (fixed) intercept.
Intercept-Slope Correlation:
- The interpretation of any (random) intercept-slope correlation is conditional on the location of the intercept.
Example
- Centering at
t=0
will result in an estimated correlation less than 0 (< 0
). - Centering at
t=3.5
will result in an estimated correlation equal to 0 (= 0
). - Centering at
t=8
will result in an estimated correlation greater than 0 (> 0
).
- Centering at
G and R matrices
The model implied covariance matrix is a function of two matrices:
- G is a covariance matrix for level-2 random coefficients.
- R is a covariance matrix for level-1 random coefficients.
- Usually the R matrix is diagonal, but some programs allow you to specify a different structure for this matrix.
Calculating the implied covariance (1)
The implied covariance matrix is a function of both the G and R matrices.
Technical detail: Calculating the implied covariance
=
Modelling non-linear change over time
Modelling Non-linear Change Over Time
Change considered until now has been purely linear.
General Approaches to Dealing with Non-linearity
- Polynomial models
- Piecewise-discontinuous models
- Splines
Using polynomials to mod non-linear change
- Linear regression is more flexible than the name implies, and can be used to model (some) non-linear relationships.
- Traditionally polynomials used to model non-linear relationships.
E(Y|X)=b0+b1⋅X+b2⋅X2+...+bp⋅Xp
- Polynomial functions are only valid in a restricted range.
- Intercept and linear terms must be included in the model for it to be meaningful (regardless of significance).
Model 7: Polynomial change
Level 2: β0i=γ00+U0i
Level 2: β1i=γ10+U1i
Level 2: β1i=γ20+U2i
Level 1: yti=β0i+β1i⋅Timeti+β2i⋅Time2ti+ϵti
Composite:
yti=(γ00FixedIntercept+U0iRandomIntercept)+(γ10FixedLin Slope+U1iRandomLin Slope)⋅Timeti+(γ20FixedQuad Slope+U2iRandomQuad Slope)⋅Time2ti+ϵtiError
Choosing Polynomial Degree in Longitudinal Models
Consider Data:
- Think about the data in terms of the number of times the curve changes direction (corresponding to the number of inflection points).
Statistical Approach:
- Include only and all of the polynomial orders that improve model fit.
Theoretical Approach:
- Include only those terms for which the experimenter predicted an effect.
Orthogonal polynomials
Natural Polynomials:
- Allow testing for differences at "Time 0."
- Useful when such differences need to be tested.
Orthogonal Polynomials:
- Provide the same estimates as natural polynomials.
- Uncorrelated, so p-values may differ (sometimes considerably).
Splines
- Splines allow for flexibility in modeling complex, non-linear relationships
- They can capture local patterns and smooth changes over time
- Reduce overfitting compared to high-order polynomials
- Cubic splines are commonly used for their smoothness and continuity
- Natural splines can impose constraints to reduce extreme behavior at the endpoints
Piecewise linear models
- Better fit for discontinuous data:
- More accurately capture abrupt changes
- Interpretability:
- Coefficients of piecewise continuous models have simple interpretations
- Parsimonious representation:
- May require fewer parameters than high-degree polynomial
- Avoid overfitting:
- Less likely to overfit the data compared to high-degree polynomial
Piecewise linear models: coding
These two coding schemes only differ in the interpretation of the regression coefficients.
- In scheme 1 the two slope coefficients represent the actual slope in the respective time period.
- In scheme 2 the coefficient for time 2 represents the deviation from the slope in period 1, + i.e. if the estimate is 0 then the rate of change is the same in both periods.
Getting p-values
- P-values are not reported by default, as the number of degrees of freedom is hard to calculate.
- Some approximations can be made, for example using the
lmerTest
package.
# P-values for individual coefficientsrequire(lmerTest)m1 <- lmer(y ~ x | id), data=dt)coef(summary(m1))# Alternatively, confidence intervalsconfint(vs_3) # Profile CIconfint(vs_3, method="boot", nsim=100) # Bootstrapped CI# Alternatively, compare models with anova()anova(m1,m2)
Reporting growth curve results
Model selection:
- Report the criteria used to select the best-fitting model, e.g., AIC, BIC, or likelihood ratio tests.
Fixed effects:
- Report the estimated coefficients, standard errors, t-values, and p-values for each fixed effect predictor.
- Interpret the direction and magnitude of the relationships between the predictors and the outcome variable.
Random effects:
- Report the estimated variances and standard deviations for random intercepts and slopes.
- Interpret the amount of variability in the intercepts and slopes across the different levels of the model.
Model fit:
- Report goodness-of-fit statistics, such as pseudo R-squared or deviance explained.
- Compare the selected model with alternative or null models, if applicable.
Visualizations:
- Include relevant plots, such as individual growth curves, group-level trajectories, or predicted values versus observed values.
Model 8
Why is there random intercept variance?
Model 8
Level 2: β0i=γ00+γ01⋅Groupi+U0i
Level 2: β1i=γ10+U1i
Level 1: yti=β0i+β1i⋅Timeti+ϵti
Composite: yti=(γ00+γ01⋅Groupi+U0i)+(γ10+U1i)⋅Timeti+ϵti
Model 9
Level 2: β0i=γ00+γ01⋅Groupi+U0i
Level 2: β1i=γ10+γ11⋅Groupi+U1i
Level 1: yti=β0i+β1i⋅Timeti+ϵti
Composite: yti=(γ00+γ01⋅Groupi+U0i)+(γ10+γ11⋅Groupi+U1i)⋅Timeti+ϵti
Inferential consequences of including random effects
Effect on t/p-values:
- The impact on t/p-values of estimating random components depends on the level.
- Moving variance from Level 1 to Level 2 can provide more power to detect Level 1 predictors.
Improved Model Fit:
- Including random effects can help account for unobserved heterogeneity in the data, leading to better model fit.
Generalizability:
- Including random effects allows for more accurate generalizations to the larger population.
Cautions:
- Adding random effects can increase model complexity.
- It is important to carefully consider the theoretical justifications for including random effects in the model.
Techincal issues: Parameter estimation (ML vs. REML)
ML (Maximum Likelihood)
- Estimates fixed and random effects simultaneously
- Can result in biased estimates of random effect variances
- Suitable for model selection and comparison
REML (Restricted Maximum Likelihood)
- Estimates fixed effects and random effects separately
- Provides unbiased estimates of random effect variances
- Less suitable for model selection and comparison
Which to use?
- For unbiased variance estimates, use REML
- For model selection or comparison, use ML
- Sometimes, both are used in a two-step process: REML to estimate variance components and ML to compare models
Techincal issues: Standardization
Purpose: To compare the relative importance of predictors and facilitate interpretation.
Procedure:
- Standardize continuous predictor variables (center and scale)
- Refit the linear mixed model with standardized predictors
- Interpret the standardized coefficients as effect sizes
Interpretation:
- Each standardized coefficient represents the change in outcome variable (in standard deviations) for a one standard deviation change in the predictor
- Larger absolute values indicate stronger relationships between predictors and the outcome variable
Considerations:
- Only applicable to continuous predictor variables
- Ensure that model assumptions are met before interpreting standardized coefficients
- Be cautious when comparing standardized coefficients across models with different fixed and random effects structures
Techincal issues: Pseudo R2
- There no MLM statistic entirely equivalent to R2 in ordinary regression.
- While R2 increases monothically as independent variables are added in linear regression, this may not hold in multilevel models.
- in MLM we can calculate an of explained variance at each level.
- This is referred to as pseudo R2, and can behave in unusual ways.
- Can decrease as predictor on another level is included.
Example: Modelling popularity in pupils
SECTION: A closer look at random effects
Keeping it maximal
A full or maximal random effect structure is the case where all of the factors that could hypothetically vary across individual observational units are allowed to do so.
The general principle is keep it maximal (Barr et al., 2013): the random effects should include as much of the structure of the data as possible.
m_1 <- lmer(Accuracy ~ (ot1+ot2)*TP + (ot1 | Subject), data=WordLearnEx, REML=FALSE)m_2 <- lmer(Accuracy ~ (ot1+ot2)*TP + (ot1+ot2 | Subject), data=WordLearnEx, REML=FALSE)
Removing a time term from the random effects primarily reduces the standard error of the corresponding fixed effect estimate, making more significant.
Example: Cognitive Performance and Age
Research question: Does cognitive performance decline with age and does this decline rate vary between individuals?
Data structure:
- Repeated measures of cognitive performance for each individual
- Age as a time-varying predictor
Linear mixed model:
- Random intercepts and slopes for age
- Correlation between intercepts and slopes
Reason to Drop the Intercept-Slope Correlation
Hypothesis: Initial cognitive performance and rate of decline are unrelated.
- Example: Individuals with high initial performance decline at the same rate as those with low initial performance.
Model modification: Set the correlation between intercepts and slopes to zero.
- This enforces the hypothesis that initial performance and decline rate are independent.
Interpretation: If the modified model fits the data well, it suggests that the rate of cognitive decline is not dependent on initial performance levels.
How to specify non-correlatred intercept and slope
m_corr <- lmer(y ~ time + (1 | id), data=dt)m_noncorr <- lmer(y ~ time + (1 | id) + (0 + time | id), data=dt)
Reason to Drop the Intercept-Slope Correlation
Hypothesis: Initial cognitive performance and rate of decline are unrelated.
- Example: Individuals with high initial performance decline at the same rate as those with low initial performance.
Model modification: Set the correlation between intercepts and slopes to zero.
- This enforces the hypothesis that initial performance and decline rate are independent.
Interpretation: If the modified model fits the data well, it suggests that the rate of cognitive decline is not dependent on initial performance levels.
Effect of omitting random coefficients
m_2<-lmer(Accuracy ~ (ot1+ot2)*TP + (ot1+ot2 | Subject))
G=⎡⎢⎣Var(U0)Cov(U0,U1)Cov(U0,U2)Cov(U0,U1)Var(U1)Cov(U1,U2)Cov(U0,U2)Cov(U1,U2)Var(U2)⎤⎥⎦
m_1<-lmer(Accuracy ~ (ot1+ot2)*TP + (ot1 | Subject))
G=⎡⎢⎣Var(U0)Cov(U0,U1)0Cov(U0,U1)Var(U1)0000⎤⎥⎦
m_3<-lmer(Accuracy ~ (ot1+ot2)*TP + (1 | Subject) + (0+ot1 | Subject) + (0+ot2 | Subject))
G=⎡⎢⎣Var(U0)000Var(U1)000Var(U2)⎤⎥⎦
Pooling
Depending upon the variation among clusters, which is learned from the data as well, the model pools information across clusters. This pooling tends to improve estimates about each cluster.
Complete pooling: :First, suppose you ignore the varying intercepts and just use the overall mean across all clusters.
No pooling Estimate separate fixed effect in each cluster.
Partial pooling Multilevel approach. Extreme values are pulled towards an overall average, and this is tronger for smaller groups.
1. Complete Pooling
This approach assumes the treatment has the same effect on everyone, ignoring individual-specific variation. It estimates an average treatment effect across all individuals.
Consequences:
- Biased estimates: Complete pooling may yield biased treatment effect estimates due to ignored individual variation.
- Inaccurate inferences: Hypothesis tests and confidence intervals may be misleading, leading to incorrect conclusions.
- Lack of personalization: No information about the treatment's effectiveness for specific individuals, hindering tailored interventions.
2. Partial Pooling
This approach estimates separate effects for each individual while "borrowing strength" from the group, typically using random effects or mixed models. It allows for individual-specific treatment effects and considers the overall population effect.
Consequences:
- More accurate estimates: Partial pooling accounts for individual variation, providing more accurate treatment effect estimates.
- Improved inferences: Hypothesis tests and confidence intervals are more accurate, leading to reliable conclusions.
- Personalized insights: Provides information on the treatment's effectiveness for specific individuals, aiding personalized interventions.
Shrinkage in partial pooling
Pooling data across clusters tends to shrink their deviation from the overall mean levels.
Shrinkage effect on individual participant intercept and linear term parameter estimates. For each participant, the arrow shows the change in the parameter estimate from a model that treats participants as fixed effects (open circles) to a model that treats participants as random effects (filled circles). The black vertical and horizontal lines indicate the population-level fixed effect.
3. No Pooling
This approach estimates separate effects for each individual without sharing information between them, fitting separate models and ignoring potential similarities.
Consequences:
- Overfitting: No pooling may overfit the data, resulting in estimates that don't generalize well.
- Inefficient data use: Not sharing information between individuals leads to less efficient use of data and less precise estimates.
- Interpretation challenges: Separate models for each individual make it hard to draw overall conclusions about treatment effectiveness.
Constructing a longitudinal model (Hoffman)
- Building an unconditional model of change
- Decide what your metric of time will be
- Decide at what occasion time 0 should be located
- Plot individual trajectories over time
- What kind of change, if any, do you see on average?
- Do you see individual differences in that pattern of change?
- Fit a null model (random intercept)
- Calculate the ICC. If it is close to 1.0, there is no longitudinal variation to model.
- Add level-1 Fixed predictors
- Add level-2 Explanatory Variables
- Examine Whether a Particular slope varies between groups
- Add cross-level interactions to explain variation in the slope.
class: mainTopic, middle, center
Within-participant effects
SECTION: Categorical time invariant predictors
Time varying and time invariant predictors
- A predictor is time varying when it is measured at multiple points in time, just as is the outcome variable.
- In the context of education, a time-varying predictor might be the number of hours in the previous 30 days a student has spent studying.
- time-varying predictors will appear at level 1 because they are associated with specific measurements
On the other hand, a predictor is time invariant when it is measured at only one point in time, and its value does not change across measurement occasions.
- An example of this type of predictor would be gender,
- whereas time-invariant predictors will appear at level 2 or higher, because they are associated with the individual
Unless time-variying predictors are group-mean centered, BP and WP variance may not be cleanly separated.
Time-Varying and Time-Invariant Predictors in Longitudinal Models
Time-varying predictors
- A predictor is time varying when it is measured at multiple points in time, just as is the outcome variable.
- Change over time within individuals
- Predict within-person fluctuations
- Can be used to model individual trajectories
- Need to consider potential lagged effects, autocorrelation, and time-dependent confounders
Time-invariant predictors
- A predictor is time invariant when it is measured at only one point in time, and its value does not change across measurement occasions.
- Do not change over time within individuals
- Predict between-person differences
- Can be used to model group differences in trajectories
- Need to consider potential multicollinearity with other time-invariant predictors
SECTION: Higher order longitudinal models
Introducing higher level variability
Third Order Linear Mixed Model: Depression Over Time
Outcome: Depression score over time
Data structure:
- Repeated measures of depression for each patient
- Patients nested within therapists
Model components:
- Fixed effects: Population-level effects of predictors (e.g., time, patient characteristics, therapist characteristics)
- Random effects: Individual-level variability in intercepts (baseline depression) and slopes (depression change over time)
- Third-level random effects: Therapist-level variability in intercepts and slopes
Model interpretation:
- Fixed effects describe the average relationships between predictors and depression scores
- Random effects capture the variability in depression scores and change over time across patients
- Third-level random effects account for variability in patient outcomes and change over time due to therapist differences
Benefits:
- Acknowledges the hierarchical structure of the data, resulting in more accurate inferences
- Provides insights into the sources of variability in depression scores and change over time
- Can be used to inform interventions targeted at the patient or therapist level
L3 - Model 1
Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 1: ytib=β0ib+ϵtib
Composite: ytib=δ000+V00b+U0ib+ϵtib
L3 - Model 2
Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 1: ytib=β0ib+ϵtib
Composite: ytib=δ000+V00b+U0ib+ϵtib
L3 - Model 2
Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 1: ytib=β0ib+ϵtib
Composite: ytib=δ000+V00b+U0ib+ϵtib
L3 - Model 3
Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 1: ytib=β0ib+ϵtib
Composite: ytib=δ000+V00b+U0ib+ϵtib
L3 - Model 4
Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 2: β1ib=γ100
Level 1: ytib=β0ib+β1ib⋅Time+ϵtib
Composite: ytib=δ000+V00b+U0ib+ϵtib
L3 - Model 5
Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 2: β1ib=γ100+U1ib
Level 1: ytib=β0ib+β1ib⋅Time+ϵtib
Composite: ytib=δ000+V00b+U0ib+ϵtib
L3 - Model 6
Level 3: γ00b=δ000+V00b
Level 3: γ10b=δ100+V00b
Level 2: β0ib=γ00b+U0ib
Level 2: β1ib=γ100
Level 1: ytib=β0ib+β1ib⋅Time+ϵtib
Composite: ytib=δ000+V00b+U0ib+ϵtib
GPA exercise (1)
GPA exercise (2)
Level-3 exercise (2)
SECTION: Transforming data from wide to long format
Wide vs. Long format
Left: Wide format; each measure of the same variable is in separate column. Right: Wide format; each measure of the same variable is in separate row.
Pivot_longer and pivot_wider
The tidyverse is a collection of open source R packages that share an underlying design philosophy, grammar, and data structures.
In the tidyverse approach to R syntax, these two functions are used to transform a dataset from wide to long format.
pivot_longer()
makes datasets longer by increasing the number of rows and decreasing the number of columns.pivot_wider()
is the opposite of pivot_longer(): it makes a dataset wider by increasing the number of columns and decreasing the number of rows.
See these pages for documentation and examples:
Creating a synthetic wide dataset
library(tidyverse)dt<-tibble( id=1:20, male=sample(0:1, 20, replace=TRUE), dep1=rnorm(20), dep2=rnorm(20), dep3=rnorm(20), anx1=rnorm(20), anx2=rnorm(20), anx3=rnorm(20) )print(dt,n=10)
## # A tibble: 20 x 8## id male dep1 dep2 dep3 anx1 anx2 anx3## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 1 1 -0.709 0.0301 0.189 0.357 0.407 0.199## 2 2 0 -1.10 -0.696 -0.0357 1.37 -0.0255 0.880## 3 3 0 -1.06 -0.104 -0.315 0.226 0.585 -1.17 ## 4 4 1 0.490 1.42 0.0567 -0.182 -0.347 1.17 ## 5 5 1 0.149 -1.25 0.284 -0.354 0.0755 0.948## 6 6 0 1.39 -0.828 0.392 -0.197 1.55 0.847## 7 7 0 -1.41 -1.94 -2.02 -0.0964 -0.182 1.58 ## 8 8 0 -0.592 -0.356 -1.30 0.434 -0.600 0.352## 9 9 0 1.03 0.359 0.0586 0.980 0.921 0.793## 10 10 1 -1.02 1.81 0.776 0.252 0.556 -0.589## # i 10 more rows
Pivot from wide to long
dt_l <- dt %>% pivot_longer( cols=dep1:anx3, names_sep=3, names_to = c("disorder", "number") ) %>% pivot_wider(names_from="disorder") %>% rename("wave"="number", "anxiety"="anx", "depression"="dep")print(dt_l,n=10)
## # A tibble: 60 x 5## id male wave depression anxiety## <int> <int> <chr> <dbl> <dbl>## 1 1 1 1 -0.709 0.357 ## 2 1 1 2 0.0301 0.407 ## 3 1 1 3 0.189 0.199 ## 4 2 0 1 -1.10 1.37 ## 5 2 0 2 -0.696 -0.0255## 6 2 0 3 -0.0357 0.880 ## 7 3 0 1 -1.06 0.226 ## 8 3 0 2 -0.104 0.585 ## 9 3 0 3 -0.315 -1.17 ## 10 4 1 1 0.490 -0.182 ## # i 50 more rows
Shrinkage in partial pooling
Pooling data across clusters tends to shrink their deviation from the overall mean levels.
Shrinkage effect on individual participant intercept and linear term parameter estimates. For each participant, the arrow shows the change in the parameter estimate from a model that treats participants as fixed effects (open circles) to a model that treats participants as random effects (filled circles). The black vertical and horizontal lines indicate the population-level fixed effect.
Help
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |