+ - 6:51:04
Notes for current slide
Notes for next slide

PSY9185 - Multilevel models

Longitudinal data analysis using multilevel models

Nikolai Czajkowski

1 / 104

Today


  • Introduction to longitudinal data and LMM.
  • Plotting longitudinal data.
  • Basic two-level longitudinal
    • Fixed and random coefficients
  • Non-linear and piecewise continuous change
  • Including predictor in the model
    • Time invariat predictors
    • Time varying predictors
2 / 104

Books


  • Several examples borrowed from Mirman (2014).
  • Lesa Hoffman has links to several full video courses in longitudinal modelling on her homepage.
3 / 104

R resources

Fitting mixed models in R:

Books on R:

Online resources

4 / 104

SECTION: Introducing longitudinal analysis

6 / 104

Multilevel models for clustered data (1)

7 / 104

Classes 2

8 / 104

Classes 3

9 / 104

Classes 4

10 / 104

Two sides of all models

  • Models are an idealized account of:

    1. How the expected valueis related to an independent variable (mean).
    2. How people vary (variance).
  • Linear regression only models the mean value, not the variance.

11 / 104

Assumptions in linear models

Yi=b0+b1Xi+ϵiϵiN(0,σ2)

  1. The relationship between X and Y is linear.
  2. The error variance (around the regression line) is normally distributed and is constant (i.e. the sd is the same) at every level of X.
  3. The observations are independent.
  4. The independent variables are measured without error.
12 / 104

Which assumtion is really important?


  • We often focus on the extent to which the residuals are normally distributed, but in practise this rarely has a major impact on the standard errors (or p-values).
  • Dependency in the data can render all p-values completely biased.
13 / 104

Multilevel data structures


  • Data can be nested within higher order units.
  • Different variables are measured at different levels.
    • Traditionally handeled by aggregation or disaggregation.

14 / 104

Longitudinal data structures


  • In longitudinal multilevel models, observations are nested within persons.

    • Level 1 units are individual observations
    • Level 2 units are usually individuals
    • People may be nested under higher order units (therapists, treatment centers, etc).
  • Time will be included as a level 1 predictor.

15 / 104

What are longitudinal models?


  • Time course data are the result of repeated measurments at multiple (at least two) time points.

    • These sorts of data are also called longitudinal.
  • Two key properties distinguish time course data from other kinds of data:

    1. Groups of observations all come from one source (nested data).
    2. The repeated measurements are related by a continuous variable, usually that variable is time, but any continuous variable will do.
    • We will also look at other variables, like set-size.
  • For analyses we will use multilevel (mixed) models that include time course data as a level-1 predictor.

16 / 104
  • if you asked participants to name letters printed in different sizes, you could examine the outcome (letter recognition accuracy) as a function of the continuous predictor size.

Benefits of multilevel models in general

  • Flexibility in modelling dependency across observations

    • LMMs provide a flexible framework to model and account for various sources of non-independence, reducing biased estimates and increasing the accuracy of inferences.
  • Avoid choosing between individuals or groups as the unit of analysis

    • Cross-level interactions are uniquely possible in MLM.
  • Increased statistical power

    • By including random effects, LMMs can improve the precision of estimates and increase statistical power compared to traditional regression models.
  • Generalizability

    • LMMs allow researchers to investigate both the average effect of predictors and their variations across different levels, helping to generalize findings across various contexts and populations.
  • Study variation between and within people

    • Ask more complex and interesting questions
17 / 104

Benefits of MLM for longitudinal data

  • Handle dependency
    • Traditional analyses only designed to handle dependency due to constant mean differences.
  • Handles missing
    • Number of observations can vary between individuals.
  • Flexible treatment of time
    • Measures can be taken at fixed or at varying occasions.
    • No assumption that an equal amount of time has elapsed.
  • Investigate within-person relationships
    • Between and within effects can differ, even in direction.

18 / 104
  • longitudinal studies provide the opportunity to test hypotheses at multiple levels of analysis simultaneously

    • models in this text will allow us to exam- ine both between-person and within-person relationships in the same variables at the same time
    • For example, we might posit a link between stress and negative mood, such that greater amounts of stress will result in greater negative mood. But at what level of analysis is this relationship likely to hold: between persons, within persons, or both?

SECTION: Plotting longitudinal data

20 / 104

Plotting longitudinal data

Plotting is an even more critical first step longitudinal analysis than cross-sectional studies.

We will make the following classes of figures:

  • Individual change curves.
  • Plots of group mean levels and CI/confidence regions.
  • Stratified plots (assess interactions).

Why ggplot?

  1. Plots are publication grade quality.
  2. The syntax is flexible and powerful framework for visualizing data.
  3. Summary statistics like means and standard errors can be computed while plotting.
21 / 104

Introduction to ggplot2

  • ggplot2 is a powerful R package for data visualization
  • Built on the principles of the "Grammar of Graphics"
    • first assign variables in your data to properties of the graph. These assignments or mappings are called the aesthetics of your graph. Then you select “geometries,” or geoms — points, lines, bars, etc. — for those aesthetics.
  • Create complex, multi-layered plots using a consistent syntax
  • Easily customizable and extensible
  • Widely used in data analysis and research
22 / 104

Spaghetti Plots for Longitudinal Data

  • Visualize individual trajectories over time
  • Identify patterns and trends in the data
  • Assess variability between and within groups
  • Detect potential outliers or influential cases
  • Compare individual trajectories to the overall fixed effect of time
23 / 104

Generating a Spaghetti Plot with ggplot2

Below is the R-code used to generate the plot.

library(ggplot2)
# Create spaghetti plot
ggplot(data, aes(x = time, y = y, group = id, color = group)) +
geom_line(alpha = 0.5) +
geom_line(data = fixed_effect_data, aes(x = time, y = y, group = 1), color = "black",
size = 1.5, linetype = "solid") +
labs(title = "Example Spaghetti Plot", x = "Time", y = "Outcome") +
theme_minimal()
# Save plot to a file
ggsave("example_spaghetti_plot.png", width = 7, height = 5)
24 / 104

SECTION: Time-only (Unconditional) models

25 / 104

Features of longitudinal data

Multilevel longitudinal models permit analyses at different levels:

  • Between-person variance:

    • Inter-individual variation.
    • Ex. Biological sex, ethnicity.
  • Within-person variance:

    • Intraindividual variation.
    • Ex. Sleep the previous night.

Relationsips observed at the within-person level need not (and often will not) mirror those at the between-person level of analysis.
  • Eg. resting heart rate vs. exercise.
26 / 104

Features of longitudinal data (2)


  • Within person change: Specific type of within person variation, that refers to any systematic change that is expected as a result of meaningful passage of time.
  • Within person fluctuation: Undirected variation over repeated assesments seen in contexts in which one would not expect systematic change.

27 / 104

Unconditional longitudinal models


In unconditional longitudinal models, time is the only predictor.

  • Included as a level-1 variable.

Critical first questions:

  • What units should time be measured in?
  • What constitutes "time=0"?
    • E.g. If age, it is best to center age on another value than "0".
  • How do you expect development over time to be?
    • Linear, non.linear, abrupt/discontinuous?
28 / 104

Introducing idealized growth curves

29 / 104

Between-person and Within-person empty models

Empty models partition the variance, but don't account for it.

Between person empty model

yti=β0FixedIntercept+ϵtiError

Within person empty model yti=β0FixedIntercept+U0iRandomIntercept+ϵtiError

30 / 104

Fixed and random effects


Level 2: β0i=γ00+U0i
Level 1: yti=β0i+ϵti

Composite: yti=(γ00+U0i)+ϵti

yti=β0+u0i+ϵti

  • We instead argue that deviation from the grand mean is attributable to a normally distributed residual at level 2. Now we can account for this variance by estimating a parameter (variance of 𝑈_0𝑖) instead of estimating the fixed effects of 49 dummy variables. Random effects are not designed to make inferences between the specific variants included in the study (can’t compare participant 1 to participant 50). Could we include a random effect of time? This would require at least three measurements per person.
31 / 104

Why random intercept?


  • Differences between individuals could be handled by N-1 dummy variables.

    • Many estimated parameters and loss of power.
  • With MLM, clustering is accounted for by only one parameter, the variance of a normal distribution.

32 / 104

Model 1: Random intercept

Level 2: β0i=γ00+U0iU0iN(0,σU0i)
Level 1: yti=β0i+ϵtiϵtiN(0,σϵ)

Composite: yti=(γ00FixedIntercept+U0iRandomIntercept)+ϵtiError

33 / 104

Model 1: Random intercept

.left-column[

Level 2: β0i=γ00+U0iU0iN(0,σU0i)
Level 1: yti=β0i+ϵtiϵtiN(0,σϵ)

Composite: yti=(γ00FixedIntercept+U0iRandomIntercept)+ϵtiError

# Load necessary libraries
library(lme4)
# Fit the random intercept model
model1 <- lmer(y ~ 1 + (1|subject), data=data)
# Print the model summary
summary(model1)
34 / 104

Fixed and random effects


Multilevel (mixed) models contain two classes of coefficients.

Fixed effects: The «structural» part of the model, spesifying the expected conditional mean.

  • Empty model: Only intercepts are included.
  • Unconditional models: Intercepts + passage of time.
  • Conditional models: Intercepts + variables accounting for differences in the passage of time.

Random effects: Coefficients that specify the stochastic (error) part of the model.

  • Describes how the residuals of the Y outcome are distributed and related across the observations.
35 / 104

Model 2: Random intercept

Level 2: β0i=γ00+U0iU0iN(0,σU0i)
Level 1: yti=β0i+ϵtiϵtiN(0,σϵ)

Composite: yti=(γ00+U0i)+ϵti

36 / 104

Intraclass Correlation Coefficient (ICC)


  • Measures the proportion of total variance in the outcome that is attributable to the grouping structure (e.g., individuals within groups)
  • Ranges from 0 to 1:
    • 0: No correlation between observations within the same group
    • 1: Observations within the same group are identical
  • Important for assessing group-level effects in mixed models
  • A high ICC indicates a strong grouping effect and the need for a multilevel or mixed model to account for the clustered data structure
37 / 104

Intraclass Correlation Coefficient (ICC) 2


ICC=BPvariationBP+WPvariation=Var(U0i)Var(U0i)+Var(ϵti)
38 / 104

Model 3: Fixed slope, Random intercept

Level 2: β0i=γ00+U0i
Level 2: β1i=γ10
Level 1: yti=β0i+β1iTimeti+ϵti

Composite: yti=(γ00+U0i)+γ10Timeti+ϵti yti=(γ00FixedIntercept+U0iRandomIntercept)+γ10FixedSlopeTimeti+ϵtiError

39 / 104

Comparing Linear Mixed Models

  • Likelihood Ratio Tests:

    • Performed with the anova() function.
    • Only valid under maximum likelihood (ML) estimation, not restricted maximum likelihood (REML).
  • Information Criteria:

    • Akaike Information Criterion (AIC) can be calculated using the AIC() function.
  • Pseudo R2:

    • Purpose: Assess the proportion of variance explained by the predictors in the model, similar to R2 in linear regression.
    • Calculated separately for Level 1 and Level 2 effects.
    • Range: 0 to 1; higher values indicate better model fit.
    • Limitation: Not directly comparable to R2 in linear regression. Interpretation should be cautious.
40 / 104

Over vs. underfitting (Scylla og Charybdis)

Overfitting: Poor prediction as a result of learning too much from your data.
Underfitting: Poor prediction as a result of learning too little from your data.

  • You can always get better fit to your data by adding more predictors.
  • Every dataset contains both systematic and unsystematic variance (noise), so overly complex models may fit well to the current data, but predict less variance in new data.

https://xcelab.net/rm/statistical-rethinking/

41 / 104

What happens as the model becomes too complex?

Over: The green curve is the true (generating) function. The red fits better, but adapts to a lot of the noise in the data.
42 / 104

k-fold cross validation


  • Cross-validation involves partitioning the data into training and test subsets.
  • The model is buildt based on the training sets, and evaluated on the test sets.
    • Cornerstone in machine learning approaches.

(Introduction to statistical learning)

43 / 104

k-fold kryssvalidering (2)

Notice:

  1. It is easy to explain variance in the training data, but far more difficult to explain it in (new) test data.
  2. Initially, as the model grows in complexity, explained variance increases. However, as it gets overly complex, explained variance in test data starts to decrease.
44 / 104

Metode 3: Information criteria (AIC / BIC)

  • Information criteria (IC) is a class of statistics devised to strike the optimalbetween over and underfitting.
  • The model with the lowest AIC/BIC value is chosen as the best one.
    • Non-nested models can be compared.
45 / 104

Model 4: Fixed slope, Random intercept

Level 2: β0i=γ00+U0i
Level 2: β1i=γ10
Level 1: yti=β0i+β1iTimeti+ϵti

Composite: yti=(γ00+U0i)+γ10Timeti+ϵti

46 / 104

Model 5: Random slope, Random intercept

Level 2: β0i=γ00+U0i
Level 2: β1i=γ10+U1i
Level 1: yti=β0i+β1iTimeti+ϵti

Composite: yti=(γ00FixedIntercept+U0iRandomIntercept)+(γ10FixedSlope+U1iRandomSlope)Timeti+ϵtiError

47 / 104

How do these differ?

48 / 104

Random effects can be correlated

Level 2: β0i=γ00+U0i
Level 2: β1i=γ10+U1i
Level 1: yti=β0i+β1iTimeti+ϵti

G=[Var(U0i)Cov(U0i,U1i)Cov(U0i,U1i)Var(U1i)]
Composite: yti=(γ00+U0i)+(γ10+U1i)Timeti+ϵti

49 / 104

Sensitivity of the intercept-slope correlation to the centering of time

  • Impact on Interpretability:

    • Centering time can affect the interpretability of the (fixed) intercept.
  • Intercept-Slope Correlation:

    • The interpretation of any (random) intercept-slope correlation is conditional on the location of the intercept.
  • Example

    • Centering at t=0 will result in an estimated correlation less than 0 (< 0).
    • Centering at t=3.5 will result in an estimated correlation equal to 0 (= 0).
    • Centering at t=8 will result in an estimated correlation greater than 0 (> 0).
50 / 104

G and R matrices



The model implied covariance matrix is a function of two matrices:

  • G is a covariance matrix for level-2 random coefficients.
  • R is a covariance matrix for level-1 random coefficients.
    • Usually the R matrix is diagonal, but some programs allow you to specify a different structure for this matrix.
51 / 104

Calculating the implied covariance (1)



The implied covariance matrix is a function of both the G and R matrices.
52 / 104

Technical detail: Calculating the implied covariance



=

53 / 104

Modelling non-linear change over time

54 / 104

Modelling Non-linear Change Over Time

Non-linear change


  • Change considered until now has been purely linear.

  • General Approaches to Dealing with Non-linearity

    • Polynomial models
    • Piecewise-discontinuous models
    • Splines
55 / 104

Using polynomials to mod non-linear change

  • Linear regression is more flexible than the name implies, and can be used to model (some) non-linear relationships.
  • Traditionally polynomials used to model non-linear relationships.

E(Y|X)=b0+b1X+b2X2+...+bpXp

  • Polynomial functions are only valid in a restricted range.
  • Intercept and linear terms must be included in the model for it to be meaningful (regardless of significance).
56 / 104

Model 7: Polynomial change

Level 2: β0i=γ00+U0i
Level 2: β1i=γ10+U1i
Level 2: β1i=γ20+U2i
Level 1: yti=β0i+β1iTimeti+β2iTimeti2+ϵti

Composite: yti=(γ00FixedIntercept+U0iRandomIntercept)+(γ10FixedLin Slope+U1iRandomLin Slope)Timeti+(γ20FixedQuad Slope+U2iRandomQuad Slope)Timeti2+ϵtiError

57 / 104

Choosing Polynomial Degree in Longitudinal Models

  • Consider Data:

    • Think about the data in terms of the number of times the curve changes direction (corresponding to the number of inflection points).
  • Statistical Approach:

    • Include only and all of the polynomial orders that improve model fit.
  • Theoretical Approach:

    • Include only those terms for which the experimenter predicted an effect.
58 / 104

Orthogonal polynomials

  • Natural Polynomials:

    • Allow testing for differences at "Time 0."
    • Useful when such differences need to be tested.
  • Orthogonal Polynomials:

    • Provide the same estimates as natural polynomials.
    • Uncorrelated, so p-values may differ (sometimes considerably).
59 / 104

Splines


  • Splines allow for flexibility in modeling complex, non-linear relationships
  • They can capture local patterns and smooth changes over time
  • Reduce overfitting compared to high-order polynomials
  • Cubic splines are commonly used for their smoothness and continuity
  • Natural splines can impose constraints to reduce extreme behavior at the endpoints

60 / 104

Piecewise linear models

  • Better fit for discontinuous data:
    • More accurately capture abrupt changes
  • Interpretability:
    • Coefficients of piecewise continuous models have simple interpretations
  • Parsimonious representation:
    • May require fewer parameters than high-degree polynomial
  • Avoid overfitting:
    • Less likely to overfit the data compared to high-degree polynomial

61 / 104

Piecewise linear models: coding

These two coding schemes only differ in the interpretation of the regression coefficients.

  • In scheme 1 the two slope coefficients represent the actual slope in the respective time period.
  • In scheme 2 the coefficient for time 2 represents the deviation from the slope in period 1, + i.e. if the estimate is 0 then the rate of change is the same in both periods.
https://rpsychologist.com/r-guide-longitudinal-lme-lmer#piecewise-growth-curve
62 / 104

Getting p-values

  • P-values are not reported by default, as the number of degrees of freedom is hard to calculate.
  • Some approximations can be made, for example using the lmerTest package.
# P-values for individual coefficients
require(lmerTest)
m1 <- lmer(y ~ x | id), data=dt)
coef(summary(m1))
# Alternatively, confidence intervals
confint(vs_3) # Profile CI
confint(vs_3, method="boot", nsim=100) # Bootstrapped CI
# Alternatively, compare models with anova()
anova(m1,m2)
63 / 104

Reporting growth curve results

  • Model selection:

    • Report the criteria used to select the best-fitting model, e.g., AIC, BIC, or likelihood ratio tests.
  • Fixed effects:

    • Report the estimated coefficients, standard errors, t-values, and p-values for each fixed effect predictor.
    • Interpret the direction and magnitude of the relationships between the predictors and the outcome variable.
  • Random effects:

    • Report the estimated variances and standard deviations for random intercepts and slopes.
    • Interpret the amount of variability in the intercepts and slopes across the different levels of the model.
  • Model fit:

    • Report goodness-of-fit statistics, such as pseudo R-squared or deviance explained.
    • Compare the selected model with alternative or null models, if applicable.
  • Visualizations:

    • Include relevant plots, such as individual growth curves, group-level trajectories, or predicted values versus observed values.
64 / 104

Model 8

Why is there random intercept variance?

65 / 104

Model 8

Level 2: β0i=γ00+γ01Groupi+U0i
Level 2: β1i=γ10+U1i
Level 1: yti=β0i+β1iTimeti+ϵti

Composite: yti=(γ00+γ01Groupi+U0i)+(γ10+U1i)Timeti+ϵti

66 / 104

Model 9

Level 2: β0i=γ00+γ01Groupi+U0i
Level 2: β1i=γ10+γ11Groupi+U1i
Level 1: yti=β0i+β1iTimeti+ϵti

Composite: yti=(γ00+γ01Groupi+U0i)+(γ10+γ11Groupi+U1i)Timeti+ϵti

67 / 104

Inferential consequences of including random effects

  • Effect on t/p-values:

    • The impact on t/p-values of estimating random components depends on the level.
    • Moving variance from Level 1 to Level 2 can provide more power to detect Level 1 predictors.
  • Improved Model Fit:

    • Including random effects can help account for unobserved heterogeneity in the data, leading to better model fit.
  • Generalizability:

    • Including random effects allows for more accurate generalizations to the larger population.
  • Cautions:

    • Adding random effects can increase model complexity.
    • It is important to carefully consider the theoretical justifications for including random effects in the model.
68 / 104

Techincal issues: Parameter estimation (ML vs. REML)

  • ML (Maximum Likelihood)

    • Estimates fixed and random effects simultaneously
    • Can result in biased estimates of random effect variances
    • Suitable for model selection and comparison
  • REML (Restricted Maximum Likelihood)

    • Estimates fixed effects and random effects separately
    • Provides unbiased estimates of random effect variances
    • Less suitable for model selection and comparison
  • Which to use?

    • For unbiased variance estimates, use REML
    • For model selection or comparison, use ML
    • Sometimes, both are used in a two-step process: REML to estimate variance components and ML to compare models
69 / 104

Techincal issues: Standardization

  • Purpose: To compare the relative importance of predictors and facilitate interpretation.

  • Procedure:

    1. Standardize continuous predictor variables (center and scale)
    2. Refit the linear mixed model with standardized predictors
    3. Interpret the standardized coefficients as effect sizes
  • Interpretation:

    • Each standardized coefficient represents the change in outcome variable (in standard deviations) for a one standard deviation change in the predictor
    • Larger absolute values indicate stronger relationships between predictors and the outcome variable
  • Considerations:

    • Only applicable to continuous predictor variables
    • Ensure that model assumptions are met before interpreting standardized coefficients
    • Be cautious when comparing standardized coefficients across models with different fixed and random effects structures
70 / 104

Techincal issues: Pseudo R2


  • There no MLM statistic entirely equivalent to R2 in ordinary regression.
  • While R2 increases monothically as independent variables are added in linear regression, this may not hold in multilevel models.
  • in MLM we can calculate an of explained variance at each level.
    • This is referred to as pseudo R2, and can behave in unusual ways.
    • Can decrease as predictor on another level is included.

71 / 104

Example: Modelling popularity in pupils

72 / 104

SECTION: A closer look at random effects

73 / 104

Keeping it maximal

  • A full or maximal random effect structure is the case where all of the factors that could hypothetically vary across individual observational units are allowed to do so.

  • The general principle is keep it maximal (Barr et al., 2013): the random effects should include as much of the structure of the data as possible.

m_1 <- lmer(Accuracy ~ (ot1+ot2)*TP + (ot1 | Subject), data=WordLearnEx, REML=FALSE)
m_2 <- lmer(Accuracy ~ (ot1+ot2)*TP + (ot1+ot2 | Subject), data=WordLearnEx, REML=FALSE)

Removing a time term from the random effects primarily reduces the standard error of the corresponding fixed effect estimate, making more significant.

74 / 104

Example: Cognitive Performance and Age

  • Research question: Does cognitive performance decline with age and does this decline rate vary between individuals?

  • Data structure:

    • Repeated measures of cognitive performance for each individual
    • Age as a time-varying predictor
  • Linear mixed model:

    • Random intercepts and slopes for age
    • Correlation between intercepts and slopes

    Reason to Drop the Intercept-Slope Correlation

  • Hypothesis: Initial cognitive performance and rate of decline are unrelated.

    • Example: Individuals with high initial performance decline at the same rate as those with low initial performance.
  • Model modification: Set the correlation between intercepts and slopes to zero.

    • This enforces the hypothesis that initial performance and decline rate are independent.
  • Interpretation: If the modified model fits the data well, it suggests that the rate of cognitive decline is not dependent on initial performance levels.

75 / 104

How to specify non-correlatred intercept and slope

m_corr <- lmer(y ~ time + (1 | id), data=dt)
m_noncorr <- lmer(y ~ time + (1 | id) + (0 + time | id), data=dt)

Reason to Drop the Intercept-Slope Correlation

  • Hypothesis: Initial cognitive performance and rate of decline are unrelated.

    • Example: Individuals with high initial performance decline at the same rate as those with low initial performance.
  • Model modification: Set the correlation between intercepts and slopes to zero.

    • This enforces the hypothesis that initial performance and decline rate are independent.
  • Interpretation: If the modified model fits the data well, it suggests that the rate of cognitive decline is not dependent on initial performance levels.

76 / 104

Effect of omitting random coefficients

m_2<-lmer(Accuracy ~ (ot1+ot2)*TP + (ot1+ot2 | Subject))

G=[Var(U0)Cov(U0,U1)Cov(U0,U2)Cov(U0,U1)Var(U1)Cov(U1,U2)Cov(U0,U2)Cov(U1,U2)Var(U2)]


m_1<-lmer(Accuracy ~ (ot1+ot2)*TP + (ot1 | Subject))

G=[Var(U0)Cov(U0,U1)0Cov(U0,U1)Var(U1)0000]


m_3<-lmer(Accuracy ~ (ot1+ot2)*TP +
(1 | Subject) +
(0+ot1 | Subject) +
(0+ot2 | Subject))

G=[Var(U0)000Var(U1)000Var(U2)]

77 / 104

Pooling

Depending upon the variation among clusters, which is learned from the data as well, the model pools information across clusters. This pooling tends to improve estimates about each cluster.

Complete pooling: :First, suppose you ignore the varying intercepts and just use the overall mean across all clusters.

No pooling Estimate separate fixed effect in each cluster.

Partial pooling Multilevel approach. Extreme values are pulled towards an overall average, and this is tronger for smaller groups.

78 / 104

1. Complete Pooling

This approach assumes the treatment has the same effect on everyone, ignoring individual-specific variation. It estimates an average treatment effect across all individuals.

Consequences:

  • Biased estimates: Complete pooling may yield biased treatment effect estimates due to ignored individual variation.
  • Inaccurate inferences: Hypothesis tests and confidence intervals may be misleading, leading to incorrect conclusions.
  • Lack of personalization: No information about the treatment's effectiveness for specific individuals, hindering tailored interventions.
79 / 104

2. Partial Pooling

This approach estimates separate effects for each individual while "borrowing strength" from the group, typically using random effects or mixed models. It allows for individual-specific treatment effects and considers the overall population effect.

Consequences:

  • More accurate estimates: Partial pooling accounts for individual variation, providing more accurate treatment effect estimates.
  • Improved inferences: Hypothesis tests and confidence intervals are more accurate, leading to reliable conclusions.
  • Personalized insights: Provides information on the treatment's effectiveness for specific individuals, aiding personalized interventions.
80 / 104

Shrinkage in partial pooling

  • Pooling data across clusters tends to shrink their deviation from the overall mean levels.

  • Shrinkage effect on individual participant intercept and linear term parameter estimates. For each participant, the arrow shows the change in the parameter estimate from a model that treats participants as fixed effects (open circles) to a model that treats participants as random effects (filled circles). The black vertical and horizontal lines indicate the population-level fixed effect.

81 / 104

3. No Pooling

This approach estimates separate effects for each individual without sharing information between them, fitting separate models and ignoring potential similarities.

Consequences:

  • Overfitting: No pooling may overfit the data, resulting in estimates that don't generalize well.
  • Inefficient data use: Not sharing information between individuals leads to less efficient use of data and less precise estimates.
  • Interpretation challenges: Separate models for each individual make it hard to draw overall conclusions about treatment effectiveness.
82 / 104

Constructing a longitudinal model (Hoffman)

  1. Building an unconditional model of change
    • Decide what your metric of time will be
    • Decide at what occasion time 0 should be located
  2. Plot individual trajectories over time
    • What kind of change, if any, do you see on average?
    • Do you see individual differences in that pattern of change?
  3. Fit a null model (random intercept)
    • Calculate the ICC. If it is close to 1.0, there is no longitudinal variation to model.
  4. Add level-1 Fixed predictors
  5. Add level-2 Explanatory Variables
  6. Examine Whether a Particular slope varies between groups
  7. Add cross-level interactions to explain variation in the slope.

class: mainTopic, middle, center

Within-participant effects

83 / 104

SECTION: Categorical time invariant predictors

84 / 104

Time varying and time invariant predictors

  • A predictor is time varying when it is measured at multiple points in time, just as is the outcome variable.
    • In the context of education, a time-varying predictor might be the number of hours in the previous 30 days a student has spent studying.
    • time-varying predictors will appear at level 1 because they are associated with specific measurements
  • On the other hand, a predictor is time invariant when it is measured at only one point in time, and its value does not change across measurement occasions.

    • An example of this type of predictor would be gender,
    • whereas time-invariant predictors will appear at level 2 or higher, because they are associated with the individual
  • Unless time-variying predictors are group-mean centered, BP and WP variance may not be cleanly separated.

85 / 104

Time-Varying and Time-Invariant Predictors in Longitudinal Models

  • Time-varying predictors

    • A predictor is time varying when it is measured at multiple points in time, just as is the outcome variable.
    • Change over time within individuals
    • Predict within-person fluctuations
    • Can be used to model individual trajectories
    • Need to consider potential lagged effects, autocorrelation, and time-dependent confounders
  • Time-invariant predictors

    • A predictor is time invariant when it is measured at only one point in time, and its value does not change across measurement occasions.
    • Do not change over time within individuals
    • Predict between-person differences
    • Can be used to model group differences in trajectories
    • Need to consider potential multicollinearity with other time-invariant predictors
86 / 104

SECTION: Higher order longitudinal models

87 / 104

Introducing higher level variability



88 / 104

Third Order Linear Mixed Model: Depression Over Time

  • Outcome: Depression score over time

  • Data structure:

    • Repeated measures of depression for each patient
    • Patients nested within therapists
  • Model components:

    1. Fixed effects: Population-level effects of predictors (e.g., time, patient characteristics, therapist characteristics)
    2. Random effects: Individual-level variability in intercepts (baseline depression) and slopes (depression change over time)
    3. Third-level random effects: Therapist-level variability in intercepts and slopes
  • Model interpretation:

    • Fixed effects describe the average relationships between predictors and depression scores
    • Random effects capture the variability in depression scores and change over time across patients
    • Third-level random effects account for variability in patient outcomes and change over time due to therapist differences
  • Benefits:

    • Acknowledges the hierarchical structure of the data, resulting in more accurate inferences
    • Provides insights into the sources of variability in depression scores and change over time
    • Can be used to inform interventions targeted at the patient or therapist level
89 / 104

L3 - Model 1

Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 1: ytib=β0ib+ϵtib

Composite: ytib=δ000+V00b+U0ib+ϵtib

90 / 104

L3 - Model 2

Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 1: ytib=β0ib+ϵtib

Composite: ytib=δ000+V00b+U0ib+ϵtib

91 / 104

L3 - Model 2

Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 1: ytib=β0ib+ϵtib

Composite: ytib=δ000+V00b+U0ib+ϵtib

92 / 104

L3 - Model 3

Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 1: ytib=β0ib+ϵtib

Composite: ytib=δ000+V00b+U0ib+ϵtib

93 / 104

L3 - Model 4

Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 2: β1ib=γ100
Level 1: ytib=β0ib+β1ibTime+ϵtib

Composite: ytib=δ000+V00b+U0ib+ϵtib

94 / 104

L3 - Model 5

Level 3: γ00b=δ000+V00b
Level 2: β0ib=γ00b+U0ib
Level 2: β1ib=γ100+U1ib
Level 1: ytib=β0ib+β1ibTime+ϵtib

Composite: ytib=δ000+V00b+U0ib+ϵtib

95 / 104

L3 - Model 6

Level 3: γ00b=δ000+V00b
Level 3: γ10b=δ100+V00b
Level 2: β0ib=γ00b+U0ib
Level 2: β1ib=γ100
Level 1: ytib=β0ib+β1ibTime+ϵtib

Composite: ytib=δ000+V00b+U0ib+ϵtib

96 / 104

GPA exercise (1)

97 / 104

GPA exercise (2)

98 / 104

Level-3 exercise (2)

99 / 104

SECTION: Transforming data from wide to long format

100 / 104

Wide vs. Long format

Left: Wide format; each measure of the same variable is in separate column. Right: Wide format; each measure of the same variable is in separate row.

101 / 104

Pivot_longer and pivot_wider


  • The tidyverse is a collection of open source R packages that share an underlying design philosophy, grammar, and data structures.

  • In the tidyverse approach to R syntax, these two functions are used to transform a dataset from wide to long format.

    • pivot_longer() makes datasets longer by increasing the number of rows and decreasing the number of columns.

    • pivot_wider() is the opposite of pivot_longer(): it makes a dataset wider by increasing the number of columns and decreasing the number of rows.

  • See these pages for documentation and examples:

102 / 104

Creating a synthetic wide dataset

library(tidyverse)
dt<-tibble(
id=1:20, male=sample(0:1, 20, replace=TRUE),
dep1=rnorm(20), dep2=rnorm(20), dep3=rnorm(20),
anx1=rnorm(20), anx2=rnorm(20), anx3=rnorm(20)
)
print(dt,n=10)
## # A tibble: 20 x 8
## id male dep1 dep2 dep3 anx1 anx2 anx3
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 -0.709 0.0301 0.189 0.357 0.407 0.199
## 2 2 0 -1.10 -0.696 -0.0357 1.37 -0.0255 0.880
## 3 3 0 -1.06 -0.104 -0.315 0.226 0.585 -1.17
## 4 4 1 0.490 1.42 0.0567 -0.182 -0.347 1.17
## 5 5 1 0.149 -1.25 0.284 -0.354 0.0755 0.948
## 6 6 0 1.39 -0.828 0.392 -0.197 1.55 0.847
## 7 7 0 -1.41 -1.94 -2.02 -0.0964 -0.182 1.58
## 8 8 0 -0.592 -0.356 -1.30 0.434 -0.600 0.352
## 9 9 0 1.03 0.359 0.0586 0.980 0.921 0.793
## 10 10 1 -1.02 1.81 0.776 0.252 0.556 -0.589
## # i 10 more rows
103 / 104

Pivot from wide to long

dt_l <-
dt %>% pivot_longer(
cols=dep1:anx3,
names_sep=3,
names_to = c("disorder", "number")
) %>% pivot_wider(names_from="disorder") %>%
rename("wave"="number",
"anxiety"="anx",
"depression"="dep")
print(dt_l,n=10)
## # A tibble: 60 x 5
## id male wave depression anxiety
## <int> <int> <chr> <dbl> <dbl>
## 1 1 1 1 -0.709 0.357
## 2 1 1 2 0.0301 0.407
## 3 1 1 3 0.189 0.199
## 4 2 0 1 -1.10 1.37
## 5 2 0 2 -0.696 -0.0255
## 6 2 0 3 -0.0357 0.880
## 7 3 0 1 -1.06 0.226
## 8 3 0 2 -0.104 0.585
## 9 3 0 3 -0.315 -1.17
## 10 4 1 1 0.490 -0.182
## # i 50 more rows
104 / 104

Shrinkage in partial pooling

  • Pooling data across clusters tends to shrink their deviation from the overall mean levels.

  • Shrinkage effect on individual participant intercept and linear term parameter estimates. For each participant, the arrow shows the change in the parameter estimate from a model that treats participants as fixed effects (open circles) to a model that treats participants as random effects (filled circles). The black vertical and horizontal lines indicate the population-level fixed effect.

81 / 104
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow
continue; } } i++; } }; slideshow._releaseMath(document);