<!-- 
pagedown::chrome_print("Course__Multilevel_Models__26apr2021.html",output="Course__Multilevel_Models__26apr2021.pdf")
xaringan::inf_mr()

-->

# PSY9185 - Multilevel models
### Longitudinal data analysis using multilevel models

### Nikolai Czajkowski

<!-- pagedown::chrome_print("PSY9140_2019_V03.html",output="PSY9140_2019_V03.pdf")

Lesa Hoffmans powerpoint
https://qipsr.as.uky.edu/sites/default/files/QIPSR_2013_Packet.pdf
-->

---

# Today 
 
* Introduction to longitudinal data and LMM.
* Plotting longitudinal data.
* Basic two-level longitudinal
 * Fixed and random coefficients
* Non-linear and piecewise continuous change
* Including predictor in the model
 + Time invariat predictors
 + Time varying predictors

---
# Books
.center[
<img src="./images/mirman.jpg" height="300">
<img src="./images/hoffman.jpg" height="300">
]
 
* Several examples borrowed from Mirman (2014).
* Lesa Hoffman has links to several full video courses in longitudinal modelling on her homepage.
 + http://www.lesahoffman.com/

---

# R resources

**Fitting mixed models in R:**
* https://m-clark.github.io/mixed-models-with-R/introduction.html
* https://rpsychologist.com/r-guide-longitudinal-lme-lmer

**Books on R:**
* Cookbook for R: http://www.cookbook-r.com/
* R for data science: https://r4ds.had.co.nz/

**Online resources**  
*  R-bloggers; R news and tutorials contributed by hundreds of R bloggers.
  + (http://www.r-bloggers.com/) 
* Stackoverflow: question and answer site for programmers.  
  + (http://stackoverflow.com/)

---

# Multilevel tutorials

To tutorials available here:

Clustered data: https://shinyibv02.uio.no/connect/#/apps/433caf55-e0da-477a-920e-bc044ef5030f/access

Longitudinal analysis: https://shinyibv02.uio.no/connect/#/apps/bf3cac08-66ca-45d0-a45c-9b9402aa6c84/access

---

# SECTION: Introducing longitudinal analysis

---

# Multilevel models for clustered data (1)

---

# Classes 2
.center[
<img src="./images/classes2.svg" width="800">
]
---

# Classes 3
.center[
<img src="./images/classes3.svg" width="800">
]
---

# Classes 4

---
# Two sides of all models
.center[ 
<img src="./images/Rfig_RegAssumptions.svg" height="250">
]

* Models are an idealized account of:
  1. How the expected valueis related to an independent variable *(mean)*.
  2. How people vary *(variance)*.
  
* Linear regression only models the mean value, not the variance.

---

# Assumptions in linear models

1. The relationship between X and Y is linear.
2. The error variance (around the regression line) is *normally distributed* and is constant (i.e. the sd is the same) at every level of X.
3. The observations are independent.
4. The independent variables are measured without error.

---
# Which assumtion is really important?
 
* We often focus on the extent to which the residuals are normally distributed, but in practise this rarely has a major impact on the standard errors (or p-values).
* *Dependency* in the data can render all p-values completely biased.

---
# Multilevel data structures
 
.pull-left[
* Data can be *nested* within higher order units.
* Different variables are measured at different levels.
 + Traditionally handeled by *aggregation* or *disaggregation*.

]
.pull-right[
<img src="./images/nested3.png" width="400">
]

---
# Longitudinal data structures
 
.pull-left[
* In longitudinal multilevel models, observations are nested within persons.
 + Level 1 units are individual observations
 + Level 2 units are usually individuals
 + People may be nested under higher order units (therapists, treatment centers, etc).

* *Time* will be included as a level 1 predictor.

]
.pull-right[
<img src="./images/nested2.png" width="400">
]
---

# What are longitudinal models?
 
* **Time course data** are the result of repeated measurments at multiple (at least two) time points.
 + These sorts of data are also called *longitudinal*.

* Two key properties distinguish time course data from other kinds of data:
  1. Groups of observations all come from one source (nested data).
  2. The repeated measurements are *related by a continuous variable*, usually that variable is time, but any continuous variable will do.
  + We will also look at other variables, like *set-size*.

* For analyses we will use multilevel (mixed) models that include time course data as a *level-1 predictor*.

???
    +  if you asked participants to name letters printed in different sizes, you could examine the outcome (letter recognition accuracy) as a function of the continuous predictor size.

---
# Benefits of multilevel models in general

* **Flexibility in modelling dependency across observations**
  + LMMs provide a flexible framework to model and account for various sources of non-independence, reducing biased estimates and increasing the accuracy of inferences.
  
* **Avoid choosing between individuals or groups as the unit of analysis**
  + Cross-level interactions are uniquely possible in MLM.

*  **Increased statistical power** 
  + By including random effects, LMMs can improve the precision of estimates and increase statistical power compared to traditional regression models.

* **Generalizability** 
  + LMMs allow researchers to investigate both the average effect of predictors and their variations across different levels, helping to generalize findings across various contexts and populations.

* **Study variation between and within people**
  + Ask more complex and interesting questions

---
# Benefits of MLM for longitudinal data

* **Handle dependency** 
  + Traditional analyses only designed to handle dependency due to constant mean differences.
* **Handles missing**
  + Number of observations can vary between individuals.
* **Flexible treatment of time**
  + Measures can be taken at fixed or at varying occasions.
  + No assumption that an equal amount of time has elapsed.
* **Investigate within-person relationships**
  + Between and within effects can differ, even in direction.  
]

]

???
* longitudinal studies provide the opportunity to test hypotheses at multiple levels of analysis simultaneously

+ models in this text will allow us to exam- ine both between-person and within-person relationships in the same variables at the same time
  + For example, we might posit a link between stress and negative mood, such that greater amounts of stress will result in greater negative mood. But at what level of analysis is this relationship likely to hold: between persons, within persons, or both?

---

# Pooling

* https://www.bayesrulesbook.com/chapter-15.html

---

---
# Plotting longitudinal data

Plotting is an even more critical first step longitudinal analysis than cross-sectional studies.

We will make the following classes of figures:
* Individual change curves.
* Plots of group mean levels and CI/confidence regions.
* Stratified plots (assess interactions).

**Why ggplot?**
  1. Plots are publication grade quality.
  2. The syntax is flexible and powerful framework for visualizing data.
  3. Summary statistics like means and standard errors can be computed while plotting.

---
# Introduction to ggplot2

- ggplot2 is a powerful R package for data visualization
- Built on the principles of the "Grammar of Graphics"
  + first assign variables in your data to properties of the graph. These assignments or mappings are called the aesthetics of your graph. Then you select “geometries,” or geoms — points, lines, bars, etc. — for those aesthetics.
- Create complex, multi-layered plots using a consistent syntax
- Easily customizable and extensible
- Widely used in data analysis and research

---

# Spaghetti Plots for Longitudinal Data

- Visualize individual trajectories over time
- Identify patterns and trends in the data
- Assess variability between and within groups
- Detect potential outliers or influential cases
- Compare individual trajectories to the overall fixed effect of time

---

# Generating a Spaghetti Plot with ggplot2

Below is the R-code used to generate the plot.

```r
library(ggplot2)

# Create spaghetti plot
ggplot(data, aes(x = time, y = y, group = id, color = group)) +
  geom_line(alpha = 0.5) +
  geom_line(data = fixed_effect_data, aes(x = time, y = y, group = 1), color = "black", 
            size = 1.5, linetype = "solid") +
  labs(title = "Example Spaghetti Plot", x = "Time", y = "Outcome") +
  theme_minimal()

# Save plot to a file
ggsave("example_spaghetti_plot.png", width = 7, height = 5)
```
]

---

---

# Features of longitudinal data
.pull-left[
Multilevel longitudinal models permit analyses at different levels:

* **Between-person variance**: 
  + Inter-individual variation.
  + Ex. Biological sex, ethnicity.
  
* **Within-person variance**:  
  + Intraindividual variation.
  + Ex. Sleep the previous night.

]
.pull-right[
<img src="./images/features1.png" width="400">

]
Relationsips observed at the within-person level need not (and often will not) mirror those at the between-person level of analysis.
* Eg. resting heart rate vs. exercise.

---

# Features of longitudinal data (2)
 
.pull-left[

* **Within person change**: Specific type of within person variation, that refers to any systematic change that is expected as a result of meaningful passage of time.
* **Within person fluctuation**: Undirected variation over repeated assesments seen in contexts in which one would not expect systematic change.

]

]

---

# Unconditional longitudinal models

In **unconditional** longitudinal models, *time* is the only predictor.
  + Included as a level-1 variable.

Critical first questions:
* What units should time be measured in?
* What constitutes "time=0"? 
  + E.g. If *age*, it is best to center age on another value than "0".
* How do you expect development over time to be?
  + Linear, non.linear, abrupt/discontinuous?

---

# Introducing idealized growth curves

---

# Between-person and Within-person empty models

*Empty* models partition the variance, but don't account for it.

Between person empty model

`$$\begin{align*}
y_{ti} &= \underbrace{\beta_0}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{\epsilon_{ti}}_{\text{Error}}
\end{align*}$$`

]
.pull-right[
<img src="./images/wp_empty2.png" width="400">

Within person empty model
`$$\begin{align*}
y_{ti} &= \underbrace{\beta_0}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{U_{0i}}_{\substack{\text{Random}\\\text{Intercept}}} + \underbrace{\epsilon_{ti}}_{\text{Error}}
\end{align*}$$`

]

---

# Fixed and random effects

.pull-left[
 
.content-box-blue[
.font80[
Level 2: `$\beta_{0i}=\gamma_{00}+U_{0i}$` 
Level 1: `$y_{ti}=\beta_{0i}+\epsilon_{ti}$` 
 
Composite: `$y_{ti}=(\gamma_{00}+U_{0i})+\epsilon_{ti}$`
]
]]

.pull-right[
<img src="./images/wp_empty2.png" width="300">
`$$y_{ti} = \beta_0+u_{0i}+\epsilon_{ti}$$`

]

* We instead argue that deviation from the grand mean is attributable to a normally distributed residual at level 2.
Now we can account for this variance by estimating a parameter (variance of 𝑈_0𝑖) instead of estimating the fixed effects of 49 dummy variables.
Random effects are not designed to make inferences between the specific variants included in the study (can’t compare participant 1 to participant 50).
Could we include a random effect of time? This would require at least three measurements per person.

---

# Why random intercept?

.pull-left[
 
* Differences between individuals *could* be handled by N-1 dummy variables.
 + Many estimated parameters and loss of power.
 
* With MLM, clustering is accounted for by only one parameter, the *variance* of a normal distribution.

]

]

---
# Model 1: Random intercept

.content-box-blue[
.font80[
Level 2: `$\beta_{0i}=\gamma_{00}+U_{0i} \quad \quad U_{0i}\sim N(0, \sigma_{U_{0i}})$` 
Level 1: `$y_{ti}=\beta_{0i}+\epsilon_{ti} \quad \quad \epsilon_{ti} \sim N(0, \sigma_\epsilon)$` 
 
Composite: 
`$$\begin{align*}
y_{ti} &= (\underbrace{\gamma_{00}}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{U_{0i}}_{\substack{\text{Random}\\\text{Intercept}}}) + \underbrace{\epsilon_{ti}}_{\text{Error}}
\end{align*}$$`
]
]

---
# Model 1: Random intercept

```r
# Load necessary libraries
library(lme4)

# Fit the random intercept model
model1 <- lmer(y ~ 1 + (1|subject), data=data)

# Print the model summary
summary(model1)
```

]
]

---

# Fixed and random effects
 
Multilevel (mixed) models contain two classes of coefficients.
 
**Fixed effects**: The «structural» part of the model, spesifying the expected conditional mean.
* Empty model: Only intercepts are included.
* Unconditional models: Intercepts + passage of time.
* Conditional models: Intercepts + variables accounting for differences in the passage of time.

**Random effects**: Coefficients that specify the stochastic (error) part of the model.
* Describes how the residuals of the Y outcome are distributed and related across the observations.

---

# Model 2: Random intercept

---

# Intraclass Correlation Coefficient (ICC)
 
- Measures the **proportion of total variance** in the outcome that is attributable to the grouping structure (e.g., individuals within groups)
- Ranges from 0 to 1:
 - 0: No correlation between observations within the same group
 - 1: Observations within the same group are identical
- Important for assessing **group-level effects** in mixed models
- A high ICC indicates a strong grouping effect and the need for a multilevel or mixed model to account for the clustered data structure

---

# Intraclass Correlation Coefficient (ICC) 2

.center[ 
<img src="./images/mlm_m1.svg" width="300">
<img src="./images/mlm_m2.svg" width="300">
]
 
`$$ICC = \frac{BP \quad variation}{BP + WP\quad variation} = \frac{Var(U_{0i})}{Var(U_{0i})+Var(\epsilon_{ti})}$$`

---

# Model 3: Fixed slope, Random intercept

.content-box-blue[
.font80[
Level 2: `$\beta_{0i}=\gamma_{00}+U_{0i}$` 
Level 2: `$\beta_{1i}=\gamma_{10}$` 
Level 1: `$y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\epsilon_{ti}$` 
 
Composite: `$y_{ti}=(\gamma_{00}+U_{0i})+\gamma_{10}\cdot Time_{ti}+\epsilon_{ti}$`
`$$\begin{align*}
y_{ti} &= (\underbrace{\gamma_{00}}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{U_{0i}}_{\substack{\text{Random}\\\text{Intercept}}}) + \underbrace{\gamma_{10}}_{\substack{\text{Fixed}\\\text{Slope}}} \cdot Time_{ti} + \underbrace{\epsilon_{ti}}_{\text{Error}}
\end{align*}$$`
]]

---
# Comparing Linear Mixed Models

- **Likelihood Ratio Tests:**
    - Performed with the `anova()` function.
    - Only valid under maximum likelihood (ML) estimation, not restricted maximum likelihood (REML).

- **Information Criteria:**
    - Akaike Information Criterion (AIC) can be calculated using the `AIC()` function.

- **Pseudo `$R^2$`:**
    - **Purpose:** Assess the proportion of variance explained by the predictors in the model, similar to `$R^2$` in linear regression.
    - Calculated separately for Level 1 and Level 2 effects.
    - **Range:** 0 to 1; higher values indicate better model fit.
    - **Limitation:** Not directly comparable to `$R^2$` in linear regression. Interpretation should be cautious.

---

# Over vs. underfitting (Scylla og Charybdis)

**Overfitting**: Poor prediction as a result of learning too much from your data. 
**Underfitting**: Poor prediction as a result of learning too little from your data. 
 
* You can always get better fit to your data by adding more predictors.
* Every dataset contains both systematic and unsystematic variance (noise), so overly complex models may fit well to the current data, but predict less variance in *new* data.

---
# What happens as the model becomes too complex?

.center[
<img src="./images/cross1.svg" width="400">
]
Over: The green curve is the *true* (generating) function. The red fits better, but adapts to a lot of the noise in the data.
---
# k-fold cross validation

.center[
<img src="./images/CV.png" width="400">
]
 
* **Cross-validation** involves partitioning the data into *training* and *test* subsets.
* The model is buildt based on the training sets, and evaluated on the test sets.
 + Cornerstone in machine learning approaches.

---
# k-fold kryssvalidering (2)

Notice: 
1. It is easy to explain variance in the training data, but far more difficult to explain it in (new) test data.
2. Initially, as the model grows in complexity, explained variance increases. However, as it gets overly complex, explained variance in test data starts to *decrease*.

---
# Metode 3: Information criteria (AIC / BIC)

* **Information criteria** (IC) is a class of statistics devised to strike the optimalbetween over and underfitting.
* The model with the *lowest* AIC/BIC value is chosen as the best one.
  + Non-nested models can be compared.

---
# Model 4: Fixed slope, Random intercept

---

# Model 5: Random slope, Random intercept

.content-box-blue[
.font80[
Level 2: `$\beta_{0i}=\gamma_{00}+U_{0i}$` 
Level 2: `$\beta_{1i}=\gamma_{10}+U_{1i}$` 
Level 1: `$y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\epsilon_{ti}$` 
 
Composite: 
`$$\begin{align*}
y_{ti} &= (\underbrace{\gamma_{00}}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{U_{0i}}_{\substack{\text{Random}\\\text{Intercept}}}) + (\underbrace{\gamma_{10}}_{\substack{\text{Fixed}\\\text{Slope}}} + \underbrace{U_{1i}}_{\substack{\text{Random}\\\text{Slope}}}) \cdot Time_{ti} + \underbrace{\epsilon_{ti}}_{\text{Error}}
\end{align*}$$`
]]

---

# How do these differ?

.center[ 
<img src="./images/mlm_m6a.svg" width="350">
<img src="./images/mlm_m6b.svg" width="350">
]

---
# Random effects can be correlated

.center[ 
<img src="./images/mlm_m6a.svg" width="280">
<img src="./images/mlm_m6b.svg" width="280">
]

`$$G=\begin{bmatrix}
Var(U_{0i}) & Cov(U_{0i},U_{1i}) \\
Cov(U_{0i},U_{1i}) & Var(U_{1i}) 
\end{bmatrix}$$`
 
Composite: `$y_{ti}=(\gamma_{00}+U_{0i})+(\gamma_{10}+U_{1i})\cdot Time_{ti}+\epsilon_{ti}$`
]]

---

# Sensitivity of the intercept-slope correlation to the centering of time

- **Impact on Interpretability:**
    - Centering time can affect the interpretability of the (fixed) intercept.

- **Intercept-Slope Correlation:**
    - The interpretation of any (random) intercept-slope correlation is conditional on the location of the intercept.

- **Example**
 - Centering at `t=0` will result in an estimated correlation less than 0 (`< 0`).
 - Centering at `t=3.5` will result in an estimated correlation equal to 0 (`= 0`).
 - Centering at `t=8` will result in an estimated correlation greater than 0 (`> 0`).
]

---

# G and R matrices

.center[
<img src="./images/rmat0.png" height="150">
]
 
.content-box-blue[
The model implied covariance matrix is a function of two matrices:
 
* **G** is a covariance matrix for level-2 random coefficients.
* **R** is a covariance matrix for level-1 random coefficients.
 + Usually the R matrix is diagonal, but some programs allow you to specify a different structure for this matrix.
]

---

# Calculating the implied covariance (1)
 
.center[
<img src="./images/rmat1.png" height="180">

]
 
The implied covariance matrix is a function of both the G and R matrices.

---

# Technical detail: Calculating the implied covariance 
 
.center[

]

---

# Modelling Non-linear Change Over Time

- General Approaches to Dealing with Non-linearity
  - Polynomial models
  - Piecewise-discontinuous models
  - Splines
]

---

# Using polynomials to mod non-linear change

* Linear regression is more flexible than the name implies, and can be used to model (some) non-linear relationships.
* Traditionally **polynomials** used to model non-linear relationships.

`$$E(Y|X) = b_0 + b_1\cdot X+ b_2\cdot X^2+ ... + b_p\cdot X^p$$`

* Polynomial functions are only valid in a restricted range.
* Intercept and linear terms must be included in the model for it to be meaningful (regardless of significance).

---

# Model 7: Polynomial change

.content-box-blue[
.font70[
Level 2: `$\beta_{0i}=\gamma_{00}+U_{0i}$` 
Level 2: `$\beta_{1i}=\gamma_{10}+U_{1i}$` 
Level 2: `$\beta_{1i}=\gamma_{20}+U_{2i}$` 
Level 1: `$y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\beta_{2i}\cdot Time_{ti}^2+\epsilon_{ti}$` 
 
Composite: 
`$$\begin{align*}
y_{ti} &= (\underbrace{\gamma_{00}}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{U_{0i}}_{\substack{\text{Random}\\\text{Intercept}}}) + (\underbrace{\gamma_{10}}_{\substack{\text{Fixed}\\\text{Lin Slope}}} + \underbrace{U_{1i}}_{\substack{\text{Random}\\\text{Lin Slope}}}) \cdot Time_{ti} + (\underbrace{\gamma_{20}}_{\substack{\text{Fixed}\\\text{Quad Slope}}} + \underbrace{U_{2i}}_{\substack{\text{Random}\\\text{Quad Slope}}}) \cdot Time_{ti}^2 + \underbrace{\epsilon_{ti}}_{\text{Error}}
\end{align*}$$`
]]

---
# Choosing Polynomial Degree in Longitudinal Models

- **Consider Data:**
    - Think about the data in terms of the number of times the curve changes direction (corresponding to the number of inflection points).

- **Statistical Approach:**
    - Include only and all of the polynomial orders that improve model fit.

- **Theoretical Approach:**
    - Include only those terms for which the experimenter predicted an effect.

---
# Orthogonal polynomials

- **Natural Polynomials:**
    - Allow testing for differences at "Time 0."
    - Useful when such differences need to be tested.

- **Orthogonal Polynomials:**
    - Provide the same estimates as natural polynomials.
    - Uncorrelated, so p-values may differ (sometimes considerably).

---
# Splines
.pull-left[
 
- Splines allow for flexibility in modeling complex, non-linear relationships
- They can capture local patterns and smooth changes over time
- Reduce overfitting compared to high-order polynomials
- Cubic splines are commonly used for their smoothness and continuity
- Natural splines can impose constraints to reduce extreme behavior at the endpoints
]
.pull-right[
<img src="./images/spline_example.png" width="300">
]

---

# Piecewise linear models 
.pull-left[
- Better fit for discontinuous data:
 - More accurately capture abrupt changes
- Interpretability:
 - Coefficients of piecewise continuous models have simple interpretations
- Parsimonious representation:
 - May require fewer parameters than high-degree polynomial 
- Avoid overfitting:
 - Less likely to overfit the data compared to high-degree polynomial 
]
.pull-right[
<img src="./images/piecewise_example.png" width="300">
]

---

# Piecewise linear models: coding

]
.font80[
These two coding schemes only differ in the interpretation of the regression coefficients.
* In scheme 1 the two slope coefficients represent the actual slope in the respective time period. 
* In scheme 2 the coefficient for time 2 represents the *deviation* from the slope in period 1,     + i.e. if the estimate is 0 then the rate of change is the same in both periods.
]
.footnote[.font70[https://rpsychologist.com/r-guide-longitudinal-lme-lmer#piecewise-growth-curve]]

---

# Getting p-values

* P-values are not reported by default, as the number of degrees of freedom is hard to calculate.
* Some approximations can be made, for example using the `lmerTest` package.

```r
# P-values for individual coefficients
require(lmerTest)
m1 <- lmer(y ~ x | id), data=dt)
coef(summary(m1))

# Alternatively, confidence intervals
confint(vs_3) # Profile CI
confint(vs_3, method="boot", nsim=100) # Bootstrapped CI

# Alternatively, compare models with anova()
anova(m1,m2)
```

---
# Reporting growth curve results

- Model selection:
  - Report the criteria used to select the best-fitting model, e.g., AIC, BIC, or likelihood ratio tests.
  
- Fixed effects:
  - Report the estimated coefficients, standard errors, t-values, and p-values for each fixed effect predictor.
  - Interpret the direction and magnitude of the relationships between the predictors and the outcome variable.
  
- Random effects:
  - Report the estimated variances and standard deviations for random intercepts and slopes.
  - Interpret the amount of variability in the intercepts and slopes across the different levels of the model.
  
- Model fit:
  - Report goodness-of-fit statistics, such as pseudo R-squared or deviance explained.
  - Compare the selected model with alternative or null models, if applicable.

- Visualizations:
  - Include relevant plots, such as individual growth curves, group-level trajectories, or predicted values versus observed values.

---

# Model 8

Why is there random intercept variance?
---
# Model 8

.content-box-blue[
.font80[
Level 2: `$\beta_{0i}=\gamma_{00}+\gamma_{01}\cdot Group_i+U_{0i}$` 
Level 2: `$\beta_{1i}=\gamma_{10}+U_{1i}$` 
Level 1: `$y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\epsilon_{ti}$` 
 
Composite: `$y_{ti}=(\gamma_{00}+\gamma_{01}\cdot Group_i+U_{0i})+(\gamma_{10}+U_{1i})\cdot Time_{ti}+\epsilon_{ti}$`
]]

---

# Model 9

.content-box-blue[
.font80[
Level 2: `$\beta_{0i}=\gamma_{00}+\gamma_{01}\cdot Group_i+U_{0i}$` 
Level 2: `$\beta_{1i}=\gamma_{10}+\gamma_{11}\cdot Group_i+U_{1i}$` 
Level 1: `$y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\epsilon_{ti}$` 
 
Composite: `$y_{ti}=(\gamma_{00}+\gamma_{01}\cdot Group_i+U_{0i})+(\gamma_{10}+\gamma_{11}\cdot Group_i+U_{1i})\cdot Time_{ti}+\epsilon_{ti}$`
]]

---

# Inferential consequences of including random effects     
- **Effect on t/p-values:**
    - The impact on t/p-values of estimating random components depends on the level.
    - Moving variance from Level 1 to Level 2 can provide more power to detect Level 1 predictors.

- **Improved Model Fit:**
    - Including random effects can help account for unobserved heterogeneity in the data, leading to better model fit.

- **Generalizability:**
    - Including random effects allows for more accurate generalizations to the larger population.

- **Cautions:**
    - Adding random effects can increase model complexity.
    - It is important to carefully consider the theoretical justifications for including random effects in the model.

---

# Techincal issues: Parameter estimation (ML vs. REML)

- **ML (Maximum Likelihood)**
  - Estimates fixed and random effects simultaneously
  - Can result in biased estimates of random effect variances
  - Suitable for model selection and comparison

- **REML (Restricted Maximum Likelihood)**
  - Estimates fixed effects and random effects separately
  - Provides unbiased estimates of random effect variances
  - Less suitable for model selection and comparison

- **Which to use?**
  - For unbiased variance estimates, use REML
  - For model selection or comparison, use ML
  - Sometimes, both are used in a two-step process: REML to estimate variance components and ML to compare models

---
# Techincal issues: Standardization
 
.center[
<img src="./images/zcoef.png" width="600">
]

- **Purpose**: To compare the relative importance of predictors and facilitate interpretation.

- **Procedure**:
  1. Standardize continuous predictor variables (center and scale)
  2. Refit the linear mixed model with standardized predictors
  3. Interpret the standardized coefficients as effect sizes

- **Interpretation**:
  - Each standardized coefficient represents the change in outcome variable (in standard deviations) for a one standard deviation change in the predictor
  - Larger absolute values indicate stronger relationships between predictors and the outcome variable

- **Considerations**:
  - Only applicable to continuous predictor variables
  - Ensure that model assumptions are met before interpreting standardized coefficients
  - Be cautious when comparing standardized coefficients across models with different fixed and random effects structures

---

# Techincal issues: Pseudo R2

.pull-left[
 
* There no MLM statistic entirely equivalent to R2 in ordinary regression.
* While R2 increases monothically as independent variables are added in linear regression, this may not hold in multilevel models.
* in MLM we can calculate an of explained variance at each level.
 + This is referred to as *pseudo R2*, and can behave in unusual ways.
 + Can decrease as predictor on another level is included.

]
.pull-right[
<img src="./images/nested3.png" width="400">
]

---
# Example: Modelling popularity in pupils

---

# SECTION: A closer look at random effects

---

# Keeping it maximal

* A full or **maximal random effect structure** is the case where all of the factors that could hypothetically vary across individual observational units are allowed to do so.

* The general principle is keep it maximal (Barr et al., 2013): the random effects should include as much of the structure of the data as possible.

```r
m_1 <- lmer(Accuracy ~ (ot1+ot2)*TP + (ot1 | Subject), data=WordLearnEx, REML=FALSE)
m_2 <- lmer(Accuracy ~ (ot1+ot2)*TP + (ot1+ot2 | Subject), data=WordLearnEx, REML=FALSE)
```
]

Removing a time term from the random effects primarily reduces the standard error of the corresponding fixed effect estimate, making more significant.
---

# Example: Cognitive Performance and Age

- **Research question**: Does cognitive performance decline with age and does this decline rate vary between individuals?

- **Data structure**:
  - Repeated measures of cognitive performance for each individual
  - Age as a time-varying predictor

- **Linear mixed model**:
  - Random intercepts and slopes for age
  - Correlation between intercepts and slopes

Reason to Drop the Intercept-Slope Correlation

- **Hypothesis**: Initial cognitive performance and rate of decline are unrelated.
  - Example: Individuals with high initial performance decline at the same rate as those with low initial performance.

- **Model modification**: Set the correlation between intercepts and slopes to zero.
  - This enforces the hypothesis that initial performance and decline rate are independent.

- **Interpretation**: If the modified model fits the data well, it suggests that the rate of cognitive decline is not dependent on initial performance levels.

---
# How to specify non-correlatred intercept and slope

```r
m_corr <- lmer(y ~ time + (1 | id), data=dt)
m_noncorr <- lmer(y ~ time + (1 | id) + (0 + time | id), data=dt)
```

Reason to Drop the Intercept-Slope Correlation

- **Model modification**: Set the correlation between intercepts and slopes to zero.
  - This enforces the hypothesis that initial performance and decline rate are independent.

- **Interpretation**: If the modified model fits the data well, it suggests that the rate of cognitive decline is not dependent on initial performance levels.

---

# Effect of omitting random coefficients

.pull-left[
<img src="./images/mirman_rtable1.png" width="400">
<img src="./images/mirman_rtable2.png" width="400">
<img src="./images/mirman_rtable3.png" width="400">
]

```r
m_2<-lmer(Accuracy ~ (ot1+ot2)*TP + (ot1+ot2 | Subject))
```
]
.font40[
`$$G=\begin{bmatrix}
Var(U_{0}) & Cov(U_{0},U_{1}) & Cov(U_{0},U_{2}) \\
Cov(U_{0},U_{1}) & Var(U_{1}) & Cov(U_{1},U_{2}) \\
Cov(U_{0},U_{2}) & Cov(U_{1},U_{2}) & Var(U_{2})
\end{bmatrix}$$`
]

```r
m_1<-lmer(Accuracy ~ (ot1+ot2)*TP + (ot1 | Subject))
```
]
.font40[
`$$G=\begin{bmatrix}
Var(U_{0}) & Cov(U_{0},U_{1}) & 0 \\
Cov(U_{0},U_{1}) & Var(U_{1}) & 0 \\
0 & 0 & 0
\end{bmatrix}$$`
]
 
.font60[

```r
m_3<-lmer(Accuracy ~ (ot1+ot2)*TP + 
 (1 | Subject) + 
 (0+ot1 | Subject) + 
 (0+ot2 | Subject))
```
]
.font40[
`$$G=\begin{bmatrix}
Var(U_{0}) & 0 & 0 \\
0 & Var(U_{1}) & 0 \\
0 & 0 & Var(U_{2})
\end{bmatrix}$$`

]
]

---

# Pooling

Depending upon the variation among clusters, which is learned
from the data as well, the model pools information across clusters. This pooling tends to
improve estimates about each cluster.

**Complete pooling:** :First, suppose you ignore the varying intercepts and just use the overall mean across all clusters.

**No pooling** Estimate separate fixed effect in each cluster.

**Partial pooling** Multilevel approach. Extreme values are pulled towards an overall average, and this is tronger for smaller groups.

---

# 1. Complete Pooling

This approach assumes the treatment has the same effect on everyone, ignoring individual-specific variation. It estimates an average treatment effect across all individuals.

**Consequences**:

- *Biased estimates*: Complete pooling may yield biased treatment effect estimates due to ignored individual variation.
- *Inaccurate inferences*: Hypothesis tests and confidence intervals may be misleading, leading to incorrect conclusions.
- *Lack of personalization*: No information about the treatment's effectiveness for specific individuals, hindering tailored interventions.

---

# 2. Partial Pooling

This approach estimates separate effects for each individual while "borrowing strength" from the group, typically using random effects or mixed models. It allows for individual-specific treatment effects and considers the overall population effect.

**Consequences**:

- *More accurate estimates*: Partial pooling accounts for individual variation, providing more accurate treatment effect estimates.
- *Improved inferences*: Hypothesis tests and confidence intervals are more accurate, leading to reliable conclusions.
- *Personalized insights*: Provides information on the treatment's effectiveness for specific individuals, aiding personalized interventions.

---
# Shrinkage in partial pooling
.center[ 
<img src="./images/mirman_pooling.png" width="300">
]

* Pooling data across clusters tends to *shrink* their deviation from the overall mean levels.

* **Shrinkage** effect on individual participant intercept and linear term parameter estimates. For each participant, the arrow shows the change in the parameter estimate from a model that treats participants as fixed effects (open circles) to a model that treats participants as random effects (filled circles). The black vertical and horizontal lines indicate the population-level fixed effect.

---
# 3. No Pooling

This approach estimates separate effects for each individual without sharing information between them, fitting separate models and ignoring potential similarities.

**Consequences**:

- *Overfitting*: No pooling may overfit the data, resulting in estimates that don't generalize well.
- *Inefficient data use*: Not sharing information between individuals leads to less efficient use of data and less precise estimates.
- *Interpretation challenges*: Separate models for each individual make it hard to draw overall conclusions about treatment effectiveness.

---

# Constructing a longitudinal model (Hoffman)

1. Building an unconditional model of change
  + Decide what your metric of time will be
  + Decide at what occasion time 0 should be located
2. Plot individual trajectories over time
  + What kind of change, if any, do you see on average?
  + Do you see individual differences in that pattern of change?
3. Fit a null model (random intercept)
  + Calculate the ICC. If it is close to 1.0, there is no longitudinal variation to model.
4. Add level-1 Fixed predictors
5. Add level-2 Explanatory Variables
6. Examine Whether a Particular slope varies between groups
7. Add cross-level interactions to explain variation in the slope.

---

# SECTION: Categorical time invariant predictors
---

# Time varying and time invariant predictors
* A predictor is **time varying** when it is measured at multiple points in time, just as is the outcome variable.
  + In the context of education, a time-varying predictor might be the number of hours in the previous 30 days a student has spent studying. 
  + time-varying predictors will appear at level 1 because they are associated with specific measurements
* On the other hand, a predictor is **time invariant** when it is measured at only one point in time, and its value does not change across measurement occasions. 
  + An example of this type of predictor would be gender,
  + whereas time-invariant predictors will appear at level 2 or higher, because they are associated with the individual 
  
* Unless time-variying predictors are group-mean centered, BP and WP variance may not be cleanly separated.
---

# Time-Varying and Time-Invariant Predictors in Longitudinal Models

- **Time-varying predictors**
  - A predictor is **time varying** when it is measured at multiple points in time, just as is the outcome variable.
  - Change over time within individuals
  - Predict within-person fluctuations
  - Can be used to model individual trajectories
  - Need to consider potential lagged effects, autocorrelation, and time-dependent confounders

- **Time-invariant predictors**
  - A predictor is **time invariant** when it is measured at only one point in time, and its value does not change across measurement occasions. 
  - Do not change over time within individuals
  - Predict between-person differences
  - Can be used to model group differences in trajectories
  - Need to consider potential multicollinearity with other time-invariant predictors

---

# SECTION: Higher order longitudinal models

---
# Introducing higher level variability

.center[ 
<img src="./images/three_level_structure.svg" width="400">
 
 
<img src="./images/mlm_3l_m1i.svg" width="500">
]

---

# Third Order Linear Mixed Model: Depression Over Time

- **Outcome**: Depression score over time

- **Data structure**:
  - Repeated measures of depression for each patient
  - Patients nested within therapists

- **Model components**:
  1. **Fixed effects**: Population-level effects of predictors (e.g., time, patient characteristics, therapist characteristics)
  2. **Random effects**: Individual-level variability in intercepts (baseline depression) and slopes (depression change over time)
  3. **Third-level random effects**: Therapist-level variability in intercepts and slopes
  
- **Model interpretation**:
  - Fixed effects describe the average relationships between predictors and depression scores
  - Random effects capture the variability in depression scores and change over time across patients
  - Third-level random effects account for variability in patient outcomes and change over time due to therapist differences

- **Benefits**:
  - Acknowledges the hierarchical structure of the data, resulting in more accurate inferences
  - Provides insights into the sources of variability in depression scores and change over time
  - Can be used to inform interventions targeted at the patient or therapist level

---
# L3 - Model 1

Level 3: `$\gamma_{00b}=\delta_{000}+V_{00b}$` 
Level 2: `$\beta_{0ib}=\gamma_{00b}+U_{0ib}$` 
Level 1: `$y_{tib}=\beta_{0ib}+\epsilon_{tib}$` 
 
Composite: `$y_{tib}=\delta_{000}+V_{00b}+U_{0ib}+\epsilon_{tib}$`
]
]

---
# L3 - Model 2

---
# L3 - Model 2

---
# L3 - Model 3

---
# L3 - Model 4

Level 3: `$\gamma_{00b}=\delta_{000}+V_{00b}$` 
Level 2: `$\beta_{0ib}=\gamma_{00b}+U_{0ib}$` 
Level 2: `$\beta_{1ib}=\gamma_{100}$` 
Level 1: `$y_{tib}=\beta_{0ib}+\beta_{1ib} \cdot Time+\epsilon_{tib}$` 
 
Composite: `$y_{tib}=\delta_{000}+V_{00b}+U_{0ib}+\epsilon_{tib}$`
]
]

---
# L3 - Model 5

Level 3: `$\gamma_{00b}=\delta_{000}+V_{00b}$` 
Level 2: `$\beta_{0ib}=\gamma_{00b}+U_{0ib}$` 
Level 2: `$\beta_{1ib}=\gamma_{100}+U_{1ib}$` 
Level 1: `$y_{tib}=\beta_{0ib}+\beta_{1ib} \cdot Time+\epsilon_{tib}$` 
 
Composite: `$y_{tib}=\delta_{000}+V_{00b}+U_{0ib}+\epsilon_{tib}$`
]
]

---
# L3 - Model 6

Level 3: `$\gamma_{00b}=\delta_{000}+V_{00b}$` 
Level 3: `$\gamma_{10b}=\delta_{100}+V_{00b}$` 
Level 2: `$\beta_{0ib}=\gamma_{00b}+U_{0ib}$` 
Level 2: `$\beta_{1ib}=\gamma_{100}$` 
Level 1: `$y_{tib}=\beta_{0ib}+\beta_{1ib} \cdot Time+\epsilon_{tib}$` 
 
Composite: `$y_{tib}=\delta_{000}+V_{00b}+U_{0ib}+\epsilon_{tib}$`
]
]

---

# GPA exercise (1)
 
.center[
<img src="./images/gpa1.png" height="450">
]

---

# GPA exercise (2)
 
.center[
<img src="./images/gpa2.png" height="450">
]

---

# Level-3 exercise (2)
 
.center[
<img src="./images/L32C.png" height="500">
]

---
class: mainTopic, middle, center

# SECTION: Transforming data from wide to long format

---
# Wide vs. Long format

.center[ 
<img src="./images/format_wide.png" height="200">
<img src="./images/format_long.png" height="350">
]

Left: Wide format; each measure of the same variable is in separate column.
Right: Wide format; each measure of the same variable is in separate row.

---
# Pivot_longer and pivot_wider

* The *tidyverse* is a collection of open source R packages that share an underlying design philosophy, grammar, and data structures.

* In the tidyverse approach to R syntax, these two functions are used to transform a dataset from wide to long format.
  + `pivot_longer()` makes datasets longer by increasing the number of rows and decreasing the number of columns. 
  
  + `pivot_wider()` is the opposite of pivot_longer(): it makes a dataset wider by increasing the number of columns and decreasing the number of rows.

* See these pages for documentation and examples:
  + .font70[https://cran.r-project.org/web/packages/tidyr/vignettes/pivot.html]

---
# Creating a synthetic wide dataset

```r
library(tidyverse)
dt<-tibble(
 id=1:20, male=sample(0:1, 20, replace=TRUE), 
 dep1=rnorm(20), dep2=rnorm(20), dep3=rnorm(20),
 anx1=rnorm(20), anx2=rnorm(20), anx3=rnorm(20)
 )

print(dt,n=10)
```

```
## # A tibble: 20 x 8
## id male dep1 dep2 dep3 anx1 anx2 anx3
## <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 -0.709 0.0301 0.189 0.357 0.407 0.199
## 2 2 0 -1.10 -0.696 -0.0357 1.37 -0.0255 0.880
## 3 3 0 -1.06 -0.104 -0.315 0.226 0.585 -1.17 
## 4 4 1 0.490 1.42 0.0567 -0.182 -0.347 1.17 
## 5 5 1 0.149 -1.25 0.284 -0.354 0.0755 0.948
## 6 6 0 1.39 -0.828 0.392 -0.197 1.55 0.847
## 7 7 0 -1.41 -1.94 -2.02 -0.0964 -0.182 1.58 
## 8 8 0 -0.592 -0.356 -1.30 0.434 -0.600 0.352
## 9 9 0 1.03 0.359 0.0586 0.980 0.921 0.793
## 10 10 1 -1.02 1.81 0.776 0.252 0.556 -0.589
## # i 10 more rows
```
]

---

# Pivot from wide to long

```r
dt_l <-
 dt %>% pivot_longer(
 cols=dep1:anx3,
 names_sep=3,
 names_to = c("disorder", "number")
 ) %>% pivot_wider(names_from="disorder") %>% 
 rename("wave"="number",
 "anxiety"="anx",
 "depression"="dep")

print(dt_l,n=10)
```

```
## # A tibble: 60 x 5
## id male wave depression anxiety
## <int> <int> <chr> <dbl> <dbl>
## 1 1 1 1 -0.709 0.357 
## 2 1 1 2 0.0301 0.407 
## 3 1 1 3 0.189 0.199 
## 4 2 0 1 -1.10 1.37 
## 5 2 0 2 -0.696 -0.0255
## 6 2 0 3 -0.0357 0.880 
## 7 3 0 1 -1.06 0.226 
## 8 3 0 2 -0.104 0.585 
## 9 3 0 3 -0.315 -1.17 
## 10 4 1 1 0.490 -0.182 
## # i 50 more rows
```
]

Notes for current slide

Notes for next slide

PSY9185 - Multilevel models

Longitudinal data analysis using multilevel models

Nikolai Czajkowski

1 / 104

Today

Introduction to longitudinal data and LMM.
Plotting longitudinal data.
Basic two-level longitudinal
- Fixed and random coefficients
Non-linear and piecewise continuous change
Including predictor in the model
- Time invariat predictors
- Time varying predictors

2 / 104

Books

Several examples borrowed from Mirman (2014).
Lesa Hoffman has links to several full video courses in longitudinal modelling on her homepage.
- http://www.lesahoffman.com/

3 / 104

R resources

Fitting mixed models in R:

Books on R:

Cookbook for R: http://www.cookbook-r.com/
R for data science: https://r4ds.had.co.nz/

Online resources

R-bloggers; R news and tutorials contributed by hundreds of R bloggers.
- (http://www.r-bloggers.com/)
Stackoverflow: question and answer site for programmers.
- (http://stackoverflow.com/)

4 / 104

Multilevel tutorials

To tutorials available here:

Clustered data: https://shinyibv02.uio.no/connect/#/apps/433caf55-e0da-477a-920e-bc044ef5030f/access

Longitudinal analysis: https://shinyibv02.uio.no/connect/#/apps/bf3cac08-66ca-45d0-a45c-9b9402aa6c84/access

5 / 104

SECTION: Introducing longitudinal analysis6 / 104

Multilevel models for clustered data (1)

7 / 104

Classes 2

8 / 104

Classes 3

9 / 104

Classes 4

10 / 104

Two sides of all models

Models are an idealized account of:
1. How the expected valueis related to an independent variable (mean).
2. How people vary (variance).
Linear regression only models the mean value, not the variance.

11 / 104

Assumptions in linear models

$Y_i = b_0 + b_1 \cdot X_i + \epsilon_i \quad \quad \epsilon_i \sim N(0,\sigma^2)$

The relationship between X and Y is linear.
The error variance (around the regression line) is normally distributed and is constant (i.e. the sd is the same) at every level of X.
The observations are independent.
The independent variables are measured without error.

12 / 104

Which assumtion is really important?

We often focus on the extent to which the residuals are normally distributed, but in practise this rarely has a major impact on the standard errors (or p-values).
Dependency in the data can render all p-values completely biased.

13 / 104

Multilevel data structures

Data can be nested within higher order units.
Different variables are measured at different levels.
- Traditionally handeled by aggregation or disaggregation.

14 / 104

Longitudinal data structures

In longitudinal multilevel models, observations are nested within persons.
- Level 1 units are individual observations
- Level 2 units are usually individuals
- People may be nested under higher order units (therapists, treatment centers, etc).
Time will be included as a level 1 predictor.

15 / 104

What are longitudinal models?

Time course data are the result of repeated measurments at multiple (at least two) time points.
- These sorts of data are also called longitudinal.
Two key properties distinguish time course data from other kinds of data:
1. Groups of observations all come from one source (nested data).
2. The repeated measurements are related by a continuous variable, usually that variable is time, but any continuous variable will do.
- We will also look at other variables, like set-size.
For analyses we will use multilevel (mixed) models that include time course data as a level-1 predictor.

16 / 104

if you asked participants to name letters printed in different sizes, you could examine the outcome (letter recognition accuracy) as a function of the continuous predictor size.

Benefits of multilevel models in general

Flexibility in modelling dependency across observations
- LMMs provide a flexible framework to model and account for various sources of non-independence, reducing biased estimates and increasing the accuracy of inferences.
Avoid choosing between individuals or groups as the unit of analysis
- Cross-level interactions are uniquely possible in MLM.
Increased statistical power
- By including random effects, LMMs can improve the precision of estimates and increase statistical power compared to traditional regression models.
Generalizability
- LMMs allow researchers to investigate both the average effect of predictors and their variations across different levels, helping to generalize findings across various contexts and populations.
Study variation between and within people
- Ask more complex and interesting questions

17 / 104

Benefits of MLM for longitudinal data

Handle dependency
- Traditional analyses only designed to handle dependency due to constant mean differences.
Handles missing
- Number of observations can vary between individuals.
Flexible treatment of time
- Measures can be taken at fixed or at varying occasions.
- No assumption that an equal amount of time has elapsed.
Investigate within-person relationships
- Between and within effects can differ, even in direction.

18 / 104

longitudinal studies provide the opportunity to test hypotheses at multiple levels of analysis simultaneously
- models in this text will allow us to exam- ine both between-person and within-person relationships in the same variables at the same time
- For example, we might posit a link between stress and negative mood, such that greater amounts of stress will result in greater negative mood. But at what level of analysis is this relationship likely to hold: between persons, within persons, or both?

Pooling

https://www.bayesrulesbook.com/chapter-15.html

19 / 104

SECTION: Plotting longitudinal data20 / 104

Plotting longitudinal data

Plotting is an even more critical first step longitudinal analysis than cross-sectional studies.

We will make the following classes of figures:

Individual change curves.
Plots of group mean levels and CI/confidence regions.
Stratified plots (assess interactions).

Why ggplot?

Plots are publication grade quality.
The syntax is flexible and powerful framework for visualizing data.
Summary statistics like means and standard errors can be computed while plotting.

21 / 104

Introduction to ggplot2ggplot2 is a powerful R package for data visualization
Built on the principles of the "Grammar of Graphics"first assign variables in your data to properties of the graph. These assignments or mappings are called the aesthetics of your graph. Then you select “geometries,” or geoms — points, lines, bars, etc. — for those aesthetics.

Create complex, multi-layered plots using a consistent syntax
Easily customizable and extensible
Widely used in data analysis and research
22 / 104

Spaghetti Plots for Longitudinal Data

Visualize individual trajectories over time
Identify patterns and trends in the data
Assess variability between and within groups
Detect potential outliers or influential cases
Compare individual trajectories to the overall fixed effect of time

23 / 104

Generating a Spaghetti Plot with ggplot2

Below is the R-code used to generate the plot.

library(ggplot2)
# Create spaghetti plot
ggplot(data, aes(x = time, y = y, group = id, color = group)) +
  geom_line(alpha = 0.5) +
  geom_line(data = fixed_effect_data, aes(x = time, y = y, group = 1), color = "black", 
            size = 1.5, linetype = "solid") +
  labs(title = "Example Spaghetti Plot", x = "Time", y = "Outcome") +
  theme_minimal()
# Save plot to a file
ggsave("example_spaghetti_plot.png", width = 7, height = 5)

24 / 104

SECTION: Time-only (Unconditional) models25 / 104

Features of longitudinal data

Multilevel longitudinal models permit analyses at different levels:

Between-person variance:
- Inter-individual variation.
- Ex. Biological sex, ethnicity.
Within-person variance:
- Intraindividual variation.
- Ex. Sleep the previous night.

Relationsips observed at the within-person level need not (and often will not) mirror those at the between-person level of analysis.

Eg. resting heart rate vs. exercise.

26 / 104

Features of longitudinal data (2)

Within person change: Specific type of within person variation, that refers to any systematic change that is expected as a result of meaningful passage of time.
Within person fluctuation: Undirected variation over repeated assesments seen in contexts in which one would not expect systematic change.

27 / 104

Unconditional longitudinal models

In unconditional longitudinal models, time is the only predictor.

Included as a level-1 variable.

Critical first questions:

What units should time be measured in?
What constitutes "time=0"?
- E.g. If age, it is best to center age on another value than "0".
How do you expect development over time to be?
- Linear, non.linear, abrupt/discontinuous?

28 / 104

Introducing idealized growth curves

29 / 104

Between-person and Within-person empty models

Empty models partition the variance, but don't account for it.

Between person empty model

$\begin{align*} y_{ti} &= \underbrace{\beta_0}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{\epsilon_{ti}}_{\text{Error}} \end{align*}$

Within person empty model $\begin{align*} y_{ti} &= \underbrace{\beta_0}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{U_{0i}}_{\substack{\text{Random}\\\text{Intercept}}} + \underbrace{\epsilon_{ti}}_{\text{Error}} \end{align*}$

30 / 104

Fixed and random effects

Level 2: $\beta_{0i}=\gamma_{00}+U_{0i}$
Level 1: $y_{ti}=\beta_{0i}+\epsilon_{ti}$

Composite: $y_{ti}=(\gamma_{00}+U_{0i})+\epsilon_{ti}$

$y_{ti} = \beta_0+u_{0i}+\epsilon_{ti}$

We instead argue that deviation from the grand mean is attributable to a normally distributed residual at level 2. Now we can account for this variance by estimating a parameter (variance of 𝑈_0𝑖) instead of estimating the fixed effects of 49 dummy variables. Random effects are not designed to make inferences between the specific variants included in the study (can’t compare participant 1 to participant 50). Could we include a random effect of time? This would require at least three measurements per person.

31 / 104

Why random intercept?

Differences between individuals could be handled by N-1 dummy variables.
- Many estimated parameters and loss of power.
With MLM, clustering is accounted for by only one parameter, the variance of a normal distribution.

32 / 104

Model 1: Random intercept

Level 2: $\beta_{0i}=\gamma_{00}+U_{0i} \quad \quad U_{0i}\sim N(0, \sigma_{U_{0i}})$
Level 1: $y_{ti}=\beta_{0i}+\epsilon_{ti} \quad \quad \epsilon_{ti} \sim N(0, \sigma_\epsilon)$

Composite: $\begin{align*} y_{ti} &= (\underbrace{\gamma_{00}}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{U_{0i}}_{\substack{\text{Random}\\\text{Intercept}}}) + \underbrace{\epsilon_{ti}}_{\text{Error}} \end{align*}$

33 / 104

Model 1: Random intercept

# Load necessary libraries
library(lme4)
# Fit the random intercept model
model1 <- lmer(y ~ 1 + (1|subject), data=data)
# Print the model summary
summary(model1)

34 / 104

Fixed and random effects

Multilevel (mixed) models contain two classes of coefficients.

Fixed effects: The «structural» part of the model, spesifying the expected conditional mean.

Empty model: Only intercepts are included.
Unconditional models: Intercepts + passage of time.
Conditional models: Intercepts + variables accounting for differences in the passage of time.

Random effects: Coefficients that specify the stochastic (error) part of the model.

Describes how the residuals of the Y outcome are distributed and related across the observations.

35 / 104

Model 2: Random intercept

36 / 104

Intraclass Correlation Coefficient (ICC)

Measures the proportion of total variance in the outcome that is attributable to the grouping structure (e.g., individuals within groups)
Ranges from 0 to 1:
- 0: No correlation between observations within the same group
- 1: Observations within the same group are identical
Important for assessing group-level effects in mixed models
A high ICC indicates a strong grouping effect and the need for a multilevel or mixed model to account for the clustered data structure

37 / 104

Intraclass Correlation Coefficient (ICC) 2

I C C = \frac{B P v a r i a t i o n}{B P + W P v a r i a t i o n} = \frac{V a r (U_{0 i})}{V a r (U_{0 i}) + V a r (ϵ_{t i})}

$ICC = \frac{BP \quad variation}{BP + WP\quad variation} = \frac{Var(U_{0i})}{Var(U_{0i})+Var(\epsilon_{ti})}$

38 / 104

Model 3: Fixed slope, Random intercept

Level 2: $\beta_{0i}=\gamma_{00}+U_{0i}$
Level 2: $\beta_{1i}=\gamma_{10}$
Level 1: $y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\epsilon_{ti}$

Composite: $y_{ti}=(\gamma_{00}+U_{0i})+\gamma_{10}\cdot Time_{ti}+\epsilon_{ti}$ $\begin{align*} y_{ti} &= (\underbrace{\gamma_{00}}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{U_{0i}}_{\substack{\text{Random}\\\text{Intercept}}}) + \underbrace{\gamma_{10}}_{\substack{\text{Fixed}\\\text{Slope}}} \cdot Time_{ti} + \underbrace{\epsilon_{ti}}_{\text{Error}} \end{align*}$

39 / 104

Comparing Linear Mixed Models

Likelihood Ratio Tests:
- Performed with the anova() function.
- Only valid under maximum likelihood (ML) estimation, not restricted maximum likelihood (REML).
Information Criteria:
- Akaike Information Criterion (AIC) can be calculated using the AIC() function.
Pseudo $R^2$ :
- Purpose: Assess the proportion of variance explained by the predictors in the model, similar to $R^2$ in linear regression.
- Calculated separately for Level 1 and Level 2 effects.
- Range: 0 to 1; higher values indicate better model fit.
- Limitation: Not directly comparable to $R^2$ in linear regression. Interpretation should be cautious.

40 / 104

Over vs. underfitting (Scylla og Charybdis)

Overfitting: Poor prediction as a result of learning too much from your data.
Underfitting: Poor prediction as a result of learning too little from your data.

You can always get better fit to your data by adding more predictors.
Every dataset contains both systematic and unsystematic variance (noise), so overly complex models may fit well to the current data, but predict less variance in new data.

https://xcelab.net/rm/statistical-rethinking/

41 / 104

What happens as the model becomes too complex?

Over: The green curve is the true (generating) function. The red fits better, but adapts to a lot of the noise in the data.

42 / 104

k-fold cross validation

Cross-validation involves partitioning the data into training and test subsets.
The model is buildt based on the training sets, and evaluated on the test sets.
- Cornerstone in machine learning approaches.

(Introduction to statistical learning)

43 / 104

k-fold kryssvalidering (2)

Notice:

It is easy to explain variance in the training data, but far more difficult to explain it in (new) test data.
Initially, as the model grows in complexity, explained variance increases. However, as it gets overly complex, explained variance in test data starts to decrease.

44 / 104

Metode 3: Information criteria (AIC / BIC)

Information criteria (IC) is a class of statistics devised to strike the optimalbetween over and underfitting.
The model with the lowest AIC/BIC value is chosen as the best one.
- Non-nested models can be compared.

45 / 104

Model 4: Fixed slope, Random intercept

46 / 104

Model 5: Random slope, Random intercept

Level 2: $\beta_{0i}=\gamma_{00}+U_{0i}$
Level 2: $\beta_{1i}=\gamma_{10}+U_{1i}$
Level 1: $y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\epsilon_{ti}$

Composite: $\begin{align*} y_{ti} &= (\underbrace{\gamma_{00}}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{U_{0i}}_{\substack{\text{Random}\\\text{Intercept}}}) + (\underbrace{\gamma_{10}}_{\substack{\text{Fixed}\\\text{Slope}}} + \underbrace{U_{1i}}_{\substack{\text{Random}\\\text{Slope}}}) \cdot Time_{ti} + \underbrace{\epsilon_{ti}}_{\text{Error}} \end{align*}$

47 / 104

How do these differ?

48 / 104

Random effects can be correlated

Level 2: $\beta_{0i}=\gamma_{00}+U_{0i}$
Level 2: $\beta_{1i}=\gamma_{10}+U_{1i}$
Level 1: $y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\epsilon_{ti}$

$G=\begin{bmatrix} Var(U_{0i}) & Cov(U_{0i},U_{1i}) \\ Cov(U_{0i},U_{1i}) & Var(U_{1i}) \end{bmatrix}$
Composite: $y_{ti}=(\gamma_{00}+U_{0i})+(\gamma_{10}+U_{1i})\cdot Time_{ti}+\epsilon_{ti}$

49 / 104

Sensitivity of the intercept-slope correlation to the centering of time

Impact on Interpretability:
- Centering time can affect the interpretability of the (fixed) intercept.
Intercept-Slope Correlation:
- The interpretation of any (random) intercept-slope correlation is conditional on the location of the intercept.
Example
- Centering at t=0 will result in an estimated correlation less than 0 (< 0).
- Centering at t=3.5 will result in an estimated correlation equal to 0 (= 0).
- Centering at t=8 will result in an estimated correlation greater than 0 (> 0).

50 / 104

G and R matrices

The model implied covariance matrix is a function of two matrices:

G is a covariance matrix for level-2 random coefficients.
R is a covariance matrix for level-1 random coefficients.
- Usually the R matrix is diagonal, but some programs allow you to specify a different structure for this matrix.

51 / 104

Calculating the implied covariance (1)

The implied covariance matrix is a function of both the G and R matrices.

52 / 104

Technical detail: Calculating the implied covariance

53 / 104

Modelling non-linear change over time54 / 104

Modelling Non-linear Change Over Time

Non-linear change

Change considered until now has been purely linear.
General Approaches to Dealing with Non-linearity
- Polynomial models
- Piecewise-discontinuous models
- Splines

55 / 104

Using polynomials to mod non-linear change

Linear regression is more flexible than the name implies, and can be used to model (some) non-linear relationships.
Traditionally polynomials used to model non-linear relationships.

$E(Y|X) = b_0 + b_1\cdot X+ b_2\cdot X^2+ ... + b_p\cdot X^p$

Polynomial functions are only valid in a restricted range.
Intercept and linear terms must be included in the model for it to be meaningful (regardless of significance).

56 / 104

Model 7: Polynomial change

Level 2: $\beta_{0i}=\gamma_{00}+U_{0i}$
Level 2: $\beta_{1i}=\gamma_{10}+U_{1i}$
Level 2: $\beta_{1i}=\gamma_{20}+U_{2i}$
Level 1: $y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\beta_{2i}\cdot Time_{ti}^2+\epsilon_{ti}$

Composite: $\begin{align*} y_{ti} &= (\underbrace{\gamma_{00}}_{\substack{\text{Fixed}\\\text{Intercept}}} + \underbrace{U_{0i}}_{\substack{\text{Random}\\\text{Intercept}}}) + (\underbrace{\gamma_{10}}_{\substack{\text{Fixed}\\\text{Lin Slope}}} + \underbrace{U_{1i}}_{\substack{\text{Random}\\\text{Lin Slope}}}) \cdot Time_{ti} + (\underbrace{\gamma_{20}}_{\substack{\text{Fixed}\\\text{Quad Slope}}} + \underbrace{U_{2i}}_{\substack{\text{Random}\\\text{Quad Slope}}}) \cdot Time_{ti}^2 + \underbrace{\epsilon_{ti}}_{\text{Error}} \end{align*}$

57 / 104

Choosing Polynomial Degree in Longitudinal Models

Consider Data:
- Think about the data in terms of the number of times the curve changes direction (corresponding to the number of inflection points).
Statistical Approach:
- Include only and all of the polynomial orders that improve model fit.
Theoretical Approach:
- Include only those terms for which the experimenter predicted an effect.

58 / 104

Orthogonal polynomials

Natural Polynomials:
- Allow testing for differences at "Time 0."
- Useful when such differences need to be tested.
Orthogonal Polynomials:
- Provide the same estimates as natural polynomials.
- Uncorrelated, so p-values may differ (sometimes considerably).

59 / 104

Splines

Splines allow for flexibility in modeling complex, non-linear relationships
They can capture local patterns and smooth changes over time
Reduce overfitting compared to high-order polynomials
Cubic splines are commonly used for their smoothness and continuity
Natural splines can impose constraints to reduce extreme behavior at the endpoints

60 / 104

Piecewise linear models

Better fit for discontinuous data:
- More accurately capture abrupt changes
Interpretability:
- Coefficients of piecewise continuous models have simple interpretations
Parsimonious representation:
- May require fewer parameters than high-degree polynomial
Avoid overfitting:
- Less likely to overfit the data compared to high-degree polynomial

61 / 104

Piecewise linear models: coding

These two coding schemes only differ in the interpretation of the regression coefficients.

In scheme 1 the two slope coefficients represent the actual slope in the respective time period.
In scheme 2 the coefficient for time 2 represents the deviation from the slope in period 1, + i.e. if the estimate is 0 then the rate of change is the same in both periods.

https://rpsychologist.com/r-guide-longitudinal-lme-lmer#piecewise-growth-curve

62 / 104

Getting p-values

P-values are not reported by default, as the number of degrees of freedom is hard to calculate.
Some approximations can be made, for example using the lmerTest package.

# P-values for individual coefficients
require(lmerTest)
m1 <- lmer(y ~  x | id), data=dt)
coef(summary(m1))
# Alternatively, confidence intervals
confint(vs_3) # Profile CI
confint(vs_3, method="boot", nsim=100) # Bootstrapped CI
# Alternatively, compare models with anova()
anova(m1,m2)

63 / 104

Reporting growth curve results

Model selection:
- Report the criteria used to select the best-fitting model, e.g., AIC, BIC, or likelihood ratio tests.
Fixed effects:
- Report the estimated coefficients, standard errors, t-values, and p-values for each fixed effect predictor.
- Interpret the direction and magnitude of the relationships between the predictors and the outcome variable.
Random effects:
- Report the estimated variances and standard deviations for random intercepts and slopes.
- Interpret the amount of variability in the intercepts and slopes across the different levels of the model.
Model fit:
- Report goodness-of-fit statistics, such as pseudo R-squared or deviance explained.
- Compare the selected model with alternative or null models, if applicable.
Visualizations:
- Include relevant plots, such as individual growth curves, group-level trajectories, or predicted values versus observed values.

64 / 104

Model 8

Why is there random intercept variance?

65 / 104

Model 8

Level 2: $\beta_{0i}=\gamma_{00}+\gamma_{01}\cdot Group_i+U_{0i}$
Level 2: $\beta_{1i}=\gamma_{10}+U_{1i}$
Level 1: $y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\epsilon_{ti}$

Composite: $y_{ti}=(\gamma_{00}+\gamma_{01}\cdot Group_i+U_{0i})+(\gamma_{10}+U_{1i})\cdot Time_{ti}+\epsilon_{ti}$

66 / 104

Model 9

Level 2: $\beta_{0i}=\gamma_{00}+\gamma_{01}\cdot Group_i+U_{0i}$
Level 2: $\beta_{1i}=\gamma_{10}+\gamma_{11}\cdot Group_i+U_{1i}$
Level 1: $y_{ti}=\beta_{0i}+\beta_{1i}\cdot Time_{ti}+\epsilon_{ti}$

Composite: $y_{ti}=(\gamma_{00}+\gamma_{01}\cdot Group_i+U_{0i})+(\gamma_{10}+\gamma_{11}\cdot Group_i+U_{1i})\cdot Time_{ti}+\epsilon_{ti}$

67 / 104

Inferential consequences of including random effects

Effect on t/p-values:
- The impact on t/p-values of estimating random components depends on the level.
- Moving variance from Level 1 to Level 2 can provide more power to detect Level 1 predictors.
Improved Model Fit:
- Including random effects can help account for unobserved heterogeneity in the data, leading to better model fit.
Generalizability:
- Including random effects allows for more accurate generalizations to the larger population.
Cautions:
- Adding random effects can increase model complexity.
- It is important to carefully consider the theoretical justifications for including random effects in the model.

68 / 104

Techincal issues: Parameter estimation (ML vs. REML)

ML (Maximum Likelihood)
- Estimates fixed and random effects simultaneously
- Can result in biased estimates of random effect variances
- Suitable for model selection and comparison
REML (Restricted Maximum Likelihood)
- Estimates fixed effects and random effects separately
- Provides unbiased estimates of random effect variances
- Less suitable for model selection and comparison
Which to use?
- For unbiased variance estimates, use REML
- For model selection or comparison, use ML
- Sometimes, both are used in a two-step process: REML to estimate variance components and ML to compare models

69 / 104

Techincal issues: Standardization

Purpose: To compare the relative importance of predictors and facilitate interpretation.
Procedure:
1. Standardize continuous predictor variables (center and scale)
2. Refit the linear mixed model with standardized predictors
3. Interpret the standardized coefficients as effect sizes
Interpretation:
- Each standardized coefficient represents the change in outcome variable (in standard deviations) for a one standard deviation change in the predictor
- Larger absolute values indicate stronger relationships between predictors and the outcome variable
Considerations:
- Only applicable to continuous predictor variables
- Ensure that model assumptions are met before interpreting standardized coefficients
- Be cautious when comparing standardized coefficients across models with different fixed and random effects structures

70 / 104

Techincal issues: Pseudo R²

There no MLM statistic entirely equivalent to R² in ordinary regression.
While R² increases monothically as independent variables are added in linear regression, this may not hold in multilevel models.
in MLM we can calculate an of explained variance at each level.
- This is referred to as pseudo R², and can behave in unusual ways.
- Can decrease as predictor on another level is included.

71 / 104

Example: Modelling popularity in pupils

72 / 104

SECTION: A closer look at random effects73 / 104

Keeping it maximal

A full or maximal random effect structure is the case where all of the factors that could hypothetically vary across individual observational units are allowed to do so.
The general principle is keep it maximal (Barr et al., 2013): the random effects should include as much of the structure of the data as possible.

m_1 <- lmer(Accuracy ~ (ot1+ot2)*TP + (ot1 | Subject), data=WordLearnEx, REML=FALSE)
m_2 <- lmer(Accuracy ~ (ot1+ot2)*TP + (ot1+ot2 | Subject), data=WordLearnEx, REML=FALSE)

Removing a time term from the random effects primarily reduces the standard error of the corresponding fixed effect estimate, making more significant.

74 / 104

Example: Cognitive Performance and Age

Research question: Does cognitive performance decline with age and does this decline rate vary between individuals?
Data structure:
- Repeated measures of cognitive performance for each individual
- Age as a time-varying predictor
Linear mixed model:
- Random intercepts and slopes for age
- Correlation between intercepts and slopes
Reason to Drop the Intercept-Slope Correlation
Hypothesis: Initial cognitive performance and rate of decline are unrelated.
- Example: Individuals with high initial performance decline at the same rate as those with low initial performance.
Model modification: Set the correlation between intercepts and slopes to zero.
- This enforces the hypothesis that initial performance and decline rate are independent.
Interpretation: If the modified model fits the data well, it suggests that the rate of cognitive decline is not dependent on initial performance levels.

75 / 104

How to specify non-correlatred intercept and slope

m_corr <- lmer(y ~ time + (1 | id), data=dt)
m_noncorr <- lmer(y ~ time + (1 | id) + (0 + time | id), data=dt)

Reason to Drop the Intercept-Slope Correlation

Hypothesis: Initial cognitive performance and rate of decline are unrelated.
- Example: Individuals with high initial performance decline at the same rate as those with low initial performance.
Model modification: Set the correlation between intercepts and slopes to zero.
- This enforces the hypothesis that initial performance and decline rate are independent.
Interpretation: If the modified model fits the data well, it suggests that the rate of cognitive decline is not dependent on initial performance levels.

76 / 104

Effect of omitting random coefficients

m_2<-lmer(Accuracy ~ (ot1+ot2)*TP + (ot1+ot2 | Subject))

$G=\begin{bmatrix} Var(U_{0}) & Cov(U_{0},U_{1}) & Cov(U_{0},U_{2}) \\ Cov(U_{0},U_{1}) & Var(U_{1}) & Cov(U_{1},U_{2}) \\ Cov(U_{0},U_{2}) & Cov(U_{1},U_{2}) & Var(U_{2}) \end{bmatrix}$

m_1<-lmer(Accuracy ~ (ot1+ot2)*TP + (ot1 | Subject))

$G=\begin{bmatrix} Var(U_{0}) & Cov(U_{0},U_{1}) & 0 \\ Cov(U_{0},U_{1}) & Var(U_{1}) & 0 \\ 0 & 0 & 0 \end{bmatrix}$

m_3<-lmer(Accuracy ~ (ot1+ot2)*TP + 
                   (1 | Subject) + 
                   (0+ot1 | Subject) + 
                   (0+ot2 | Subject))

$G=\begin{bmatrix} Var(U_{0}) & 0 & 0 \\ 0 & Var(U_{1}) & 0 \\ 0 & 0 & Var(U_{2}) \end{bmatrix}$

77 / 104

Pooling

Depending upon the variation among clusters, which is learned from the data as well, the model pools information across clusters. This pooling tends to improve estimates about each cluster.

Complete pooling: :First, suppose you ignore the varying intercepts and just use the overall mean across all clusters.

No pooling Estimate separate fixed effect in each cluster.

Partial pooling Multilevel approach. Extreme values are pulled towards an overall average, and this is tronger for smaller groups.

78 / 104

1. Complete Pooling

This approach assumes the treatment has the same effect on everyone, ignoring individual-specific variation. It estimates an average treatment effect across all individuals.

Consequences:

Biased estimates: Complete pooling may yield biased treatment effect estimates due to ignored individual variation.
Inaccurate inferences: Hypothesis tests and confidence intervals may be misleading, leading to incorrect conclusions.
Lack of personalization: No information about the treatment's effectiveness for specific individuals, hindering tailored interventions.

79 / 104

2. Partial Pooling

Consequences:

More accurate estimates: Partial pooling accounts for individual variation, providing more accurate treatment effect estimates.
Improved inferences: Hypothesis tests and confidence intervals are more accurate, leading to reliable conclusions.
Personalized insights: Provides information on the treatment's effectiveness for specific individuals, aiding personalized interventions.

80 / 104

Shrinkage in partial pooling

Pooling data across clusters tends to shrink their deviation from the overall mean levels.
Shrinkage effect on individual participant intercept and linear term parameter estimates. For each participant, the arrow shows the change in the parameter estimate from a model that treats participants as fixed effects (open circles) to a model that treats participants as random effects (filled circles). The black vertical and horizontal lines indicate the population-level fixed effect.

81 / 104

3. No Pooling

This approach estimates separate effects for each individual without sharing information between them, fitting separate models and ignoring potential similarities.

Consequences:

Overfitting: No pooling may overfit the data, resulting in estimates that don't generalize well.
Inefficient data use: Not sharing information between individuals leads to less efficient use of data and less precise estimates.
Interpretation challenges: Separate models for each individual make it hard to draw overall conclusions about treatment effectiveness.

82 / 104

Constructing a longitudinal model (Hoffman)

Building an unconditional model of change
- Decide what your metric of time will be
- Decide at what occasion time 0 should be located
Plot individual trajectories over time
- What kind of change, if any, do you see on average?
- Do you see individual differences in that pattern of change?
Fit a null model (random intercept)
- Calculate the ICC. If it is close to 1.0, there is no longitudinal variation to model.
Add level-1 Fixed predictors
Add level-2 Explanatory Variables
Examine Whether a Particular slope varies between groups
Add cross-level interactions to explain variation in the slope.

Within-participant effects

83 / 104

SECTION: Categorical time invariant predictors84 / 104

Time varying and time invariant predictors

A predictor is time varying when it is measured at multiple points in time, just as is the outcome variable.
- In the context of education, a time-varying predictor might be the number of hours in the previous 30 days a student has spent studying.
- time-varying predictors will appear at level 1 because they are associated with specific measurements
On the other hand, a predictor is time invariant when it is measured at only one point in time, and its value does not change across measurement occasions.
- An example of this type of predictor would be gender,
- whereas time-invariant predictors will appear at level 2 or higher, because they are associated with the individual
Unless time-variying predictors are group-mean centered, BP and WP variance may not be cleanly separated.

85 / 104

Time-Varying and Time-Invariant Predictors in Longitudinal Models

Time-varying predictors
- A predictor is time varying when it is measured at multiple points in time, just as is the outcome variable.
- Change over time within individuals
- Predict within-person fluctuations
- Can be used to model individual trajectories
- Need to consider potential lagged effects, autocorrelation, and time-dependent confounders
Time-invariant predictors
- A predictor is time invariant when it is measured at only one point in time, and its value does not change across measurement occasions.
- Do not change over time within individuals
- Predict between-person differences
- Can be used to model group differences in trajectories
- Need to consider potential multicollinearity with other time-invariant predictors

86 / 104

SECTION: Higher order longitudinal models87 / 104

Introducing higher level variability

88 / 104

Third Order Linear Mixed Model: Depression Over Time

Outcome: Depression score over time
Data structure:
- Repeated measures of depression for each patient
- Patients nested within therapists
Model components:
1. Fixed effects: Population-level effects of predictors (e.g., time, patient characteristics, therapist characteristics)
2. Random effects: Individual-level variability in intercepts (baseline depression) and slopes (depression change over time)
3. Third-level random effects: Therapist-level variability in intercepts and slopes
Model interpretation:
- Fixed effects describe the average relationships between predictors and depression scores
- Random effects capture the variability in depression scores and change over time across patients
- Third-level random effects account for variability in patient outcomes and change over time due to therapist differences
Benefits:
- Acknowledges the hierarchical structure of the data, resulting in more accurate inferences
- Provides insights into the sources of variability in depression scores and change over time
- Can be used to inform interventions targeted at the patient or therapist level

89 / 104

L3 - Model 1

Level 3: $\gamma_{00b}=\delta_{000}+V_{00b}$
Level 2: $\beta_{0ib}=\gamma_{00b}+U_{0ib}$
Level 1: $y_{tib}=\beta_{0ib}+\epsilon_{tib}$

Composite: $y_{tib}=\delta_{000}+V_{00b}+U_{0ib}+\epsilon_{tib}$

90 / 104

L3 - Model 2

91 / 104

L3 - Model 2

92 / 104

L3 - Model 3

93 / 104

L3 - Model 4

Level 3: $\gamma_{00b}=\delta_{000}+V_{00b}$
Level 2: $\beta_{0ib}=\gamma_{00b}+U_{0ib}$
Level 2: $\beta_{1ib}=\gamma_{100}$
Level 1: $y_{tib}=\beta_{0ib}+\beta_{1ib} \cdot Time+\epsilon_{tib}$

Composite: $y_{tib}=\delta_{000}+V_{00b}+U_{0ib}+\epsilon_{tib}$

94 / 104

L3 - Model 5

Level 3: $\gamma_{00b}=\delta_{000}+V_{00b}$
Level 2: $\beta_{0ib}=\gamma_{00b}+U_{0ib}$
Level 2: $\beta_{1ib}=\gamma_{100}+U_{1ib}$
Level 1: $y_{tib}=\beta_{0ib}+\beta_{1ib} \cdot Time+\epsilon_{tib}$

Composite: $y_{tib}=\delta_{000}+V_{00b}+U_{0ib}+\epsilon_{tib}$

95 / 104

L3 - Model 6

Level 3: $\gamma_{00b}=\delta_{000}+V_{00b}$
Level 3: $\gamma_{10b}=\delta_{100}+V_{00b}$
Level 2: $\beta_{0ib}=\gamma_{00b}+U_{0ib}$
Level 2: $\beta_{1ib}=\gamma_{100}$
Level 1: $y_{tib}=\beta_{0ib}+\beta_{1ib} \cdot Time+\epsilon_{tib}$

Composite: $y_{tib}=\delta_{000}+V_{00b}+U_{0ib}+\epsilon_{tib}$

96 / 104

GPA exercise (1)

97 / 104

GPA exercise (2)

98 / 104

Level-3 exercise (2)

99 / 104

SECTION: Transforming data from wide to long format100 / 104

Wide vs. Long format

Left: Wide format; each measure of the same variable is in separate column. Right: Wide format; each measure of the same variable is in separate row.

101 / 104

Pivot_longer and pivot_wider

The tidyverse is a collection of open source R packages that share an underlying design philosophy, grammar, and data structures.
In the tidyverse approach to R syntax, these two functions are used to transform a dataset from wide to long format.
- pivot_longer() makes datasets longer by increasing the number of rows and decreasing the number of columns.
- pivot_wider() is the opposite of pivot_longer(): it makes a dataset wider by increasing the number of columns and decreasing the number of rows.
See these pages for documentation and examples:
- https://cran.r-project.org/web/packages/tidyr/vignettes/pivot.html

102 / 104

Creating a synthetic wide datasetlibrary(tidyverse)
dt<-tibble(
  id=1:20, male=sample(0:1, 20, replace=TRUE), 
  dep1=rnorm(20), dep2=rnorm(20), dep3=rnorm(20),
  anx1=rnorm(20), anx2=rnorm(20), anx3=rnorm(20)
  )
print(dt,n=10)

## # A tibble: 20 x 8
##       id  male   dep1    dep2    dep3    anx1    anx2   anx3
##    <int> <int>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
##  1     1     1 -0.709  0.0301  0.189   0.357   0.407   0.199
##  2     2     0 -1.10  -0.696  -0.0357  1.37   -0.0255  0.880
##  3     3     0 -1.06  -0.104  -0.315   0.226   0.585  -1.17 
##  4     4     1  0.490  1.42    0.0567 -0.182  -0.347   1.17 
##  5     5     1  0.149 -1.25    0.284  -0.354   0.0755  0.948
##  6     6     0  1.39  -0.828   0.392  -0.197   1.55    0.847
##  7     7     0 -1.41  -1.94   -2.02   -0.0964 -0.182   1.58 
##  8     8     0 -0.592 -0.356  -1.30    0.434  -0.600   0.352
##  9     9     0  1.03   0.359   0.0586  0.980   0.921   0.793
## 10    10     1 -1.02   1.81    0.776   0.252   0.556  -0.589
## # i 10 more rows
103 / 104

Pivot from wide to longdt_l <-
  dt %>% pivot_longer(
  cols=dep1:anx3,
  names_sep=3,
  names_to = c("disorder", "number")
  ) %>% pivot_wider(names_from="disorder") %>% 
  rename("wave"="number",
         "anxiety"="anx",
         "depression"="dep")
print(dt_l,n=10)

## # A tibble: 60 x 5
##       id  male wave  depression anxiety
##    <int> <int> <chr>      <dbl>   <dbl>
##  1     1     1 1        -0.709   0.357 
##  2     1     1 2         0.0301  0.407 
##  3     1     1 3         0.189   0.199 
##  4     2     0 1        -1.10    1.37  
##  5     2     0 2        -0.696  -0.0255
##  6     2     0 3        -0.0357  0.880 
##  7     3     0 1        -1.06    0.226 
##  8     3     0 2        -0.104   0.585 
##  9     3     0 3        -0.315  -1.17  
## 10     4     1 1         0.490  -0.182 
## # i 50 more rows
104 / 104

Shrinkage in partial pooling

Pooling data across clusters tends to shrink their deviation from the overall mean levels.
Shrinkage effect on individual participant intercept and linear term parameter estimates. For each participant, the arrow shows the change in the parameter estimate from a model that treats participants as fixed effects (open circles) to a model that treats participants as random effects (filled circles). The black vertical and horizontal lines indicate the population-level fixed effect.

81 / 104

Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Esc	Back to slideshow

continue; } } i++; } }; slideshow._releaseMath(document);