Monte Carlo Simulation: An Invaluable Tool for 21^st Century Statisticians

Alessandro Gasparini
E-mail: <alessandro.gasparini@reddooranalytics.se>

Outline

Introduction;
Planning a simulation study;
Coding a simulation study;
Analysing a simulation study;
Reporting and visualising results;
Wrap-up.

(1) Introduction

About me

Principal Statistical Methodologist at Red Door Analytics, here in Stockholm;
PhD in Biostatistics from the University of Leicester, UK;
Affiliated researcher at the Department of Medical Epidemiology and Biostatistics at Karolinska Institutet;
Co-chair of the openstatsware working group, recognised by both ASA BIOP and EFSPI;
Associate Editor for the journal Biostatistics.

About Red Door Analytics

But, what are Monte Carlo simulation studies?

Note that, from now on, I will be using the terms Monte Carlo simulation study and simulation study interchangeably.

Monte Carlo simulation studies are computer experiments that involve generating data by pseudo-random sampling.

The Process™

In a simulation study, we design an experiment such that:

Generate data from a known distribution/model (so that we know the truth);
Analyse the data;
Compare the analysis results with the truth;
We (generally) repeat steps 1-3 B times;
Finally, after running the experiment B times, we collect and analyse the results to draw our conclusions.

Simulation studies are useful…

Monte Carlo simulation studies provide an invaluable tool for statistical and biostatistical research — and are great for learning/understanding.

Simulation studies are useful…

They can help to answer questions such as:

Is an estimator biased in a finite sample?
Do confidence intervals for a given parameter achieve the desired nominal level of coverage?
How does a newly developed method compare to an established one?
What is the power to detect a desired effect size under complex experimental settings and analysis methods?
…you name it.

…and increasingly popular!

Throughout this seminar we will see how to plan, code, analyse, report/visualise simulation studies.

Tutorial paper

(2) Planning a simulation study

There are issues…

Monte Carlo standard errors of performance measures are often not computed/reported;
Reproducibility of results;
Dissemination of results.

As statisticians, we should follow our advice — and could do better — when it comes to the design, reporting, and reproducibility of our own experiments.

The problem is that the average statistical training does not teach enough about practical aspects of simulation studies.

Enter the ADEMP(I) framework

A is for Aims
D is for Data-generating mechanisms
E is for Estimands
M is for Methods
P is for Performance measures
I is for Implementation

Carefully thinking and planning a simulation study makes your life easier (and more reproducible) in the long run.

ADEMPI: Aims

Before starting with a simulation study, we need to know what we want to learn. For instance:

Whether/when an estimation procedure converges?
Consistency? (large-sample properties)
Small-sample issues?
Correct variance formula? (e.g. when derived from asymptotic theory)
Comparison with other approaches?
Robustness to model misspecification?

ADEMPI: Data-generating mechanisms

The data-generating mechanisms [DGM] are processes that we assume generate the data under study. For instance, the process underlying a clinical trial (eligibility, randomisation, etc.) or an observational study.

Defining each DGM is aim-specific, but we often vary several factors at once: sample size, true value of the estimand of interest, additional DGM-related parameters, …

We can vary them factorially, or use e.g. fractional designs for greater efficiency: we do need to be pragmatic, after all.

ADEMPI: Estimands

The estimand is what we are estimating, the target of our inference.

It typically is a parameter (or a function of parameters), but it might be a summary measure (e.g. predictive performance) or a null hypothesis.

For instance, the estimand of interest in a simulation study mimicking a clinical trial might be the treatment effect.

This depends once again on the aims of the simulation study, hence the importance of defining that from the start.

ADEMPI: Estimands

ADEMPI: Methods

Methods are often what we are studying, e.g. a newly-developed method or a method that we aim to apply in unusual settings.

A comparator is more often than not included, e.g. a gold standard. However, it is not always necessary to include one.

It is also recommended to include familiar methods as a check.

ADEMPI: Performance measures

Let’s define the chosen estimand as \(\theta\), which is estimated by a method as \(\hat{\theta}\) with standard error \(\widehat{\text{se}}(\hat{\theta})\). Performance measures of interest might target:

Properties of the estimator, such as bias, empirical standard error, mean squared error;
Properties of the standard error of the estimator, such as average model-based standard error;
Properties of a confidence interval for \(\hat{\theta}\), such as coverage probability and power;
Convergence rates or computational speed.

In brief…

Bias estimates how close the average estimate \(\hat{\theta}\) is to the true value \(\theta\): \(E(\hat{\theta}) - \theta\).

For instance, the maximum likelihood estimator is asymptotically unbiased.

In brief…

The empirical standard error describes the precision of an estimator, and is defined as \(\sqrt{\text{var}(\hat{\theta})}\).

Ideally, we are happier with precise estimators with low variance.

In brief…

The mean squared error (MSE) combines bias and emp. SE in an overall performance measure

It is defined as bias² + (emp. SE)².

In brief…

Coverage probability of a confidence interval is defined as the proportion of confidence intervals containing the true value: \(P(\hat{\theta}_{\text{low}}, \hat{\theta}_{\text{upp}})\).

For a 95% CI, this probability targets 95%; poor coverage might arise because of (1) biased estimator, (2) biased model SEs, (3) poor distributional properties.

All of these performance measures are described in more detail in the tutorial paper. Make sure to check that out and read carefully before planning your first simulation study.

ADEMPI: Implementation

Implementation covers some of the very practical details of simulation studies:

Number of repetitions (not the number of observations in the simulated data!):
- Depends on the aims of a simulation study;
- In general, should be large enough to control Monte Carlo error.
Software package;
Computational details.

Example: confounding

We have a study which can be summarised by the following DAG:

Where \(X\) is the exposure, \(Y\) the outcome, and \(Z\) a confounder.

Note that the exposure and the outcome are independent: there is no edge between \(X\) and \(Y\).

Example: Aim

The aim of this simulation study is pedagogical.

We know that there should be no association between \(X\) and \(Y\) once we adjust for \(Z\) in our analysis.

However:

Is it actually true that once we adjust for \(Z\) we observe no association between \(X\) and \(Y\)?
What happens if we fail to adjust for a \(Z\) in our analysis?

Example: Data-generating mechanisms

\(X\) is a binary treatment variable, \(Z\) is a continuous confounder, and \(Y\) is a continuous outcome;
\(Z \sim N(0, 1)\);
\(X\) depends on \(Z\) according to a logistic model: \[ \log [P(X = 1) / P(X = 0)] = \gamma_0 + \gamma_1 Z \]
\(Y\) depends on \(Z\) according to a linear model: \[ Y = \alpha_0 + \alpha_1 Z + \varepsilon \] with \(\varepsilon \sim N(0, 1)\).

Example: Data-generating mechanisms

We fix the parameters \(\gamma_0 = 1\), \(\gamma_1 = 3\), \(\alpha_0 = 10\).

For \(\alpha_1\), we actually simulate two scenarios:

\(\alpha_1 = 5\): in this scenario (DGM = 1), \(Z\) is actually a confounder;
\(\alpha_1 = 0\): in the second scenario (DGM = 2), we remove the edge between \(Z\) and \(Y\) as well. Here \(Z\) is not a confounder anymore.

Finally, we generate \(N = 200\) independent subjects per each simulated dataset.

Example: Estimands

We have a single estimand of interest here, the regression coefficient \(\beta_1\) (which we define how to estimate in the next slide) that quantifies the association between \(X\) and \(Y\).

Under certain structural assumptions (see e.g. John et. al. (2019), DOI: 10.1186/s12874-019-0858-x), the regression coefficient \(\beta_1\) can be interpreted as the average causal effect (ACE) of the exposure \(X\) on the outcome \(Y\).

Remember that, according to our data-generating mechanisms, we expect no association between \(X\) and \(Y\).

Example: Methods

We estimate \(\beta_1\) using linear regression. Specifically, we fit the following two models:

A model that fails to adjust for the observed confounder \(Z\) (model 1): \[ Y = \beta_0 + \beta_1 X + \varepsilon \]
A model that properly adjusts for \(Z\) (model 2): \[ Y = \beta_0 + \beta_1 X + \beta_2 Z + \varepsilon \]

Example: Performance measures

The key performance measure of interest is bias, defined as \[ E(\hat{\beta_1}) \] as we know that (recall our DGMs) the true \(\beta_1 = 0\).

We might also be interested in estimating the power of a significance test for \(\hat{\beta_1}\) at the \(\alpha = 0.05\) level. In our example, think of that as a test for the null hypothesis \(\beta_1 = 0\).

Example: Implementation

The rationale for picking the number of replications is that we need enough to estimate the key performance measures with good precision.

Think of this as having a large enough dataset for an epidemiological study, or power calculations for a clinical trial. It is roughly the same concept.

Our example is simple so we don’t need to worry too much about computational time. We can then run a large number of replications (\(B = 10,000\) or more).

In practice, it’s complicated…

In general, choosing the number of replications \(B = n_{\text{sim}}\) is not trivial, requiring a tradeoff between Precision and Computational Effort.

Here is an example from a paper we recently published:

Example: Implementation

We should also list all practical software details that are required to replicate the simulation:

Software package: R, Stata, etc.
User-written commands?
Arguments of functions or of estimation algorithms?
Numerical precision?
Key (and sometimes implicit) assumptions when implementing a method?
… and so on.

(3) Coding a simulation study

Coding a simulation study

The most important thing to remember is that there is no right way of coding a simulation study.

However, your code should be:

Reproducible;
Immune to the bus factor;
Openly available (as much as possible).

It’s time to switch to a live demo.

We are going to use R for now, but code is available in Stata as well. All material is available online:
tinyurl.com/mcsim-example-code

(4) Analysing a simulation study

Analysing a simulation study

The analysis of a simulation study is not different from a clinical study. You often start with descriptive statistics:

Inspect your data,
Compute summary statistics (mean, median, etc.),
Create some descriptive plots e.g. to check for outliers.

Then, you can move onto estimating the performance measures of interest: bias, coverage, etc.

Now, if you feel fancy, you can go into regression modelling, but that’s a much less established practice.

Let’s go back to the example.

(5) Reporting and visualising results

Reporting results

There are potentially 4 dimensions (D times E times M times P), so complexity can grow quickly.

You may want to think about:

Which is the comparison of interest?
Across DGMs or across methods?
Can my reporting strategy effectively tell the story of the study?

Reporting results

As with coding before, there is no right way to do this.

Generally, I recommend using plots rather than tables, as tables can easily become daunting.

Example 1

Example 2

See the supplementary material for the paper titled Impact of model misspecification in shared frailty survival models by Gasparini et al., available online at:

https://github.com/ellessenne/frailtymcsim/blob/master/suppl.pdf

Warning: it’s not pretty!

Plots are (more often than not) a better option, and {rsimsum} can help with that.

Let’s go back to the example.

Interactivity?

With large simulation studies, we really want/need interactive graphics that allow the user to control what is displayed (or not).

Furthermore, more info (e.g., actual estimated values) could appear when hovering over plots.

Frameworks that let you do that are (among others) d3.js, Shiny, Plotly, etc.

Interactivity?

We actually did develop and release an interactive Shiny app that does some of that:

GitHub repository: ellessenne/interest
Paper: DOI: 10.52933/jdssv.v1i4.9

(6) Wrap-up

Replicability and reproducibility

Compared to, e.g., clinical trials or observational studies, simulation studies offer one of the most straightforward opportunities for replication.

Replicability and reproducibility

Therefore:

Write up your study always assuming that other people will try to replicate it. Many actually do: see, e.g., the RepliSims Project: DOI: 10.1098/rsos.231003. The ADEMPI framework can help with this.
Another suggestion is to release the full code, e.g. on GitHub or other repositories. This allows direct replicability (and reproducibility!), and forces you to check (and organise, and document) your code thoroughly. Some journals even require you to share code nowadays.

Phases of methodological research

Heinze et al. proposed a framework for thinking about how methodological work contributes to the evidence base for a method.

Is a method trustworthy enough to be used in practice?
Similar to the well-known phases of drug development.
Aims and simulation design can depend on the phase of research: scope, finite vs large sample, etc.
Recommended reading, DOI: 10.1002/bimj.202200222

Phases of methodological research

Key points

Start by planning your simulation study on paper;
Structure and describe your study assuming that other people will try to reproduce it;
Build your code slowly, start small and only then generalise/expand it;
Report Monte Carlo standard errors to quantify uncertainty;
Make sure you’re selling the story of your study well, with engaging and effective visualisations and reporting.

A Swiss Army Knife?

…yes! Let’s recap some key uses:

Invaluable tool for learning (and teaching!) statistics;
Helpful to test/challenge your understanding methods;
Helpful to test/evaluate new methods you develop;
Helpful to compare alternative methods;
Helpful to assess performance when key assumptions are violated;
…and many more.

Material

All material you have seen today is available online.

Slides: ellessenne.xyz/talks
Code: ellessenne/simulation-study-example

Monte Carlo Simulation: An Invaluable Tool for 21st Century Statisticians

Outline

(1) Introduction

About me

About Red Door Analytics

But, what are Monte Carlo simulation studies?

Monte Carlo simulation studies are computer experiments that involve generating data by pseudo-random sampling.

The Process™

Simulation studies are useful…

Simulation studies are useful…

…and increasingly popular!

Throughout this seminar we will see how to plan, code, analyse, report/visualise simulation studies.

Tutorial paper

(2) Planning a simulation study

There are issues…

Enter the ADEMP(I) framework

ADEMPI: Aims

ADEMPI: Data-generating mechanisms

ADEMPI: Estimands

ADEMPI: Estimands

ADEMPI: Methods

ADEMPI: Performance measures

In brief…

In brief…

In brief…

In brief…

All of these performance measures are described in more detail in the tutorial paper. Make sure to check that out and read carefully before planning your first simulation study.

ADEMPI: Implementation

Example: confounding

Example: Aim

Example: Data-generating mechanisms

Example: Data-generating mechanisms

Example: Estimands

Example: Methods

Example: Performance measures

Example: Implementation

In practice, it’s complicated…

Example: Implementation

(3) Coding a simulation study

Coding a simulation study

It’s time to switch to a live demo.

We are going to use R for now, but code is available in Stata as well. All material is available online: tinyurl.com/mcsim-example-code

(4) Analysing a simulation study

Analysing a simulation study

Let’s go back to the example.

(5) Reporting and visualising results

Reporting results

Reporting results

Example 1

Example 2

Plots are (more often than not) a better option, and {rsimsum} can help with that. Let’s go back to the example.

Interactivity?

Interactivity?

(6) Wrap-up

Replicability and reproducibility

Replicability and reproducibility

Phases of methodological research

Phases of methodological research

Key points

A Swiss Army Knife?

Material

Monte Carlo Simulation: An Invaluable Tool for 21^st Century Statisticians

We are going to use R for now, but code is available in Stata as well. All material is available online:
tinyurl.com/mcsim-example-code

Plots are (more often than not) a better option, and {rsimsum} can help with that.

Let’s go back to the example.