A short ode to simulation

simulation
statistics
learning
Some thoughts about the importance and value of statistical simulation, inspired by talks from the rstudio::global conference.
Published

February 26, 2021

I have been slowly catching up with talks from rstudio::global, and I was so impressed with a couple of presentations on statistical simulation that I had to write something about it!

Yes, you read it right, you’re not dreaming: statistical simulation is now cool!

As you might now, I am also a strong supporter of statistical simulation. Traditionally you might use it to test a new statistical method you are developing, or to compare different methods in unusual settings. You might also want to try to see where, when, and if a method breaks in edge cases.

However, two other (very important) use-cases are highlighted in the talks that I mentioned above:

  1. Simulation for learning and teaching,

  2. Simulation to drive agile infrastructures.

The first scenario is described very well by Chelsea Parlett-Pelleriti:

There are three main take-home messages:

  1. Statistical simulation encourages exploration,

  2. Tests intuition,

  3. And empowers a deeper understanding of complex statistical methods.

I stand by all of these points.

In fact, that’s a great and accessible way to learn: by trying, and trying, and trying again until you finally get it. And if you follow a simulation exercise, it’s even easier: you can modify the parameters at will (starting from a solid foundation), explore, and follow your intuition. I mean, isn’t this the scientific method all along?

The second talk is by Richard Vogg:

During the talk, he gives a couple of examples that highlight the importance of being able to compose data at will:

  1. When you cannot share the real dataset (e.g. for privacy reason), it is useful to have real-like data that can be shared more freely. There’s a decent amount of research (and growing interest) on the topic, see e.g. this paper by Dan Quintana;

  2. When you don’t have the data you’re supposed to be building pipelines for (yet), it is useful to have data that is similar to what you expect receiving. That gives you a head start to start e.g. prototyping;

  3. When you’re running internal training courses for staff, it is useful to use datasets that resemble what you actually work with (e.g. transactions, clients, etc.).

…the more you think about it, the more you realise how useful that can be!

In conclusion, both talks are excellent, straight to the point, and engaging. I recommend you give it a look, it will (literally) take just 10 minutes of your time.

See you soon!