The Statistical Software Revolution in the Pharmaceutical Industry and Minimum Viable Good Practices for High Quality Statistical Software Packages
Disclaimer:
Views are solely my own and do not express the opinions or views of Red Door Analytics AB or the openstatsware working group.
This is a presentation in two acts:
The statistical software revolution in the pharmaceutical industry
Minimum viable good practices for high quality statistical software packages
Challenges with productivity, efficiency, and innovation in the pharmaceutical industry
Flat number of new drug approvals per year, but increasing R&D spend per approval
Several potential reasons:
Methodological innovation is required due to:
New data modalities and endpoints
Higher volume of data
Novel designs
[…]
In this context, open-source software development has emerged as an agile solution for keeping pace with advances in statistical methodology.
Risk-based validation: align package risk level with validation effort and documentation
Usability first: API design, examples, and user testing as core deliverables
Break silos: cross-functional governance and shared infrastructure teams
Create RSE roles: embed software engineering expertise in biostatistics groups
Develop engineering standards: version control, CI, automated tests, release management
Leverage the community: openstatsware, pharmaverse, and other cross-company initiatives
The open-source revolution in the pharmaceutical industry is in full swing
This is exemplified by successful submissions by Novo Nordisk and Roche (among others)
Open-source software offers faster innovation, transparency, and shared efficiency if paired with governance, validation, and good software engineering practices
Documentation
Vignettes
Tests
Functions
Style
Life cycle
Developers Value Tests For Software Longevity
Documentation is important for users and developers to understand all objects in your package, without reading and interpreting the underlying source code.
Use inline comments next to functions, classes and other objects to generate their corresponding documentation
Do document internal functions and classes for maintenance by future developers
Add code comments for ambiguous or complex pieces of internal code
Vignettes are documents that complement the documentation by providing a comprehensive and long-form overview of the package from a user perspective.
Provide an introduction vignette that introduces the package to new users
Include code examples and automatically compile the vignette to ensure reproducibility
Include deep dive vignettes that go into depth on specific use cases, functionalities or underlying theory
Host your vignettes on a dedicated website
Tests are a fundamental safety net and development tool to ensure that your package works as expected, both during development as well as on user systems.
Write unit tests for all functions and classes in your package, to ensure that all building blocks work correctly on their own
Write functional tests for all user-facing functionality, to ensure that the package is stable when refactoring internal code
Ensure adequate coverage of your code
Function definitions should be short, simple and enforce argument types with assertions.
Write short functions for a single and well-defined purpose, with few arguments, and low complexity
Use type hints to explain to users which argument of the function expects which type of input
Enforce types and other expected properties of function arguments with assertions
Catch errors and fail early with human-readable error messages
Code style is important because it makes software easier to read, safer to change, and more consistent for everyone who works on it.
Use idiomatic code and follow clean code rules
Use a formatting tool to automatically implement a consistent and readable code format
Use style checking tools to enforce a consistent and readable code style
Life cycle management is simplified by reducing dependencies, and should include a central code repository.
Reduce dependencies to simplify maintenance
Only depend on other packages that you trust
Give clear information on user-facing changes in the package, and first deprecate functionality before removing it
Use a central repository for version control, collecting and resolving issues, and managing releases
openstatsware website:
https://www.openstatsware.org
Current version of our openstatsguide:
https://www.openstatsware.org/guide
Good Software Engineering Practice for R Packages course material:
https://openstatsware.github.io/shortcourse-iscb2025/
Check our website: reddooranalytics.se/career