Setup & Prerequisites
Getting Started with the Workshop Materials
Welcome, and thanks for joining the workshop, Prediction-Based Inference: Methods & Applications!
This page is a quick checklist to help you get set up before the session. If you’d like to follow along interactively during the workshop, these steps will make sure everything runs smoothly.
Quick Checklist (10-15 minutes)
Complete these before the tutorial:
- Confirm you have a stable internet connection and a laptop.
- Choose your environment:
- Recommended: Docker + browser-based RStudio
- Alternative: local R/RStudio install
- Verify
ipdand core R packages install successfully (Option B only).
Prerequisites
You should be comfortable with:
- Base
Randtidyversesyntax (dplyr,ggplot2, basic pipes). - Basic regression modeling (
lm,glm). - Basic predictive modeling concepts (train/test split, predictions, model error).
Helpful but optional (for the supplemental modules):
- Bioconductor familiarity (
ExpressionSet,AnnotationDbi,MLInterfaces).
Software Requirements
Option A (Recommended): Docker Workshop Environment
Install Docker Desktop:
Then run:
docker run -e PASSWORD=<your_chosen_password> -p 8787:8787 ghcr.io/salernos/ipd-workshop:latestOpen:
http://localhost:8787/- Login: username =
rstudio, password =<your_chosen_password>
Option B: Local R + RStudio
You need R 4.4.1 or newer.
Install:
R Packages to Install Ahead of Time
If you use Option A (Docker), all required packages are already included in the workshop image and you can skip installation.
Core packages (required for Option B)
install.packages(c(
"ipd", "MASS", "broom", "tidyverse", "future", "furrr"
))Supplemental packages (optional, used outside Units 00 and 01)
During the workshop, we will only cover Unit 00 and Unit 01. You can install the following only if you want to explore modules outside Unit 00 and Unit 01.
# CRAN packages for optional supplemental units
install.packages(c(
"patchwork", "scales", "janitor", "GGally", "randomForest",
"ranger", "mgcv", "pROC", "DALEX", "neuralnet", "partykit"
))
# Bioconductor packages for optional supplemental biological modules
if (!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install(c(
"ALL", "golubEsets", "AnnotationDbi", "hgu95av2.db",
"hu6800.db", "MLInterfaces"
))60-Second Setup Test
Run this in R/RStudio (for Option B, this confirms local setup is complete):
library(ipd)
library(MASS)
library(tidyverse)
library(broom)
library(future)
library(furrr)
sessionInfo()Optional check (if one fails, install that package and rerun this chunk):
required <- c("ipd", "MASS", "broom", "tidyverse", "future", "furrr")
missing <- required[!vapply(required, requireNamespace, logical(1), quietly = TRUE)]
if (length(missing) == 0L) {
message("All required packages are available.")
} else {
stop(sprintf("Missing required packages: %s", paste(missing, collapse = ", ")))
}If this runs without errors, you are ready.
Data
We will be providing datasets for the modules that use real data. For Option B, please download the data folder from either GitHub or Google Drive into your local working directory (for Option A, these data will already be available in the docker image).
Link to Data Folder on GitHub: https://github.com/salernos/ipd-workshop/tree/main/content/data
Link to Data Folder on Google Drive: https://drive.google.com/drive/folders/1ubmvB43a7zYgwrZ93-BSAjfbDOaEuH9r?usp=sharing
Links You’ll Need During the Workshop
- Workshop site: https://salernos.github.io/ipd-workshop
ipdpackage repo: https://github.com/ipd-tools/ipd- Unit 00 (start here): Getting Started
Support
If you hit setup issues before the session, contact Stephen Salerno (ssalerno@fredhutch.org)