Setup & Prerequisites

Getting Started with the Workshop Materials

Quick checklist for software setup and package installation.

Welcome, and thanks for joining the workshop, Prediction-Based Inference: Methods & Applications!

This page is a quick checklist to help you get set up before the session. If you’d like to follow along interactively during the workshop, these steps will make sure everything runs smoothly.

Quick Checklist (10-15 minutes)

Complete these before the tutorial:

  • Confirm you have a stable internet connection and a laptop.
  • Choose your environment:
    • Recommended: Docker + browser-based RStudio
    • Alternative: local R/RStudio install
  • Verify ipd and core R packages install successfully (Option B only).

Prerequisites

You should be comfortable with:

  • Base R and tidyverse syntax (dplyr, ggplot2, basic pipes).
  • Basic regression modeling (lm, glm).
  • Basic predictive modeling concepts (train/test split, predictions, model error).

Helpful but optional (for the supplemental modules):

  • Bioconductor familiarity (ExpressionSet, AnnotationDbi, MLInterfaces).

Software Requirements

Option B: Local R + RStudio

You need R 4.4.1 or newer.

Install:

R Packages to Install Ahead of Time

If you use Option A (Docker), all required packages are already included in the workshop image and you can skip installation.

Core packages (required for Option B)

install.packages(c(
  "ipd", "MASS", "broom", "tidyverse", "future", "furrr"
))

Supplemental packages (optional, used outside Units 00 and 01)

During the workshop, we will only cover Unit 00 and Unit 01. You can install the following only if you want to explore modules outside Unit 00 and Unit 01.

# CRAN packages for optional supplemental units
install.packages(c(
  "patchwork", "scales", "janitor", "GGally", "randomForest",
  "ranger", "mgcv", "pROC", "DALEX", "neuralnet", "partykit"
))

# Bioconductor packages for optional supplemental biological modules
if (!requireNamespace("BiocManager", quietly = TRUE)) {
  install.packages("BiocManager")
}
BiocManager::install(c(
  "ALL", "golubEsets", "AnnotationDbi", "hgu95av2.db",
  "hu6800.db", "MLInterfaces"
))

60-Second Setup Test

Run this in R/RStudio (for Option B, this confirms local setup is complete):

library(ipd)
library(MASS)
library(tidyverse)
library(broom)
library(future)
library(furrr)
sessionInfo()

Optional check (if one fails, install that package and rerun this chunk):

required <- c("ipd", "MASS", "broom", "tidyverse", "future", "furrr")

missing <- required[!vapply(required, requireNamespace, logical(1), quietly = TRUE)]

if (length(missing) == 0L) {
  message("All required packages are available.")
} else {
  stop(sprintf("Missing required packages: %s", paste(missing, collapse = ", ")))
}

If this runs without errors, you are ready.

Data

We will be providing datasets for the modules that use real data. For Option B, please download the data folder from either GitHub or Google Drive into your local working directory (for Option A, these data will already be available in the docker image).

Link to Data Folder on GitHub: https://github.com/salernos/ipd-workshop/tree/main/content/data

Link to Data Folder on Google Drive: https://drive.google.com/drive/folders/1ubmvB43a7zYgwrZ93-BSAjfbDOaEuH9r?usp=sharing

Support

If you hit setup issues before the session, contact Stephen Salerno (ssalerno@fredhutch.org)