Prediction-Based Inference: Methods & Applications

What do we do after we have machine learned everything?

Authors
Affiliations

Jesse Gronsbell, PhD

University of Toronto

Jianhu Gao, MS

University of Toronto

Stephen Salerno, PhD

Fred Hutchinson Cancer Center

Published

Tuesday, March 17, 2026 | 6:45 pm UTC

Workshop Goals and Objectives

Learning Goals:

  • Understand limitations in using predicted data for inference.
  • Learn about methods that correct for bias and recover valid uncertainty estimates.
  • Gain practical skills using the ipd R package.

Learning Objectives:

  • Explore data with AI/ML-predicted outcomes and diagnose bias/variance in predictions.
  • Apply ipd::ipd() to continuous and binary outcomes.
  • Interpret prediction-based (PB) inference outputs and visualize model results.

Time Outline (105 minutes)

Activity Time
Overview 30 m
Short Break 5 m
Unit 00: Getting Started 30 m
Unit 01: AlphaFold 30 m
Wrap-Up and Q&A 10 m

Quick Start

The companion website for this workshop is available at:

https://salernos.github.io/ipd-workshop

To use the workshop image:

docker run -e PASSWORD=<your_chosen_password> -p 8787:8787 ghcr.io/salernos/ipd-workshop:latest

Once running, open http://localhost:8787/ and login with username = rstudio, password = <your_chosen_password>

Then begin!

Workshop Overview

In this workshop, we explore the consequences of conducting inference on predicted data across several applications and present a suite of prediction-based (PB) inference methods that adjust for prediction-related uncertainty to improve inference validity and efficiency. We also introduce ipd, a user-friendly R package that implements the PB inference methods through a unified interface. The package supports modular integration into existing workflows and includes tidy methods for model inspection and diagnostics.

Modules

This workshop covers two modules, each illustrated with the ipd package:1

Supplemental Modules

We have also included some supplemental modules for you to explore on your own:

Participation

This workshop uses a blended format of instruction and hands-on coding exercises. Participants should:

  • Follow along in the virtual RStudio environment (see below).
  • Attempt to complete brief exercises or run the solution code snippets in real time.
  • Engage in Q&A at module boundaries to troubleshoot and discuss concepts.

Prerequisites

  • A computer with internet access.
  • Familiarity with base R and tidyverse syntax (e.g., dplyr, broom).
  • Basic understanding of predictive (e.g., randomForest) and regression modeling (e.g., lm, glm).
  • Optional: Exposure to Bioconductor’s ExpressionSet, AnnotationDbi, and MLInterfaces is helpful for one of the supplemental modules.

Contributors

Presenters: Jesse Gronsbell ✉︎, Jianhui Gao ✉︎, Stephen Salerno ✉︎

All Contributors (Alphabetical Order): Awan Afiaz ✉︎, David Cheng ✉︎, Jianhui Gao ✉︎, Jesse Gronsbell ✉︎, Kentaro Hoffman ✉︎, Jeff Leek ✉︎, Qiongshi Lu ✉︎, Tyler McCormick ✉︎, Jiacheng Miao ✉︎, Anna Neufeld ✉︎, Stephen Salerno ✉︎

Footnotes

  1. Module card cover images were generated by GPT-5.2.↩︎