Prediction-Based Inference: Methods & Applications

What do we do after we have machine learned everything?

Authors

Affiliations

Jesse Gronsbell, PhD

University of Toronto

Jianhu Gao, MS

University of Toronto

Stephen Salerno, PhD

Fred Hutchinson Cancer Center

Published

Tuesday, March 17, 2026 | 6:45 pm UTC

Source: https://arxiv.org/pdf/2411.19908

Workshop Goals and Objectives

Learning Goals:

Understand limitations in using predicted data for inference.
Learn about methods that correct for bias and recover valid uncertainty estimates.
Gain practical skills using the ipd R package.

Learning Objectives:

Explore data with AI/ML-predicted outcomes and diagnose bias/variance in predictions.
Apply ipd::ipd() to continuous and binary outcomes.
Interpret prediction-based (PB) inference outputs and visualize model results.

Time Outline (105 minutes)

Activity	Time
Overview	30 m
Short Break	5 m
Unit 00: Getting Started	30 m
Unit 01: AlphaFold	30 m
Wrap-Up and Q&A	10 m

Quick Start

The companion website for this workshop is available at:

https://salernos.github.io/ipd-workshop

To use the workshop image:

docker run -e PASSWORD=<your_chosen_password> -p 8787:8787 ghcr.io/salernos/ipd-workshop:latest

Once running, open http://localhost:8787/ and login with username = rstudio, password = <your_chosen_password>

Then begin!

Workshop Overview

In this workshop, we explore the consequences of conducting inference on predicted data across several applications and present a suite of prediction-based (PB) inference methods that adjust for prediction-related uncertainty to improve inference validity and efficiency. We also introduce ipd, a user-friendly R package that implements the PB inference methods through a unified interface. The package supports modular integration into existing workflows and includes tidy methods for model inspection and diagnostics.

Modules

This workshop covers two modules, each illustrated with the ipd package:¹

Getting Started

Prediction-Based Inference: Methods & Applications

Build intuition for prediction-based inference by simulating data and comparing different methods.

Proteomics with AlphaFold

Protein Disorder and PTMs

Apply PB methods in a proteomics setting to estimate associations when key outcomes are model-predicted rather than directly measured.

Supplemental Modules

We have also included some supplemental modules for you to explore on your own:

Measuring Adiposity

BMI vs. DXA

Compare BMI and DXA-based adiposity measures and use PB corrections to improve regression inference under predicted outcomes.

BCR-ABL Fusion

in B-Cell Leukemia

Study BCR-ABL prediction from gene-expression profiles and evaluate how PB inference methods calibrate downstream inference in genomics.

The Rashomon Quartet

i.e., Performance is not Enough

Examine how equally predictive models can yield different scientific conclusions and compare naive, classical, and PB inference.

Participation

This workshop uses a blended format of instruction and hands-on coding exercises. Participants should:

Follow along in the virtual RStudio environment (see below).
Attempt to complete brief exercises or run the solution code snippets in real time.
Engage in Q&A at module boundaries to troubleshoot and discuss concepts.

Prerequisites

A computer with internet access.
Familiarity with base R and tidyverse syntax (e.g., dplyr, broom).
Basic understanding of predictive (e.g., randomForest) and regression modeling (e.g., lm, glm).
Optional: Exposure to Bioconductor’s ExpressionSet, AnnotationDbi, and MLInterfaces is helpful for one of the supplemental modules.

Contributors

Presenters: Jesse Gronsbell ✉︎, Jianhui Gao ✉︎, Stephen Salerno ✉︎

All Contributors (Alphabetical Order): Awan Afiaz ✉︎, David Cheng ✉︎, Jianhui Gao ✉︎, Jesse Gronsbell ✉︎, Kentaro Hoffman ✉︎, Jeff Leek ✉︎, Qiongshi Lu ✉︎, Tyler McCormick ✉︎, Jiacheng Miao ✉︎, Anna Neufeld ✉︎, Stephen Salerno ✉︎

Footnotes

Module card cover images were generated by GPT-5.2.↩︎