# Install these packages if you have not already:
# install.packages(c("ipd", "broom", "scales", "tidyverse"))
library(ipd) # Inference with predicted data
library(broom) # Convert model objects (lm, glm, ipd) into tidy data.frames
library(scales) # Formatting scales and labels for ggplot2
library(tidyverse) # Meta‐package for data manipulation and visualizationMeasuring Adiposity
BMI vs. DXA
Compare BMI and DXA-based adiposity measures and use PB corrections to improve regression inference under predicted outcomes.
Background and Motivation
How Do We Measure Adiposity?
Body mass index (BMI) is the most commonly used anthropometric proxy for adiposity in epidemiologic studies and clinical settings. It is simple to calculate, weight in kilograms divided by height in meters squared, and has well-established cut-points for classifying patients as overweight or obese. However, BMI does not distinguish between fat mass and lean mass, nor does it capture fat distribution (visceral versus subcutaneous). As a result, BMI can both under- and over-estimate true adiposity in key subgroups. For example muscular individuals (e.g., athletes) may be misclassified as “obese,” while older adults with sarcopenia (low muscle mass) may fall below BMI thresholds despite having high percent body fat.

Waist circumference (WC) is another simple measure of central adiposity that may better reflect visceral fat, a key driver of metabolic risk. Standard WC cut-points (e.g., ≥ 102 cm in men, ≥ 88 cm in women) identify individuals at increased cardiometabolic risk, even when their BMI is in the normal or overweight range. Yet WC also does not directly quantify total body fat and is influenced by body build, posture, and measurement error.
Dual-energy X-ray absorptiometry (DXA) provides a more accurate “gold-standard” measure of body composition by directly quantifying total and regional fat and lean mass. DXA-derived percent fat thresholds (e.g., > 30% in men, > 42% in women) have been validated against metabolic outcomes. Comparing BMI-based obesity (BMI ≥ 30 kg/m²) and WC-based obesity (WC ≥ 102 cm men, ≥ 88 cm women) with DXA-defined obesity reveals where and how often these common anthropometric proxies misclassify true adiposity.
It is worth noting that BMI is often defended as a valid measure for population-level inference, under the assumption that individual-level misclassification errors are harmless when estimating group-level trends. This justification appears frequently in epidemiology, where BMI is treated as a convenient stand-in for adiposity in regression models. But this reasoning masks a deeper issue central to the prediction-based (PB) inference framework: BMI is itself a prediction model, a crude one, but a model nonetheless. It encodes a deterministic function (weight divided by height squared) to approximate an unobserved latent quantity (body fat), and like any prediction model, it introduces systematic biases that may not vanish with aggregation. PB inference adopts a prediction-agnostic perspective: whether the surrogate is a simple index like BMI or a high-dimensional neural network, the challenge is the same: how to draw valid statistical inference when the outcome is a model-based proxy. By treating BMI as a prediction rather than a ground truth, we clarify the role of uncertainty, calibration, and correction, and demonstrate how PB methods can help recover valid estimates even when only such surrogates are available.
In this module, we will use data from the National Health and Nutrition Examination Survey (NHANES) to:
- Load BMI, WC, and DXA-based measures of adiposity and their associated features
- Define obesity by three standards:
- BMI (≥ 30 kg/m²)
- Waist circumference (men ≥ 102 cm; women ≥ 88 cm)
- DXA % body fat (> 30% men; > 42% women)
- Visualize misclassification rates overall and by age, sex, and race/ethnicity
- Calibrate both BMI and WC-based obesity measures to DXA using the
ipdpackage - Discuss implications for research and practice
By the end, we may have a better understanding of the strengths and limitations of BMI and WC as proxies for adiposity, know how to assess their sensitivity and specificity versus DXA, and be equipped to correct for measurement error using PB methods when only BMI/WC are available.
NHANES
The National Health and Nutrition Examination Survey (NHANES) is a nationally representative program conducted by the U.S. Centers for Disease Control and Prevention that combines in-home interviews with standardized physical examinations and laboratory tests to assess the health and nutritional status of Americans. Initiated in the early 1960s and carried out continuously since 1999 in two-year cycles, NHANES informs public health policy, tracks trends in chronic conditions and risk factors, and support research on diet, disease, and health disparities. NHANES provides:
- DXA-measured percent body fat (
DXDTOPF) for a subset of participants - BMI (
BMXBMI), derived from measured height and weight - Waist circumference (
BMXWAIST), a potential measure of central fat
As DXA scans are costly and time-consuming, many studies only record BMI and WC. When our scientific question involves true adiposity (e.g., its association with certain risk factors), but we only have BMI and WC in most participants, we can use ipd to correct our downstream inference on % body fat to account for the fact that these proxies are being used in place of the true DXA measurement.
Note: DXA scans were collected through the 2017 - 2018 NHANES cycle but were suspended during the COVID-19 pandemic. As a result, subsequent waves, including the August 2021 - August 2023 data, do not include DXA measurements. In this tutorial, we will therefore:
- Use the 2017-2018 wave as our labeled data (which includes
DXDTOPF).
- Use PB inference to correct our inference when estimating associations between % body fat and other covariates in the absence of direct DXA measurements in the unlabeled August 2021 - August 2023 wave.
This approach lets us leverage historic DXA data to predict adiposity in more recent participants, enabling unbiased inference across pre- and post-pandemic cohorts.
Data Preparation and Exploration
We will now load a pre-compiled NHANES dataset, explore key variables (BMI, waist circumference, DXA % body fat) by cohort, and define obesity categories for later misclassification analyses and PB inference.
Loading the Pre-Compiled NHANES Data
For convenience, we prepared and saved a data file, data/NHANES.rda, which contains a tibble, NHANES, with the following columns:
SEQN- Respondent ID
Cohort- Factor: “2017-2018” or “2021-2023”
Age- Factor: “Under 20”, “20-39”, “40-59”, or “60+”Sex- Factor: “Male” or “Female”
Race- Factor: “Non-Hispanic White”, “Hispanic”, “Non-Hispanic Black”, “Non-Hispanic Asian”, or “Other Race - Including Multi-Racial”Smoking- Factor: “Never Smoker”, “Former Smoker”, or “Current Smoker”Education- Factor: “Less than high school”, “High school graduate/GED or equivalent”, “Some college or AA degree”, “College graduate or above”, or “Refused/Unknown”BMXBMI- Body Mass Index (kg/m²)
BMXWAIST- Waist Circumference (cm)
DXDTOPF- DXA Percent Body Fat (only measured in 2017-2018;NAin 2021-2023)WTMEC4YR- Four-Year Adjusted Survey Weights
Note: The the code used to produce this dataset is available in this repository at inst/NHANES_DATA.R.
Exercise 1: Install and Load the Necessary Packages
Load packages once here so all remaining exercises run cleanly.
Exercise 2: Load the Data
We first load the prepared dataset and take a look at its features:
# Load the dataset
load("data/NHANES.RData")
# Inspect its structure
glimpse(NHANES)Rows: 11,782
Columns: 11
$ SEQN <dbl> 93706, 93707, 93711, 93712, 93714, 93717, 93719, 93725, 9372…
$ Cohort <fct> 2017-2018, 2017-2018, 2017-2018, 2017-2018, 2017-2018, 2017-…
$ Age <fct> Under 20, Under 20, 40-59, Under 20, 40-59, 20-39, Under 20,…
$ Sex <fct> Male, Male, Male, Male, Female, Male, Female, Female, Male, …
$ Race <fct> Non-Hispanic Asian, Other Race - Including Multi-Racial, Non…
$ Smoking <fct> NA, NA, NA, NA, Former Smoker, NA, NA, NA, NA, NA, NA, NA, N…
$ Education <fct> Refused/Unknown, Refused/Unknown, College graduate or above,…
$ BMXBMI <dbl> 21.5, 18.1, 21.3, 19.7, 39.9, 24.5, 26.0, 16.1, 27.6, 28.6, …
$ BMXWAIST <dbl> 79.3, 64.1, 86.6, 72.0, 118.4, 86.2, 86.0, 66.8, 101.5, 96.3…
$ DXDTOPF <dbl> 22.7, 19.0, 22.8, 16.7, 42.1, 20.4, 33.4, 26.9, 29.4, 22.8, …
$ WTMEC4YR <dbl> 4361.720, 3532.305, 6195.460, 15168.327, 7739.791, 30058.968…
Verify the data schema now to avoid debugging later steps.
Exercise 3: Overview by Cohort
We can now get a sense of the different cohorts and our variables of interest:
Sample Sizes
NHANES |>
count(Cohort) |>
mutate(pct = n / sum(n) * 100) Cohort n pct
1 2017-2018 3617 30.69937
2 2021-2023 8165 69.30063
DXA Missingness
NHANES |>
group_by(Cohort) |>
summarize(
total = n(),
pct_missing = mean(is.na(DXDTOPF)) * 100)# A tibble: 2 × 3
Cohort total pct_missing
<fct> <int> <dbl>
1 2017-2018 3617 0
2 2021-2023 8165 100
Expected:
- 2017-2018: DXA (
DXDTOPF) present for analytic participants. - 2021-2023:
DXDTOPFentirely missing (post-pandemic no DXA).
Discussion Questions
- Why is this a natural labeled/unlabeled split for PB inference?
- What inferential bias would appear if we ignored the structured missingness?
Exercise 4: Descriptive Statistics
BMI, Waist Circumference, and DXA % Fat
NHANES |>
group_by(Cohort) |>
summarize(
BMI_mean = mean(BMXBMI),
BMI_sd = sd(BMXBMI),
WC_mean = mean(BMXWAIST),
WC_sd = sd(BMXWAIST),
DXA_mean = mean(DXDTOPF),
DXA_sd = sd(DXDTOPF)
) |>
pivot_longer(-Cohort)|>
separate(name, c("Metric", "Statistic")) |>
pivot_wider(names_from = Statistic, values_from = value)# A tibble: 6 × 4
Cohort Metric mean sd
<fct> <chr> <dbl> <dbl>
1 2017-2018 BMI 26.5 7.24
2 2017-2018 WC 89.3 18.8
3 2017-2018 DXA 32.3 8.70
4 2021-2023 BMI 27.1 7.96
5 2021-2023 WC 92.1 22.0
6 2021-2023 DXA NA NA
Distributions of Continuous Measures of Adiposity
NHANES |>
pivot_longer(c(BMXBMI, BMXWAIST, DXDTOPF)) |>
ggplot(aes(x = value, fill = Cohort)) +
facet_wrap(~ name, scales = "free") +
geom_histogram(alpha = 0.6, position = "identity", bins = 30) +
scale_fill_manual(values = c("#00C1D5", "#AA4AC4")) +
labs(
title = "Measures of Adiposity by Cohort",
x = "BMI (kg/m2), WC (cm), or % Fat",
y = "Count") +
theme_minimal()
DXA vs Anthropometry in 2017-2018
Only the 2017-2018 cohort has DXDTOPF, so we examine how BMI and WC relate to true % body fat among the study participants in this wave:
NHANES |>
filter(Cohort == "2017-2018") |>
pivot_longer(c(BMXBMI, BMXWAIST)) |>
ggplot(aes(x = value, y = DXDTOPF)) +
facet_wrap(~ name, scales = "free_x") +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm", color = "#1B365D") +
labs(
title = "DXA % Body Fat vs BMI and WC (2017-2018)",
x = "BMI (kg/m2) or WC (cm)", y = "DXDTOPF (%)") +
theme_minimal()
Let’s also compare these measures by Sex. We can also add reference lines to see where the measure-specific obesity cut-offs would be:
cutoffs <- tibble(
name = c("BMXBMI", "BMXBMI", "BMXWAIST", "BMXWAIST"),
Sex = c("Male", "Female", "Male", "Female"),
xint = c(30, 30, 102, 88),
yint = c(30, 42, 30, 42)
)
NHANES |>
filter(Cohort == "2017-2018") |>
pivot_longer(c(BMXBMI, BMXWAIST)) |>
ggplot(aes(x = value, y = DXDTOPF, group = Sex, fill = Sex, color = Sex)) +
facet_wrap(~ name, scales = "free_x") +
geom_point(alpha = 0.4) +
geom_smooth(method = "lm") +
geom_vline(data = cutoffs,
aes(xintercept = xint, color = Sex), linetype = "dashed") +
geom_hline(data = cutoffs,
aes(yintercept = yint, color = Sex), linetype = "dashed") +
scale_fill_manual(values = c("#00C1D5", "#AA4AC4")) +
scale_color_manual(values = c("#00C1D5", "#AA4AC4")) +
labs(
title = "DXA % Body Fat vs BMI and WC (2017-2018)",
x = "BMI (kg/m2) or WC (cm)", y = "DXDTOPF (%)") +
theme_minimal()
- BMI and WC both show strong positive association with DXA % fat.
- Slopes differ by sex, suggesting different proxy behavior across groups.
- Mismatched quadrants indicate real proxy misclassification.
Discussion Questions
- Which proxy looks better aligned with DXA in this cohort?
- How might sex-specific thresholding improve proxy calibration?
Exercise 5: Defining Obesity Categories
Let us create three binary indicators:
obese_BMI: BMI ≥ 30 kg/m²obese_WC: WC ≥ 102 cm (men) or ≥ 88 cm (women)obese_DXA: DXA % Fat > 30% (men) or > 42% (women)
NHANES <- NHANES |>
mutate(
obese_BMI = BMXBMI >= 30,
obese_WC = case_when(
Sex == "Male" & BMXWAIST >= 102 ~ TRUE,
Sex == "Female" & BMXWAIST >= 88 ~ TRUE,
.default = FALSE),
obese_DXA = case_when(
is.na(DXDTOPF) ~ NA,
Sex == "Male" & DXDTOPF > 30 ~ TRUE,
Sex == "Female" & DXDTOPF > 42 ~ TRUE,
.default = FALSE)
)These indicators define proxy outcomes (obese_BMI, obese_WC) and the reference outcome (obese_DXA) for downstream analyses.
Exercise 6: Misclassification in 2017-2018
Compare BMI-defined vs DXA-defined obesity:
# Overall
NHANES |>
filter(Cohort == "2017-2018") |>
count(obese_DXA, obese_BMI) |>
mutate(percent = sprintf("%.1f%%", 100 * n / sum(n))) obese_DXA obese_BMI n percent
1 FALSE FALSE 2180 60.3%
2 FALSE TRUE 307 8.5%
3 TRUE FALSE 415 11.5%
4 TRUE TRUE 715 19.8%
# By Sex
NHANES |>
filter(Cohort == "2017-2018") |>
count(Sex, obese_DXA, obese_BMI) |>
group_by(Sex) |>
mutate(percent = sprintf("%.1f%%", 100 * n / sum(n))) |>
ungroup()# A tibble: 8 × 5
Sex obese_DXA obese_BMI n percent
<fct> <lgl> <lgl> <int> <chr>
1 Male FALSE FALSE 1003 56.1%
2 Male FALSE TRUE 138 7.7%
3 Male TRUE FALSE 302 16.9%
4 Male TRUE TRUE 345 19.3%
5 Female FALSE FALSE 1177 64.4%
6 Female FALSE TRUE 169 9.2%
7 Female TRUE FALSE 113 6.2%
8 Female TRUE TRUE 370 20.2%
And WC-defined vs DXA-defined:
# Overall
NHANES |>
filter(Cohort == "2017-2018") |>
count(obese_DXA, obese_WC) |>
mutate(percent = sprintf("%.1f%%", 100 * n / sum(n))) obese_DXA obese_WC n percent
1 FALSE FALSE 1959 54.2%
2 FALSE TRUE 528 14.6%
3 TRUE FALSE 319 8.8%
4 TRUE TRUE 811 22.4%
# By Sex
NHANES |>
filter(Cohort == "2017-2018") |>
count(Sex, obese_DXA, obese_WC) |>
group_by(Sex) |>
mutate(percent = sprintf("%.1f%%", 100 * n / sum(n))) |>
ungroup()# A tibble: 8 × 5
Sex obese_DXA obese_WC n percent
<fct> <lgl> <lgl> <int> <chr>
1 Male FALSE FALSE 1020 57.0%
2 Male FALSE TRUE 121 6.8%
3 Male TRUE FALSE 286 16.0%
4 Male TRUE TRUE 361 20.2%
5 Female FALSE FALSE 939 51.3%
6 Female FALSE TRUE 407 22.3%
7 Female TRUE FALSE 33 1.8%
8 Female TRUE TRUE 450 24.6%
- BMI vs DXA: meaningful misclassification, with sex-specific asymmetry.
- WC vs DXA: also misclassifies, with a different error profile from BMI.
- Proxy choice changes both bias direction and uncertainty in downstream models.
Discussion Questions
- Which error pattern would most distort regression coefficients?
- How might subgroup-specific calibration reduce these errors?
Exercise 7: Additional Stratified Comparisons by Measure
Let us reshape the data to long format so that BMI, WC, and DXA-defined obesity are all in one “Measure” column:
# Assume NHANES is already loaded and has Cohort, Age, Sex, Race,
# and the three obesity variables
nhanes_long <- NHANES |>
select(Cohort, Age, Sex, Race, obese_BMI, obese_WC, obese_DXA) |>
pivot_longer(
cols = starts_with("obese_"),
names_to = "Measure",
values_to = "Obese"
) |>
mutate(
Measure = recode(Measure,
obese_BMI = "BMI",
obese_WC = "Waist Circumference",
obese_DXA = "DXA"
)
) |>
filter(!is.na(Obese))By Age Group
prop_age <- nhanes_long |>
group_by(Cohort, Age, Measure, Obese) |>
summarize(n = n(), .groups = "drop") |>
group_by(Cohort, Age, Measure) |>
mutate(prop = n / sum(n))
ggplot(prop_age, aes(x = Age, y = prop, fill = Obese)) +
geom_col(position = "stack") +
facet_grid(Cohort ~ Measure) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_fill_manual(values = c("#00C1D5", "#AA4AC4")) +
scale_color_manual(values = c("#00C1D5", "#AA4AC4")) +
labs(
x = "Age Group",
y = "Percent",
fill = "Obesity Status",
title = "Obesity Classification by Age, Cohort, and Measure"
) +
theme_minimal()
By Sex
prop_sex <- nhanes_long |>
group_by(Cohort, Sex, Measure, Obese) |>
summarize(n = n(), .groups = "drop") |>
group_by(Cohort, Sex, Measure) |>
mutate(prop = n / sum(n))
ggplot(prop_sex, aes(x = Sex, y = prop, fill = Obese)) +
geom_col(position = "stack") +
facet_grid(Cohort ~ Measure) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_fill_manual(values = c("#00C1D5", "#AA4AC4")) +
scale_color_manual(values = c("#00C1D5", "#AA4AC4")) +
labs(
x = "Sex",
y = "Percent",
fill = "Obesity Status",
title = "Obesity Classification by Sex, Cohort, and Measure"
) +
theme_minimal()
By Race/Ethnicity
prop_race <- nhanes_long |>
group_by(Cohort, Race, Measure, Obese) |>
summarize(n = n(), .groups = "drop") |>
group_by(Cohort, Race, Measure) |>
mutate(prop = n / sum(n))
ggplot(prop_race, aes(x = Race, y = prop, fill = Obese)) +
geom_col(position = "stack") +
facet_grid(Cohort ~ Measure) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_fill_manual(values = c("#00C1D5", "#AA4AC4")) +
scale_color_manual(values = c("#00C1D5", "#AA4AC4")) +
labs(
x = "Race / Ethnicity",
y = "Percent",
fill = "Obesity Status",
title = "Obesity Classification by Race/Ethnicity, Cohort, and Measure"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Stratified charts reveal where proxy-defined obesity diverges most from DXA by subgroup.
Discussion Questions
BMI misclassification varies systematically across subgroups:
- Age: Young adults (18-34) often have lower lean mass, so normal-BMI individuals can still have high body fat (“normal-weight obesity”).
- Sex: Women generally have higher percent-fat at the same BMI; sex-specific DXA thresholds help adjust for this.
- Race/Ethnicity: Differences in body composition and fat distribution mean that a single BMI cut-point may not correspond to the same adiposity level across groups.
Implications: Reliance on BMI alone can bias epidemiologic associations with true adiposity-driven outcomes (e.g., diabetes, cardiovascular disease). When DXA or other body-composition measures are impractical, consider calibration equations or subgroup-specific BMI thresholds.
Next: we will split the combined NHANES data into labeled (2017-2018 with DXDTOPF) and unlabeled (2021-2023 without DXA), and then proceed with PB inference using BMI or WC as our proxy.
Demographic Associations: Naive vs Classical vs PB Inference Methods
Exercise 8: Naive vs Classical vs PB Inference
We are interested in studying the association between true percent body fat (DXA % fat) and risk factors such as age, sex, and race. We model the binary obesity outcome (DXA-defined “true” vs BMI/WC “proxy”) as a function of Age, Sex, and Race using logistic regression:
- Naive: proxy-only model on unlabeled data (2021-2023)
- Classical: true-only model on labeled data (2017-2018)
- PB Inference: combine both while correcting for proxy error
Exercise 8a: Setup Labeled vs Unlabeled Data
# Split NHANES into labeled (DXA available) vs unlabeled
labeled <- NHANES |> filter(Cohort == "2017-2018")
unlabeled <- NHANES |> filter(Cohort == "2021-2023")
# Stack for PB Inference
combined <- bind_rows(
labeled |> mutate(set_label = "labeled"),
unlabeled |> mutate(set_label = "unlabeled")
)This split defines the labeled and unlabeled sets used by PB methods.
Exercise 8b: Naive versus Classical Regressions
We fit:
- Naive-BMI:
lm(BMI ~ Age + Sex + Race, data = unlabeled) - Naive-WC:
lm(WC ~ Age + Sex + Race, data = unlabeled) - Classical:
lm(DXDTOPF ~ Age + Sex + Race, data = labeled)
For the naive method, we treat BMI- and WC-based obesity (our ’predictions) as the true outcomes, fit on the unlabeled participants:
# Naive on unlabeled using BMI
naive_bmi_fit <- glm(obese_BMI ~ Age + Sex + Race,
family = binomial, data = unlabeled)
naive_bmi_df <- broom::tidy(naive_bmi_fit) |>
mutate(method = "Naive (BMI)")
# Naive on unlabeled using WC
naive_wc_fit <- glm(obese_WC ~ Age + Sex + Race,
family = binomial, data = unlabeled)
naive_wc_df <- broom::tidy(naive_wc_fit) |>
mutate(method = "Naive (WC)")Interpretation:
- Because we ‘predicted’ obesity using BMI or WC, the “naive” coefficient are biased and the confidence intervals are artifically too narrow.
For the classical approach, we fit the true model on the labeled subset (with actual DXA % fat). That is:
# Classical on labeled using DXA
class_fit <- glm(obese_DXA ~ Age + Sex + Race,
family = binomial, data = labeled)
class_df <- broom::tidy(class_fit) |>
mutate(method = "Classical (DXA)")
# Combine
coef_df <- bind_rows(naive_bmi_df, naive_wc_df, class_df) |>
filter(term != "(Intercept)") |>
mutate(
term = recode(term,
"Age20-39" = "Age 20-39 (vs Under 20)",
"Age40-59" = "Age 40-59 (vs Under 20)",
"Age60+" = "Age 60+ (vs Under 20)",
"SexFemale" = "Female (vs Male)",
"RaceNon-Hispanic Black" = "Non-Hispanic Black (vs Non-Hispanic White)",
"RaceNon-Hispanic Asian" = "Non-Hispanic Asian (vs Non-Hispanic White)",
"RaceHispanic" = "Hispanic (vs Non-Hispanic White)",
"RaceOther Race - Including Multi-Racial" = "Other Race - Including Multi-Racial (vs Non-Hispanic White)"
),
method = factor(method, levels = c("Classical (DXA)", "Naive (BMI)", "Naive (WC)"))
)
# Forest plot
ggplot(coef_df, aes(x = estimate, y = term, color = method)) +
geom_point(position = position_dodge(width = 0.6)) +
geom_errorbarh(aes(xmin = estimate - 1.96 * std.error,
xmax = estimate + 1.96 * std.error),
height = 0.2,
position = position_dodge(width = 0.6)) +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_y_discrete(limits = rev) +
scale_fill_manual(values = c("#1B365D", "#00C1D5", "#AA4AC4")) +
scale_color_manual(values = c("#1B365D", "#00C1D5", "#AA4AC4")) +
labs(
x = "Log-Odds Estimate (95% CI)",
y = "Covariate",
title = "Comparison of Obesity Model Coefficients",
color = "Model Type"
) +
theme_minimal()
- Naive models can appear precise but are biased by proxy error.
- Classical DXA-only estimates are valid but use fewer observations.
- Compare sign/magnitude shifts across covariates to diagnose proxy bias.
Discussion Questions
- Which coefficients are most sensitive to proxy choice?
- Do BMI and WC induce similar bias patterns?
Exercise 9: PB Inference Corrections
We now apply PB inference to leverage all participants (labeled + unlabeled) while adjusting for prediction error. We run two IPD analyses, one using BMI as our ‘f’ and one using WC.
A note on two calling styles:
We can either provide a single, combined dataset to the
dataargument and the name of the column that gives the set labels inlabel:ipd_bmi_1 <- ipd( formula = obese_DXA - obese_BMI ~ Age + Sex + Race, data = combined, label = “set_label”, model = “logistic”, method = “pspa” )
or we can provide the labeled set to
dataand the unlabeled set tounlabeled_dataseparately:ipd_bmi_2 <- ipd( formula = obese_DXA - obese_BMI ~ Age + Sex + Race, data = labeled, unlabeled_data = unlabeled, model = “logistic”, method = “pspa” )
Now we can try to run the PB model:
# Note: This code chunk results in an error, but we have an informative warning
# We can try running it and see!
# ipd_bmi_fit <- ipd(
# formula = obese_DXA - obese_BMI ~ Age + Sex + Race,
# method = "pspa",
# model = "logistic",
# data = combined,
# label = "set_label"
# )
#
# ipd_wc_fit <- ipd(
# formula = obese_DXA - obese_WC ~ Age + Sex + Race,
# method = "pspa",
# model = "logistic",
# data = labeled,
# unlabeled_data = unlabeled
# )The warning is expected: labeled and unlabeled datasets must share factor levels for PB model fitting.
Discussion Questions
- Why do mismatched factor levels break PB inference fitting?
- What preprocessing checks should you run before modeling?
We can see that the labeled subset does not include any "60+" observations, but ipd expects the same factor levels in both sets in order to fit the model using all the data. To fix this, let us refactor both the labeled and unlabeled data so they have consistent age categories:
# Recode age
NHANES2 <- NHANES |>
mutate(Age_recode = fct_collapse(Age, `40+` = c("40-59", "60+")))
# Split NHANES into labeled (DXA available) vs unlabeled
labeled2 <- NHANES2 |>
filter(Cohort == "2017-2018") |>
select(obese_DXA, obese_BMI, obese_WC, Age_recode, Sex, Race)
unlabeled2 <- NHANES2 |> filter(Cohort == "2021-2023") |>
select(obese_DXA, obese_BMI, obese_WC, Age_recode, Sex, Race)
# Stack for PB Inference
combined2 <- bind_rows(
labeled2 |> mutate(set_label = "labeled"),
unlabeled2 |> mutate(set_label = "unlabeled")
)Exercise 10: Run PB Inference and Compare Against Naive/Classical
Now let us rerun ipd::ipd() and compare our results to the Naive and Classical models:
# Note: This code chunk now runs without error!
ipd_bmi_fit <- ipd(
formula = obese_DXA - obese_BMI ~ Age_recode + Sex + Race,
method = "pspa",
model = "logistic",
data = combined2,
label = "set_label"
)
ipd_wc_fit <- ipd(
formula = obese_DXA - obese_WC ~ Age_recode + Sex + Race,
method = "pspa",
model = "logistic",
data = labeled2,
unlabeled_data = unlabeled2
)
# Collect results using the tidy() method
ipd_bmi_df <- tidy(ipd_bmi_fit) |>
mutate(method = "PSPA (BMI)")
ipd_wc_df <- tidy(ipd_wc_fit) |>
mutate(method = "PSPA (WC)")# Rerun previous models
# Naive on unlabeled using BMI
naive_bmi_fit <- glm(obese_BMI ~ Age_recode + Sex + Race,
family = binomial, data = unlabeled2)
naive_bmi_df <- broom::tidy(naive_bmi_fit) |>
mutate(method = "Naive (BMI)")
# Naive on unlabeled using WC
naive_wc_fit <- glm(obese_WC ~ Age_recode + Sex + Race,
family = binomial, data = unlabeled2)
naive_wc_df <- broom::tidy(naive_wc_fit) |>
mutate(method = "Naive (WC)")
# Classical on labeled using DXA
class_fit <- glm(obese_DXA ~ Age_recode + Sex + Race,
family = binomial, data = labeled2)
class_df <- broom::tidy(class_fit) |>
mutate(method = "Classical (DXA)")
# Combine
coef_df <- bind_rows(
ipd_bmi_df, ipd_wc_df, naive_bmi_df, naive_wc_df, class_df) |>
filter(term != "(Intercept)") |>
mutate(
term = recode(term,
"Age_recode20-39" = "Age 20-39 (vs Under 20)",
"Age_recode40+" = "Age 40+ (vs Under 20)",
"SexFemale" = "Female (vs Male)",
"RaceNon-Hispanic Black" = "Non-Hispanic Black (vs Non-Hispanic White)",
"RaceNon-Hispanic Asian" = "Non-Hispanic Asian (vs Non-Hispanic White)",
"RaceHispanic" = "Hispanic (vs Non-Hispanic White)",
"RaceOther Race - Including Multi-Racial" = "Other Race - Including Multi-Racial (vs Non-Hispanic White)"
),
method = factor(method,
levels = c("PSPA (BMI)", "PSPA (WC)",
"Classical (DXA)", "Naive (BMI)", "Naive (WC)"))
)
# Forest plot
ggplot(coef_df, aes(x = estimate, y = term,
color = method, fill = method, shape = method)) +
geom_point(position = position_dodge(width = 0.6)) +
geom_errorbarh(aes(xmin = estimate - 1.96 * std.error,
xmax = estimate + 1.96 * std.error),
height = 0.2,
position = position_dodge(width = 0.6)) +
geom_vline(xintercept = 0, linetype = "dashed") +
scale_y_discrete(limits = rev) +
scale_fill_manual(
values = c(
"Classical (DXA)" = "#1B365D",
"Naive (BMI)" = "#00C1D5",
"Naive (WC)" = "#AA4AC4",
"PSPA (BMI)" = "#00C1D5",
"PSPA (WC)" = "#AA4AC4"
)
) +
scale_color_manual(
values = c(
"Classical (DXA)" = "#1B365D",
"Naive (BMI)" = "#00C1D5",
"Naive (WC)" = "#AA4AC4",
"PSPA (BMI)" = "#00C1D5",
"PSPA (WC)" = "#AA4AC4"
)
) +
scale_shape_manual(
values = c(
"Classical (DXA)" = 16,
"Naive (BMI)" = 16,
"Naive (WC)" = 16,
"PSPA (BMI)" = 17,
"PSPA (WC)" = 17
)
) +
labs(
x = "Log-Odds Estimate (95% CI)",
y = "Covariate",
title = "Comparison of Obesity Model Coefficients",
color = "Model Type", fill = "Model Type", shape = "Model Type"
) +
theme_minimal()
PB inference combines both cohorts to correct proxy-induced bias while leveraging unlabeled sample size for efficiency.
Discussion Questions
- Which covariates move closest to classical after PB inference correction?
- Do PSPA (BMI) and PSPA (WC) differ meaningfully in uncertainty?
Summary and Key Takeaways
- BMI and WC are imperfect proxies for DXA-based % body fat.
- Naive regression on BMI or WC leads to biased estimates for associations between obesity and risk factors such as age, sex, and race.
- Classical regression on true DXA-based % fat is unbiased but costly.
- PB inference allows you to leverage a large cohort with only BMI + covariates, correct for prediction error, and recover unbiased estimates with more precision than the Classical approach.
- Understanding how different proxies like BMI and WC affect population-level inference is important for epidemiologic and clinical studies.
References
- Visokay, Adam, et al. “How to measure obesity in public health research? Problems with using BMI for population inference.” medRxiv (2025): 2025-04.
This is the end of the module. We hope this was informative! For question/concerns/suggestions, please reach out to ssalerno@fredhutch.org