This page lists all of my publications, organized by year. Click a paper’s title to visit the publisher page and access the official version. Use Abstract to reveal the summary, Citation for the APA 7th-edition reference, and BibTeX to view the BibTeX entry. Clicking the link a second time will close the display again. Free Text links to a free full-text version, and Materials leads to an open repository with data, code, and other open-science resources.

★ = (co)first-authored work

For the most up-to-date listing of my work, please see my CV, Google Scholar, or My Bibliography.

2024

Fan, Visokay, Hoffman, Salerno, Liu, Leek & McCormick
First Conference on Language Modeling
Copied!
In settings where most deaths occur outside the healthcare system, verbal autopsies (VAs) are a common tool to monitor trends in causes of death (COD). VAs are interviews with a surviving caregiver or relative that are used to predict the decedent’s COD. Turning VAs into actionable insights for researchers and policymakers requires two steps (i) predicting likely COD using the VA interview and (ii) performing inference with predicted CODs (e.g. modeling the breakdown of causes by demographic factors using a sample of deaths). In this paper, we develop a method for valid inference using outcomes (in our case COD) predicted from free-form text using state-of-the-art NLP techniques. This method, which we call multiPPI++, extends recent work in “prediction-powered inference” to multinomial classification. We leverage a suite of NLP techniques for COD prediction and, through empirical analysis of VA data, we demonstrate the effectiveness of our approach in handling transportability issues. multiPPI++ recovers ground truth estimates, regardless of which NLP model produced predictions and regardless of whether they were produced by a more accurate predictor like GPT-4-32k or a less accurate predictor like KNN. Our findings demonstrate the practical importance of inference correction for public health decision-making and suggests that if inference tasks are the end goal, having a small amount of contextually relevant, high quality labeled data is essential regardless of the NLP algorithm.
@inproceedings{fan2024from, title = {From Narratives to Numbers: Valid Inference Using Language Model Predictions from Verbal Autopsies}, author = {Shuxian Fan and Adam Visokay and Kentaro Hoffman and Stephen Salerno and Li Liu and Jeffrey T. Leek and Tyler McCormick}, year = {2024}, booktitle = {First Conference on Language Modeling} }

2018

NeCamp, Morris, Reynolds & Salerno
Bloomberg Data for Good Exchange
Copied!
Data are beneficial resources for big businesses, national and state governments, and large nonprofits. In this article we describe how the benefits of data can be similarly reaped by local community organizations. We exemplar this benefit through Statistics in the Community (STATCOM) at the University of Michigan. STATCOM is an organization which connects statistics graduate students to local community organizations to provide free assistance around data organization, analysis, and interpretation. We describe three specific projects where STATCOM members partnered with a food assistance program, a crisis center, and a community foundation. For each project, we provide an overview of the community partner, the data available, the statistical methods used, the findings, and discuss how the analyses impacted the partner and the local community.
NA

This page was adapted from https://github.com/jmgirard/affcomlab/blob/main/publications.qmd

Back to top

References

Fan, S., Visokay, A., Hoffman, K., Salerno, S., Liu, L., Leek, J. T., & McCormick, T. (2024). From narratives to numbers: Valid inference using language model predictions from verbal autopsies. First Conference on Language Modeling.
NeCamp, T., Morris, E., Reynolds, E., & Salerno, S. (2018). Data for good in your neighborhood: A case study on how data can benefit your local community.