A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-09062017-101455


Type of Document Dissertation
Author Mercaldo, Sarah Fletcher
URN etd-09062017-101455
Title On Optimal Prediction Rules With Prospective Missingness and Bagged Empirical Null Inference in Large-Scale Data
Degree PhD
Department Biostatistics
Advisory Committee
Advisor Name Title
Robert A. Greevy Committee Chair
Jeffrey D. Blume Committee Member
Matthew S. Shotwell Committee Member
Melinda C. Aldrich Committee Member
Thomas G. Stewart Committee Member
Keywords
  • missing data
  • imputation
  • prediction models
  • large-scale inference
  • p-values
Date of Defense 2017-08-24
Availability restrictone
Abstract
This dissertation consists of three papers related to missing data, prediction, and large scale inference. The first paper defines the problem of obtaining predictions from an exist- ing clinical risk prediction model when covariates are missing. We introduce the Pattern Mixture Kernel Submodel - submodels fit within each missing data pattern - that minimize prediction error in the presence of missingness. PMKS is explored in simulations and a case study, outperforming standard simple and multiple imputation techniques. The second paper introduces the Bagged Empirical Null p-value, a new algorithm that combines exist- ing methodology of Bagging and Empirical Null techniques to identify important effects in massive high-dimensional data. We illustrate the approach using a famous leukemia gene example where we uncovered new findings that are supported by previously published bench- work and we evaluate the algorithm’s performance in novel pseudo-simulations. The third paper gives recommendations for including the outcome in the imputation model during construction, validation, and application. We suggest only including the outcome for impu- tation of missing covariate values during model construction to obtain unbiased parameter estimates. When the outcome is used in the imputation algorithm during the validation step, we show through simulation, the model prediction metrics are optimistically inflated, and the actual pragmatic model performance would be inferior to the validated results. While the three papers presented here provide foundations for missing data and large scale inferential techniques, these ideas are applicable to a wide range of biomedical settings.
Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
[campus] Mercaldo.pdf 19.58 Mb 01:30:39 00:46:37 00:40:47 00:20:23 00:01:44
[campus] indicates that a file or directory is accessible from the campus network only.

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.