A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-08232018-184014

Type of Document Dissertation
Author Giganti, Mark Joseph
URN etd-08232018-184014
Title Statistical Methods for the Analysis of Error-Prone Electronic Health Records: Impact of Source Data Verification, Time Discretized Multiple Imputation, and Variance Estimation with Incompatible Imputation and Analysis Models
Degree PhD
Department Biostatistics
Advisory Committee
Advisor Name Title
Jonathan Schildcrout Committee Chair
Bryan Shepherd Committee Member
Peter Rebeiro Committee Member
Qingxia (Cindy) Chen Committee Member
  • electronic health records
  • missing data
  • time-to-event outcomes
  • measurement error
  • variance estimation
Date of Defense 2018-08-16
Availability restricted
Observational data from electronic health records (EHRs) are prone to errors which are often correlated across multiple variables. One strategy to assess EHR data quality is to compare the research study data to the original source document for a subset of records and document discrepancies. Given the resource-intensiveness of this source data verification (SDV), it is imperative to be able to justify its continued implementation. Using a data audit from an international HIV setting as a practical example, I propose a framework for assessing the impact of audits on study results and illustrate its implementation. Given the discrepancies in the originally collected data are substantial enough to impact epidemiological inferences, I propose a method to obtain unbiased and efficient estimates in time-to-event analyses while incorporating both the original error-prone data for all subjects and the audited data for the subsample of subjects. This time-discretized modeling and imputation (TDMI) approach uses discrete time models built in a validation sample to multiply impute covariate and outcome values in the remaining unvalidated records. Imputation variances estimates were calculated using an approached proposed by Robins and Wang (2000) that allows for incompatibility between imputation and analysis models. We provide a tutorial for calculating this imputation variance estimator using multiple examples and providing comprehensive R code.
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
[campus] Giganti.pdf 1.56 Mb 00:07:14 00:03:43 00:03:15 00:01:37 00:00:08
[campus] indicates that a file or directory is accessible from the campus network only.

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.