A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-07252016-100314

Type of Document Dissertation
Author VanHouten, Jacob Paul
Author's Email Address jacob.p.vanhouten@vanderbilt.edu
URN etd-07252016-100314
Title Using Abstraction to Overcome Problems of Sparsity, Irregularity, and Asynchrony in Structured Medical Data
Degree PhD
Department Biomedical Informatics
Advisory Committee
Advisor Name Title
Thomas A. Lasko Committee Chair
Christopher J. Fonnesbeck Committee Member
Katherine E. Hartmann Committee Member
Michael E. Matheny Committee Member
Nancy M. Lorenzi Committee Member
  • clinical informatics
  • data mining
  • medical records
  • data representation
  • machine learning
Date of Defense 2016-06-29
Availability unrestricted
Electronic health records (EHRs) are rich data sources that can be analyzed to discover new, clinically relevant patterns of disease manifestations. However, sparsity, irregularity, and asynchrony in health records pose challenges for their use in such discovery tasks, as standard statistical and machine learning techniques possess limited ability to handle these complications. Abstracting the clinical data into models and then using elements of those models as input to statistical and machine learning algorithms is one approach to overcoming these challenges. This dissertation provides insight into the use of different models for this purpose.

First, I examine the effect of model complexity on algorithm performance. Specifically, I examine how well different models capture the low-specificity information distributed throughout electronic health data. For several predictive algorithms, low-complexity models turn out to be nearly as powerful and much less costly as high-complexity models.

I then explore the use of continuous longitudinal models of laboratory results and diagnosis billing codes to discover clinically relevant patterns between and among these data. I look for associations between clusters of specific laboratory values and single billing codes, and identify known associations as well as others that are consistent with current medical knowledge but not expected a priori.

Finally, I use the same longitudinal abstraction models as inputs into more complex probabilistic models that adjust for indirect associations, and find that diagnosis codes can be used to predict the laboratory status of a patient.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  VanHouten_Dissertation.pdf 1.69 Mb 00:07:50 00:04:01 00:03:31 00:01:45 00:00:09

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.