Type of Document Dissertation Author VanHouten, Jacob Paul Author's Email Address firstname.lastname@example.org URN etd-07252016-100314 Title Using Abstraction to Overcome Problems of Sparsity, Irregularity, and Asynchrony in Structured Medical Data Degree PhD Department Biomedical Informatics Advisory Committee
Advisor Name Title Thomas A. Lasko Committee Chair Christopher J. Fonnesbeck Committee Member Katherine E. Hartmann Committee Member Michael E. Matheny Committee Member Nancy M. Lorenzi Committee Member Keywords
- clinical informatics
- data mining
- medical records
- data representation
- machine learning
Date of Defense 2016-06-29 Availability restricted AbstractElectronic health records (EHRs) are rich data sources that can be analyzed to discover new, clinically relevant patterns of disease manifestations. However, sparsity, irregularity, and asynchrony in health records pose challenges for their use in such discovery tasks, as standard statistical and machine learning techniques possess limited ability to handle these complications. Abstracting the clinical data into models and then using elements of those models as input to statistical and machine learning algorithms is one approach to overcoming these challenges. This dissertation provides insight into the use of different models for this purpose.
First, I examine the effect of model complexity on algorithm performance. Specifically, I examine how well different models capture the low-specificity information distributed throughout electronic health data. For several predictive algorithms, low-complexity models turn out to be nearly as powerful and much less costly as high-complexity models.
I then explore the use of continuous longitudinal models of laboratory results and diagnosis billing codes to discover clinically relevant patterns between and among these data. I look for associations between clusters of specific laboratory values and single billing codes, and identify known associations as well as others that are consistent with current medical knowledge but not expected a priori.
Finally, I use the same longitudinal abstraction models as inputs into more complex probabilistic models that adjust for indirect associations, and find that diagnosis codes can be used to predict the laboratory status of a patient.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access VanHouten_Dissertation.pdf 1.69 Mb 00:07:50 00:04:01 00:03:31 00:01:45 00:00:09indicates that a file or directory is accessible from the campus network only.