Type of Document Master's Thesis Author Carroll, Robert James Author's Email Address Robert.J.Carroll@Vanderbilt.edu URN etd-10192011-143352 Title Efficient Development of Electronic Health Record Based Algorithms to Identify Rheumatoid Arthritis Degree Master of Science Department Biomedical Informatics Advisory Committee
Advisor Name Title Josh Denny Committee Chair Hua Xu Committee Member Tom Lasko Committee Member Keywords
- genetic associations
- automated patient cohorts
- phenotype algorithm
Date of Defense 2011-09-23 Availability unrestricted AbstractElectronic Health Records (EHRs) are valuable tools in clinical and genomic research, providing the ability to generate large patient cohorts for study. Traditionally, EHR-based research is carried out through manual review of patient charts, which is expensive and time consuming, and limits the scalability of EHR-derived genetic or clinical research. The recent introduction of automated phenotype identification algorithms have sped cohort identification, but they also require significant investment to develop.
In these studies, we evaluated three aspects of the process of phenotype algorithm implementation and application in the context of Rheumatoid Arthritis (RA), a chronic inflammatory arthritis with known genetic risk factors. The first aspect was whether using a naïve set of features to train a support vector machine (SVM) would have similar performance to models trained using an expert-defined feature set. The second aspect was the effect of training set size on the predictive power of the algorithm for both the naïve and expert-defined sets. The third aspect was the evaluation of the portability across institutions of a trained model using expert-derived features.
We show that training an SVM with all available attributes maintains strong performance compared to an SVM trained using an expert-defined set of features. Using an expert-defined feature set allowed for a much smaller training set compared to the naïve feature set, although training set size requirements were much smaller than often used for phenotype algorithm training. We also show the portability of a previously published logistic regression model trained at Partners HealthCare to Vanderbilt and Northwestern Universities. While the original model was portable, models retrained using local data can also improve performance.
This research shows the potential for rapid development of new phenotype identification algorithms that may be portable to different EHR systems and institutions. With the application of clinical knowledge in the design, very few training records are required to create strongly predictive models, which could ease the development of models for new conditions. Fast, accurate development of portable phenotype algorithms offers the potential to engender a new era of EHR-based research.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access CarrollThesis.pdf 1.28 Mb 00:05:55 00:03:02 00:02:40 00:01:20 00:00:06