A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-10192011-143352

Type of Document Master's Thesis
Author Carroll, Robert James
Author's Email Address Robert.J.Carroll@Vanderbilt.edu
URN etd-10192011-143352
Title Efficient Development of Electronic Health Record Based Algorithms to Identify Rheumatoid Arthritis
Degree Master of Science
Department Biomedical Informatics
Advisory Committee
Advisor Name Title
Josh Denny Committee Chair
Hua Xu Committee Member
Tom Lasko Committee Member
  • genetic associations
  • automated patient cohorts
  • phenotype algorithm
Date of Defense 2011-09-23
Availability unrestricted
Electronic Health Records (EHRs) are valuable tools in clinical and genomic research, providing the ability to generate large patient cohorts for study. Traditionally, EHR-based research is carried out through manual review of patient charts, which is expensive and time consuming, and limits the scalability of EHR-derived genetic or clinical research. The recent introduction of automated phenotype identification algorithms have sped cohort identification, but they also require significant investment to develop.

In these studies, we evaluated three aspects of the process of phenotype algorithm implementation and application in the context of Rheumatoid Arthritis (RA), a chronic inflammatory arthritis with known genetic risk factors. The first aspect was whether using a naïve set of features to train a support vector machine (SVM) would have similar performance to models trained using an expert-defined feature set. The second aspect was the effect of training set size on the predictive power of the algorithm for both the naïve and expert-defined sets. The third aspect was the evaluation of the portability across institutions of a trained model using expert-derived features.

We show that training an SVM with all available attributes maintains strong performance compared to an SVM trained using an expert-defined set of features. Using an expert-defined feature set allowed for a much smaller training set compared to the naïve feature set, although training set size requirements were much smaller than often used for phenotype algorithm training. We also show the portability of a previously published logistic regression model trained at Partners HealthCare to Vanderbilt and Northwestern Universities. While the original model was portable, models retrained using local data can also improve performance.

This research shows the potential for rapid development of new phenotype identification algorithms that may be portable to different EHR systems and institutions. With the application of clinical knowledge in the design, very few training records are required to create strongly predictive models, which could ease the development of models for new conditions. Fast, accurate development of portable phenotype algorithms offers the potential to engender a new era of EHR-based research.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  CarrollThesis.pdf 1.28 Mb 00:05:55 00:03:02 00:02:40 00:01:20 00:00:06

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.