Type of Document Dissertation Author Sulieman, Lina Mahmoud Author's Email Address firstname.lastname@example.org URN etd-11192018-165929 Title Learning Clinical Data Representations for Machine Learning Degree PhD Department Biomedical Informatics Advisory Committee
Advisor Name Title Daniel Fabbri Committee Chair Bradley Malin Committee Member Christopher Fonnesbeck Committee Member Colin Walsh Committee Member Tom Lasko Committee Member Keywords
- clinical documents
- information extraction
- text features
- dynamic features
- outcome prediction
- natural language processing
- text mining
- feature representation
- clinical models
- prediction models
- electronic health records
- deep learning
- machine learning
Date of Defense 2018-09-26 Availability restricted AbstractImplementing machine learning in healthcare has increased in the past years. Representing clinical data is the Crux of machine learning. Learning informative features can improve the trained models’ performance. This dissertation describes methods to learn representations for temporal and text data to improve machine learning results.
Three data representations are discussed across three aims to tackle three biomedical informatics problems: 1) identifying patients at high risk of suffering from a negative outcome (readmission or death) to allocate intervention resources efficiently; 2) triaging patients’ messages and identifying their needs which requires human and time resources; 3) locating information about a phenotype in the clinical documents that requires human resources and increase information overload on healthcare providers.
In the first aim, a representation leveraged the post-discharge data to predict the patients’ outcome over one year after discharge. Training the outcome prediction model on post-discharge and before-discharge data improved performance significantly compared the model trained on before-discharge clinical data only.
In the second aim, the dissertation describes methods to learn representations that incorporate the semantics and the context of the words. These representations outperformed traditional features in identifying the patients’ needs in portal messages sent to healthcare providers. The results demonstrate that training machine learning models on these learned representations performs better than representations that lack those features.
In the third aim, a deep learning model leveraged the clinical documents’ contents and the billing codes to learn representations for sentences. The model implemented the representations to extract the sentences that include phenotype information (i.e., relevant sentences) without using an annotated dataset. The extraction model achieved higher performance than a similar keyword-based extraction and KnowledgeMap, a clinical concepts extraction tool.
The representations described in this dissertation are extensible to other electronic medical records. The proposed models can learn new representations that improve the clinical machine learning performance and can be applied to other medical informatics problems.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access SuliemanPhDDissertationETDV2.pdf 4.03 Mb 00:18:39 00:09:35 00:08:23 00:04:11 00:00:21indicates that a file or directory is accessible from the campus network only.