A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-11192018-165929

Type of Document Dissertation
Author Sulieman, Lina Mahmoud
Author's Email Address sulieman.lina@gmail.com
URN etd-11192018-165929
Title Learning Clinical Data Representations for Machine Learning
Degree PhD
Department Biomedical Informatics
Advisory Committee
Advisor Name Title
Daniel Fabbri Committee Chair
Bradley Malin Committee Member
Christopher Fonnesbeck Committee Member
Colin Walsh Committee Member
Tom Lasko Committee Member
  • clinical documents
  • information extraction
  • text features
  • dynamic features
  • readmission
  • outcome prediction
  • natural language processing
  • text mining
  • NLP
  • feature representation
  • clinical models
  • prediction models
  • EMR
  • electronic health records
  • deep learning
  • machine learning
Date of Defense 2018-09-26
Availability restricted
Implementing machine learning in healthcare has increased in the past years. Representing clinical data is the Crux of machine learning. Learning informative features can improve the trained models’ performance. This dissertation describes methods to learn representations for temporal and text data to improve machine learning results.

Three data representations are discussed across three aims to tackle three biomedical informatics problems: 1) identifying patients at high risk of suffering from a negative outcome (readmission or death) to allocate intervention resources efficiently; 2) triaging patients’ messages and identifying their needs which requires human and time resources; 3) locating information about a phenotype in the clinical documents that requires human resources and increase information overload on healthcare providers.

In the first aim, a representation leveraged the post-discharge data to predict the patients’ outcome over one year after discharge. Training the outcome prediction model on post-discharge and before-discharge data improved performance significantly compared the model trained on before-discharge clinical data only.

In the second aim, the dissertation describes methods to learn representations that incorporate the semantics and the context of the words. These representations outperformed traditional features in identifying the patients’ needs in portal messages sent to healthcare providers. The results demonstrate that training machine learning models on these learned representations performs better than representations that lack those features.

In the third aim, a deep learning model leveraged the clinical documents’ contents and the billing codes to learn representations for sentences. The model implemented the representations to extract the sentences that include phenotype information (i.e., relevant sentences) without using an annotated dataset. The extraction model achieved higher performance than a similar keyword-based extraction and KnowledgeMap, a clinical concepts extraction tool.

The representations described in this dissertation are extensible to other electronic medical records. The proposed models can learn new representations that improve the clinical machine learning performance and can be applied to other medical informatics problems.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
[campus] SuliemanPhDDissertationETDV2.pdf 4.03 Mb 00:18:39 00:09:35 00:08:23 00:04:11 00:00:21
[campus] indicates that a file or directory is accessible from the campus network only.

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.