A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-02112018-221351

Type of Document Dissertation
Author Yin, Zhijun
Author's Email Address zhijun.yin@vanderbilt.edu
URN etd-02112018-221351
Title Automated Learning Of Health Behaviors Through Consumer Authored Natural Language Text
Degree PhD
Department Computer Science
Advisory Committee
Advisor Name Title
Bradley Malin Committee Chair
Ching-Hua Chen Committee Member
Daniel Fabbri Committee Member
Jeremy Warner Committee Member
Yevgeniy Vorobeychik Committee Member
Yuan Xue Committee Member
  • Hormonal Therapy
  • Data Mining
  • Machine Learning
  • Statistical Inference
  • Natural Language Processing
  • Treatment Adherence
  • User Generated Content
  • Health Behavior
  • Patient Portal
  • Privacy
  • Social Media
  • Online Health Community
Date of Defense 2018-01-16
Availability unrestricted
Traditional methods for collecting data in support of clinical research include prospectively collected surveys, retrospective analyses of existing medical records, and a combination of the two. Yet these approaches tend to focus on a medical-centric worldview and, as a result, provide only a partial view of a patient's life. As distributed systems, cloud services and mobile devices grow in sophistication and market penetration, large amounts of personal data are generated every day, particularly in online environments, where a range of aspects of their life are disclosed, including information related to one's health. This situation provides an opportunity for healthcare providers and biomedical researchers to learn about patients from their own voice and beyond traditional data sources. However, collecting, processing, and acting upon self-authored natural language text imposes challenges on automatically extracting health-related information, including, but not limited to, ambiguity in communication, noisy data, long exposition that contains many different types of health information, and high-dimensionality in predictive model interoperability.

This dissertation applies a data-driven approach to investigate how self-authored information in three different online environments can be relied upon to learn about health-related behaviors. Specifically, this dissertation investigates three foundational questions. First, how do individuals disclose health status through a general social media platform (e.g., Twitter)? Second, can patients' long-term treatment adherence be inferred through online health communities (e.g., forums in breastcancer.org)? Third, how can we learn patients' needs based on the messages they send to healthcare providers over a patient portal that is connected to an electronic medical record (EMR) system that is ingrained in the everyday functions of a large academic medical center? To process consumer-authored natural language text, this dissertation illustrates how to combine text mining, machine learning, and statistical inference to 1) extract health related events (e.g., adherence status), 2) create interpretable factors (e.g., semantic groups), 3) build efficient predicting models (e.g., predicting medication interruption events), and 4) learn meaningful health-related associations (e.g., semantics and health status disclosure, emotions and portray of adherence status, topics and medication adherence). It is shown that many factors communicated through self-authored text (e.g., emotions, personalities, and other factors that are not captured in structured EMRs) can be applied to explain an individual's health-related behavior. This research provides evidence that self-generated information can be applied to supplement traditional data sources to facilitate healthcare research.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Z_Yin.pdf 4.01 Mb 00:18:33 00:09:32 00:08:21 00:04:10 00:00:21

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.