A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-07142017-193150

Type of Document Master's Thesis
Author Osterman, Travis John
Author's Email Address travis.osterman@vanderbilt.edu
URN etd-07142017-193150
Title Extracting Detailed Tobacco Exposure From The Electronic Health Record
Degree Master of Science
Department Biomedical Informatics
Advisory Committee
Advisor Name Title
Josh Denny, M.D., M.S. Committee Chair
Mia Levy, M.D., Ph.D. Committee Member
Pierre Massion, M.D. Committee Member
  • data extraction
  • lung cancer screening
  • phewas
  • gxe
  • smoking
  • natural language processing
Date of Defense 2017-07-22
Availability restricted
Lung cancer is the leading cause of cancer-related death in the United States and worldwide. Natural language processing (NLP) tools exist to determine smoking status (ever-smoker vs. never-smoker) from electronic health record data, but no system to date extracts detailed smoking data needed to assess a patient’s eligibility for lung cancer screening. Here we describe the Smoking History And Pack-year Extraction System (SHAPES), a rules-based, NLP system to quantify tobacco exposure from electronic clinical notes.

SHAPES was developed based on 261 patient records with 9,573 clinical notes and validated on 352 randomly selected patient records with 4,040 notes. F-measures for never-smoking status, ever-smoking status, rate of smoking, duration of smoking, quantity of cigarettes, and years quit were 0.86, 0.82, 0.79, 0.62, 0.64, and 0.61, respectively. Sixteen of 22 individuals eligible for lung cancer screening were identified (precision = 0.94, recall = 0.73).

SHAPES was compared to a previously validated smoking classification system using a phenome wide association study (PheWAS). SHAPES predicted similar significant associations with 66% less sample size (10,000 vs. 35,788), and detected 411 (268%) more associations in the full dataset than when using just ever/never smoking status.

Using smoking data from SHAPES, a smoking genome by environment interaction study found 57 statistically significant interactions between smoking and diseases including previously describes interactions between ischemic heart disease and rs1746537, obesity and rs10871777, and type 2 diabetes and rs2943641.

These studies support the use of SHAPES for lung cancer screening and other research requiring quantitative smoking history. External validation needs to be performed prior to implementation at other medical centers.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
[campus] TOsterman.pdf 2.97 Mb 00:13:43 00:07:03 00:06:10 00:03:05 00:00:15
[campus] indicates that a file or directory is accessible from the campus network only.

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.