A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-03282011-223440


Type of Document Dissertation
Author Wehbe, Firas Hazem
URN etd-03282011-223440
Title A Machine Learning-Based Information Retrieval Framework for Molecular Medicine Predictive Models
Degree PhD
Department Biomedical Informatics
Advisory Committee
Advisor Name Title
Constantin Aliferis Committee Co-Chair
Cynthia S. Gadd Committee Co-Chair
Daniel R. Masys Committee Member
Hua Xu Committee Member
Pierre Massion Committee Member
Steven H. Brown Committee Member
Keywords
  • information retrieval
  • molecular medicine
  • translational bioinformatics
  • machine learning
Date of Defense 2011-03-15
Availability unrestricted
Abstract
Molecular medicine encompasses the application of molecular biology techniques and knowledge to the prevention, diagnosis and treatment of diseases and disorders. Statistical and computational models can predict clinical outcomes, such as prognosis or response to treatment, based on the results of molecular assays. For advances in molecular medicine to translate into clinical results, clinicians and translational researchers need to have up-to-date access to high-quality predictive models. The large number of such models reported in the literature is growing at a pace that overwhelms the human ability to manually assimilate this information. Therefore the important problem of retrieving and organizing the vast amount of published information within this domain needs to be addressed. The inherent complexity of this domain and the fast pace of scientific discovery make this problem particularly challenging.

This dissertation describes a framework for retrieval and organization of clinical bioinformatics predictive models. A semantic analysis of this domain was performed. The semantic analysis informed the design of the framework. Specifically, it allowed the development of a specialized annotation scheme of published articles that can be used for meaningful organization and for indexing and efficient retrieval. This annotation scheme was codified using an annotation form and accompanying guidelines document that were used by multiple human experts to annotate over 1000 articles. These datasets were then used to train and test support vector machine (SVM) machine learning classifiers. The classifiers were designed to provide a scalable mechanism to replicate human experts’ ability (1) to retrieve relevant MEDLINE articles and (2) to annotate these articles using the specialized annotation scheme. The machine learning classifiers showed very good predictive ability that was also shown to generalize to different disease domains and to datasets annotated by independent experts. The experiments highlighted the need for providing unambiguous operational definitions of the complex concepts used for semantic annotations. The impact of the semantic definitions on the quality of manual annotations and on the performance of the machine learning classifiers was discussed.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Firas_Wehbe_Dissertation_20110328.pdf 3.80 Mb 00:17:35 00:09:02 00:07:54 00:03:57 00:00:20

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.