A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-12042009-124713

Type of Document Dissertation
Author Brown, Laura Elizabeth
URN etd-12042009-124713
Title Novel Methods for Variable Selection in Non-faithful Domains, Understanding Support Vector Machines, Learning Regions of Bayesian Networks, and Prediction Under Manipulation
Degree PhD
Department Biomedical Informatics
Advisory Committee
Advisor Name Title
Douglas Hardin Committee Co-Chair
Ioannis Tsamardinos Committee Co-Chair
Constantin Aliferis Committee Member
Daniel Masys Committee Member
  • support vector machines
  • causal discovery
  • Bayesian network
  • variable selection
Date of Defense 2009-12-01
Availability unrestricted
The focus of my research was to develop several novel computational techniques for discovering informative patterns and complex relationships in biomedical data. First, an efficient, heuristic method was developed to search for the features with largest absolute weight in a polynomial Support Vector Machine (SVM) model. This algorithm provides a new ability to understand, conceptualize, visualize, and communicate polynomial SVM models. Second, a new variable selection algorithm, called Feature Space Markov Blanket (FSMB), was designed. FSMB combines the advantages from kernel methods and Markov Blanket-based techniques for variable selection. FSMB was evaluated on several simulated, "difficult" distributions where it identified the Markov Blankets with high sensitivity and specificity. Additionally, it was run on several real world data sets; the resulting classification models are parsimonious (for two data sets, the models consisted of only 2-3 features). On another data set, the Markov Blanket-based method performed poorly; FSMB's improved performance suggests the existence of a complex, multivariate relationship in the underlying domain. Third, a well-cited algorithm for learning Bayesian networks (Max-Min Hill-Climbing, MMHC) was extended to locally learn a region of a Bayesian network. This local method was compared to MMHC in an empirical evaluation. The local method took, as expected, a fraction of the time to learn regions compared to MMHC; of particular interest, the local technique learned regions with equal or better quality. Finally, an approach using the formalism of causal Bayesian networks was designed to make predictions under manipulations; this approach was used in a submission to the Causality Challenge. The approach required the use and combination of the three methods from this research and many state-of-the-art techniques to build and evaluate models. The results of the competition (the submission performed best on one of the four tasks presented) illustrate some of the strengths and weaknesses of causal discovery methods and point to new directions in the field. The methods explored are introductory steps along research paths to explore understanding SVM models, variable selection in non-faithful problems, identifying causal relations in large domains, and learning with manipulations.
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  leb_phd_SUBMITTED.pdf 1.37 Mb 00:06:21 00:03:16 00:02:51 00:01:25 00:00:07

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.