A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-12042008-121803


Type of Document Dissertation
Author Statnikov, Alexander Romanovich
Author's Email Address alexander.statnikov@vanderbilt.edu
URN etd-12042008-121803
Title Algorithms for discovery of multiple Markov boundaries: application to the molecular signature multiplicity problem
Degree PhD
Department Biomedical Informatics
Advisory Committee
Advisor Name Title
Constantin F. Aliferis Committee Chair
Daniel R. Masys Committee Member
Douglas P. Hardin Committee Member
Gregory F. Cooper Committee Member
Ioannis Tsamardinos Committee Member
Keywords
  • variable selection
  • Markov boundary
  • gene expression microarray
  • high-throughput molecular data
  • signature multiplicity
  • classification
  • molecular signature
Date of Defense 2008-08-26
Availability unrestricted
Abstract
Algorithms for discovery of a Markov boundary from data constitute one of the most important recent developments in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight about local causal structure. Even though there is always a single Markov boundary of the response variable in faithful distributions, distributions with violations of the intersection property may have multiple Markov boundaries. Such distributions are abundant in practical data-analytic applications, and there are several reasons why it is important to induce all Markov boundaries from such data. However, there are currently no practical algorithms that can provably accomplish this task. To this end, I propose a novel generative algorithm (termed TIE*) that can discover all Markov boundaries from data. The generative algorithm can be instantiated to discover Markov boundaries independent of data distribution. I prove correctness of the generative algorithm and provide several admissible instantiations. The new algorithm is then applied to identify the set of maximally predictive and non-redundant molecular signatures. TIE* identifies exactly the set of true signatures in simulated distributions and yields signatures with significantly better predictivity and reproducibility than prior algorithms in human microarray gene expression datasets. The results of this thesis also shed light on the causes of molecular signature multiplicity phenomenon.
Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Thesis.pdf 2.81 Mb 00:12:59 00:06:40 00:05:50 00:02:55 00:00:14

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.