A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-12022005-171510


Type of Document Master's Thesis
Author Fu, Lawrence Dachen
Author's Email Address lawrence.fu@vanderbilt.edu
URN etd-12022005-171510
Title A Comparison of State-of-the-Art Algorithms for Learning Bayesian Network Structure from Continuous Data
Degree Master of Science
Department Biomedical Informatics
Advisory Committee
Advisor Name Title
Ioannis Tsamardinos Committee Chair
Constantin Aliferis Committee Member
Doug Hardin Committee Member
Keywords
  • continuous
  • structure learning
  • bayesian network
  • discretization
  • Computer algorithms
  • Medicine -- Research -- Statistical methods
Date of Defense 2005-10-20
Availability unrestricted
Abstract
In biomedical and biological domains, researchers typically study continuous data sets. In these domains, an increasingly popular tool for understanding the relationship between variables is Bayesian network structure learning. There are three methods for learning Bayesian network structure from continuous data. The most popular approach is discretizing the data prior to structure learning. Alternative approaches are integrating discretization with structure learning as well as learning directly with continuous data. It is not known which method is best since there has not been a unified study of the three approaches. The purpose of this work was to perform an extensive experimental evaluation of them. For large data sets consisting of originally discrete variables, discretization-based approaches learned the most accurate structures. With smaller sample sizes or data without an underlying discrete mechanism, a method learning directly with continuous data performed best. Also, for some quality metrics, the integrated methods did not provide improvements over simple discretization methods. In terms of time-efficiency, the integrated approaches were the most computationally intensive, while methods from the other categories were the least intensive.
Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  ms_thesis.pdf 366.84 Kb 00:01:41 00:00:52 00:00:45 00:00:22 00:00:01

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.