A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-07182005-122343


Type of Document Master's Thesis
Author Thornton-Wells, Tricia Ann
Author's Email Address t.thornton-wells@vanderbilt.edu
URN etd-07182005-122343
Title Comparison of Three Clustering Methods for Dissecting Trait Heterogeneity in Genotypic Data
Degree Master of Science
Department Biomedical Informatics
Advisory Committee
Advisor Name Title
Mike McDonald Committee Chair
Constantin F. Aliferis Committee Member
Jason H. Moore Committee Member
Jonathan L. Haines Committee Member
Thomas J. Palmeri Committee Member
Keywords
  • unsupervised learning
  • simulation study
  • method comparison
  • clustering
  • complex disease
  • genetics
  • trait heterogenetiy
  • locus heterogeneity
Date of Defense 2005-06-17
Availability unrestricted
Abstract
Trait heterogeneity, which exists when a trait has been defined with insufficient specificity such that it is actually two or more distinct traits, has been implicated as a confounding factor in traditional statistical genetics of complex human disease. In the absence of detailed phenotypic data collected consistently in combination with genetic data, unsupervised computational methodologies offer the potential for discovering underlying trait heterogeneity. The performance of three such methods—Bayesian Classification, Hypergraph-Based Clustering, and Fuzzy k-Modes Clustering—that are appropriate for categorical data were compared. Also tested was the ability of these methods to additionally detect trait heterogeneity in the presence of locus heterogeneity and gene-gene interaction, which are two other complicating factors in discovering genetic models of complex human disease. Bayesian Classification performed well under the simplest of genetic models simulated, and it outperformed the other two methods, with the exception that the Fuzzy k-Modes Clustering performed best on the most complex genetic model. Permutation testing showed that Bayesian Classification controlled Type I error very well but produced less desirable Type II error rates. Methodological limitations and future directions are discussed.
Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  MS_Thesis_17June2005_eversion_edited18July2005.pdf 614.06 Kb 00:02:50 00:01:27 00:01:16 00:00:38 00:00:03

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.