A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-04082013-143931

Type of Document Master's Thesis
Author Sivley, Robert Michael
Author's Email Address mike.sivley@vanderbilt.edu
URN etd-04082013-143931
Title Clustering Rare Event Features to Increase Statistical Power
Degree Master of Science
Department Computer Science
Advisory Committee
Advisor Name Title
Dr. Douglas H. Fisher Committee Chair
Dr. Tricia A. Thornton-Wells Committee Member
Dr. William S. Bush Committee Member
  • power
  • statistics
  • rvclust
  • clustering
  • rare event
  • rare variant
Date of Defense 2013-03-25
Availability unrestricted
Rare genetic variation has been put forward as a major contributor to the development of disease; however, it is inherently difficult to associate rare variants with disease, as the low number of observations greatly reduces statistical power. Binning is a method that groups several variants together and merges them into a single feature, sacrificing resolution to increase statistical power. Binning strategies are applicable to rare variant analysis in any field, though their effectiveness is dependent on the method used to group variants. This thesis presents a flexible workflow for rare variant analysis, comprised of five sequential steps: identification of rare variants, annotation of those variants, clustering the variants, collapsing those clusters, and statistical analysis. There are no restrictions on which clustering algorithms are applied, so a review of the core clustering paradigms is provided as an introduction for readers unfamiliar with the field. Also presented is RVCLUST, an R package that facilitates all stages of the described workflow and provides a collection of interfaces to common clustering algorithms and statistical tests. The utility of RVCLUST is demonstrated in a genetic analysis of rare variants in gene regulatory regions and their effect on gene expression. The results of this analysis suggest that informed clustering is an effective alternative to existing strategies, discovering the same associations while avoiding the statistical complications introduced by other binning methods.
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Sivley_Masters_Thesis.pdf 1.03 Mb 00:04:46 00:02:27 00:02:09 00:01:04 00:00:05

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.