A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-07142014-072802


Type of Document Master's Thesis
Author Chen, Yaoyi
Author's Email Address chenyaoyi@gmail.com
URN etd-07142014-072802
Title Using Statistical Learning Methods for Better Spectrum Classification
Degree Master of Science
Department Biostatistics
Advisory Committee
Advisor Name Title
Ming Li Committee Chair
Chris Fonnesbeck Committee Member
Keywords
  • statistical learning
  • data mining
  • machine learning
Date of Defense 2014-06-13
Availability unrestricted
Abstract
Shotgun proteomics has become a widely used technology for identifying a large number of peptides and proteins in complex biological samples. However, any single score function from most search algorithms to evaluate the quality of peptide-spectrum matches (PSMs) is not adequate to discriminate between correct and incorrect spectrum identification. Here, we used and compared multiple logistic regression models with different flexibilities and support vector machines with various kernel functions and random forests to incorporate multiple scores from search engines. New features, such as retention time differences and a number of other modifications, were also incorporated to build a better binary classifier. We validated these methods through bootstrapping and compared their performance to each other. My study has shown that these methods, with their unique strengths, have improved performance - specifically with higher area under ROC curve and better discrimination indices - to classify correct from incorrect peptide spectrum matches.
Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Chen.pdf 3.15 Mb 00:14:35 00:07:30 00:06:34 00:03:17 00:00:16

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.