A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-04022013-104900

Type of Document Master's Thesis
Author Wu, Dili
Author's Email Address dili.wu@vanderbilt.edu
URN etd-04022013-104900
Title A profiling and performance analysis based self-tuning system for optimization of Hadoop MapReduce cluster configuration
Degree Master of Science
Department Computer Science
Advisory Committee
Advisor Name Title
Dr. Aniruddha Gokhale Committee Chair
Dr. Yi Cui Committee Member
  • Self-Tuning
  • Performance Optimiazation
  • MapReduce
  • Cluster Configuration
  • Hadoop
Date of Defense 2013-03-26
Availability unrestricted
As a parallel data processing framework, MapReduce has been proved to be one of the most popular topics in the age of Cloud Computing since it was firstly proposed by Google. Despite its advantages such as scalability, reliability and flexibility, how to manage the resources of a MapReduce cluster and thus optimize the performance of MapReduce applications running on it is still one major issue in this field. This thesis introduces PPABS, a profiling and performance analysis based system for performance optimization of a Hadoop cluster by automatically tuning its configuration settings. The entire process of PPABS can be described as follows. First of all, Profiling of MapReduce job performance and Data Mining technique were combined in this system to dynamically divide jobs into groups. Secondly, Simulated Annealing, a probabilistic metaheuristic algorithm for global optimization, was imported and modified to find the optimum solution and tune the cluster configuration for the job groups we got from the first step. Thirdly, after running an incoming job with only a small part of its input data set, Pattern Recognition technique was also used to classify this new job. And finally, the cluster configuration would be updated by PPABS to match this job's features before running the whole job. The experimental results were very promising and showed the effectiveness of our approach in improving the performance of several Hadoop jobs running with cluster built on Amazon EC2.
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Master_Thesis_Dili_Wu.pdf 1.48 Mb 00:06:50 00:03:31 00:03:04 00:01:32 00:00:07

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.