A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-04092013-182409

Type of Document Dissertation
Author Mack, Daniel Leif Campana
Author's Email Address dmack@isis.vanderbilt.edu
URN etd-04092013-182409
Title Anomaly Detection from Complex Temporal Sequences in Large Data
Degree PhD
Department Computer Science
Advisory Committee
Advisor Name Title
Gautam Biswas Committee Chair
Doug Fisher Committee Member
Gabor Karsai Committee Member
Julie A. Adams Committee Member
Xenofon Koutsoukos Committee Member
  • baseball
  • aviation safety
  • complexity measures
  • anomaly detection
Date of Defense 2013-03-25
Availability unrestricted
As systems become more complex and the amount of operational data collected from these systems increases proportionally, new challenges arise about how this data can be used to better understand system operations, and detect unsafe behavior. For large systems made up of a number of interacting subsystems, detecting anomalous behavior while avoiding false alarms becomes an important problem. Anomaly detection in such systems must navigate large amounts of data that include a large number of operational runs under a variety of operating conditions, sensors, and long sequences of time series data that cover different aspects of system operation. From a safety viewpoint, we wish to use this data to improve the effectiveness of existing fault detection schemes. Of equal importance, is the development of methods that can detect previously unknown and undetected anomalies from the vast amounts of available operational data.

In this thesis, we have developed two approaches for anomaly detection in complex systems. The first approach uses supervised learning methods to improve the detection efficiency and accuracy of known anomalies in available diagnostic reasoners. The second approach uses unsupervised learning methods applied to the large amounts of data to identify previously undiscovered anomalies in system operations. Once anomalous instances are identified, we find the most discriminatory features, which then provide targeted information to help characterize the nature of the newly found anomalies for further study.

The methodologies developed in this thesis have been successfully applied to two big data domains. In the first domain, aircraft flight operations data is used for targeted improvement of known anomalies to improve diagnostic accuracy of a vehicle reasoner. This data is also used for identifying previously undetected or unknown anomalies during the takeoff phase of aircraft flight, which are then evaluated in terms of their potential impact on aviation safety. In the second domain, data recorded from pitches thrown in Major League Baseball games is used with our exploratory approach to identify anomalous games for individual pitchers, and then characterize these games in terms of the specific pitch types that differed from the nominal set thrown by these pitchers.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  MackDissertation.pdf 5.46 Mb 00:25:16 00:13:00 00:11:22 00:05:41 00:00:29

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.