Type of Document Dissertation Author Mack, Daniel Leif Campana Author's Email Address email@example.com URN etd-04092013-182409 Title Anomaly Detection from Complex Temporal Sequences in Large Data Degree PhD Department Computer Science Advisory Committee
Advisor Name Title Gautam Biswas Committee Chair Doug Fisher Committee Member Gabor Karsai Committee Member Julie A. Adams Committee Member Xenofon Koutsoukos Committee Member Keywords
- aviation safety
- complexity measures
- anomaly detection
Date of Defense 2013-03-25 Availability unrestricted AbstractAs systems become more complex and the amount of operational data collected from these systems increases proportionally, new challenges arise about how this data can be used to better understand system operations, and detect unsafe behavior. For large systems made up of a number of interacting subsystems, detecting anomalous behavior while avoiding false alarms becomes an important problem. Anomaly detection in such systems must navigate large amounts of data that include a large number of operational runs under a variety of operating conditions, sensors, and long sequences of time series data that cover different aspects of system operation. From a safety viewpoint, we wish to use this data to improve the effectiveness of existing fault detection schemes. Of equal importance, is the development of methods that can detect previously unknown and undetected anomalies from the vast amounts of available operational data.
In this thesis, we have developed two approaches for anomaly detection in complex systems. The first approach uses supervised learning methods to improve the detection efficiency and accuracy of known anomalies in available diagnostic reasoners. The second approach uses unsupervised learning methods applied to the large amounts of data to identify previously undiscovered anomalies in system operations. Once anomalous instances are identified, we find the most discriminatory features, which then provide targeted information to help characterize the nature of the newly found anomalies for further study.
The methodologies developed in this thesis have been successfully applied to two big data domains. In the first domain, aircraft flight operations data is used for targeted improvement of known anomalies to improve diagnostic accuracy of a vehicle reasoner. This data is also used for identifying previously undetected or unknown anomalies during the takeoff phase of aircraft flight, which are then evaluated in terms of their potential impact on aviation safety. In the second domain, data recorded from pitches thrown in Major League Baseball games is used with our exploratory approach to identify anomalous games for individual pitchers, and then characterize these games in terms of the specific pitch types that differed from the nominal set thrown by these pitchers.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access MackDissertation.pdf 5.46 Mb 00:25:16 00:13:00 00:11:22 00:05:41 00:00:29