A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-04022012-093720

Type of Document Dissertation
Author Sekmen, Ali Safak
Author's Email Address ali.sekmen@vanderbilt.edu
URN etd-04022012-093720
Title Subspace Segmentation and High-Dimensional Data Analysis
Degree PhD
Department Mathematics
Advisory Committee
Advisor Name Title
Akram Aldroubi Committee Chair
Alexander Powell Committee Member
Douglas Hardin Committee Member
Larry Schumaker Committee Member
Mitchell Wilkes Committee Member
  • motion segmentation
  • subspace segmentation
  • union of subspaces
Date of Defense 2012-03-27
Availability unrestricted
This thesis developed theory and associated algorithms to solve subspace segmentation problem. Given a set of data W={w_1,...,w_N} in R^D that comes from a union of subspaces, we focused on determining a nonlinear model of the form U={S_i}_{i in I}, where S_i is a set of subspaces, that is nearest to W. The model is then used to classify W into clusters. Our first approach is based on the binary reduced row echelon form of data matrix. We prove that, in absence of noise, our approach can find the number of subspaces, their dimensions, and an orthonormal basis for each subspace S_i. We provide a comprehensive analysis of our theory and determine its limitations and strengths in presence of outliers and noise. Our second approach is based on nearness to local subspaces approach and it can handle noise effectively, but it works only in special cases of the general subspace segmentation problem (i.e., subspaces of equal and known dimensions). Our approach is based on the computation of a binary similarity matrix for the data points. A local subspace is first estimated for each data point. Then, a distance matrix is generated by computing the distances between the local subspaces and points. The distance matrix is converted to the similarity matrix by applying a data-driven threshold. The problem is then transformed to segmentation of subspaces of dimension 1 instead of subspaces of dimension d. The algorithm was applied to the Hopkins 155 Dataset and generated the best results to date.
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Sekmen_PhD_Dissertation.pdf 1.59 Mb 00:07:22 00:03:47 00:03:18 00:01:39 00:00:08

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.