A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-04042011-120300

Type of Document Master's Thesis
Author Podgursky, Benjamin T
Author's Email Address benjamin.t.podgursky@vanderbilt.edu
URN etd-04042011-120300
Title Practical k-Anonymity on large datasets
Degree Master of Science
Department Computer Science
Advisory Committee
Advisor Name Title
Gautam Biswas Committee Chair
Douglas Fisher Committee Member
  • clustering
  • anonymity
  • privacy
Date of Defense 2011-04-04
Availability unrestricted
The implicit contract between an individual and a website is that a viewer will remain anonymous unless they choose to identify themselves. On the other hand, there are many advantages to allowing websites to tailor content to viewers based on hints about the person's likely interests and habits. However, as people spend increasing amounts of time engaged networked and online, the line between a person's online presence and their offline identity has blurred. Ideally the goals of providing personalized internet content and the implicit contract of net-anonymity can be reconciled. This thesis studies what research from the field of privacy preserving data publishing can be used to use offline data anonymously for web personalization.

The anonymity models of k-Anonymity and (k,1)-Anonymity, or k-Unlinkability, turn out to be promising models to study for this problem, and this work studies how to anonymize insight data using these models. Rapleaf is a company that helps websites personalize their content, with the goal of anonymizing content while still keeping the data specific enough to be insightful. Rapleaf's personalization dataset is used as a case study for investigating the challenges associated with anonymizing one of these datasets.

It is hoped that through the findings reported here web data can be anonymized while remaining useful, and that organizations will be encouraged to view anonymity and insight as goals that can be equitably balanced, rather than being mutually exclusive.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  podgursky_electronic_thesis.pdf 443.16 Kb 00:02:03 00:01:03 00:00:55 00:00:27 00:00:02

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.