A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-01052017-105740

Type of Document Dissertation
Author Xia, Weiyi
Author's Email Address xwy0220@gmail.com
URN etd-01052017-105740
Title Optimizing the Privacy Risk - Utility Framework in Data Publication
Degree PhD
Department Computer Science
Advisory Committee
Advisor Name Title
Bradley Malin Committee Chair
Daniel Fabbri Committee Member
Eric Johnson Committee Member
Murat Kantarcioglu Committee Member
Yevgeniy Vorobeychik Committee Member
  • Data sharing policy
  • Frontier optimization
  • Data utility
  • Re-identification risk
  • Sharing individual data
  • Data privacy
Date of Defense 2016-12-16
Availability unrestricted
In the past decade, we have witnessed a rapid growth in the quantity, quality, and diversity of personal data we shed through our daily activities. At the same time, it is increasingly recognized that personal data has tremendous value in supporting a variety of endeavors, which leads to sharing of such data in a de-identified form for secondary uses. This practice has raised numerous concerns about privacy, in particular that de-identified data will be linked to, or reveal sensitive information about, the corresponding named individual. As a result, disclosure control methods have been developed to mitigate risks when sharing such data. Yet disclosure risk is often measured simply as the probability that a record can be linked.

Traditional views on disclosure risk often neglect that an adversary is often driven by gain (which may be economic in nature), with limited resources, and must undertake a series of actions to achieve a successful attack. As a result, demonstration of possible disclosures does not necessarily indicate the likely disclosure risk level. Applying de-identification methods without consideration for this fact can lead to overprotection, which causes unnecessary data distortion and diminishes the legitimate usage of the data, as well underprotection, which could harm relationships between data subjects and data publishers.

The goal of this thesis is to build a framework to reason about privacy disclosure risk given the dataset and the context in which it is published. The context includes an adversary’s decision making process, the adversary’s gain from a successful attack, and the adversary’s cost for accomplishing the attack process. It also develops methods to find data publishing solutions that provide desirable tradeoffs between identity disclosure risks and data utility. This dissertation is specifically partitioned into three main contributions. First, this dissertation proposes a novel re-identification risk framework, which formalizes incentive and deterrence mechanisms in a real world environment where the de-identified dataset is released. This framework explicitly models the adversary as an optimal planning agent using a factored Markov decision process (FMDP). Second, this dissertation investigates the feasibility of certain types of penalty on deterrence. Specifically, it considers a temporal penalty for violating the terms of a data use agreement (i.e., holding a data user out of a system for a prespecified period of time). This is accomplished by analyzing the value of datasets, through the impact of publications based upon them, over time. The analysis suggests that such temporal penalties may not be appropriate for protecting data from attack. Finally, this dissertation develops a de-identification policy discovery platform that selects high performance de-identification policies by accounting for the tradeoff between risk and utility criteria.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Xia.pdf 1.79 Mb 00:08:18 00:04:16 00:03:44 00:01:52 00:00:09

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.