A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-03262012-144837

Type of Document Dissertation
Author Durham, Elizabeth Ashley
Author's Email Address ea.durham@vanderbilt.edu
URN etd-03262012-144837
Title A framework for accurate, efficient private record linkage
Degree PhD
Department Biomedical Informatics
Advisory Committee
Advisor Name Title
Bradley Malin Committee Chair
Dario Giuse Committee Member
Mark Frisse Committee Member
Murat Kantarcioglu Committee Member
Yuan Xue Committee Member
  • private record linkage
  • data sharing
  • entity resolution
Date of Defense 2012-03-12
Availability unrestricted
Record linkage is the task of identifying records from multiple data sources that refer to the same individual. Private record linkage (PRL) is a variant of the task in which data holders wish to perform linkage without revealing identifiers associated with the records. PRL is desirable in various domains, including health care, where it may not be possible to reveal an individual’s identity due to confidentiality requirements. In medicine, PRL can be applied when datasets from multiple care providers are aggregated for biomedical research, thus enriching data quality by reducing duplicate and fragmented information. Additionally, PRL has the potential to improve patient care and minimize the costs associated with replicated services, by bringing together all of a patient’s information.

This dissertation is the first to address the entire life cycle of PRL and introduces a framework for its design and application in practice. Additionally, it addresses how PRL relates to policies that govern the use of medical data, such as the HIPAA Privacy Rule. To accomplish these goals, the framework addresses three crucial and competing aspects of PRL: 1) computational complexity, 2) accuracy, and 3) security. As such, this dissertation is divided into several parts. First, the dissertation begins with an evaluation of current approaches for encoding data for PRL and identifies a Bloom filter-based approach that provides a good balance of these competing aspects. However, such encodings may reveal information when subject to cryptanalysis and so, second, the dissertation presents a refinement of the encoding strategy to mitigate vulnerability without sacrificing linkage accuracy. Third, this dissertation introduces a method to significantly reduce the number of record pair comparisons required, and thus computational complexity, for PRL via the application of locality-sensitive hash functions. Finally, this dissertation reports on an extensive evaluation of the combined application of these methods with real datasets, which illustrates that they outperform existing approaches.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  dissertation.pdf 2.90 Mb 00:13:25 00:06:54 00:06:02 00:03:01 00:00:15

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.