A joint project of the Graduate School, Peabody College, and the Jean & Alexander Heard Library

Title page for ETD etd-09062017-100115

Type of Document Dissertation
Author Mercaldo, Nathaniel David
URN etd-09062017-100115
Title Design and Analysis Considerations for Complex Longitudinal and Survey Sampling Studies
Degree PhD
Department Biostatistics
Advisory Committee
Advisor Name Title
Frank E. Harrell Jr. Committee Chair
Bryan E. Shepherd Committee Member
Jonathan S. Schildcrout Committee Member
Loren P. Lipworth Committee Member
Matthew S. Shotwell Committee Member
  • efficient study designs
  • outcome-dependent sampling
  • survey analysis
  • misclassification
  • electronic health records
  • R software
Date of Defense 2017-08-23
Availability unrestricted
Pre-existing cohort data (e.g., electronic health records) are being increasingly available, and the need for novel and efficient uses of these data is paramount due to resource constraints. This dissertation consists of three chapters relating to the design and analysis of longitudinal and survey sampling studies when utilizing these types of data.

In chapter one, we extend outcome-dependent sampling (ODS) designs for longitudinal binary data to permit data collection in two stages. We consider two subclasses of designs: fixed designs where the designs at each stage are pre-specified, and adaptive designs that utilize stage one data to improve design choice at stage two. We demonstrate that data from both stages can be aggregated to generate valid parameter estimates using ascertainment-corrected maximum likelihood methods. Efficiency gains are observed compared to random sampling, and in certain situations, single-stage ODS sampling designs.

In chapter two, we investigate the effects of utilizing an imperfect sampling frame on the design, and analysis of complex survey data. We explore the impact of stratum misclassification on the choice of study design, on the operating characteristics of survey estimators, and on the appropriateness of two common approaches to survey design analysis. Stratified sampling is recommended over random sampling if interest lies in making inferential statements regarding rare subgroups. In the presence of misclassification, the relative efficiency depends on the subgroup prevalence, and analytic methods that account for the design are still required for valid inferences.

In chapter three, we introduce the MMLB R package which is used to estimate parameters from marginalized regression models for longitudinal binary data. These models are described, and estimation procedures outlined under random, and ODS schemes. We provide examples to demonstrate how to fit these models, and how data may be generated under a pre-specified marginal mean model.

We hope these chapters provide specific and general insights that will improve our ability to conduct efficient research studies under resource constraints.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  mercaldo.pdf 943.12 Kb 00:04:21 00:02:14 00:01:57 00:00:58 00:00:05

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LITS.