Publication Abstract

Authors: Lu M, Rupp LB, Moorman AC, Li J, Zhang T, Lamerato LE, Holmberg SD, Spradling PR, Teshale EH, Vijayadeva V, Boscarino JA, Schmidt MA, Nerenz DR, Gordon SC

Title: Comparative effectiveness research of chronic hepatitis B and C cohort study (CHeCS): improving data collection and cohort identification.

Journal: Dig Dis Sci 59(12):3053-61

Date: 2014 Dec

Abstract: BACKGROUND AND AIMS: The Chronic Hepatitis Cohort Study (CHeCS) is a longitudinal observational study of risks and benefits of treatments and care in patients with chronic hepatitis B (HBV) and C (HCV) infection from four US health systems. We hypothesized that comparative effectiveness methods-including a centralized data management system and an adaptive approach for cohort selection-would improve cohort selection while controlling data quality and reducing the cost. METHODS: Cohort selection and data collection were performed primarily via the electronic health record (EHR); cases were confirmed via chart abstraction. Two parallel sources fed data to a centralized data management system: direct EHR data collection with common data elements, and chart abstraction via electronic data capture. An adaptive Classification and Regression Tree (CART) identified a set of electronic variables to improve case ascertainment accuracy. RESULTS: Over 16 million patient records were collected on 23 case report forms in 2006-2008. The vast majority of data (99.2%) were collected electronically from EHR; only 0.8% was collected via chart abstraction. Initial electronic criteria identified 12,144 chronic hepatitis patients; 10,098 were confirmed via chart abstraction with positive predictive values (PPV) 79 and 83% for HBV and HCV, respectively. CART-optimized models significantly increased PPV to 88 for HBV and 95% for HCV. CONCLUSIONS: CHeCS is a comparative effectiveness research project that leverages electronic centralized data collection and adaptive cohort identification approaches to enhance study efficiency. The adaptive CART model significantly improved the positive predictive value of cohort identification methods.