Publication Abstract

Authors: Miglioretti DL, Ichikawa L, Smith RA, Bassett LW, Feig SA, Monsees B, Parikh JR, Rosenberg RD, Sickles EA, Carney PA

Title: Criteria for identifying radiologists with acceptable screening mammography interpretive performance on basis of multiple performance measures.

Journal: AJR Am J Roentgenol 204(4):W486-91

Date: 2015 Apr

Abstract: OBJECTIVE: Using a combination of performance measures, we updated previously proposed criteria for identifying physicians whose performance interpreting screening mammography may indicate suboptimal interpretation skills. MATERIALS AND METHODS: In this study, six expert breast imagers used a method based on the Angoff approach to update criteria for acceptable mammography performance on the basis of two sets of combined performance measures: set 1, sensitivity and specificity for facilities with complete capture of false-negative cancers; and set 2, cancer detection rate (CDR), recall rate, and positive predictive value of a recall (PPV1) for facilities that cannot capture false-negative cancers but have reliable cancer follow-up information for positive mammography results. Decisions were informed by normative data from the Breast Cancer Surveillance Consortium (BCSC). RESULTS: Updated combined ranges for acceptable sensitivity and specificity of screening mammography are sensitivity≥80% and specificity≥85% or sensitivity 75-79% and specificity 88-97%. Updated ranges for CDR, recall rate, and PPV1 are: CDR≥6 per 1000, recall rate 3-20%, and any PPV1; CDR 4-6 per 1000, recall rate 3-15%, and PPV1≥3%; or CDR 2.5-4.0 per 1000, recall rate 5-12%, and PPV1 3-8%. Using the original criteria, 51% of BCSC radiologists had acceptable sensitivity and specificity; 40% had acceptable CDR, recall rate, and PPV1. Using the combined criteria, 69% had acceptable sensitivity and specificity and 62% had acceptable CDR, recall rate, and PPV1. CONCLUSION: The combined criteria improve previous criteria by considering the interrelationships of multiple performance measures and broaden the acceptable performance ranges compared with previous criteria based on individual measures.