Overview of the Process for Obtaining the Data
The SEER-CAHPS linked data are available to outside investigators for research purposes. Although personal identifiers for all patient and medical care providers have been removed from the SEER-CAHPS data, there remains the remote risk of re-identification (given the large amount of data available). In light of the sensitive nature of the data, maintaining patient and provider confidentiality is a primary concern of the National Cancer Institute (NCI), SEER, and the Centers for Medicare and Medicaid Services (CMS). Therefore, the SEER-CAHPS data are not public use data files. Investigators are required to obtain approval in order to obtain the data. We strongly recommend that investigators schedule a phone call with the SEER-CAHPS team after submitting their draft proposal.
The primary purpose of the approval process is not to critique the methodology or merits of proposed projects, but to ensure the confidentiality of the patients and providers in SEER areas. However, reviewers from NCI and SEER may comment and/or base review decisions on aspects of the research plan that may affect project feasibility and scientific rigor. NCI will work with investigators requesting data files to balance their research needs with those of the individuals and institutions included in the data. Multiple requests to use SEER-CAHPS data may be received and the approval process should not be understood as a guarantee to prevent overlapping research aims. Reviewers intend to be good stewards of the data and will make efforts to notify investigators when this may be the case.
For reasons of confidentiality, selected variables are not routinely released on the SEER-CAHPS files. These variables include the Health Plan ID and Contract number. Additionally, the patient's census tract identifier and zip code reported by SEER at the time of first cancer diagnosis, and the zip code at the time of the CAHPS survey have been encrypted. Separate files that contain geographically-based (ZIP code and census tract level) socioeconomic information from the 1990 and 2000 Censuses and the 2008 – 2012 American Community Survey are provided and can be matched by the encrypted patient census tract and zip code. These aggregated census variables have been slightly altered to prevent matching back to the Census data and identifying the actual census tract or zip code. Please review the Privacy and Confidentiality Issues section for more information on these variables.
Once a data request has been approved and all appropriate documents are on file, IMS (NCI's programming contractor) will provide an invoice to the investigator to cover the costs of creating the requested data files (see Cost of Acquiring SEER-CAHPS Data). In accordance with an NCI-IMS contractual agreement, IMS will begin processing data requests upon receipt of payment. IMS requires pre-payment of all invoices. Extracted files are sent in column delimited files and SAS c-port format. In order to ensure the security of the patient's information during transition of files, the data files will be encrypted to a thumb drive that is password-protected. The data files will also be compressed using the GZIP compression utility. A program will be made available to unzip the files onto the user's PC in the directory that the user specifies. The PC must be equipped with the Windows Operating system. GUNZIP is necessary to unzip the files if using a UNIX or Linux machine.
Last Updated: 14 Sep 2018