Obtaining the Data

The SEER-CAHPS linked data are available to outside investigators for research purposes. Although personal identifiers for all patient and medical care providers have been removed from the SEER-CAHPS data, there remains the remote risk of re-identification (given the large amount of data available). In light of the sensitive nature of the data, maintaining patient and provider confidentiality is a primary concern of the National Cancer Institute (NCI), SEER, and the Centers for Medicare & Medicaid Services (CMS). Therefore, the SEER-CAHPS data are not public use data files. Investigators are required to obtain approval in order to obtain the data. We strongly recommend that investigators schedule a phone call with the SEER-CAHPS team prior to submitting their draft proposal.

Representatives from NCI and CMS will be responsible for reviewing each proposal. SEER PI review may be necessary after NCI/CMS review. The review and approval process generally takes 4-6 weeks from receipt of completed proposals. This is an iterative process with multiple steps, as shown in the figure below.

SEER-CAHPS Application Process

Graphic titled: SEER-CAHPS Application Process

[D]

Purpose of Review Process

The primary purpose of the approval process is not to critique the methodology or merits of proposed projects, but to ensure the confidentiality of the patients and providers in SEER areas. However, reviewers from NCI and SEER may comment and/or base review decisions on aspects of the research plan that may affect project feasibility and scientific rigor. NCI will work with investigators requesting data files to balance their research needs with those of the individuals and institutions included in the data. Multiple requests to use SEER-CAHPS data may be received and the approval process should not be understood as a guarantee to prevent overlapping research aims. Reviewers intend to be good stewards of the data and will make efforts to notify investigators when this may be the case.

Requests for SEER-CAHPS data will generally be approved unless one of the following conditions occurs:

  • The proposed research involves data that may compromise the privacy or confidentiality of patients, providers, or institutions.
  • The central purpose of the study is not cancer-related research.
  • The study does not require CAHPS survey data.
  • The SEER-CAHPS data are not of sufficient quality or completeness to provide accurate data to address a specific research question or aim.
  • The data processing necessary to produce the requested files places an unusually heavy burden on data processing staff.

If the reviewing agencies have concerns about confidentiality arising from the project, investigators may revise their proposal. However, if there are ongoing concerns about confidentiality, SEER-CAHPS data will not be released, regardless of whether an investigator has already been funded by another agency or organization to conduct an analysis using the data. Therefore, investigators planning to use the SEER-CAHPS data as part of a grant are encouraged to obtain approval for release of the data before submitting their grant proposal.

Specific Confidential SEER-CAHPS variables

For reasons of confidentiality, selected variables are not routinely released on the SEER-CAHPS files. These variables include the Health Plan ID and Contract number. Additionally, the patient's census tract identifier and zip code reported by SEER at the time of first cancer diagnosis, and the zip code at the time of the CAHPS survey have been encrypted. Separate files that contain geographically-based (ZIP code and census tract level) socioeconomic information from the 1990 and 2000 Censuses and the 2008 – 2012 American Community Survey are provided and can be matched by the encrypted patient census tract and zip code. These aggregated census variables have been slightly altered to prevent matching back to the Census data and identifying the actual census tract or zip code.

Please review the Privacy and Confidentiality Issues section for more information on these variables.

Process for Obtaining Data after Approval

Once a data request has been approved and all appropriate documents are on file, IMS (NCI's programming contractor) will provide an invoice to the investigator to cover the costs of creating the requested data files (see Cost of Acquiring SEER-CAHPS Data). In accordance with an NCI-IMS contractual agreement, IMS will begin processing data requests upon receipt of payment. IMS requires pre-payment of all invoices. Extracted files are sent in column delimited files and SAS c-port format. In order to ensure the security of the patient's information during transition of files, the data files will be encrypted to a thumb drive that is password protected. The data files will also be compressed using the GZIP compression utility. Programs such as 7-Zip and WinZip are available to unzip the compressed files onto the user's PC in the directory that the user specifies. The PC must be equipped with the Windows Operating system. GUNZIP is necessary to unzip the files if using a UNIX or Linux machine.

Please review Cost of Acquiring SEER-CAHPS Data for more information on the cost of creating the requesting data files.

Last Updated: 19 Apr, 2024