Overview of the Process for Obtaining the Data

The SEER-MHOS data are available to outside investigators for research purposes. Although personal identifiers for all patient and medical care providers have been removed from the SEER-MHOS data, there remains the remote risk of re-identification (given the large amount of data available). In light of the sensitive nature of the data, maintaining patient, hospital and health plan confidentiality is a primary concern of the National Cancer Institute (NCI), SEER, and the Centers for Medicare & Medicaid Services (CMS). Therefore, the SEER-MHOS data are not public use data files. Investigators are required to obtain approval in order to obtain the data.

SEER-MHOS Application Process

Application process flow image

The primary purpose of the approval process is not to critique the methodology or merits of proposed projects, but to ensure the confidentiality of the patients and providers in SEER areas. Reviewers from NCI and SEER may comment, however, on aspects of the research plan that may affect project feasibility and scientific rigor. NCI will work with investigators requesting data files to balance their research needs with those of the individuals and institutions included in the data. Multiple requests to use SEER-MHOS data may be received and the approval process should not be understood as a guarantee preventing overlapping research aims. Reviewers intend to be good stewards of the data and will make efforts to notify investigators when this may be the case.

For reasons of confidentiality, selected variables are not routinely released on the SEER-MHOS files. These variables include the Managed Care Plan ID and Contract number. Additionally, the patient's Census tract identifier and ZIP code reported by SEER at the time of first cancer diagnosis, and the ZIP code at the time of the MHOS survey have been encrypted. Separate files that contain geographically-based (ZIP code and census tract level) socioeconomic information from the 1990 and 2000 Censuses and the 2008 – 2012 American Community Survey are provided and can be matched by the encrypted patient census tract and zip code. These aggregated Census variables have been slightly altered to prevent matching back to the Census data and identifying the actual Census tract or ZIP code. Please review the Privacy and Confidentiality Issues section for more information on these variables.

Once a data request has been approved and all appropriate documents are on file, IMS (NCI's programming contractor) will provide an invoice to the investigator to cover the costs of creating the requested data files (see Cost of Acquiring SEER-MHOS Data). In accordance with an NCI-IMS contractual agreement, IMS will begin processing data requests upon receipt of payment. IMS requires pre-payment of all invoices. Extracted files are sent in SAS Cport format. In order to ensure the security of the patient's information during transition of files, the data files will be encrypted using WinZip (256bit AES encryption) and password-protected. The data files will also be compressed using the GZIP compression utility. Programs such as 7-Zip and WinZip are available to unzip the compressed files onto the user's PC in the directory that the user specifies. The PC must be equipped with the Windows Operating system. GUNZIP is necessary to unzip the files if using a UNIX or Linux machine.

Last Updated: 05 Oct, 2023