****************************************************************************; *Survey file for cancer of interest has multiple survey records per person *; *SEER file has multiple cancer records per person, one for each cancer site*; *Additional variables were added to the survey records: *; *CA_SEQ_INDX,CATEXT,CA_SITE,CA_YEAR,CA_MON,FIRSTCA,ONLYPRIM,MOSTRECTCA *; *Please refer to the data dictionary for details. *; ****************************************************************************; *********************************************************************; *A. extract SEER info for patients in the survey file with the *; * selected cancer (lung, prostate, etc.) *; *********************************************************************; Proc sort data=SEER_FILE; **replace with the actual SEER file name**; By mhos_encrypID sequence_number; Run; *delete multiple survey records, keep one record per person; Proc sort nodupkey data=survey_file out=person(keep=mhos_encrypID ca_seq_indx); By mhos_encrypID; Run; *to retrieve all SEER info for the cancer of interest**; Data get_SEER; *one record per person*; Merge person(in=in_surv) SEER_FILE(in=in_sr); By mhos_encrypID; If in_surv; If CA_SEQ_INDX = sequence_number; *CA_SEQ_INDX has the sequence_number for the selected cancer; Run; *to retrieve all cancer sites for patients in the survey file with the selected cancer**; Data get_other_ca; *multiple records per person, one for each cancer site**; Merge person(in=in_surv keep=mhos_encrypID ca_seq_indx) SEER_FILE(in=in_sr); By mhos_encrypID; If in_surv; Run; ******************************************************************* ****; *B. Combining Survey files for cancer of interest(lung, prostate, etc.)*; *NOTE: variables CATEXT, CA_SITE, FIRSTCA, ONLYPRIM, CA_SEQ_INDX, *; * CA_YEAR, CA_MON, MOSTRECTCA need to be renamed when *; * merging the individual survey files. Please see example below *; ************************************************************************; Proc sort data=lung; by mhos_encrypID srvdate srvseq; run; Proc sort data=prostate; by mhos_encrypID srvdate srvseq; run; data combined; merge lung(in=in_lung rename=(ca_seq_indx=lung_seq_indx)) prostate(in=in_prostate rename=(ca_seq_indx=prostate_seq_indx)); by mhos_encrypID srvdate srvseq; run; *********************************************************************; *C. Survey file for each cancer of interest (lung, prostate, etc.) *; *********************************************************************; **1. Keep surveys before any cancer**; If INSEER = 1; If NUMCABEF = 0; **2. Keep all surveys before the selected cancer and the cancer is the first primary (seq no. = 00 or 01)**; If INSEER = 1; If FIRSTCA=1; If NUMCABEF = 0; **3. Keep most recent or last survey before the selected cancer and the cancer is the first primary **; **a. select all surveys before the cancer as in #2**; If INSEER = 1; If FIRSTCA=1; If NUMCABEF = 0; **b. take the most recent or last survey before the cancer**; proc sort data=input_data; **replace input_data with actual dataset name**; by mhos_encrypID srvdate srvseq; run; data input_data; set input_data; by mhos_encrypID srvdate srvseq; if last.mhos_encrypID; run; **4. Keep all surveys after the selected cancer, and patient did not have any other cancer, only one primary with SEQ = “00”**; If INSEER = 1; If ONLYPRIM=1; If NUMCABEF = 1 and NUMCAAFT = 0; **5. Keep the most recent or first survey after the selected cancer**; **a. select all surveys where the selected cancer is the most recent cancer before the survey**; If INSEER = 1; If MostRectCA=1; *most recent cancer before the survey*; **b. take the most recent or first survey after the selected cancer**; proc sort data=input_data; **replace input_data with actual dataset name**; by mhos_encrypID srvdate srvseq; run; data input_data; set input_data; by mhos_encrypID srvdate srvseq; if first.mhos_encrypID; run; **6. Patient must have lived in SEER area at time of survey**; If SEERAREA = 1; **7. Delete duplicate survey records for patients in more than one cohort. It occurs when the same survey completed**; ** by a patient is used as the follow-up survey in one cohort and the baseline survey in another cohort.**; proc sort nodupkey data=input_data; **replace input_data with actual dataset name**; by mhos_encrypID srvdate; run; **8. Identify patients with surveys before and after cancer diagnosis.**; ** In this example, select patients with no cancer before their first survey,**; ** and their first cancer must = cancer of interest**; title2 "SEER patients with no cancer before the first survey and that the first cancer is the cancer of interest"; **cancer specific(ex. Breast cancer) survey level file for cohorts 1-8*; filename in pipe "gunzip -c mhos.ch1to20.requests.breast.v9x.gz"; proc cimport infile=in data=in; run; **cancer of interest is the first cancer**; data fstca; set in; where firstca=1; run; **survey records before the cancer**; data before; set fstca; if numcabef=0; run; **to get the most recent(last) survey before the cancer diagnosis**; proc sort data=before; by mhos_encrypID srvdate cohort srvtype; run; data mrc_bef; set before; by mhos_encrypID srvdate cohort srvtype; if last.mhos_encrypID; run; **survey records after the cancer**; data after; set fstca; if 0