SEER-Medicare: Medicare Claims Files

For Medicare beneficiaries with fee-for-service coverage, claims data are available for both the cancer and non-cancer cohorts from 1991 – 2016 (see Summary Table of Available Data). Beginning October, 2015, the coding system for diagnosis codes and procedure codes was switched from ICD-9 to ICD-10. Two new variables, one for diagnosis code version and the other for procedure code version, were added to 2015 and later claims to indicate if ICD-9 or ICD-10 coding system was used.

The Medicare files provided as part of SEER-Medicare are described below and reflect input from staff at NCI and CMS.

Medicare Provider Analysis and Review (MEDPAR)

The MEDPAR file includes all Part A short stay, long stay, and skilled nursing facility (SNF) bills for each calendar year. MEDPAR contains one summarized record per admission. Each record includes up to 25 ICD-9 diagnoses and 25 ICD-9 procedures provided during the hospitalization. MEDPAR files are finalized 3 years following the close of the calendar year.

Researchers interested in only short-stay hospitalizations will need to subset the MEDPAR file using the variable 'MEDPAR short stay/long stay/skilled nursing facility (SNF) indicator code' located in column 48 ('S' = short stay, 'L' = long stay and 'N' = skilled nursing stay).

In almost all cases, a single MEDPAR record reflects a summary of all care provided during an institutional stay. However, if the stay is long, there may be more than one claim per stay. This occurs most frequently for stays in SNFs as these often span several months. SNFs records often have no discharge date as persons remain in institutions beyond the period of Medicare coverage.

Several fields on the MEDPAR file are not considered reliable:

  • source of admission;
  • discharge destination; and
  • group health organization payment code.

View or download MEDPAR documentation. (PDF, 136 KB)

[Return to top]

Carrier Claims (old file name Physician/Supplier Part B (NCH))

Since 1991, the Center for Medicare & Medicaid Services (CMS)External Web Site Policy has collected physician/supplier (Part B) bills for 100 percent of all claims. These bills, known as the National Claims History (NCH) records, are largely from physicians although the file also includes claims from other non-institutional providers such as physician assistants, clinical social workers, nurse practitioners, independent clinical laboratories, ambulance providers, and stand-alone ambulatory surgical centers. The claims are processed by carriers working under contract to CMS. Each carrier claim must include a Health Care Procedure Classification Code (HCPCS) to describe the nature of the billed service. The HCPCS is composed primarily of CPT-4 codes developed by the American Medical AssociationExternal Web Site Policy, with additional codes specific to CMS. Each HCPCS code on the carrier bill must be accompanied by an ICD-9 or ICD10 diagnosis code, providing a reason for the service. In addition each bill has the fields for the dates of service, reimbursement amount, encrypted provider numbers (e.g., UPIN), and beneficiary demographic data.

Because of the large number of carrier claims, CMS maintains the data in variable length files. IMS, NCI's programming contractor, has converted these records into fixed length files by creating a record for each service that appears as a trailer on the CMS record. As a result, there may be multiple records for the same date of service. For example, if a person saw his/her doctor for an office visit and had a chest x-ray as part of the visit, there would be two different records for the same date of service, one for the office visit and the other for the chest x-ray. The variable claim_id was created to index unique claims. The variable rec_count is a counter that enumerates each record associated with a claim, where rec_count = 1 is also the first HCPCS in the first segment of a claim. The file is sorted by patient_id, year, claim_id, and rec_count.

Carrier Claims Details:

  • Carrier claims are non-institutional claims, however this does not mean that they are outpatient claims. Providers, such as physicians, can bill for services provided in the office, hospital, or other sites. To identify where the service is provided, one needs to assess the variable "line place of service" (columns 422 – 423), which specifies the place of service.
  • There are two pairs of date fields on the carrier file. The fields "claim from" and "claim through" dates (cols 18 and 26) fields cover a period of service (usually but not always a single date of service), while the "line first expense date" and "line last expense date" (cols 426 and 434) reflect the specific day of service.
  • For every billed procedure (using a HCPCS code), there should be a corresponding ICD-9 or ICD-10 diagnosis code (often called the line item diagnosis) that provides the reason for the billed service. In the case of lab tests, the diagnosis will often be XXOOO because the outside lab has no information from the physician about the reason for the test. In addition, the carrier claim contains space for 12 diagnoses, known as "header" diagnoses. These are not necessarily linked with any of the procedures but may reflect co-existing health conditions. The accuracy of the diagnoses on the carrier data has not been determined.
  • Selected services may not appear in the carrier claims, even if they have been provided. For example, CMS pays physicians a fixed amount for surgeries, a payment practice known as bundling. As part of bundling, CMS expects that certain care will be included in the payment amount, such as the first one or two office visits following surgery or a biopsy just before surgery. Bundled services will not appear in the physician data. Interpretation of the rules on bundling varies by carrier.

View or download Carrier Claims documentation. (PDF, 76 KB)

[Return to top]

Outpatient Claims

The outpatient file contains Part B claims for 100 percent for each calendar year from institutional outpatient providers. Examples of institutional outpatient providers include hospital outpatient departments, rural health clinics, renal dialysis facilities, outpatient rehabilitation facilities, comprehensive outpatient rehabilitation facilities, community mental health centers. In and out surgeries performed in a hospital will be in the hospital outpatient file, while bills for surgeries performed in freestanding surgical centers appear in the carrier claims, not in the outpatient file.

Some of the information contained in this outpatient file includes diagnosis and procedure codes, dates of service, reimbursement amounts, facility provider number, revenue center codes and beneficiary demographic information. Although the outpatient file contains data fields for ICD-9 or ICD-10 procedure codes, the reporting of these codes has been sporadic since 2000 when CPT/HCPCS codes replaced ICD-9 procedure codes as the basis of billing procedures here. Since 2000, services from the outpatient bill have been captured from CPT/HCPCS codes and from the revenue centers. Definitions for revenue center codes may be obtained by contacting ResDAC or CMS directly.

As with the carrier data, there may be multiple records for the same date of service. Additionally, data related to each revenue center on a claim are written to a separate record. The variable claim_id was created to index unique claims. The variable rec_count is a counter that enumerates each record associated with a claim, where rec_count = 1 is also the first revenue center in the first segment of a claim. Payment amount specific to a revenue center is available beginning in 1998. The claim total charge amount and payment amount are repeated on every record that originated from the same claim. It is important, therefore, that the file is sorted by patient_id, year, claim_id, and rec_count, and that the first record in the sort (if first.claim_id;) is kept in order to eliminate duplicate charge and payment amounts from the file. Keeping the first record is also necessary to extract total charge and payment amounts, as CMS inserted "0" in these fields on records with seg_num > 1.

View or download Outpatient Claims documentation. (PDF, 86 KB)

[Return to top]

Home Health Agency (HHA)

The Home Health Agency file contains 100 percent of all claims for home health services. Some of the information contained in this file includes the number of visits, type of visit (skilled-nursing care, home health aides, physical therapy, speech therapy, occupational therapy, and medical social services), diagnosis (ICD-9 or ICD10 diagnosis), the dates of visits, reimbursement amount, HHA provider number, and beneficiary demographic information. An HHA bill may cover services provided over a period of time, not a single day. Because the claim total payment amount is repeated on every record associated with a claim, sort the file on patient_id, year, claim_id and rec_count. Keep the first record in the sort (if first claim_id).

Researchers using the Home Health Agency data need to be aware of changes over time in how CMS codes HHA services. Prior to January 1998, all HHA visits associated with a particular type of service were entered once for each month. The Revenue Center Unit Count on the claim captured the number of visits in that month. Beginning with May 1998, CMS entered each visit separately, meaning that when the variable-length record is rewritten to a fixed length file, there will be a separate record for each HHA visit. Claims from February through April of 1998 were either coded under the old or new method during these transitional months, with April showing the biggest change. In addition, in July 1999, there was a change of what is captured in the Revenue Center Unit Count field. This field now captures the number of fifteen (15) minute segments of time spent on each visit. However, it is not thought to be reliable until October 1999.

Based on these observations, we recommend that the number of visits for a specific service (revenue center code) be counted in the following way:

  • For data through April 1998, the number of visits is the sum of the Revenue Center Unit Counts.
  • For data beginning in May, 1998, number of visits is the number of fixed-length records with that revenue center code.

View or download documentation for the HHA file. (PDF, 78 KB)

Note: For the HHA, Hospice and Outpatient claims records, if more than one record from the same claim (sorted by patient_id, year, claim_id, and rec_count) is selected, be sure to keep the claim payment amount from the first record only.

[Return to top]


The Hospice file contains claims data submitted by Hospice providers. Some of the information contained in this file includes the level of hospice care received (e.g., routine home care, inpatient respite care), terminal diagnosis (ICD-9 or ICD-10 diagnosis), the dates of service, reimbursement amount, Hospice provider number, and beneficiary demographic information.

View or download documentation for the Hospice file. (PDF, 76 KB)

[Return to top]

Durable Medical Equipment (DME)

The Durable Medical Equipment (DME) contains final action claims data submitted to Durable Medical Equipment Regional Carriers (DMERCs). Some of the information contained in this file includes diagnosis, (ICD-9 or ICD-10 diagnosis), services provided (HCFA Common Procedure Coding System (HCPCS) codes), dates of service, reimbursement amount, DME provider number, and beneficiary demographic information. Claims for DME services that are processed by a carrier will be found in the NCH file. Claims for DME services that are processed by DMERCs will be found in the DME file. For example, claims for oral equivalents of IV chemotherapies will be found in the DME file.

View or download documentation for the DME file. (PDF, 70 KB)

[Return to top]

Medicare Part D Data

Since July 2006, when Medicare coverage was expanded to include prescription drugs under Medicare Part D, approximately 60% of Medicare beneficiaries have enrolled in Part D, either because they have opted to pay the Part D premium out of pocket or their premiums are paid for them, such as for low-income persons receiving medical assistance from their state. The Part D Enrollment file (PDF, 50 KB) includes variables for each year beginning with 2007 to indicate which Medicare beneficiaries are enrolled in Part D and the dates of coverage. Information about drug utilization is obtained from the Prescription Drug Event file (PDE) (PDF, 37 KB) .

[Return to top]

Chronic Condition Flags

The Chronic Condition Flags file includes yearly, mid-year, and ever flags to indicate the presence or absence of 27 conditions, based on Medicare services provided beginning in 1999. This file is analogous to the CMS Chronic Conditions Data Warehouse (CCW) Chronic Condition segment of the Master Beneficiary Summary File (MBSF). As a proxy of evidence for the presence of a condition, these flags are determined based on the presence of treatment for the conditions using claims-based algorithmsExternal Web Site Policy that were created by CMS. Because the flags are determined using claims data it is not possible to ascertain the information for beneficiaries enrolled in managed care/HMOs. This limitation also applies to newly-eligible Medicare beneficiaries who may have only a partial year of FFS coverage. Thus, in order for the flag to indicate the presence of a condition, the claims for the beneficiary must indicate treatment for that condition and the beneficiary must also have had continuous fee-for-service (FFS), Part A and B coverage during the specified time period. It is important to note that the major objective for creating the flags was to allow for a quick, initial identification and extraction of beneficiaries with a given condition from the larger Medicare population. The flag definitions were intended to be broad, so that researchers could extract the data based on the flag definitions and then refine their specifications as needed for their specific analyses. The condition definitions were not intended to calculate population statistics.

View or download Chronic Condition Flags documentation (PDF, 207 KB) .

[Return to top]

Changes in the 2018 Linkage

Many variable names in the claims file for the 2018 SEER/Medicare Linkage have changed. Also the order of variables in these files has changed from previous linkages. This affects the DME, Home Health Agency (HHA), Hospice, Outpatient, MEDPAR and Carrier (NCH) claims files. These changes were made to align the SEER/Medicare data more closely to the file descriptions available from CMS. To see a list of variables that have new names, see the Variable Name Changes (XLSX, 25 KB) spreadsheet. The spreadsheet includes separate pages for each claim type. These sheets list the New SAS variable name, Variable label description, Variable number from the CMS version K documentation, Old SAS variable name (SEER/Medicare previous linkages) and Output length of the variable. Variables with the same name in the 2018 linkage as used in previous linkages are not listed. Variables deleted and new variables added to the claims files are listed in the Data File Changes in the 2018 Linkage (PDF, 202 KB) document.

[Return to top]


The appendix file contains information on variables like BIC, revenue center code, state and other variables which are found in the claims data.

View or download the appendix file. (PDF, 706 KB)

[Return to top]