Survey Weights

Sample Design and Weights in the CAHPS Survey data

One of the features of the SEER-CAHPS data resources is the availability of data on Medicare fee-for-service (FFS) and Medicare Advantage (MA) enrollees with and without Prescription Drug Plans (PDPs). However, each MA plan has to survey a representative sample of its insurees, so the MA population is over-sampled relative to those with FFS Medicare. In order to produce estimates that better represent the distribution of FFS and MA enrollees in the Medicare population, the SEER-CAHPS data provides two different weight variables.

Please note that use of the survey weights is not required for SEER-CAHPS analyses.

Below are the weight variables available in SEER-CAHPS:

Weight variable MA Surveys FFS Surveys
WGT_Simple 1997-2019 with missing data in 2001 and 2002 1997-2019
WGT_RAKED 1997-2019, with no weight data for 2004 2007-2019

WGT_SIMPLE is a base weight calculated to make the sample representative of the beneficiary populations in the units in the original design. All years and survey types have this type of weight. Using the base weight variable allows analyses to produce estimates that are representative of the beneficiary populations in the units of the original design. For the MA and standalone PDP sample, these units were contracts; for the FFS sample, these were states.

WGT_RAKED was constructed after using a raking weighting procedure (loglinear weights calculated by iterative proportional fitting) to weight the respondents to match the control distributions estimated from the first round sample (with base weights). In some cases small cells were collapsed with adjacent cells, to avoid extreme weights. This corrected for biases due to differential non-response associated with beneficiary characteristics as well as reducing the effects of random variation in non-response. MA and FFS 2000-2004 does not have this type of weight as the group calculating the weights were unable to get data on non-respondents from that period. Using the raked weight variable allows analyses to correct for biases arising from differential nonresponse associated with beneficiary characteristics as well as reducing the effects of random variation in nonresponse. Currently, raked weights are only available for respondents with surveys in 2007 or later.

To specify the sample design when analyzing the data, the following variables are suggested:

  • Primary Sampling Unit: PHIC
  • Strata
    • FFS without a PDP: SA_FIPS_STATE
    • FFS with PDP or standalone PDP: SA_CONTRACT
    • MA: SA_PLAN_ID
  • Weights
    • Surveys from 2007 and later: WGT_RAKED
    • Surveys before 2007: WGT_SIMPLE

Additional information can be found in the yearly Medicare CAHPS reports related to weighting:

The following text may be used in describing the weights briefly in manuscripts:

“Data were weighted to represent the enrolled population of state (for FFS) or contract (for MA and PDP). For respondents in 2007 and later, weights were generated by applying a raking procedure (loglinear weights by iterative proportional fitting) to respondents to match weighted sample distributions within each contract (or state, for FFS beneficiaries) of gender, age, race/ethnicity, Medicaid and low income supplement eligibility, Special Needs Plan status, PD enrollment, and zip-code level distributions of income, education, and race/ethnicity.

“For respondents prior to 2007, weights were generated to produce estimates that are representative of the beneficiary populations in the units of the original design. For the MA and standalone PDP sample, these units were contracts; for the FFS sample, these were states.”

Last Updated: 15 Jun, 2022