Survey Weights
Sample Design and Weights in the CAHPS Survey data
One of the features of the SEER-CAHPS data resources is the availability of data on Medicare fee-for-service (FFS) and Medicare Advantage (MA) enrollees with and without Prescription Drug Plans (PDPs). However, each MA plan has to survey a representative sample of its insurees, so the MA population is over-sampled relative to those with FFS Medicare. In order to produce estimates that better represent the distribution of FFS and MA enrollees in the Medicare population, the SEER-CAHPS data provides two different weight variables.
Please note that use of the survey weights is not required for SEER-CAHPS analyses.
Below are the weight variables available in SEER-CAHPS:
Weight variable | MA Surveys | FFS Surveys |
---|---|---|
WGT_Simple | 1997-2019 with missing data in 2001 and 2002 | 1997-2019 |
WGT_RAKED | 1997-2019, with no weight data for 2004 | 2007-2019 |
WGT_SIMPLE is a base weight calculated to make the sample representative of the beneficiary populations in the units in the original design. All years and survey types have this type of weight. Using the base weight variable allows analyses to produce estimates that are representative of the beneficiary populations in the units of the original design. For the MA and standalone PDP sample, these units were contracts; for the FFS sample, these were states.
WGT_RAKED was constructed after using a raking weighting procedure (loglinear weights calculated by iterative proportional fitting) to weight the respondents to match the control distributions estimated from the first round sample (with base weights). In some cases small cells were collapsed with adjacent cells, to avoid extreme weights. MA and FFS 2000-2004 do not have this type of weight as the group calculating the weights was unable to get data on non-respondents from that period. Using the raked weight variable allows analyses to correct for biases arising from differential nonresponse associated with beneficiary characteristics and reduces the effects of random variation in nonresponse. Currently, raked weights are only available for respondents with surveys in 2007 or later.
Both sets of SEER-CAHPS weights described above have been calibrated to the survey populations and sub-populations. The weights take nonresponse and strata characteristics into account. The calculation algorithm ensures that variance estimates for survey responses within a subset of the data are preserved* regardless of the size and characteristics of the dataset to which that subset belongs. Thus, no further calculation of survey weights is necessary. Further, the primary sampling units (PSUs) and strata that correspond to those weights are included in the SEER-CAHPS data linkage; no additional information regarding population or subpopulation size is required. The variables used to specify PSUs, strata, and weights are listed below.
*Note: small differences in variance estimates may be observed in calculations performed using different software platforms (e.g., SAS vs. SUDAAN vs. STATA vs. R). Those differences are typically too small to meaningfully impact analytic results. However, if absolute consistency across statistical software packages is desired, it is recommended that researchers explore calculation-algorithm options within the packages being used, as default options for calculating variance estimates differ across platforms. Specifying options to be consistent across platforms may resolve those analytically minor differences in variance estimates.
To specify the sample design when analyzing the data, the following variables are suggested:
- Primary Sampling Unit: patient_ID
- Strata
- FFS without a PDP: SA_FIPS_STATE
- FFS with PDP or standalone PDP: SA_CONTRACT
- MA: SA_PLAN_ID
- Weights
- Surveys from 2007 and later: WGT_RAKED
- Surveys before 2007: WGT_SIMPLE
Additional information can be found in the yearly Medicare CAHPS reports related to weighting:
- Medicare Advantage and Prescription Drug Plan CAHPS® Survey
- MA & PDP CAHPS Individual-Level Weight Construction (PDF)
The following text may be used in describing the weights briefly in manuscripts:
“Data were weighted to represent the enrolled population of state (for FFS) or contract (for MA and PDP). For respondents in 2007 and later, weights were generated by applying a raking procedure (loglinear weights by iterative proportional fitting) to respondents to match weighted sample distributions within each contract (or state, for FFS beneficiaries) of gender, age, race/ethnicity, Medicaid and low income supplement eligibility, Special Needs Plan status, PD enrollment, and zip-code level distributions of income, education, and race/ethnicity.
“For respondents prior to 2007, weights were generated to produce estimates that are representative of the beneficiary populations in the units of the original design. For the MA and standalone PDP sample, these units were contracts; for the FFS sample, these were states.”