This study developed and evaluated a method for ascertaining a newly diagnosed breast cancer case using multiple sources of data from the Medicare claims system. Predictors of an incident case were operationally defined as codes for breast cancer-related diagnoses and procedures from hospital inpatient, hospital outpatient, and physician claims. The optimal combination of predictors was then determined from a logistic regression model using 1992 data from the linked SEER registries-Medicare claims data base and a sample of noncancer controls drawn from the SEER areas. While the ROC curve demonstrates that the model can produce levels of sensitivity and specificity above 90%, the positive predictive value is comparatively low (67-70%). This low predictive value is largely the result of the model's limitation in distinguishing recurrent and secondary malignancies from incident cases and possibly from the model identifying true incident cases not identified by SEER. Nevertheless, the logistic regression approach is a useful method for ascertaining incident cases because it allows for greater flexibility in changing the performance characteristics by selecting different cut-points depending on the application (e.g., high sensitivity for registry validation, high specificity for outcomes research). It also allows us to make specific adjustments to population based estimates of breast cancer incidence with claims. Copyright (C) 2000 Elsevier Science Inc.
- Breast neoplasms
- Sensitivity and specificity
ASJC Scopus subject areas
- Public Health, Environmental and Occupational Health