United States Department of Veterans Affairs
United States Department of Veterans Affairs

VA Information Resource Center
virec logo

Data Quality: Race and Ethnicity Information in Medical SAS Datasets

    Data Quality
    right arrowData Quality Alert addresses VA race/ethnicity and mortality data
    right arrowLab Specimen Source in DSS Laboratory Data

right arrowRace and Ethnicity Information in Medical SAS Datasets



Variables discussed on this page:
Variable Name MedSAS Dataset Years Available
RACE Inpatient (PTF Main File) FY1970 - present
Outpatient (Visit File) FY1997 - present
Outpatient (Event File) FY1998 - present
RACE1-RACE6 Inpatient (PTF Main) FY2003 - present
RACE1-RACE7 Outpatient (Visit, Event) FY2004 - present
ETHNIC Inpatient (PTF Main) FY2003 - present
Outpatient (Visit, Event) FY2004 - present

Issues regarding VHA race and ethnicity data quality include missing values and inconsistency across time due to changes in data collection method and in allowable response categories and format.

Responding to the revision of OMB Directive 15, the VHA adopted a new standard for race and ethnicity data collection in FY2003 (VHA Directive 2003-027). Under this policy, self-identification of race and ethnicity is preferred, multiple race reporting is allowed, and ethnicity data must be collected separately from race.

Prior to FY2003, race and ethnicity information was most frequently extracted from clinical documentation and/or observation of administrative staff. In the databases, there was just one race field and there was no indicator of the method of data collection. Since FY2003, self-identification of race and ethnicity is the preferred method of data collection. Up to seven races can be recorded at a time and there is a separate field for ethnicity. In addition, the race categories have changed and the values of the race variables are coded to incorporate the means of data collection (e.g., white-observer, white-proxy, white-self-identification).

Missing values on race variables are not uncommon. Following the implementation of the new standard, the frequency of missing values has increased (Sohn, et al., 2006). Researchers are advised to supplement data on race in the Medical SAS Datasets starting in FY2003 with data on race from VA datasets prior to FY 2003 and/or from Medicare data. VIReC is currently conducting an investigation of methods to fill in missing race information, including back filling race and ethnicity values from prior years and obtaining race values from other data, such as Medicare.  For more information on supplementing race data, please see the VIReC Data Issues Brief for March 2004.PDF file

Sohn, M.W., Zhang, H., Arnold, N., Stroupe, K., Taylor, B.C., Wilt, T.J., Hynes, D.M. (2006). Transition to the new race/ethnicity data collection standards in the Department of Veterans Affairs. Population Health Metrics, 4(7). Available at: http://www.pophealthmetrics.com/content/4/1/7.