This task reviews how to recode variables based so they are appropriate for your analytic needs and how to check your derived variables.
Recoding is an important step for preparing an analytical dataset. In this step, you will view programs that recode variables using different techniques for each of the scenarios listed on the Clean & Recode Data: Key Concepts about Recoding Variables in NHANES page. In the summary table below, each statement required for recoding is listed on the left with explanations on the right.
Statements | Explanation | |
---|---|---|
data demo_BP3; set demo_BP2b; |
Use the data and set statements to refer to your analytic dataset. | |
if ridreth1=3 then raceth=1; /*Non-Hispanic White*/ else if ridreth1=4 then raceth=2; /*Non-Hispanic Black*/ else if ridreth1=1 then raceth=3; /*Mexican American*/ else raceth=4;/*Other* |
Use the if, then, and else statements to create a new, derived variable (e.g., raceth) based on re-grouping the ridreth1 values.
|
|
if
(20
<= ridageyr <=
39)
then age3cat=1; else if (40 <= ridageyr <= 59) then age3cat=2; else if ridageyr >= 60 then age3cat=3; |
Use the if, then, and else statements statement to create an age categorical variable (age3cat) from a continuous variable. | |
n_sbp = n(of bpxsy1-bpxsy4); |
Use these function statements to count the number of systolic and diastolic blood pressure readings. Then use the array statement (where _DBP is the name of the array) to set any diastolic blood pressure readings of "0" to missing, so that a reading of "0" does not affect the blood pressure means. | |
mean_sbp = mean(of bpxsy1-bpxsy4); mean_dbp = mean(of bpxdi1-bpxdi4); |
Use these function statements to calculate mean systolic and diastolic blood pressures. | |
if BPQ050a=1 then HBP_trt=1; else if BPQ020 in (1,2) and BPQ050a < 7 then HBP_trt=0;
if
n_sbp>0
and n_dbp>0
then
do; |
Use the if, then, and else statements to define a new variable, hbp (high blood pressure = 1 or 0), based on a series of conditions that indicate hypertension from the questionnaire and examination variables. | |
if
BPQ100d=1
then
HLP_trt=1;
end; |
Use the if, then, and else statements to define a new variable, hlp (hyperlipidemia = 1 or 0), based on a series of conditions that indicate high lipid levels from the questionnaire and examination variables. |
In this step, you will check to confirm that derived and recoded variables correctly correspond to the original variables.
Statements | Explanation |
---|---|
proc
freq
data=demo_BP3; |
Use the proc freq procedure to create a cross tabulation of the original categorical variables for race/ethnicity, high blood pressure and hyperlipidemia by their respective recoded variables. Use the where statement to select the participants who were interviewed and examined in the MEC and who were age 20 years and older. |
proc
means
data=demo_BP3
N
min
max;
proc
means
data=demo_BP3
N
min
max;
proc
means
data=demo_BP3
N
min
max;
proc
means
data=demo_BP3
N
min
max;
|
Use the proc means procedure to calculate the mean, minimum, and maximum values for the original continuous variables. Use the where statement to select the participants who were interviewed and examined in the MEC and who were age 20 years and older. The class statement will separate the original continuous variable into categories of the derived variables. This is done to check that coding of the derived variable, based on cut-off points of the continuous variable, is correct. |
Highlighted items comparing recoded or derived variables to original variables: