2004: Findings from the National Sample Survey of Registered Nurses

HRSA - U.S Department of Health and Human Services, Health Resources and Service Administration

U.S. Department of Health and Human Services

The Registered Nurse Population: Findings from the 2004 National Sample Survey of Registered Nurses

Appendix B. Survey Methodology

The eighth cycle of the National Sample Survey of Registered Nurses (NSSRN) followed the same basic sample design as its predecessors. The sample design was originally developed by Westat, Inc. under a contract with the Division of Nursing, BHPr, HRSA in 1975-76 and can be best described as a systematic sample of alphabetic clusters of names in each State using a ‘nested alpha segment design’. Prior to sampling, each State was ranked by the sampling rate such that the highest priority States were those with the highest sampling rate (for the most part, small States). As a result, the alphabetic clusters of names for lower priority States are ‘nested’, or included, within those of higher priority States. This means that a sample name selected in one State (such as California) will also have been selected in every State with a higher priority (in the case of California, this is all other States).

This design approach takes into account two key characteristics of the sampling frame. First, no single list of all individuals with licenses to practice as registered nurses in the United States exists, although lists of those who have licenses in any one State are available. Second, a nurse may be licensed in more than one State. The advantage of the nested alpha-segment design is that one can determine the probabilities of selection and appropriate multiplicity adjusted weights for those nurses that are listed in more than one State. In addition, the design also permits the use of each sample registered nurses’ data for State estimates of each of her/his States of licensure.

This appendix provides a brief summary of the methodology of the NSSRN including the sampling frame, sample design and the statistical techniques used in summarizing the data. It also includes a discussion of sampling errors, provides the standard errors for key variables in the study and presents a simplified methodology for estimating standard errors.

Sampling Frame

The target population for the eighth NSSRN included all registered nurses with an active license in the United States as of March 2004. A sampling frame was required to select a probability sample of nurses from which valid inferences could be made to the target population. The sampling frame for the eighth NSSRN consisted of all registered nurses who are currently eligible to practice as an RN in the U.S. This sampling frame included RNs who have received a specialty license or have been certified by a State agency as an advanced practiced nurse (APNs) such as nurse practitioner, certified nurse midwives, certified registered nurse anesthetist, or clinical nursing specialist and excluded licensed practical nurses (LPNs)/licensed vocational nurses (LVNs).

State Boards of Nursing in the 50 States and in the District of Columbia (hereafter also referred to as a State) provided files containing the name, address, and license number of every RN currently holding an active license in that State. These files formed the basis of the sampling frame from which the RNs for each State were selected. The licensure files provided by the States were submitted on diskette or compact disk (twenty States), or electronically as an attachment to an e-mail message (twenty-seven States). Three States sent the data via FTP and another provided the data on their website. For this study, States were also asked to identify nurses for whom the State provided advanced practice nurse (APN) status. In some cases, the State identified these nurses on the basic list provided. However, some APNs were identified on separate lists and their APN status was appended to the information on the RN sampling frame.

Each of the 51 State files was checked for consistency, names were standardized, and duplicates and ineligible records were removed from the State list to prepare the list for sampling.

Sample Design

The NSSRN 2004, the eighth in the series, continued to oversample nurses in small States in order to better support HRSA’s National Center for Health Workforce Analysis’ State level supply and demand projections for registered nurses. The basic design was enhanced by using sample design optimization methodology developed by Chromy ^[1] to determine the sample allocation to the States that would simultaneously satisfy variance constraints defined by the 51 States and the total U.S.

In the original sample design, and in the 1988 redesign, the universe of RNs was sorted alphabetically by last name and approximately equal-sized clusters of RNs were constructed by partitioning the alphabetically ordered list into 250 alpha-segment clusters with equal (or nearly equal) numbers of RNs. An alpha-segment was defined as all alphabetically adjacent names falling within pre-specified boundaries. For example, all names beginning with the lower boundary, up to but not including the name that defined the upper boundary.

From the frame of 250 equally divided alpha-segments, a total of 40 alpha-segments were randomly selected, representing a 16 percent sampling rate overall. Registered nurses are selected in the sample based on their name, with an RN being included in the sample if the name of licensure falls into one of the alphabetic segments that are in sample for that State.

Although each State had 40 sample segments, the sample size of each State differed in size depending on the State’s sampling rate. While uniform-sampling rates would have produced the best national estimates, the resulting sample sizes for the smallest States would have been inadequate to support State-level estimates. Since both national and State-level estimates are required for the 2004 NSSRN, as was done is prior surveys, sampling rates were increased in the smaller States to obtain larger State-level sample sizes. While this disproportionate sampling improved the precision of estimates in the smaller States, it also reduced precision of national estimates due to unequal weighting effects. .

To accommodate the differing State sampling rates, a planned variation in the size of the segments, i.e., “portions of alpha segments” was used. Each of the 40 alpha-segments selected for sample was divided into ½-, ¼-, 1/8-, 1/16-, and 1/32- portions. These fractions indicate the size of the alpha segment portion relative to the size of the basic alpha-segment.

The sampling rate for a particular State was achieved using a combination of the alpha-segment portions. As a result, each State contains some sample (i.e., a portion) from each of the 40 alpha-segments, depending on the sampling rate for the State. For example, selecting the entire 40 complete alpha segments on a State list is expected to constitute a 16 percent sampling rate (40 ÷ 250 = 0.16) in the State. This is because each alpha segment contained an expected 0.4% of the State’s RN names (40 X 0.4 percent = 16 percent). Likewise, the sample for a State with an 8 percent sampling rate consisted of the 40 ½ portion selections. Several sampling rates use a combination of portions for each alpha-segment in sample (rather than one fractional portion for all alpha-segments). For example, a 5 percent sampling rate was achieved by first randomly dividing the 40 alpha-segments into two groups, the first containing 30 alpha-segments and the other containing 10; and by using the ¼ portions from the first group and the ½ portions from the second group (0.4 percent x [(30 x ¼) + (10 x ½)] = 5 percent).

To identify and account for nurses appearing in more than one of the 51 State lists, the portions were constructed such that each portion was “nested” (or included) in the boundaries of the larger portion. As a result, the alpha segment clusters from the States with lower sampling rates (typically larger States) were automatically included in the alpha segment clusters selected from the States with higher sampling rates (typically smaller States).

As a result, a RN who was licensed under the same name in two States with identical sampling rates was selected (or not selected) for both States, since the alphabetic name boundaries defining the portions are the same for both States. However, if the RN was licensed under the same name in two States that are sampled with different sampling rates, then, if the RN was sampled in the State with a lower sampling rate, they were also included in the sample for the State with the higher sampling rate (as the alphabetic name boundaries defining the portions for the State with the lower sampling rate are nested within those of the State with the higher sampling rate). This nesting property of the sample design maximizes the chances that the RN will be selected in all States that they have an active license in. A nurse that is licensed in two or more States under the same name will have a probability of selection corresponding to the State with the highest sampling rate.

Sample design optimization techniques developed by Chromy (1996) were used to determine how to allocate the sample of 54,000 RNs to the 51 State lists. This sample size was then converted to a sampling rate, and the rate was rounded to one of the admissible rates for the nesting design. For example, the original rate for the State of Washington was 1.59%, the closest admissible rate was 1.5%. Rates were rounded down only such that the change in sampling rate still left their effective sample size at or above the 1996 NSSRN level.

After determination of frame sizes and expected sampling rates, the States were assigned a priority order to properly determine selection probabilities for nurses appearing on more than one of the 51 State lists. Traditionally, States were ordered by size, with larger States having lower sampling rates and smaller States having higher sampling rates. However, as in the 2000 NSSRN, States were priority ordered based on their sampling rate. As such, it is mostly, but not necessarily, the case that States with larger RN populations had lower sampling rates.

Essentially the same procedure was followed for sample selection for all States. Once a State provided a licensure file containing all appropriate names of individuals with active RN licenses and meeting all specifications, the required sample names in that file were selected. Regardless of the way a State alphabetized and standardized the names in its files, the sample names were selected according to the standards established by the survey design. That is, sample selections ignored blanks and punctuation in the last names (except a dash in hyphenated names) and ignored titles (e.g.,”Sister”).

Registered nurses were selected in the sample on the basis of name, with an RN being included in the sample if the name of licensure fell within a specific alpha-segment portion as defined by the State sampling rate. In other words, the sample for a given State consisted of all RN names falling into any one of the State’s pre-designated 40 alphabetic portions that corresponded to the State sampling rate (one portion from each of the complete 40 alpha-segments in sample).

The pairs of names that defined the alpha-segment portion constituted the lower and upper boundaries corresponding to the sampling rate. Thus, the membership of the alpha-segment portion was defined by all names, beginning with the lower boundary (i.e., the last name in alphabetical order of all the names included in that segment), up to but not including a name that defined the upper boundary. This latter name fell into the next alpha-segment. As was done in the NSSRN 2000, any deviations of more than 8 percent were candidates for either an increased or decreased rate.

Because the survey is longitudinal in nature, a panel structure was constructed to allow for several of the sample alpha-segments to be systematically replaced each survey. Under the original survey design, the 40 sample alpha-segments were arranged in alphabetical order and then partitioned into eight groups of five successive alpha-segments each. One segment from each group was randomly assigned to each panel, so that each panel consisted of segments that spanned the entire alphabet. For each successive survey, a new panel (consisting of eight new alpha-segments or 20 percent of the sample) was entered into the sample, replacing one of the five panels from the previous survey. Under this scheme, a nurse who maintained an active license in the same State(s) could be retained in the sample for up to five surveys.

The planned NSSRN 2004 sample size was 54,000 cases, similar to that of the NSSRN 2000, and up from the 45,000 used in previous studies. Planned sampling rates ranged from 1.125 percent in several of the largest States to 15 percent in Wyoming. This translated into planned sample sizes ranging from 3,225 RNs in California to approximately 796 in Wyoming. The initial round of sampling, however, yielded a much smaller sample than expected due to the variable size of the alpha-segments in each State. Thus, a second round of sampling was done by increasing the sampling rates from 1 to 1.125 in the eleven largest States and “adding to” the sample selected in the first round, yielding a total of 56,917 sample cases. After eliminating cross-State duplications, the expected the sample size to be fielded was still approximately 54,000 cases.

Table B-1 in Appendix B shows the sampling rates and sample sizes that were planned and actually obtained for the 51 States in the survey. Differences between planned and actual sampling rates result from State-specific variation in the distribution of nurses’ names. States are priority ordered by sampling rate and size.

Because many nurses are licensed in more than one State, their names could be selected in the sample more than once. In accordance with the sample design, we ensured that each sampled RN was retained in the outgoing sample file exactly once to avoid multiple questionnaires being sent to nurses. If we identified an exact duplicate, the nurse in the lower priority State was coded as a duplicate of the sample member in the higher priority State. For example, an Alaska record was coded as a duplicate to the sample record in Wyoming. Following data collection, these expected duplicates were reviewed to ensure that the nurse reported a license in both of the States.

Table B-1. State Sampling Rates and Sample Sizes (Priority Ordered)

	Sampling Rate Percentage
State	Priority Order	Frame Size	Planned	Actual^{^[2]}	Actual Sample Size
TOTAL		3,252,548			56,917
Wyoming	1	5,309	15.00%	15.60%	828
Alaska	2	7,389	13.00%	11.88%	878
Vermont	3	8,728	10.00%	9.53%	832
District of Columbia	4	17,104	10.00%	9.71%	1,661
North Dakota	5	8,139	9.00%	9.74%	793
Delaware	6	10,407	9.00%	8.87%	923
Montana	7	10,885	8.00%	8.15%	887
South Dakota	8	10,773	7.00%	6.88%	741
Idaho	9	12,769	7.00%	6.75%	862
Hawaii	10	13,548	7.00%	7.44%	1,008
Nevada	11	19,201	7.00%	6.25%	1,200
Rhode Island	12	17,203	5.50%	5.37%	923
New Mexico	13	17,544	5.00%	4.98%	874
New Hampshire	14	19,108	5.00%	4.71%	900
Utah	15	19,210	4.50%	4.97%	954
Maine	16	19,869	4.50%	4.50%	894
Nebraska	17	20,100	3.50%	3.56%	716
Arkansas	18	27,878	3.50%	3.52%	982
West Virginia	19	21,295	3.50%	3.13%	667
Mississippi	20	31,734	3.00%	3.13%	994
Oklahoma	21	32,185	3.00%	2.93%	944
Kansas	22	34,047	3.00%	3.10%	1,057
Iowa	23	40,312	2.50%	2.31%	933
South Carolina	24	38,265	2.50%	2.47%	944
Oregon	25	38,453	2.00%	1.95%	750
Louisiana	26	43,299	2.00%	1.75%	757
Colorado	27	48,586	2.00%	2.14%	1,042
Connecticut	28	52,364	2.00%	1.96%	1,025
Alabama	29	46,974	1.75%	1.81%	852
Kentucky	30	47,123	1.75%	1.77%	832
Arizona	31	51,482	1.75%	1.72%	887
Maryland	32	56,922	1.50%	1.47%	835
Washington	33	66,397	1.50%	1.44%	954
Minnesota	34	66,434	1.50%	1.59%	1,056
Wisconsin	35	63,865	1.25%	1.24%	793
Tennessee	36	65,827	1.25%	1.29%	849
Indiana	37	70,488	1.25%	1.23%	867
Missouri	38	74,508	1.25%	1.28%	953
Georgia	39	86,369	1.25%	1.26%	1,086
Virginia	40	85,705	1.25%	1.21%	1,036
North Carolina	41	96,877	1.125%	1.146%	1,110
Massachusetts	42	105,206	1.125%	1.350%	1,420
New Jersey	43	109,726	1.125%	1.067%	1,171
Michigan	44	117,360	1.125%	1.161%	1,363
Ohio	45	140,689	1.125%	1.124%	1,581
Illinois	46	154,572	1.125%	1.124%	1,738
Texas	47	176,652	1.125%	1.066%	1,883
Pennsylvania	48	191,628	1.125%	1.037%	1,988
Florida	49	201,113	1.125%	1.086%	2,184
New York	50	244,288	1.125%	1.061%	2,592
California	51	286,639	1.125%	1.018%	2,918

Weighting Procedures

The probability sample design of the survey permits the computation of unbiased estimates of characteristics of the RN population at the National and State level. These estimates are based on weights that reflect the complex design and compensate for the potential risk of nonresponse bias to the extent feasible. The weights that are assigned to each sample nurse may be interpreted as the number of nurses in the target population that the sample nurse represents. The sampling weight for an RN is the reciprocal of the nurse’s probability of selection in her/his priority State, adjusted to account for nonresponse and multiple licenses.

Before computing the weights, the original State frame sizes (shown above) were adjusted to account for duplicate licenses within States and ineligible licenses (i.e., frame errors) found in the sample. Most within-State duplicates were identified at the time of initial list processing, but a few were identified after sample selection. The ineligible licenses were identified in the process of reconciling the State and nurse reported licenses. Some of the inconsistencies between the State reported data and the nurse reported data are due to the time period that elapsed between frame construction and data collection (a period during which changes and license expirations naturally occur). Other differences are due to errors in either the State list or the nurse’s questionnaire. Cases that could not be reconciled by Gallup were sent to the State Boards of Nursing for resolution.

In both cases, the frame total is computed by subtracting the estimated number of ineligible and duplicate licenses from the State’s original frame count. The adjusted frame total used to compute the resulting weights for State i can be computed as:

where:

N_i = the total number of licenses on State i list,

= the estimated number of within-State duplicates in State i,and

= the estimated number of frame errors in State i (e.g., licenses listed by State that were not reported by a responding nurse).

Each responding nurse was assigned a weight corresponding to their unique ‘priority State’; that is, the State with the highest sampling rate from which he or she was licensed and selected into the sample.In other words, the weight is reflective of the probability of selecting the sampled nurse in their “priority” State. All nurses with the same priority State have an equal probability of being selected and, consequently, have equal initial sampling weights. The sum of the weights for all nurse respondents assigned to a specific priority State will equal, approximately, the total number of active licenses on the list (at the time the sample was drawn) less the number of those licenses assigned to higher priority lists.

The weights were computed sequentially for each State A, B, etc., where A was the highest-priority State, and B the next-highest-priority State. The weight for an RN sampled from the highest priority State, State A, was the ratio of the adjusted count of licenses in the sampling frame for State A to the number of eligible respondents licensed in State A. For State B, and the remaining States, the numerator and denominator of this ratio were adjusted to account for State A and other higher-priority States. To describe the basic method, the following terms are defined:

N(_i) = total number of licenses for State i (adjusted for within-State duplicates and frame errors)

m(_i) = number of eligible respondents for State i that did not have a license in a higher-priority State

n(_i,j) = number of eligible respondents with a license in both State i and State j [note n(_i,i) denotes the number of eligible respondents with a license only in State i]

W(_i) = the adjusted weight for eligible respondents who were assigned to the higher priority State i

The weight for State A was computed as follows:

W(A) = N(A) / m(A).

For the State B weight, W(B), the numerator was the adjusted frame count of licenses for State B, N(B), after removing the estimated total count of State B nurses who were also licensed in State A (i.e., W(A) n(A,B)). Similarly, the numerator of W(C) excluded State C nurses who were also licensed in either State A or State B (i.e., W(A) n(A,C) + W(B) n(B,C)). That is, for the State B weight and the State C weight, the computations were:

W(B) = [N(B) - W(A) n(A,B)] / m(B)

W(C) = [N(C) - W(A) n(A,C) - W(B) n(B,C)] / m(C) .

In either case, the denominator was the number (m(B) or m(C)) of respondents in the State not licensed in a higher-priority State.

In general, the numerator of a State I weight, W(I), was the total adjusted frame count of RN licenses in State I after removing the estimated total count of State I nurses also licensed in higher-priority States. The denominator, m(I), was the number of State I respondents not licensed in a higher-priority State. This weighting scheme incorporated both a nonresponse adjustment that inflated the respondents’ data to account for those that did not respond to the survey and a duplication adjustment to account for duplication in the sampling frame across States. These final analysis weights will serve to differentially weight responding nurses to reflect the level of disproportionality in the final respondent sample relative to the population.

Estimation Procedure

Final NSSRN estimates can be computed using the final set of sampling weights, W_k (for sample nurse k). For example, an estimate of the total number of RNs working in a particular State is based on the following indicator variable, X_k:

X_k= 1 if nurse k worked in a particular State,

= 0 otherwise.

The desired estimated total may then be written as

the sum being over all sample nurses.

Estimates of ratios and averages are obtained as the ratio of estimated totals.

Sampling and Nonsampling Errors

To the extent that samples are sufficiently large, relatively precise estimates of characteristics of the licensed RN population of the United States can be made because of the underlying probability structure of the sample data. Such estimates are, sometimes, an imperfect approximation of the truth. Several sources of error could cause sample estimates to differ from the corresponding true population value. These sources of error are commonly classified into two major categories: sampling errors and nonsampling errors.

A probability sample such as the one used in this study is designed so that estimates of the magnitude of the sampling error can be computed from the sample data. In addition, nonsystematic components of nonsampling error are also reflected in the sampling error estimates.

Nonsampling Errors

Some sources of error, such as unusable responses to vague or sensitive questions; no responses from some nurses; and errors in coding, scoring, and processing the data are, to a considerable extent, beyond the control of the sampling statistician. They are called “nonsampling errors” and also occur in cases where there is a complete enumeration of a target population, such as the U.S. Census. Among the activities that were directed at reducing nonsampling errors to the lowest level feasible for this survey included careful planning, keeping nonresponses to the lowest feasible level, and coding and processing of the sample data.

If nonsampling errors are random, in the sense that they are independent and tend to be compensating from one respondent to another, then they do not cause bias in estimates of totals, percents, or averages. Furthermore, the contribution from such nonsampling errors will automatically be included in the sampling errors that are estimated from the sample data. However, correlations or relationships in cross-tabulations are often decreased by such errors, and sometimes substantially. Thus, random errors that tend to be compensated for in estimates of simple aggregates or averages may (but not necessarily will) introduce systematic errors or biases in measures of relationships or cross-tabulations.

Nonsampling errors that are systematic (rather than random and compensating) are a source of bias for sample estimates. Such errors are not reduced by increasing the size of the sample, and the sample data do not provide an assessment of the magnitude of these errors. Systematic errors are reduced in this study by such efforts as careful wording of questionnaire items, respondent motivation, and well-designed data-collection and data-management procedures. However, such errors sometimes occur in subtle ways and are less subject to design control than is the case for sampling errors.

Nonresponse to the survey is one of the largest sources of nonsampling error because a characteristic being estimated may differ, on average, between respondents and nonrespondents. For this reason, considerable effort has been expended in this survey to obtain a high response rate by respondent motivation and follow-up procedures. A high response rate reduces both random and systematic nonsampling errors. After taking into account duplicates and frame errors, the overall response rate to this survey was 70.47 percent. State-level response rates ranged from 61.98 percent to 81.57 percent except for the District of Columbia where the response rate (46.12 percent) was significantly lower.

Sampling Errors

All sample survey estimates are subject to sampling error. The magnitude of the sampling error for an estimate, as indicated by measures of variability such as its variance or its standard error (the square root of its variance), provides a basis for judging the precision of the sample estimates.

Systematic sampling, which was the selection procedure used in choosing the alpha-segments for this study, is convenient from certain practical points of view, including providing for panel rotation. However, it does not permit unbiased estimation of the variability of survey estimates unless some assumptions are made. Thus, standard errors were estimated based upon the assumption that the systematic sample of 40 alpha-segments is equivalent to a stratified random sample of two alpha-segments from each of 20 strata of adjacent alpha-segments. Ordinarily, this assumption should lead to overestimates of the sampling error for systematic sampling, but in this case (with alpha-segments as the sampling units) the magnitude of the overestimate is believed to be trivial.

Regarding the sample as consisting of 20 pairs of alpha-segments (thus obtaining 20 degrees of freedom) for variance estimation, the probability is approximately 0.95 that the statistic of interest differs from the value of the population characteristic that it estimates by not more than 2.086 standard deviations.

Specifically, a 95 percent confidence interval for an estimated statistic takes the form:

where is the estimated standard error for .

Direct Variance Estimation

Similar to prior cycles of the NSSRN, direct estimates of sampling variance were obtained for a set of important variables for each State and for the United States using the jackknife variance estimation procedure with 20 replicates of the sample. Variance estimates using the jackknife approach require the computation of a set of weights for the full sample and a set for each replicate using the established weight computation procedure (i.e., 20 additional sets of weights). Having 20 sets of weights permits construction of 20 replicate estimates to compare with the estimate produced from all of the data; each replicate estimate is based on about 39/40ths of the data.

Each replicate was formed from 19 pairs of alpha-segments (38 alpha-segments total) and 1 alpha-segment from the 20th pair. Alpha-segments were randomly removed from each pair to form the replicate estimates. This procedure was performed 20 times, once for each pair of alpha-segments. Thus, actual respondent count in the included segments for a particular replicate was approximately 39/40^ths of the full respondent sample and was weighted to represent the full population.

The variance of , Var , is estimated by computing:

where:

= an estimated total for replicate i associated with alpha-segment pair i, and

= an estimated total obtained over the full sample.

If the estimate of interest is a ratio of two estimated totals (e.g., the total number of RNs resident in Florida between 25 and 29 years old to the total number of RNs resident in Florida), the variance estimate for the estimated ratio would be of the following form:

Following the example, the and measurements would be full sample and replicate estimates, respectively, of the number of RNs resident in Florida who were 25 to 29 years old, while and would be the corresponding estimates of the total number of RNs resident in Florida. The variance of any other statistic, simple or complex, can be similarly estimated by computing the statistic for each replicate.

The jackknife variance estimator can use either the full sample estimate, or the average of the replicate estimates. While usually little difference exists between the two estimates, the estimator, was used which tends to provide more conservative estimates of variance.

Direct estimates of the variance were computed for a variety of variables. These variables were chosen not only due to their importance, but also to represent the range of expected design effects. The average of these design effects (on a State-by-State basis) provides the basis for the variance estimate for variables not included in the set for which direct variance estimates were computed. Table B-2 in Appendix B presents direct estimates of the standard error (the square root of the variance) for a selected set of variables. Table B-3 in Appendix B shows the estimated population of nurses in each State and the standard error of these population totals.

Design Effects and Generalized Variances

The generalized variance is a model-based approximation of the sampling variance estimate, which is less computationally complex than the direct variance estimator but is also less accurate. The generalized variance equations use the national-level or State-level estimates of the design effect and, for some estimates, the coefficient of variation (CV) to estimate the sampling variance. The design effect, F, for an estimated proportion is determined by taking the ratio of the estimated sampling variance, obtained by the jackknife method, to the sampling variance of the in a simple random sample of the same size. This design effect, F, can be computed as follows:

where n is the unweighted number of respondents used to determine the denominator of .

Direct estimates of the design effect were computed for a set of variables for each State. The median of the design effects was then computed for each State and the nation. These median design effects can be used in formulas for estimating generalized variances or standard errors. This procedure uses median design effects for a class of estimates instead of calculating direct estimates (with a resulting economy in time and costs), at the sacrifice generally of some accuracy in the variance estimates.

A generalized standard error estimate for an estimated proportion, for a State or for the United States, is provided by the equation:

(1)

where n is the number of survey respondents used to determine the estimate . The multiplier F, the median^² design effect, depends upon the State for which the estimated proportion was generated. The median design effects are listed on Table B-4 in Appendix B.

Generalized estimates of standard errors can also be computed for estimated numbers (or totals) of RNs in a State with a particular characteristic (such as those employed in hospitals). The estimate is a subtotal of the estimate , the estimated total of RNs working and/or living in the State. Note that the standard error and coefficient of variation of (represented by were determined for the nation and for each State (see Table B-3).

To calculate the standard error of a total, one must first compute the relative variance (or square of the coefficient of variation) of the ratio of to (called . The relative variance can be calculated as:

where F is the design effect for the State of interest and n is the number of respondents to the survey that were weighted to obtain the estimate

Then, from the relative variance of the ratio, one can approximate the relative variance of the total denoted by using:

This approximation is based on the first-order Taylor series approximation to the variance of a product and the assumption of zero correlation between the estimate of ratio and the denominator of the ratio.

Finally, the standard error of the total can be estimated by multiplying the estimate by the square root of the relative variance defined above. The standard error of is thus estimated as:

(2)

The standard error of an estimated percentage for a region of the United States depends upon a linear combination of the variance of the same estimated percentages for the States making up that particular region. The estimated proportion for the region is:

Here, h is the number of States in region R, and and are estimates for a particular State. The formula used to approximate the standard error of an estimated proportion for a region is:

(3)

where represents the standard error of the estimated proportion for the States and the standard errors are estimated from equation (1) or from direct estimation.

The direct standard error for an estimated number for a region of the United States also depends upon a linear combination of the variance of the same estimated numbers for the States that make up the region. The formula used is

(4)

where the standard error of the estimated number is available either from the direct procedures or from equation ⁽²⁾.

Table B-2. Estimates and Standard Errors (S.E.) For Selected Variables of U.S. Registered Nurse Population

Description	Estimated Number	S.E. of Estimated Number		Estimated Percent	S.E. of Estimated Percent
UNITED STATES, Total Number Of Nurses	2,909,357	7,000

Basic Nursing Education
Diploma Program	733,377	9,749		25.21	0.32
Associate Degree	1,227,256	16,571		42.18	0.54
Baccalaureate Degree	887,114	13,366		30.49	0.47
Master’s Degree	14,979	1,412		0.51	0.05
Doctorate	532	271		0.02	0.01
Not Reported	46,098	2,568		1.58	0.09

Employed in Nursing
Yes	2,421,351	10,124		83.23	0.27
No	488,006	7,792		16.77	0.27

Racial/Ethnic Background
White (non-hispanic)	2,380,529	28,004		81.82	0.89
Black/African American (non-hispanic)	122,495	16,737		4.21	0.57
Asian (non-hispanic)	84,383	15,540		2.90	0.54
American Indian/Alaskan Native (non-hispanic)	9,453	972		0.32	0.03
Native Hawaiian/Pacific Islander (non-hispanic)	5,594	1,091		0.19	0.04
Two or more races (non-hispanic)	41,244	2,641		1.42	0.09
Hispanic/Latino (White)	38,530	7,745		1.32	0.27
Hispanic/Latino (Black/African American)	2,924	633		0.10	0.02
Hispanic/Latino (Two or more races)	3,096	741		0.11	0.03
Hispanic, Other	3,460	921		0.12	0.03
Not Reported	217,651	5,689		7.48	0.19

Employment Status in 2004
Employed In Nursing Full Time	1,696,807	12,210		58.32	0.44
Employed In Nursing Part Time	720,283	11,059		24.76	0.35
Employed In Nursing, Full/Part Time Unknown	4,261	523		0.15	0.02
Not Employed In Nursing	488,006	7,793		16.77	0.27

Graduation Year
Before 1961	150,147	4,332		5.16	0.15
1961 To 1965	146,805	4,047		5.05	0.14
1966 To 1970	203,313	4,150		6.99	0.14
1971 To 1975	300,072	7,685		10.31	0.26
1976 To 1980	378,607	7,543		13.01	0.25
1981 To 1985	385,145	7,064		13.24	0.24
1986 To 1990	321,070	6,472		11.04	0.22
1991 To 1995	406,125	5,902		13.96	0.22
1996 To 2000	367,557	6,094		12.63	0.20
After 2000	196,086	5,069		6.74	0.17
Not Reported	54,430	2,524		1.87	0.09

Employment Setting
Hospital	1,360,847	13,063		46.77	0.43
Nursing Home Extended Care	153,172	3,369		5.26	0.12
Nursing Education	63,444	2,879		2.18	0.10
Public Health/Community Health	259,911	4,347		8.93	0.15
School Health Service	78,022	3,095		2.68	0.10
Occupational Health	22,447	1,820		0.77	0.06
Ambulatory Care (Except Nurse Owned/Operated)	265,273	5,346		9.12	0.18
Nurse Owned/Operated Ambulatory Care Setting	12,500	1,112		0.43	0.04
Insurance Claims/Benefits	43,641	1,976		1.50	0.07
Planning/ Regul /Licensing Agency	8,733	933		0.30	0.03
Other	103,310	3,974		3.55	0.13
Not Reported	538,058	8,227		18.49	0.29

Type of Position
Administrator Or Assistant Administrator	125,011	2,522	4.30		0.08
Consultant	35,617	1,707	1.22		0.06
Supervisor	74,201	2,976	2.55		0.10
Instructor/Faculty	62,255	2,403	2.14		0.08
Head Nurse Or Assistant Nurse	148,210	3,880	5.09		0.13
Staff Nurse	1,431,053	11,735	49.19		0.39
Nurse Practitioner	84,042	3,424	2.89		0.12
Nurse Midwife	7,274	990	0.25		0.03
Clinical Specialist	28,623	1,900	0.98		0.07
Nurse Clinician	32,954	1,908	1.13		0.07
Certified Nurse Anesthetist	27,287	1,452	0.94		0.05
Research	19,263	1,250	0.66		0.04
Private Duty	11,762	1,280	0.40		0.04
Informatic Nurse	8,570	929	0.29		0.03
Home Health	45,621	1,834	1.57		0.06
Survey Or Auditors/Regulator	12,097	1,031	0.42		0.04
Patient Coordinator	138,404	3,205	4.76		0.11
Other	82,352	3,226	2.83		0.11
Not Reported	534,760	7,774	18.38		0.27

Highest Nursing Education
Diploma In Nursing	510,209	8,062	17.54		0.27
Associate Degree In Nursing Or Related Field	981,238	14,852	33.73		0.49
Baccalaureate In Nursing	922,696	12,963	31.71		0.45
Baccalaureate In Related Field	71,580	1,946	2.46		0.07
Masters In Nursing	256,415	5,251	8.81		0.18
Masters In Related Field	94,386	3,057	3.24		0.10
Doctorate In Nursing	11,548	645	0.40		0.02
Doctorate In Related Field	14,552	1,192	0.50		0.04
Not Reported	46,733	2,300	1.61		0.08

Age of Nurse
<25	61,778	1,486	2.12		0.05
25 To 29	171,659	3,751	5.90		0.13
30 To 34	243,182	5,572	8.36		0.19
35 To 39	289,525	6,598	9.95		0.23
40 To 44	408,248	6,721	14.03		0.23
45 To 49	508,708	7,695	17.49		0.26
50 To 54	463,565	9,646	15.93		0.32
55 To 59	338,078	6,534	11.62		0.22
60 To 64	210,196	5,764	7.22		0.20
65+	185,254	5,092	6.37		0.17
Not Reported	29,165	1,525	1.00		0.05

Marital Status and Children
Married, Children < 6	225,572	5,474	7.75		0.19
Married, Children > = 6	650,793	8,062	22.37		0.28
Married, Children All Ages	162,791	3,393	5.60		0.11
Married, No Children	994,588	10,942	34.19		0.34
Married, Children Unknown	16,916	1,275	0.58		0.04
Widowed/ Separated/ Divorced, Children < 6	13,300	1,023	0.46		0.04
Widowed/ Separated/ Divorced, Children > = 6	137,283	4,514	4.72		0.15
Widowed/ Separated/ Divorced, Children All Ages	14,683	898	0.50		0.03
Widowed/ Separated/ Divorced, No Children	355,309	8,582	12.21		0.29
Widowed/ Separated/ Divorced, Children Unknown	5,795	817	0.20		0.03
Never Married, Children < 6	9,131	1,063	0.31		0.04
Never Married, Children > = 6	18,657	1,606	0.64		0.06
Never Married, Children All Ages	2,854	609	0.10		0.02
Never Married, No Children	234,208	5,167	8.05		0.18
Never Married, Children Unknown	3,897	680	0.13		0.02
Not Reported	63,581	2,497	2.19		0.09

Mean Gross Annual Salary for Full-Time RNs	57,784.86	180.85

Mean Hours Worked per year	2,160.00	5.63

Mean Hours Worked in Last Full Workweek	38.55	0.13

Table B-3. Direct Estimates of State Nurse Population, Standard Error, and Coefficient of Variation by State, 2000

State	2004 Estimated State Nurse Population	Standard Error	Coefficient of Variation (in Percent)
United States	2,909,357	7,001	0.24
Alabama	42,894	472	1.10
Alaska	7,567	420	5.54
Arizona	48,284	910	1.89
Arkansas	23,818	569	2.39
California	255,858	1,734	0.68
Colorado	43,719	695	1.59
Connecticut	42,894	1,199	2.80
DC	9,352	324	3.47
Delaware	12,118	675	5.57
Florida	169,460	2,168	1.28
Georgia	78,898	1,070	1.36
Hawaii	11,146	387	3.47
Idaho	11,068	256	2.32
Illinois	138,092	1,236	0.90
Indiana	64,396	858	1.33
Iowa	37,777	614	1.63
Kansas	29,892	790	2.64
Kentucky	42,971	812	1.89
Louisiana	39,449	731	1.85
Maine	17,785	465	2.61
Maryland	53,061	759	1.43
Massachusetts	89,358	972	1.09
Michigan	103,697	1,406	1.36
Minnesota	60,214	621	1.03
Mississippi	27,303	517	1.89
Missouri	66,551	973	1.46
Montana	9,416	149	1.58
Nebraska	20,026	604	3.01
Nevada	16,206	427	2.63
New Hampshire	18,473	493	2.67
New Jersey	92,425	1,476	1.60
New Mexico	15,027	435	2.89
New York	215,309	2,377	1.10
North Carolina	92,391	1,238	1.34
North Dakota	7,966	206	2.58
Ohio	133,064	1,224	0.92
Oklahoma	29,268	574	1.96
Oregon	34,946	713	2.04
Pennsylvania	164,433	1,834	1.12
Rhode Island	13,847	337	2.44
South Carolina	35,204	741	2.11
South Dakota	10,223	213	2.09
Tennessee	62,266	989	1.59
Texas	168,368	1,363	0.81
Utah	18,169	413	2.27
Vermont	7,137	254	3.56
Virginia	73,526	1,361	1.85
Washington	59,761	913	1.53
West Virginia	17,742	452	2.55
Wisconsin	62,044	640	1.03
Wyoming	4,498	122	2.72

Table B-4. Median Design Effects for Percentages Estimated from the Eighth National Sample Survey of Registered Nurses, 2004

State	Median Design Effect
United States	1.63
Alabama	1.06
Alaska	1.24
Arizona	1.01
Arkansas	0.98
California	1.11
Colorado	1.04
Connecticut	1.05
Delaware	0.97
DC	1.33
Florida	1.08
Georgia	1.03
Hawaii	0.99
Idaho	0.98
Illinois	1.01
Indiana	1.02
Iowa	1.10
Kansas	0.98
Kentucky	1.08
Louisiana	1.04
Maine	1.04
Maryland	1.16
Massachusetts	1.02
Michigan	0.95
Minnesota	1.01
Mississippi	1.01
Missouri	1.05
Montana	0.99
Nebraska	0.99
Nevada	1.07
New Hampshire	1.09
New Jersey	1.00
New Mexico	1.04
New York	1.04
North Carolina	1.01
North Dakota	0.97
Ohio	1.05
Oklahoma	1.02
Oregon	1.03
Pennsylvania	0.98
Rhode Island	1.00
South Carolina	1.03
South Dakota	1.06
Tennessee	0.98
Texas	1.04
Utah	1.02
Vermont	0.98
Virginia	1.13
Washington	1.07
West Virginia	0.93
Wisconsin	1.07
Wyoming	0.95

^[1] Chromy, James R. “Design Optimization with Multiple Objectives”. American Statistical Association of the Section on Survey Research Methods, Arlington, VA., pp A4-199

^[2] Since the actual distribution of names differs for each State from the frame distribution used to develop the 250 alpha-segments, some variation occurs between the planned and actual sampling rates.

^²The median design effect was based on all design effects for estimates of proportions computed on selected variables. Using a median instead of mean value avoids the effects of extreme estimates of standard errors, which can occur for some relatively rare attributes. In prior years, an average (mean) design effect was computed for selected variables. Given that the distribution of design effects is skewed to the right, it is expected that the true median be less than the true mean.