text-only page produced automatically by LIFT Text
Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation Division of Science Resources Statistics

SESTAT Survey Design and Methodology


Sample Designs

Probability sampling was used for the SESTAT component surveys to create a defensible basis for generalizing from the combined samples to the SESTAT target population. Selecting a probability sample means establishing a frame through which members of the target population can be identified -- either directly or via linkage to other units (e.g., individuals to housing units). Because scientists and engineers constitute only a small percentage of the U.S. population, it would have been cost prohibitive to survey the entire nation to identify members of the target population who could be interviewed. Instead, a multiple-frame sampling approach to surveying U.S. scientists and engineers was used (see "Component Surveys").

While the SESTAT surveys have somewhat different sample design---due to differing information available from the sample frames-- they share common goals. The samples are designed to enhance reliability of the estimates through oversampling. Oversampling stratification takes into consideration field and level of S&E degree and demographic characteristics. Increased sample is allocated to women, underrepresented minorities, the disabled, and individuals in the early part of their career.

Sample Design: 1993 National Survey of College Graduates (NSCG)

The sampling frame for the 1993 National Survey of College Graduates (NSCG) was constructed from the 1990 Decennial Census Long Form sample. Sampling was restricted to Long Form sampled individuals with at least a bachelor's degree who, who as of April 1, 1990, were age 72 or younger. A total of 4,728,000 long form sampled individuals met these criteria; 214,643 were selected for the NSCG sample.

The sample design was a two-phase, stratified random sample of individuals with at least a bachelor's degree. Phase 1 consisted of sampling from the Long Form using a stratified systematic sample. Phase 2 consisted of subsampling the Long Form cases, in which a stratified design with probability-proportional-to-size, systematic selection within strata was used. The Long Form sampling weight was used as the size measure in selection to come as close as possible to a self-weighting sample within Phase 2 strata.

Phase 2 strata were defined according to demographic characteristics, highest degree achieved, occupation, and sex. The maximum sampling rate was 3.00 percent, but most strata were sampled at rates of between 2.03 and 2.82 percent. Successively lower rates were used for each of the following groups: whites with bachelor's or master's degrees and employed in a science and engineering (S&E) occupation; nonwhites with bachelor's or master's degrees and employed in a non-S&E occupation; non-foreign-born doctorate recipients; and whites with bachelor's or master's degrees and employed in a non-S&E occupation.

The unweighted response rate for the 1993 NSCG was 78 percent, yielding 148,932 interviews with individuals who had at least a bachelor's degree and identifying an additional 19,224 cases not eligible for interview (e.g., those who were deceased, over 75, not an S&E, no longer living in the U.S.). Interview data were then used to determine whether the respondents fit into SESTAT's target population of scientists and engineers -- a total of 74,693 of the survey respondents fit the description and were incorporated into the SESTAT integrated database.

Sample Design: 1993 National Survey of Recent College Graduates (NSRCG)

The 1993 National Survey of Recent College Graduates (NSRCG) used a two-stage sample design. Educational institutions were sampled in the first stage, and bachelor's and master's graduates were sampled from within these institutions for the second stage. The Integrated Postsecondary Education Data System (IPEDS) was used to construct the sampling frame for educational institutions.

IPEDS is a system of surveys sponsored by the National Center for Education Statistics to collect data from all U.S. educational institutions whose primary purpose is postsecondary education. The frame for the NSRCG was restricted to IPEDS data records associated with four-year U.S. colleges and universities offering bachelor's or master's degrees in one or more S&E fields. Of these institutions, 196 had such large numbers of the nation's S&E graduates that they were selected with certainty.

From the remaining institutions, 79 were selected using systematic, probability- proportional-to-size sampling after the file was sorted by ethnicity, region, public/private status, and presence of agricultural courses. The measures of size were devised to account for the rareness of certain fields of study and for the incidence of Hispanic, African-American, and foreign students.

Each sampled institution was asked to provide a roster of students receiving a bachelor's or master's degree in an S&E field between April 1, 1990 and June 30, 1992. From the 273 selected institutions, 25,785 students were selected using stratified sampling. Sampling rates ranged from 1 in 144 (for those receiving bachelor's degrees in psychology, or degrees in nonspecified fields) to 1 in 2 (for those receiving bachelor's and master's degrees in materials engineering). Of the 25,785 selected students, a total of 19,426 eligible scientists and engineers responded to the 1993 NSRCG. 2,670 sample members were deemed ineligible.

Sample Design: 1993 Survey of Doctorate Recipients (SDR)

The Survey of Doctorate Recipients (SDR) is a longitudinal survey of doctorate recipients. Samples of new cohorts are added to the base sample every two years. The sampling frame for the SDR is constructed from the Doctorate Records File (DRF), a historical database derived from the Survey of Earned Doctorates, an ongoing census of all U.S. doctorate recipients since 1942.

The SDR frame is restricted to two groups: (1) S&E doctorates under 76 years of age who are U.S. citizens and (2) non-U.S. citizens who plan to remain in the U.S. after they receive their degree. For the 1993 SDR, there were 568,726 from the sampling frame, 49,228 of whom were sampled.

A two-phase sample design has been used for the SDR since 1991. Before then, the SDR design was a highly stratified, simple random sample of doctorate S&Es. Strata were defined on the basis of frame information and a "cohort" variable associated with the year the doctorate was received.

Beginning in 1991, the number of strata were reduced primarily by collapsing over the pre-1991 cohorts and then introducing new stratification variables to facilitate oversampling of the disabled and certain minority groups. Also at that time, a new 1991 cohort sample was selected using the Phase 1 stratum definitions and sampling rates. This new cohort was added to the older cohort samples to create the Phase 1 sample for the 1991 SDR and subsequent years. This Phase 1 sample was then restratified using the newer stratum definitions. Because minority and disability information was not known for older cohorts, a combination of frame and survey responses was used to assign members of the older cohorts to Phase 2 strata. These Phase 2 sample cases were then subsampled in 1991 (and to a lesser extent in 1993) to yield the desired sample allocations for each stratum. For the 1993 SDR, the sample for the new cohort (1992-93 graduates) was selected as an independent supplement to the older cohort sample. The new cohort sample was selected using stratified simple random sampling.

The sampling rates and stratum definitions were comparable to those of the Phase 2 older cohort sample. The overall 1993 sampling rate was 8.8 percent, but rates for individual sampling strata ranged from 4.5 percent to 66.7 percent. Strata sampled at 66.7 percent included Native American female doctorate recipients in the earth/ocean/atmospheric sciences and handicapped, female, doctorate recipients in electrical/electronics/communications engineering. Strata with the lowest sampling rates were white males with doctorates in economics or other social sciences. A total of 39,495 eligible scientists and engineers responded to the 1993 SDR.

Sample Design: 1995 National Survey of College Graduates (NSCG)

Subsamples for the 1995 NSCG were drawn from a frame consisting of the combined samples of eligible respondents to the 1993 NSCG and the 1993 National Survey of Recent College Graduates (NSRCG).

Cases that overlapped surveys were removed from the 1995 NSCG frame according to a "unique linkage rule." Those 1993 NSCG cases who had a chance of being selected for the 1993 NSRCG or 1993 Survey of Doctorate Recipients (SDR) were removed from the frame; 1993 NSRCG cases that had a chance of being selected for the 1993 SDR were also removed from the frame; and finally, 1993 NSCG or 1993 NSRCG cases known to have a chance of being selected for the 1995 NSRCG or the 1995 SDR were removed from the frame.

The frame was stratified by demographic group, highest S&E degree, highest S&E major, and sex. A sample consisting of 62,004 individuals for the mail survey was selected using probability- proportional-to-size sampling within these strata. The 1993 analysis weight was used as the size measure. 403 of these cases were deemed ineligible, resulting in an initial sample size of 61,891.

There were 41,522 eligible respondents to the mail survey. Nonrespondents were subsampled for computer-assisted telephone interview (CATI) or computer-assisted personal interview (CAPI) follow-up, again using stratified probability-proportional-to-size sampling. Across all data collection modes, a total of 53,448 eligible scientists and engineers responded to the 1995 NSCG.

Sample Design: 1995 National Survey of Recent College Graduates (NSRCG)

The 1995 design for the NSRCG was similar to the 1993 design. In the two-stage sampling approach, educational institutions were again sampled in the first stage using probability-proportional-to-size sampling, but a composite size measure was designed to facilitate oversampling of rare domains used for the first time in 1995. The 1991-1992 Integrated Postsecondary Education Data System (IPEDS) was used to construct the sampling frame for institutions. The rules for including institutions were the same as the 1993 rules.

One hundred and two institutions were so large that they had to be selected with certainty, and then 173 institutions were sampled from the "less certain" portion of the frame after stratifying by region, public-versus-private, and percentage of S&E degrees. Each sampled institution was asked to provide a roster of students receiving a bachelor's or master's degree in an S&E field between July 1, 1992 and June 30, 1994.

From the 266 responding institutions, 23,771 students were selected using stratified sampling. Strata were defined according to the year the degree was received, major field, degree status, and Native American status. Initial nonrespondents and those who had to be traced were subsampled to yield the desired sample size of 21,000 cases. A total of 16,338 eligible scientists and engineers responded to the 1995 NSRCG. 1,630 sample members were deemed ineligible.

Sample Design: 1995 Survey of Doctorate Recipients (SDR)

The sample design for the 1995 SDR was much like that of the 1993 SDR with some exceptions. In 1995, a sample of new cohorts -- those earning doctorate degrees at U.S. institutions between July 1, 1992 and June 30, 1994 -- was added, and the previous sample of doctorate recipients (degrees received January 1, 1942 to June 30, 1992) was subsampled. The combined sample was about the same size as the 1993 sample. New versus old cohorts were sampled at similar rates within strata defined by demographic group, field of study, and sex.

Probability-proportional-to-size sampling was used to select each stratum sample. The sampling weight was used as the size measure for old cohorts and a value of "1" was used as the size measure for the new cohort population. An initial sample of 49,829 cases was selected for the mail survey, 31,243 of which responded. Nonrespondents were subsampled for CATI follow-up, again using stratified, proportional-to-size sampling procedures. A total of 11,327 mail nonrespondents were followed up by CATI. Across all modes of data collection, 35,370 eligible doctorate recipients completed interviews.

Sample Design: 1997 National Survey of College Graduates (NSCG)

The 1997 NSCG sample was drawn from a frame consisting of eligible respondents to the 1995 National Survey of College Graduates (1993 NSCG and 1993 NSRCG Panel) and the 1995 National Survey of Recent College Graduates (1995 NSRCG Panel). The survey contractors, the Census Bureau and Westat, Inc., administered this survey.

  • Census Bureau: administered the 1993 NSCG portion
    This portion of the sample was drawn from a frame consisting of original 1993 NSCG respondents who were also respondents to the 1995 NSCG.
  • Westat, Inc.: administered the 1993 and 1995 NSRCG Panel portions
    This portion of the sample was drawn from a frame consisting of respondents to the 1993 and 1995 NSRCG.

The Census portion of the 1997 NSCG included 45,877 individuals who were initially sent the mail survey. Mail nonrespondents were sent to computer-assisted telephone interview (CATI) or computer-assisted personal interview (CAPI) follow-up. A total of 33,435 cases were deemed complete mail interviews (respondents less ineligibles and noninterviews). The remaining complete interviews were obtained by CATI (7,067) and CAPI (2,004), for a total of 42,506 eligible respondents.

The Westat portion of the 1997 NSCG had a total sample size of 15,048, all of which were sent to CATI. Of these, 12,307 were eligible completes and 485 were ineligible, given an unweighted response rate of 85% for the 1997 cycle (eligible completes as a percent of eligible sample). About 4 percent of the completed Panel surveys were received by mail.

NSCG cases that overlapped the other SESTAT surveys were removed from the 1997 NSCG frame according to a "unique linkage rule." This meant that those 1995 NSCG cases who had a chance of being selected for the 1993 or 1995 or 1997 NSRCG or SDR were removed from the frame, as were individuals now over 75 years of age.

Sample Design: 1997 National Survey of Recent College Graduates (NSRCG)

The 1997 design followed the design of the earlier NSRCG surveys. In the two-stage sampling approach, educational institutions were again sampled in the first stage using probability-proportional-to-size sampling. The institution sample was the same as used for the 1995 cycle, with 102 certainty selections and 173 selected with probability proportional to size. Of the 275 institutions, 1 was ineligible and 274 responded (100% response rate).

Each sampled institution was asked to provide a roster of students receiving a bachelor's or master's degree in an S&E field between July 1, 1994 and June 30, 1996. 14,057 graduates were sampled. Of these, 10,452 were eligible and completed the survey and 1,032 were ineligible. The unweighted response rate was 82%. This was both the graduate level response rate and the overall response rate because the institution response rate was 100%. Of the completed surveys, about 3 percent were received by mail.

Sample Design: 1997 Survey of Doctorate Recipients (SDR)

The sample design for the 1997 SDR was much like that of the 1993 and 1995 SDR. In 1997, an oversample of new cohorts -- those earning doctorate degrees at U.S. institutions between July 1, 1994 and June 30, 1996 -- was added, and the previous sample of doctorate recipients (degrees received January 1, 1942 to June 30, 1994) was subsampled. The combined sample was 55,367. New and old cohorts stratified by demographic group, field of study, and sex.

Probability-proportional-to-size sampling was used to select each stratum sample. For strata consisting of rare groups, cases were selected with certainty to maintain sufficient sample size for analysis. An initial sample of 55,367 cases was selected for the mail survey, of which 38,309 responded. Of these cases, 35,667 were deemed complete interviews, with the remainder either permanently or temporarily out-of-scope. Nonrespondents, were subsampled for CATI follow-up, based on the assignment of permanent random number (PRNs) to all cases in the sample then subsampling was performed only on eligible pending cases. A total of 15,809 mail nonrespondents were followed up by CATI. CATI data collection generated 8,285 complete interviews. Across all modes of data collection, 35,667 eligible doctorate recipients completed interviews.

Sample Design: 1999 National Survey of College Graduates (NSCG)

The 1999 NSCG sample was drawn from a frame consisting of eligible respondents to the 1995 National Survey of College Graduates (1993 NSCG, 1993 NSRCG Panel, 1995 NSRCG Panel) and the 1997 National Survey of Recent College Graduates (1997 NSRCG Panel). Two survey contractors, the Census Bureau and Westat, Inc., administered this survey.

  • Census Bureau:  administered the 1993 NSCG and 1993 NSRCG Panel portions
    This portion of the sample was drawn from a frame consisting of original 1993 NSCG or 1993 NSRCG respondents who were also respondents to the 1997 NSCG.
  • Westat, Inc.:  administered the 1995 and 1997 NSRCG Panel portions
    This portion of the sample was drawn from a frame consisting of respondents to the 1995 and 1997 NSRCG.

The Census portion included 39,989 individuals who were initially sent the mail survey. Mail nonrespondents were sent to computer-assisted telephone interview (CATI) or computer-assisted personal interview (CAPI) follow-up. A total of 28,275 cases were deemed complete mail interviews (respondents less ineligibles and noninterviews). The remaining complete interviews were obtained by CATI (7,273), for a total of 35,548 eligible respondents, giving an unweighted response rate of 90%.

The Westat portion had a total sample size of 14,527, all of which were sent to CATI. Of these, 11,397 were eligible completes and 357 were ineligible, giving an unweighted response rate of 81% for the 1999 cycle (eligible completes as a percent of eligible sample). About 4 percent of the completed Panel surveys at Westat were received by mail, and another 9% were completed on the Web.

NSCG cases that overlapped the other SESTAT surveys were removed from the 1999 NSCG frame according to a "unique linkage rule." This meant that those 1999 NSCG cases who had a chance of being selected for the 1993, 1995 1997, or 1999 NSRCG or SDR were removed from the frame, as were individuals now over 75 years of age.

Sample Design: 1999 National Survey of Recent College Graduates (NSRCG)

The 1999 design followed the design of the earlier NSRCG surveys. In the two-stage sampling approach, educational institutions were again sampled in the first stage using probability-proportional-to-size sampling. The institution sample was the same as used for the 1997 cycle, with 102 certainty selections and 173 selected with probability proportional to size. Four institutions were added to the sample with certainty to compensate for the undercoverage problem. Of the 279 institutions, 1 was ineligible, 1 declined to provide a graduate list, and 277 responded (99.6% unweighted response rate).

Each sampled institution was asked to provide a roster of students receiving a bachelor's or master's degree in an S&E field between July 1, 1996 and June 30, 1998. 13,918 graduates were sampled. Of these, 9,984 were eligible and completed the survey and 987 were ineligible. The unweighted response rate was 78.8%. The overall response rate was 78.5%, the product of the unweighted institutional response rate of 99.6% and the unweighted graduate rate of 78.8%. Of the completed surveys, about 5 percent were received by mail.

Sample Design: 1999 Survey of Doctorate Recipients (SDR)

The 1999 SDR sample design divides the 1999 SDR sampling frame cases into three mutually exclusive groups: the old cohort, the nearly new cohort, and the new cohort. These groups were defined by the doctoral degree academic years. Frame cases with doctoral degrees earned prior to July 1, 1992 were included in the old cohort, cases with doctoral degrees earned between July 1, 1992 and June 30, 1996 were included in the nearly new cohort, and cases earning a doctoral degree between July 1, 1996 and June 30, 1998 were included in the new cohort.

1999 SDR total sample size was 40,000 and 4,000 of the total sample consisted of the new cohorts to ensure that the sampling rate of the new cohort was at least 15 percent higher than that of the old cohort. The remaining 36,000 sample cases were then divided so that the nearly new cohort would have a 10 percent higher sample allocation than the old cohort.

The 1999 SDR used a stratified design, where strata were defined by demographic group, degree field, and sex.  The strata were formed by the multiway cross of these variables. The number of sample cases were allocated to be selected from each stratum.  The sample allocation followed a seven-step process.  For strata where the allocated sample size was equal to the frame size, all cases were selected for sample.  For all other strata, sample cases were selected using the probability to size (PPS) selection method separately for each cohort group (with the sampling weights as the size measure).

From an initial sample of 40,000 cases, 27,269 responded by mail. Of these cases, 26,216 were deemed complete interviews, with the remainder either permanently or temporarily out-of-scope. Nonrespondents in the mail phase --14,407 -- were followed by CATI. CATI data collection generated 5,102 complete interviews. Across all modes of data collection, 31,318 eligible doctorate recipients completed interviews.

Sample Design: 2001 National Survey of Recent College Graduates (NSRCG)

The 2001 design followed the design of the earlier NSRCG surveys. In the two-stage sampling approach, educational institutions were again sampled in the first stage using probability-proportional-to-size sampling. In addition to the same institution sample used for the 1999 cycle (106 certainty selections and 173 selected with probability proportional to size), one institution was added to the sample with certainty to deal with the undercoverage. Of the 280 institutions, 2 were ineligible, 2 declined to provide graduate lists, and 276 responded (99.3% response rate).

Each sampled institution was asked to provide a roster of students receiving a bachelor's or master's degree in an S&E field between July 1, 1998 and June 30, 2000. 13,516 graduates were sampled. Of these, 9,887 were eligible and completed the survey and 937 were ineligible. The unweighted response rate was 80.1%. The overall response rate was 79.5%, the product of the unweighted institutional response rate of 99.3% and the unweighted graduate rate of 80.1%. Of the completed surveys, about 7 percent were received by mail.

Sample Design: 2001 Survey of Doctorate Recipients (SDR)

The 2001 SDR sample design divides the 2001 SDR sampling frame cases into three mutually exclusive groups: the old cohort, the nearly new cohort, and the new cohort. These groups were defined by the doctoral degree academic years. Frame cases with doctoral degrees earned prior to July 1, 1994 were included in the old cohort, cases with doctoral degrees earned between July 1, 1994 and June 30, 1998 were included in the nearly new cohort, and cases earning a doctoral degree between July 1, 1998 and June 30, 2000 were included in the new cohort.

2001 SDR total sample size was 40,001 and 4,000 of the total sample consisted of the new cohorts to ensure that the sampling rate of the new cohort was at least 15 percent higher than that of the old cohort. The remaining 36,001 sample cases were then divided so that the nearly new cohort would have a 10 percent higher sample allocation than the old cohort.

The 2001 SDR used a stratified design, where strata were defined by demographic group, degree field, and sex.  The strata were formed by the multiway cross of these variables. The number of sample cases were allocated to be selected from each stratum.  The sample allocation followed a seven-step process.  For strata where the allocated sample size was equal to the frame size, all cases were selected for sample.  For all other strata, sample cases were selected using the probability to size (PPS) selection method separately for each cohort group (with the sampling weights as the size measure).

From an initial sample of 40,001 cases, 26,702 responded by mail. Of these cases, 25,814 were deemed complete interviews, with the remainder either permanently or temporarily out-of-scope. Nonrespondents in the mail phase --13,086 -- were followed by CATI. CATI data collection generated 5,552 complete interviews. Across all modes of data collection, 31,366 eligible doctorate recipients completed interviews.

National Science Foundation Division of Science Resources Statistics (SRS)
The National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA
Tel: (703) 292-8780, FIRS: (800) 877-8339 | TDD: (800) 281-8749
Text Only
Last Updated:
Jul 10, 2008