text-only page produced automatically by LIFT Text
Transcoder Skip all navigation and go to page contentSkip top navigation and go to directorate navigationSkip top navigation and go to page navigation
National Science Foundation Division of Science Resources Statistics

Use of Web-based Data Collection

 

Web-based techniques offer a promising new mode of data collection for the SESTAT surveys. Both the general availability of Web access and the public's familiarity with it continue to grow at a rapid rate. Web-based methods can provide superior capabilities over other methods in some aspects of data collection (Dillman 2000, pp. 352–6, 372–401; Poynter 2001). Furthermore, SRS has a long and successful history of moving toward Web-based data collection (Meeks et al. 1998). For example, edit and logic checks can be built into the software—these checks verify responses to ensure they agree with earlier responses and ask for clarification if they do not, respondents can complete the survey at a convenient time, and experience shows that a large proportion of total responses can arrive within days of the instrument's deployment on the Web.

Because of low response rates from general populations, however, it is unlikely that Web-based data collection will completely replace the traditional modes of data collection used in the SESTAT surveys. However recent experience suggests that it may provide a useful way to supplement traditional methods in the future. Such use is explored further in the sections below. The first section discusses recent research in Web collection and the second section discusses specific issues related to the three surveys.

Web Collection Research

During the 1999 National Survey of Recent College Graduates Followup Survey (NSRCG Panel), Westat conducted an experiment to test Web-based collection (Collins and Tsapogas 2000). The experimental effort was focused on the panel survey, rather than the NSRCG baseline (new graduate) survey, because the panel members are easier to locate and the 1997 cycle provided information on their access to the World Wide Web and their willingness to respond using such a method. A targeted sample rather than a representative sample of 3,500 panel members was selected for the experiment. Only panel members who completed the 1997 survey, said they had Web access, said they would be willing to respond to a Web-based survey, and had a "mailable" address (an address that was complete and had not been identified as invalid during 1997) were eligible for the experiment. Experimental sample members received two mailings, about a month apart, asking them to complete the Web survey. The mail packages included a letter from NSF and a question and answer sheet, both designed to encourage response, ensure confidentiality, and provide the instructions needed to complete the Web survey, including the sample member's personal identification number (PIN) and individual password. If no completed survey was received via the Web after about 2 months, then the experimental case was sent to CATI followup. Respondents who indicated during CATI followup that they wanted to complete the survey on the Web rather than on the telephone were given about a week to do so. If the survey was not completed within a week, then followup calls were resumed. Of the experimental sample of 3,500 panel members, about 27% used the Web application to respond; an additional 60% of these cases were completed through telephone interviewing and (in a few cases) by mail.

The experience with the NSRCG Panel appears to be in line with what other survey researchers reported at the May 2000 Annual Conference of the American Association of Public Opinion Research in a seminar entitled "Facing the Challenges of the New Millennium." At that conference, several major findings regarding the use of Web-based data collection became apparent. First, regardless of the population being surveyed (college students, marketing managers, businesses and professionals, federal employees, or teachers), researchers employing Web-based data collection were obtaining response rates of about 25% or less. Second, most of the participants who responded to the Web-based survey did so within a very short period of time after the start of data collection. After that initial flurry of response, the number of respondents dropped off dramatically. A second mailing appeared to have little effect on substantially increasing the response rate after the initial flurry, even if the data collection period was extended. Third, several of the researchers noted that item nonresponse rates were lower for the Web-based survey as compared with some other modes of data collection.

Several conference sessions addressed Internet usage in general. Researchers reported that the number of individuals with access to the Internet is increasing, the population of users is ethnically diverse, and the number of low income and less educated Internet users is increasing. For example, for the first time a substantial number of high school graduates are online. A number of researchers reported that they expect to see response rates for Web-based surveys increase substantially over the next few years.

Westat researchers attended conferences (sponsored by Fed-CASIC in Washington, DC and by the Association for Survey Computing in London) that echo these trends (e.g., see Couper 2001 and Flatley 2001). These impressions also match Westat's past and current experience with dozens of Web-based data collection efforts on a variety of topics and with a variety of respondent populations.

Other reviews of research on Web-based data collection affirm that it is growing substantially in popularity, but mostly in two general situations: (1) surveys that are not particularly concerned about achieving a high response rate from a representative sample or (2) surveys in which all respondents are part of a single organization, association, or network that communicates through Web-based interfaces regularly (Dillman 2000, pp. 354–5; Deutschmann and Faulbaum 2001). Because neither of these situations applies to the SESTAT surveys, it appears that for the time being Web-based surveys will at best provide an alternate mode of data collection to supplement traditional methods.

Moreover, other problems cited in recent literature might be expected to mitigate the potential advantages of Web-based data collection. For example, differences in Web browser capabilities and line transmission speeds necessarily limit creativity and flexibility in survey design. Ironically, this could mean that the Web-based version of a survey instrument might need to be of a simpler design than the paper questionnaire. Also, some respondents' unfamiliarity with the technical capabilities of their computer equipment may result in inaccurate responses or complete failure to respond. Furthermore, e-mail addresses tend to have more multiplicity and instability than street addresses or residential telephone numbers. Finally, Web-based data collection can raise privacy concerns that may affect the quality of the data being collected (Dillman 2000, pp. 352–8, 372–6).

Top of page. Back to Top

Web Collection Issues for SESTAT Surveys

The most effective use of Web collection for the three SESTAT surveys is expected to vary by survey, just as the use of other data collection modes varies by survey. Web collection is dependent on contacting sample members by mail or e-mail to send the request to complete the survey (the Web address, PIN, and password are needed for the survey to be completed). Therefore, the quality of the addresses available for sample members early in the data collection period is critical for this approach. All three SESTAT surveys have been using mail preceded by address updating activities to contact sample members. The NSCG and the SDR collect data by mail with CATI followup. These studies use the addresses obtained from previous survey cycles as well as address updating activities to reach a large proportion of their samples by mail. Historically, each survey completed over 60% of the sample by mail prior to telephone followup, with additional mail responses received after telephone contacts. In contrast, the NSRCG has historically collected most data with CATI but has used mail to help locate sample members. The NSRCG has used a mail flier designed to introduce the study and to request the return of an address/telephone update form by mail. These fliers were mailed to addresses obtained from the colleges and universities and from address update activities conducted prior to and during mail flier update collection. In the 1997 NSRCG baseline, initial and followup flier mailings were conducted prior to CATI collection. Completed fliers were received from 26% of those mailed, which was 24% of the sample (fliers could not be mailed to some graduates because no locating information was available for them without extensive CATI tracing activities). The early NSRCG surveys that used mail collection with telephone followup experienced difficulty in contacting recent graduates by mail; final response rates of 68% and 73% were achieved in 1988 and 1990, respectively. By comparison, response rates of 82% to 86% were achieved during the 1993 through 1997 cycles using CATI collection.

In addition to postal mail, e-mail contacts can be used to request completion of a Web survey. E-mail messages could be sent either as the initial contact or as reminders. Using e-mail as an initial contact causes some confidentiality concerns about sending respondents' PINs and passwords by e-mail. There is less control over who reads e-mail messages than who reads sealed letters. Although it is a federal offense to tamper with postal service mail, employers may legally monitor e-mail messages sent to work addresses. Although most postal addresses used in the past have been home addresses, a number of survey e-mail messages will likely reach sample members at work. The availability of e-mail addresses also varies by survey. The continuing panel components of the SDR and the NSCG have e-mail addresses that are available from the previous survey cycle, but the NSRCG does not have access to these e-mail addresses. The NSRCG faces some of the same difficulties in obtaining current e-mail addresses as in obtaining current postal addresses early in the data collection. At this time, there do not appear to be any searches for updated e-mail addresses that can be done in "batch mode" in the way that the National Change of Address can be used for postal addresses. Instead, searches for e-mail addresses are usually done on a case-by-case basis, which can be time consuming and expensive. In the 2001 NSRCG, colleges and universities were asked to supply e-mail addresses along with other contact information on the graduate sampling lists. E-mail addresses were obtained from colleges for about 22% of the sample and were used to send e-mail fliers. In the first batch of e-mail fliers, about 2,500 messages were sent; about 40% were returned as undeliverable. Completed responses were received from 5% of the messages that were sent or about 8% of those not returned as undeliverable.

These experiences demonstrate that the ability to reach sample members by mail and e-mail is very different for the NSRCG than for the other two SESTAT surveys. Because of these differences as well as population differences, Web collection will be discussed separately for each survey.

Survey of Doctorate Recipients

The SDR survey population can be expected to have wide access and familiarity with the Web, especially because 40% of the sample members are employed in academic institutions. This feature, combined with the ability to reach a large portion of the sample by mail and e-mail, makes Web collection very attractive for this survey. The Web collection test planned by NSF for the 2003 SDR will provide important information for this survey. Additional tests may be needed to determine the most efficient approach for including the Web collection with the other data collection modes. For example, the following questions need to be answered: (1) should the letter requesting completion of the Web survey be sent before the mail questionnaire or with the questionnaire, giving the respondent a choice of response mode? and (2) should the mail questionnaire collection be scaled back drastically (i.e., is it expected that the Web will replace mail collection in the future)? These types of questions may be answered through observations of response patterns and through formal experiments. It is expected that CATI telephone followup will still be needed to reach the response rate goal for the survey.

National Survey of College Graduates

The NSCG survey has also reached a large portion of the sample through mail, but this population may have a smaller proportion of Web users than the SDR survey. Requests to complete the Web survey will probably reach most sample members, but the proportion that will complete the survey on the Web is unknown. The experience of the 1999 NSRCG panel Web survey experiment may provide some guidance, although there are differences in the populations. As discussed earlier, that experiment obtained a 27% Web response rate among those who said they had Web access, said they would be willing to respond to a Web-based survey, and had a "mailable" address from the previous cycle. Although this experience is not directly applicable to the NSCG, it would be most relevant for the redesign option 2, in which the same panel of NSCG cases selected in the 1990s is continued into the 2000 decade. The Web collection issues that are dependent on the redesign options are discussed below.

With redesign option 1, a new sample would be selected for the NSCG from the 2000 census data. A large screening effort would be needed to identify the eligible sample members. A Web survey has several desirable characteristics to assist with this screening effort. A significant portion of the sample could be reached by mail using the census addresses with appropriate address updating; however, this would be a smaller proportion than that for the current NSCG panel. The Web survey could be designed to ask the screening questions early in the questionnaire and then follow different paths based on the responses to the screening questions. Because the respondent would not know the path his or her response would follow, the potential for bias (compared with a similarly designed mail survey) would be reduced. However, the ability to change responses for these questions would need to be limited. There are also issues related to field of study, which is one of the primary screening items. Respondents do not always self-select the most appropriate code for field of study and their eligibility could change if the education code is corrected. One way to address this issue is to inform respondents that they might be contacted for more information later and to collect all locating information at the end of the Web survey or screener. Completed NSCG Web screeners/surveys are expected to have lower processing costs than mail surveys and lower collection costs than CATI surveys, assuming that a large enough number of Web responses are collected to offset the Web development costs.

Redesign options 2, 3, and 4 all involve the possibility of including individuals who were either dropped due to nonresponse during the 1990s or were never in the sample (in the case of option 4). In general, the more extensive telephone followup needed to locate and contact these sample members, the less efficient Web collection becomes. If extensive telephone contacts are needed, then it is likely to be most efficient to complete the survey by telephone.

National Survey of Recent College Graduates

The NSRCG survey population consists of recent graduates who are expected to use the Web frequently. However, telephone contacts are needed to reach a large proportion of the sample. During the 1999 NSRCG, 62% of the new graduates required some tracing and 43% required intensive tracing. Identifying cases that need tracing early in the data collection period improves the efficiency and results of the tracing activities. Unlike mail or Web collection, telephone collection provides immediate feedback on the graduate's location, thus allowing efforts to be directed to the most useful collection activities for each case—either tracing, questionnaire collection, or refusal conversion. Collection of referral information is another important aspect of tracing in which the telephone is more effective than mail or the Web. Referrals include locating information supplied by the graduates' relatives, friends, or other contacts. During the 1997 cycle, referrals were the second most productive source of addresses and telephone numbers for new graduates. Twenty-four percent of final survey responses were completed at referral telephone numbers. These statistics, however, do not reflect the full impact of referral information. Contacts who cannot provide the graduate's exact telephone number or address may provide other important information such as the city of residence, employer, or college/university that the graduate attends. This information can then be used to obtain the address and telephone number through other sources, such as directory assistance and college contacts. Although these other sources then appear as the source of the final response, the referral information provides a critical link in the tracing chain.

Web Collection Summary

Assuming that issues concerning comparability of data can be resolved, it appears that a Web-based survey may provide a useful way of augmenting the traditional modes of data collection used in the SESTAT surveys, especially the SDR. The NSCG is also expected to benefit from the addition of Web collection, but this will vary depending on the redesign option chosen. The benefits to the NSRCG are less certain due to the extensive telephone contacts needed for tracing graduates.

 
Design Options for SESTAT for the Current Decade: Statistical Issues
Working Paper | SRS 07-201 | June 2007
National Science Foundation Division of Science Resources Statistics (SRS)
The National Science Foundation, 4201 Wilson Boulevard, Arlington, Virginia 22230, USA
Tel: (703) 292-8780, FIRS: (800) 877-8339 | TDD: (800) 281-8749
Text Only
Last Updated:
Jul 10, 2008