COMMENTS OF ERIC JOHNSON CONCERNING CONSUMER ON-LINE PRIVACY-P954807 

An Examination of the Role of Clickstream Data
in Marketing through the Internet

May 12, 1997

An Advanced Study Project
in the MBA Program at the Wharton School
of the University of Pennsylvania by:

Karen Ando
Lilian Clemena
Michael Robbins
Jennifer Strebeck

Table of Contents

Abstract 1

Section I: Clickstream Data

Background
Components of Clickstream Data
Site Activity Statistics
User-Level Statistics
Marketing Applications of Clickstream Data
Aggregate Traffic Tracking
Customer Profiling
Semi-Customization
Full-Customization
Clickstream Data Versus Traditional Marketing Research

Section II: Supplier Interviews
Site Activity Tracking Suppliers
BroadVision
W3.com
NetGenesis
Accrue
Auditing and Verification Suppliers
Audit Bureau of Circulations
Conclusion of Findings

Section III: Company Interviews
Survey Results
Starwave Corporation
Public Broadcasting System
Federal Express
Liberty Financial Companies, Inc.
Conclusion of Findings

Section IV: Conclusions

ABSTRACT
The following paper is the product of an Advanced Study Project, fulfilling the requirements of the Marketing and Operations Majors within the MBA program at the Wharton School of the University of Pennsylvania.

The Internet offers great potential for changing the way marketers interact with consumers. One of the main attractions of the Web is the ability to use it as an instrument for one-to-one marketing through the full-customization of product and service offerings. One powerful way in which this potential for personalization is made possible is through the analysis and application of clickstream data.

This paper explores the role of clickstream data in marketing through the Internet. First, the components of clickstream data - namely Web site activity measures and user-level statistics - are identified and explained. A typology of current marketing applications leading up to full-customization is offered. Next, current tools and services available for the analysis and application of clickstream data are examined. The paper then offers profiles of several companies on the leading edge of using clickstream data in their marketing efforts. The paper concludes with a look at current barriers to the application of clickstream data, as well as possible future developments.

SECTION I: CLICKSTREAM DATA

Background
Marketing through the Internet is one of the most talked about topics today, with many companies rushing to include the medium in their marketing strategy, as either a distribution outlet for their products and services, or as an advertising vehicle. One of the main attractions of the Web is the ability to monitor the visitor's actions at the site. Web marketers who capture and analyze this data can then adjust the Web page offerings accordingly, establishing a one-on-one relationship with the consumer. This real-time opportunity for interaction with the consumer replicates or even exceeds the experience a consumer may have with a salesperson at a traditional retail outlet.

Components of Clickstream Data
The opportunities for interaction between a visitor and the Web site are made possible through the collection and analysis of clickstream data. Clickstream refers to the flow of information that Web page visitors knowingly or unknowingly provide about themselves every time they visit a company's site.(1)

Clickstream data fall very broadly into two categories:

  • Site Activity Statistics: Aggregate measures of daily site traffic and usage, including features examined, information requested and files downloaded.
  • User-Level Statistics: Profiling or tracking individual visitors, such as the user's actual movement through the site, or records of his previous visits to the same site.

Site Activity Statistics
Perhaps the most commonly examined statistics are the aggregate site activity numbers, collected automatically by the Web page server in a common log file. These include simple traffic measures, such as the number of hits or files downloaded, the size of requested files, and the frequency of specific error messages occurring during file transmission. The specific types of files downloaded are also identified as text, picture, moving picture, sound, executable file, etc. These aggregate figures can easily be downloaded from the Web server onto the page maintainer's personal computer for analysis. With the proper software, this data can also be analyzed real-time, offering the page maintainer the same options as a retail merchant observing traffic patterns through his store.

Modeling site activity from these statistics, however, is an imprecise science. For instance, misleading activity may be reported when the user clicks the "back" button to reload a page, or interrupts the page mid-transmission. And, the Web community has yet to agree on standards of interpretation or terminology for the activity statistics.

The measure of "hits," for instance, presents a problem in accurate measurement of Web site audience. A "hit" is recorded for every file - both graphic and text - downloaded off a server. Since most Web pages have multiple graphic and text files on a single page view, a visit to a single location will inevitably result in multiple recorded hits. Hits, therefore, almost always exceed the actual number of "visits," or separate user accesses, to a Web site. As a result, with current Web server technology, the number of visitors as well as their session times - the length of time a user viewed a site - are inferred using statistical modeling techniques on the number of hits.

Another complicating factor in Web traffic measurement is the process of caching. Many large Internet service providers, such as American On-line, store copies of popular Web site pages on their own server in order to minimize the time it takes for its customers to load up the pages. This practice is known as caching. While beneficial to Web surfers, caching results in unrecognized hits to the publishers of the popular pages. Says Eileen Kent, Vice President of New Media for Playboy Enterprises, "The AOL audience for us is huge, but I don't see the hits. No piece of software is going to capture your traffic from there because it is not hitting your server."(2)

User-Level Statistics
In addition to the aggregate measures, the Web is able to capture many figures on the individual visitors. There are various ways in which these data exchanges take place. One prevalent Web mechanism is known as the "cookie." The cookie (sometimes called a "magic cookie," reportedly in reference to a power-granting charm from the game Dungeons and Dragons) is simply a text instruction placed by the Web site onto the visitor's computer hard drive. The cookie file allows the Web publisher to store information about the visit, and to retrieve that information upon the user's return visit to the same site.

Cookies are used to track the user's movements or clicks throughout the site, or to retain the user's password and user name at sites that require registration. This enables the user to skip entering the information upon every visit to the site. Cookies can be programmed with expiration dates, to erase themselves off the hard drive after a certain period of time. Companies currently using cookies range from retail sites such as Amazon.com booksellers and CDnow, to content providers such as The New York Times and Disney. Not surprisingly, a visit to the Netscape or Microsoft sites will also leave cookies on your hard drive.

While cookies' current uses are benign in nature and mostly unknown to the Web site visitor, the technology has raised some concerns among privacy advocates. Hypothetically, the cookies could violate a user's Internet privacy by allowing enterprising Web page maintainers to keep track of where a user has been and exactly what he has done there, without the user's knowledge or consent. A proposal for cookies standards is currently being reviewed by the Internet Engineering Task Force. The proposal, which can be found at http://ds.internic.net/rfc/rfc2109.txt, would require Web publishers to reveal to users when a cookie was being transmitted. Says Lucent Technologies' David Kristol, an author of the proposal, "The default assumption is that people using third-party cookies are doing bad things and that therefore they should be shut off unless someone decides they're willing to accept them."(3)

While cookies are housed on the user's actual computer (in a file named "cookies.txt" in the browser folder), Web publishers also collect information about individual visitors on their own computer, or server. Visitor information that can currently be collected includes:

Time and date of visit
Domain: The domain is the part of the visitor's email address that identifies the home network or Internet Service Provider (ISP). Issued by the National Science Foundation, the end extensions identify the home network as:

  • a commercial enterprise.com
  • an educational institution.edu
  • a government body.gov
  • the military.mil
  • a network.net, or
  • a non-profit organization.org

Additional extensions, such as .jp or .uk, may indicate country of origin. Notably, the increased popularity of the Internet has resulted in the necessity of creating additional extensions. Late 1997 will bring the following new extensions: .store, .firm, .arts, .info, and .nom (for individuals).

Browser type: The software the visitor is using to surf the Internet. Netscape Navigator and Microsoft Explorer are among the most prevalent.

Previous or referring site: Such as a search engine, a link or URL from another page, or a banner ad click-through.

Connection type: Identifying the communications hardware modem speed (14.4, 28.8, T-1, etc.)

Marketing Applications of Clickstream Data

Current marketing applications of clickstream data can be broken out into four interrelated approaches - Aggregate Traffic Tracking, Customer Profiling, Semi-Customization, and Full-Customization. The four approaches range in sophistication, with each one building on learning from the previous. Different marketing applications are the result of each level of analysis, with fewer Web page publishers conducting the more sophisticated analyses.

Aggregate Traffic Tracking
The most simple of clickstream analyses, most Web page publishers are currently using clickstream data for this purpose. A marketing application of this analysis is simple Web maintenance, using the traffic figures to optimize site organization for customer satisfaction and ease of use. For instance, the recording label Windham Hill (http://www.windham.com) used traffic analysis to identify patterns among the sampling of artists from their Web site. Windham Hill recognized that the artists at the beginning of the alphabet were getting more clicks than those at the end of the alphabet. As a result, they chose to reorganize the listed order of their artists, listing the artists by the label's promotional priorities rather than the last name.(4)

In another example, National Semiconductor (http://www.national.com/), whose site provides detailed product information on the company's 27,000 parts, examined its visitors by domain and discovered that many of their prospective customers were visiting from international addresses. As a result, the site was reorganized to operate with the use of internationally recognized icons, as opposed to English language instructions.(5)

Aggregate traffic tracking analyses are also used to gauge advertising effectiveness, approximating traditional measures of advertising reach for banner ads by measuring the number of ad exposures (also known as impressions or page views), or number of click-throughs. A click-through reflects the number of people who actually click on a banner ad to visit the advertised site. It is estimated that the average banner ad has click-through rates of less than 10% of total page views.(6)

With estimates of 1996 advertising on the Web ranging anywhere from $130 to $300 million,(7)

advertisers have been anxious to compare the cost effectiveness of advertising on the Internet with other traditional forms of media, such as television and print. CASIE, the Coalition for Advertising Supported Information & Entertainment (www.commercepark.com/aaaa/bc/casie/guide.html/), for instance, is working with the Advertising Research Foundation to establish criteria for interactive media measurement to create fair, standardized measures of Web site audience.

Customer Profiling
A common marketing application of this level of analysis is market segmentation or building profiles of users in order to give marketers a better understanding of who is interested in their products. A user's actions while surfing the Web provides very little traditional demographic information beyond the simple segmentation variables such as geography, domain, site activity and preferences. As a result, unless the site requires the visitor to complete a registration form and voluntarily provide personal information, in-depth customer profiling requires the Web page publisher to make inferences from the visitors' behaviors and attributes.

InfoSeek, for instance, tracks users by areas of interest, as revealed by keyword searches and previous sites visited. Making assumptions based on this information, InfoSeek then classifies users into familiar categories, such as "Business People, Frequent Travelers, Power Computer Users or Home & Leisure Enthusiasts."(8)

While InfoSeek completes this customer profiling primarily for selling advertising space, other commercial sites may use similar practices to profile their prospective customers by their traditional market segments. Similarly, demographics can be inferred about particular users, depending on whether they are searching for information on sophisticated financial products or child rearing.

Semi-Customization
An example of semi-customization includes page view personalization. At the most basic level, some sites, such as The New York Times (http://www.nytimes.com), greet the visitors by their self-provided registration name or e-mail address (i.e., "Hello, lilian07"). In more "collaborative" cases, the user is asked to select from given choices of content to determine his preferences, and then the page is automatically customized accordingly. For instance, the MSNBC news site (http://www.msnbc.com) offers registered users a product called "Personal Front Page". The user selects from a menu of news areas, such as world news, sports and opinion, and these preferences are retained for that user. As a result, the user sees just those news items every time she logs onto the site. Notably, the site also allows for personalization of weather, horoscopes and traffic news, taking into account demographic and regional variables.

Targeted advertising is another common application of semi-customization. With targeted advertising, banner advertisements are designated to be exposed to specific visitors on the basis of the preferences revealed in their Web surfing. Many search engines have sold certain trigger keywords to advertisers for this purpose. For instance, when a visitor to the Web initiates a search with the words "automobile" in the Webcrawler or InfoSeek search engines, the results of the search will be returned with banner advertisements for Saturn, Toyota or Lexus automobiles. Similarly, searches including the words "flowers" and "watches," will bring up ads for FTD and Seiko, respectively.

Full-Customization
Full product or service customization is the most sophisticated of the clickstream data applications. In theory, such customization would automatically serve up a personalized product to the user based on that user's demographics or needs, without any effort put forth by the user. In this case, the cookie technology may be employed to record the user's history of preferences and actions over the course of multiple visits to the Web site. This application would fulfill the Web's potential for achieving a one-on-one relationship with the consumer, with each person receiving a unique presentation.

At this time, few if any Web sites are offering this service to its fullest potential. One company that is approaching full-customization through the use of clickstream data is the Internet bookseller, Amazon.com (http://www.amazon.com). Amazon offers a searchable database of more than 2.5 million titles and prices at 10% to 40% off publishers' list prices. The company actively tracks individual customer's preferences through their title searches and buying behavior. This information is then used to send consumers e-mail messages promoting books that fit the consumers' preferences. Notably, Amazon recognizes the value in the application of clickstream data, perceiving itself not as a retailer, but as a type of market research analyst. Says Jeff Bezos, Amazon CEO, "Ultimately, we're an information broker. On the left side we have lots of products, on the right side we have lots of customers. We're in the middle making the connections."(9)

One of the main barriers to the more prevalent offering of full-customization through the Internet may be the huge computing resources required for long-term information tracking, real-time analysis and subsequent product or page view alteration. Computer power may not remain a barrier for long, however, given the rapid evolution in computer capabilities as predicted by Moore's Law which postulates that computer processing capability doubles every eighteen months.

Clickstream Data Versus Traditional Marketing Research
It is important to note that if you strip away the computer technology that is responsible for capturing it, clickstream data can be viewed as basic marketing research that would be tracked by any marketer using traditional non-electronic outlets. As a result, marketing applications of clickstream data are similar to traditional applications of marketing research.

Take for example a hypothetical clothing chain that operates retail outlets across the country, as well as a Web page offering product information. At the Web site, the clothing chain could be collecting and analyzing the actions of visitors to conduct the kind of clickstream marketing applications described above.

At its retail sites, similar actions would be taking place using traditional marketing research. First, advertising effectiveness would be gauged using the traditional CPM (cost per thousand exposed) methodology. Media planning based on readership/viewership profiles would provide the basis for targeted advertising. Aggregate traffic analysis would be conducted through observation of store traffic, possibly resulting in merchandising decisions such as the sale racks being moved to the front of the store, or matching tops and bottoms being displayed together. Customer profiling would be conducted through exit interviews or tracking callers to Consumer Affairs.

Semi-customization may take place through personalized direct mailings or special product offerings to frequent purchasers through continuity programs. Finally, full-customization would require a full analysis of customer needs and preferences, perhaps resulting in personal fit blue jeans, tailored to fit each customer's body.

Recognizing the great potential inherent in clickstream data applications, a whole industry has sprung up to supply tools and strategies for clickstream data analysis. Section II profiles key players in the new industry, and investigates the magnitude of the use of their tools in the marketplace.

SECTION II: SUPPLIER INTERVIEWS

A comprehensive survey was conducted of leading magazines and Web sites to develop a list of companies currently selling clickstream data analysis technology. These companies generally offer two types of products: technology for tracking site activity and advertising management; and technology for auditing and verifying site activity for Web advertisers and publishers. We then reviewed each of the company Web sites and personally contacted them for an in-depth interview, asking the following questions:

1. Name and title of person interviewed.

2. What percentage of companies are collecting clickstream data, analyzing this data, and are actually using it for continuous improvement?

3. Referring to the basic clickstream marketing applications that our group has identified, which specific industries and/or companies are using clickstream data for the following purposes:

a. Aggregate Traffic Tracking, including Web maintenance and advertising effectiveness
b. Customer profiling, including market segmentation
c. Semi-customization, including page view personalization and targeted advertising
d. Full-customization, including product and service customization
e. Other

4. What industries and/or companies are on the leading edge of using clickstream data analysis technology?

First, findings from interviews with four clickstream analysis technology providers are presented: BroadVision, W3.com, NetGenesis and Accrue Software. Then, another aspect of the clickstream data analysis industry is presented from the perspective of companies providing third party verification of the collected data: Audit Bureau of Circulations, BPA International and I/Pro.

Site Activity Tracking Suppliers
BroadVision, Inc. (http://www.broadvision.com)
Through its One-to-One technology product, BroadVision helps companies facilitate electronic commerce, including everything from ordering and payment to customer tracking. Scott Eschenroeder, Channel Sales Manager at BroadVision, felt that this technology was still in the "early adopter" phase. He added that progressive firms within each industry are evaluating the technology, but few are currently doing anything with it today. In its sales efforts, BroadVision is focused on financial services, merchandising/retailers and media/publishing because they believe that these are the industries that will eventually initiate the use of tracking software.

Eschenroeder reports that initially, BroadVision estimated that 100% of its revenues would come from electronic-commerce (merchandising and retailers), but it has become obvious that this will not be the largest segment of Web utilization. Non-traditional retailers with virtually no overhead, such as CDnow and Amazon.com books, are emerging as leading Web retailers while the largest traditional retailing companies are using the Web for other reasons. BroadVision's "sell" is that this technology finally allows companies to use the one-to-one marketing techniques that have been preached over the past few years. Currently, BroadVision's best customers have either just recently begun to use the technology, or are experimenting with it. BroadVision is filling that gap by offering their consulting services to these companies.

W3.com (http://www.w3.com)
W3.com develops and markets a number of products for Web banner management, visitor tracking and Web development. While clickstream data is automatically captured in a computer's log file, President Dr. Andrew Conru estimated that only about 2% of companies on the Internet actually review clickstream data, and even fewer analyze and use the information. From his perspective, it appeared that the companies that did collect the data were able to do so because of the availability of full-time staff resources. Conru felt that most companies on the Internet were struggling to turn a profit and could not finance a full effort for clickstream data analysis. Not surprisingly, companies that provide software for this very purpose have an easy sales pitch, emphasizing the low cost of their technology compared to the high fixed cost investment of a full-time staff.

NetGenesis (http://www.netgen.com)
NetGenesis, founded in 1994, markets site analysis software allowing companies to gather numerous site statistics including, who is visiting, where they are from, what they do when they're there, etc. Last month, Software Magazine cited NetGenesis as "one of the top 25 significant companies delivering enterprise software for Internet applications."(10)

Matthew Cutler, NetGenesis founder and Director of Business Development, further confirmed the low levels of clickstream data analysis. He told us that although all Websites collect clickstream data on log files not all companies analyze this information. Cutler estimated that of all companies with Web sites, between 1% and 4%, are actually analyzing its data. NetGenesis' customers typically use clickstream data for maintaining and improving their Web sites, and gathering customer information. Beyond that, using clickstream data for market segmentation is as sophisticated as the company has seen. Using this data for market segmentation is seemingly unique to on-line media companies such as Yahoo, and traditional media companies such as Ziff Davis. Cutler also pointed out, however, that financial services and software companies are exceptionally savvy at exploiting new technological opportunities.

Accrue Software, Inc. (http://www.accrue.com)
Accrue Insight was established in February 1996 to address site performance and improvement for users such as Web site managers, media buyers and marketing decision makers. Theresa Marcroft in Marketing at Accrue told us that the site monitoring technology has only been available since early this year, resulting in lots of interest but little meaningful action. She believes that what differentiates those companies seriously interested in the software is whether or not they view the Web as a long-term strategic initiative. In contrast, small companies using the Web to give clients directions to their nearest site, company history and contact phone numbers are not very interested in the power of the new software. One company, in their opinion, that is "pushing the envelope" by using the Web as an integral part of a comprehensive marketing strategy is the Public Broadcasting System (see PBS interview in Section III).

Auditing and Verification Suppliers
As clickstream data becomes more widely applied, particularly for Internet advertising, a growing number of companies have emerged which provide third party verification of circulation data to companies and their advertisers. These companies will evaluate Web sites for advertisers to confirm that the information being claimed (i.e., number of visitors, number of visits, duration of stay, who is visiting) is accurate. Some have even begun to evaluate the technology itself. In January, Web traffic auditor BPA International gave its seal of approval to NetGenesis' software product, net.analysis, for meeting independent standards of Web site auditing.

We interviewed three leading auditing companies, The Audit Bureau of Circulations (ABC), BPA International, and I/Pro. Our initial findings from ABC were also confirmed by BPA International (http://www.bpai.com) and I/Pro (http://www.ipro.com).

Audit Bureau of Circulations (http://www.accessabc.com)
ABC, already established in the publishing and advertising industries, launched their Web department in June 1996 and already has 55 clients. According to Doug Krauss of ABC's Web Site Audit Group, they load encrypted software on sites so the Web managers can not adulterate the data. The software looks for a few hundred "agents" that are like mini-viruses built for the purpose of defrauding the ignorant. These "agents" can trick the counting software into thinking that 300 different users visited the site yesterday, when in fact only 40 actually visited. ABC's software will have an independent count and any discrepancies will be pursued. A full audit report is sent to the company and its advertisers at the completion of the study. ABC told us that of the $170 billion in advertising spent in 1996, only $200 million was spent on the Internet, although that number continues to increase annually.

Conclusion of findings
Nearly every supplier spoken with told us that very few firms, if any, are using this clickstream data analysis technology to its fullest potential, and the technology is not being pursued fully by any one industry. Within each industry, usually the most aggressive and progressive firm is currently looking at the technology but no one really knows what to do with it at this point. Most companies with Web sites are collecting basic measurement data, but only a handful are using any of this data to their advantage. Most suppliers of analytical software reported that their selling point was to convince these companies that they should be using this technology for all four reasons we outlined in our questionnaire. Not one person could name a client that was extracting the full value of the technology available.

During the course of the interviews, the contacts identified several companies, each in very different industries, which are significantly advanced in the analysis and application of clickstream data. Section III profiles four of these companies who are on the leading edge of clickstream data analysis as part of an overall marketing strategy.

SECTION III: COMPANY INTERVIEWS

In order to assess the state-of-the-art of clickstream data analysis and application, several companies with a significant presence on the Web were interviewed. These companies were mentioned as leaders in clickstream analysis and use by the suppliers of Web-tracking tools and Web-tracking services that were interviewed in the previous phase of the paper.

Four companies were interviewed: Starwave, PBS, FedEx, and Liberty Financial. Each company was asked the following questions:

1. Name and title of person interviewed. How many visitors does your site receive per day?

2. Does your company currently use and/or collect clickstream data and analyze it? If so, how long have you been doing so? If not, do you have plans to do so in the future and when?

3. Are you using a technology/software tools that you purchased to collect and/or analyze the clickstream data? Are you partnering with any providers in particular? If not, are your tools developed in-house?

4. Referring to the basic clickstream marketing applications that our group has identified, which applications are you actively pursuing? Please describe any examples if you can.

a. Aggregate Traffic Tracking, including Web maintenance and advertising effectiveness
b. Customer profiling, including market segmentation
c. Semi-customization, including page view personalization and targeted advertising
d. Full-customization, including product and service customization
e. Other

5. In your opinion, what is the most valuable application of clickstream data analysis?

6. Is the data that you are getting so far, valuable? e.g., used in content decisions or resource allocation?

7. Do you know of any other companies using clickstream analysis for non-advertising purposes as much as you are?

Note that the choice of organizations was in no way meant to be a scientific or all-encompassing representation of those sites actively using clickstream analysis in their marketing strategy, but rather a census of leading edge companies in this area to provide us with a better understanding of the most advanced activity on the Web.(11)

An overwhelming amount of clickstream information is within the reach of electronic marketers. But while the data is automatically captured in the log file, the level of analysis and actual use of clickstream data by companies range from none to very sophisticated. Though most of the existing work in clickstream analysis focuses on the advertising related benefits, this section includes interviews with companies that use clickstream analysis in terms of both the advertising and non-advertising applications within a comprehensive marketing and Internet strategy.

During the course of the interviews, it was discovered that some organizations were reluctant to reveal many details about their efforts. This sensitivity to sharing information publicly may suggest that the organizations view their particular Web site activities as a competitive advantage. This belief is further reinforced by the fact that, though the individuals that we spoke with are leaders in this field, when asked to identify other Web sites on the leading edge of clickstream data, none could definitively name even one other site.

Survey Results

Starwave Corporation (http://www.starwave.com)
Starwave Corporation, based in Seattle, Washington, is a Web site publisher that publishes and manages nine sports and entertainment related Web sites such as ESPNet SportsZone (jointly created with Walt Disney's ESPN unit), Family Planet, and the official NBA and NFL sites. Walt Disney recently announced plans to buy about a 5% stake in Starwave.(12)

The ESPNet SportsZone site is often rated as one of the most frequently accessed Web sites, attracting at least as many as 500,000 visitors per day.(13)

According to John Morel, Head of Market Research, Starwave has been collecting clickstream data and performing analysis since almost the beginning of their Web site efforts in 1993. Like some other sites we talked with, Starwave uses a mix of both proprietary technology created in-house plus purchased technology from a tool or service provider such as those mentioned earlier, in order to collect data and perform analysis.

Starwave is active in several of the possible clickstream data marketing applications. Clickstream data are collected and used for basic measurement of Web site usage for advertising and tracking purposes. Data are also used for targeted advertising customization in a limited sense. Starwave's ad customization is not based on gender or age, but on other demographic variables that are readily available such as geography or domain.

Starwave is also using clickstream data for Web maintenance. Both content and design decisions are made based on clickstream analysis in order to improve content offerings and the ease of use of the site. Though Starwave had one early effort underway to use the data for market segmentation purposes, they are currently due to revisit this application. Finally, Starwave's customized product offering efforts are currently under development. For example, Starwave is considering allowing users to determine and notify Starwave of features or content areas that they like most. As a result, Starwave could deliver content to its users based on those desires on either a "push" or "pull" basis.(14)

No other uses of clickstream data were mentioned.

In Mr. Morel's opinion, the two most valuable applications of clickstream data lie first, in using the data to determine what people want to see the most, in order to meet those demands in the most efficient and effective manner, and, second, in enhancing ease of use of the site. Morel concludes that the data collection and analysis at Starwave have been valuable in that they are used in decisions that are made regarding content and internal resource allocation.

Public Broadcasting System (http://www.pbs.org)
PBS (Public Broadcasting System) On-line is the Web site for the Public Broadcasting Network, a nonprofit television broadcast entity based in Arlington, Virginia. There are local PBS member stations in all fifty of the United States, and there are stations that broadcast in Canada. According to Dave Johnston, Manager of Information Technology for PBS On-line, the PBS On-line site attracts over one million visitors per month. PBS On-line is a complex site that features more than 25,000 pages of content.(15)

PBS On-line began to collect clickstream data in earnest with the April 2, 1997 relaunch of their site. However, PBS On-line had been analyzing clickstream analysis on a smaller scale on an experimental basis before that point. PBS expects the data it will now collect will be valuable in decisions on content and resource allocation.

PBS On-line uses a combination of proprietary programming, as well as software from Accrue Software such as the Accrue Insight product, to collect data and conduct clickstream analysis. Using in-house techniques, PBS looks at information from, for example, the common log format file on their Web server, to determine factors such as:

  • Number of hits to home page in a 24-hour or weekly period
  • Number of hits to the home page over a day or a week
  • Number of unique IP addresses over a 24-hour period

PBS recognizes that by using the common log file technique, some of the results may not represent individual users, but at least PBS has an absolute minimum number of unique IP addresses. PBS has found that the actual number of users tends to be somewhere between 20-50% greater than the IP addresses represented. Currently, PBS' proprietary programming is not sophisticated enough to distinguish between individual users and sessions, but with Accrue's Insight, PBS says they will be able to analyze their traffic on an individual basis. Since PBS just recently began to use Accrue's software, it was difficult for PBS to share with us any feedback on their experiences using the technology.

Mr. Johnston pointed out that the situation of PBS On-line may be unusual because PBS is a television broadcast entity whose Web traffic is directly related to "on-air tags," or tags on the PBS television broadcast to check out the Web site for further information. PBS has observed that much of their traffic is generated through and directly linked to these one-time television tags.(16)

For example, accessing the log file after a compelling front-line program such as one on the Gulf War, will indicate that twelve to sixteen seconds after the tag, the server will get "pounded" with people wanting to come onto the Web site. Similar to other sites, PBS also wants to know more about what happens after a user actually gets to the Web site. They ask themselves questions such as: are people finding the information that they want? How much time do they spend on a page once they get there?

PBS On-line is currently active in several of the possible clickstream data marketing applications. PBS does collect and use data for basic measurement purposes. However, PBS itself is not involved in any advertising-related activities, including tracking advertising effectiveness, or targeting advertising to specific users. DoubleClick(17)

handles most of the sponsorship and advertising efforts for PBS. PBS has chosen to outsource much of its advertising-related activities because advertising currently is not an area of focus.

Interestingly, PBS On-line is not subjected to the same FCC sponsorship and advertising restrictions on its Web site as it is on its TV broadcast. PBS On-line is currently undergoing a 60-day test period to gauge public reaction to the sponsorship/advertising banners on their Web site. So far there has not been significant negative reaction. This may be due to several reasons, including the fact that sponsors are have been selectively screened by PBS. Also, viewers are used to seeing and hearing during the TV broadcast that certain programs are funded by particular commercial as well as non-commercial entities.

Regarding nonadvertising applications, PBS uses clickstream data for maintenance of their Web site. For example, PBS watches for cues, such as whether or not users always begin their travels on PBS's Web site at one place, and incorporates that information into their site strategy. When asked about market segmentation, PBS responded that they are concerned about their viewers' right to privacy. Therefore they have no interest at this time in building user profiles or segmenting their users for such purposes. However, at the Shop PBS portion of the site, users can, on a voluntary basis, give specific information about themselves if they would like a more targeted or customized shopping experience.

Though PBS did not share any specifics about their efforts in product customization, they did mention that customized content offerings could be based on following a user's clicks through the PBS site to deliver an aggregation of certain Web site features that might be appealing to the user. Mr. Johnston pointed out, however, that the processing overhead required by individual, customized efforts enters into the decision to use clickstream. Considering PBS On-line gets more than one million hits per month, the resources required to scale customization efforts up to that level cannot be ignored. Individual customized offerings are not something PBS has introduced at this time, but if intelligent filtering of information is demanded by the user, perhaps PBS would give individual customization more consideration.

In Mr. Johnston's opinion, the most valuable applications for clickstream analysis were twofold: first, to analyze the responsiveness of the server, and second, to determine which content is more valuable to users and to make sure that the content is interesting to the users once they get to it. Mr. Johnston plans to use Accrue's technology to see how long a user spends at one page, what elements they are interested in, and which pages are abandoned. Previously, PBS simply used the common log format or tracked certain environmental variables to get a rough idea of the answers to those questions. The Accrue software enables PBS to listen to their Web site traffic in a wide variety of ways, indicating when a user hits the "stop" button, when the data times out, or when a user abandons a page download.

Mr. Johnston also stressed the consideration of the tradeoff between analysis and collection. Though technology like Accrue's allows PBS to slice and dice the data many ways, in building an effective system that collects and analyzes clickstream data, robustness must be balanced with scalability. This may not be an issue if you have 200-300 users on your site each day. But when PBS receives, as mentioned, more than 1 million hits per month, robustness and scalability become important considerations. Typically in Mr. Johnston's experience, the more data is collected the less scaleable the system becomes.

Federal Express (http://www.fedex.com)
FedEx On-line is the Web site for FedEx, the document and small package shipping and transportation company that reported about $10.3 billion in revenues for 1996.(18)

According to Steve Braun, Manager of Electronic Commerce Marketing at FedEx, over 2.5 million visitors go to the FedEx Web site each month. FedEx uses a combination of both proprietary technology and market technology for their clickstream efforts. Mr. Braun could not reveal more specific details about their technology.

FedEx has been collecting and analyzing its clickstream data since they started their site about 2 1/2 years ago. However, Mr. Braun cautioned that just verifying that someone clicked does not tell you a lot from a marketer's point of view, and indicated that the most valuable application of clickstream data is the ability to focus on marketing on a one-to-one basis. As technological tools for such collection and analysis evolve, FedEx continues to upgrade the site's capabilities. Since the beginning of its Web efforts, measurement and assessment have been major priorities for the FedEx site. FedEx primarily is interested understanding the behavioral path that leads users to their site and the behaviors that characterize their visits. FedEx On-line is currently active in most of the potential clickstream data marketing applications to some degree.

FedEx uses fairly specific tracking tools that enable them to track the behavioral path of a user within their Web site. Mr. Braun's view is that a particular page should drive a particular behavior. Therefore, FedEx uses tools that allow them to see if the behavior did occur, and whether it was completed before exit. For example, the company records behavioral path data from their tracking page on a daily basis. FedEx is able to then sort out whether or not users in fact want to and are able to track packages on this page, or whether they're just looking at the capabilities.

Regarding Web site maintenance, Mr. Braun mentioned that in a truly interactive, virtual space, you want to get customers on the site performing the actions that you would like them to perform. Again, using the example of FedEx's tracking page, one of FedEx's goals is to use the data to enable users to easily track their package. By performing clickstream analysis on the clicks around and on the FedEx tracking page, FedEx can gather feedback on whether or not the intended behavior of going to the page for tracking information and being able to retrieve that information from the Web site are occurring. FedEx complements its clickstream data analysis with other research on-line, as well as by going back to its customers to talk about their experiences on the site through usability studies and focus groups. This information is then integrated into site marketing and management.

When asked about market segmentation, we learned that FedEx uses sophisticated customized segmentation tools that allow the page maintainers to perform segmentation directly from their desktop. FedEx's Web strategy does include semi-customized product marketing efforts. For example, users are able to personalize their learning experiences at the FedEx Learning Lab on the Web site. Mr. Braun declined to comment more specifically on these initiatives. Moreover, FedEx's efforts to date have proven valuable for decisions regarding content management and resource allocation. The importance of clickstream data collection and analysis at FedEx is reinforced by the fact that FedEx does have specific resources in their budget allocated for such activities.

Liberty Financial Companies, Inc. (http://www.lib.com)
Liberty Financial, based in Boston, Massachusetts, is an integrated asset accumulation and management organization that manages $48 billion of assets for more than 1.4 million customers worldwide. The company's services include investment management for individuals and institutions through fixed, indexed and variable annuities, private and institutional accounts, and 61 mutual funds.(19)

Mr. Iang Jeon, Liberty's Vice President of Electronic Commerce, could not disclose how many visitors the Liberty Web site attracts per day.

Liberty has been collecting and using its clickstream data since its Web site went live about three months ago. Liberty's personalized and customizable sites (including those of some of its operating units), were built using BroadVision's One-To-One application system. Liberty's Web sites perform real-time analysis on their Web site customers by way of intelligent agent capabilities built into their software. The company is currently active in most of the potential clickstream marketing applications. Liberty's approach is to establish a relationship with its users up front before focusing on those six elements, in order to create an on-line interaction within the context of a user's, or a customer's, specific relationship with Liberty. Then, Liberty's activity in the six applications flow out of that relationship. They see building relationships as the core of their Web strategy.

For example brokers for annuity products are licensed on a state-by-state basis. As a result, brokers may or may not need to see different products because of their particular state affiliations. Liberty's real-time capabilities make it easier to make the decisions on which products to present by factoring regulatory guidelines into their on-line relationship with the broker. Incorporating what our report defines as clickstream data is only a piece of Liberty's integrated on-line strategy.

Liberty develops relationships with its customers on-line through such services as the real-time access Liberty grants its customers to their accounts over the Web. The accounts store information about the customer's portfolio, including holdings customers may have at other funds, for an on-line consolidated view of a customer's net worth. Then, Liberty can make inferences about the customer's preferences from the on-line relationship in order to trigger different messages to different customers. For example, if Liberty knows that the customer does not have any children, it will not direct the customer to the college planning calculator feature. Also, if a customer has already visited a calculator feature, Liberty knows it does not need to remind him about that particular calculator again.

When asked about what he thought was the most valuable application of clickstream data, again Mr. Jeon pointed out that when considering such issues, the larger, strategic context of how the Web medium is used should be considered. This is especially critical as marketing moves from a traditional product push to a marketing approach that considers the lifetime value of a customer and invests in customer relationships. Compared to traditional, static, marketing research that analyzes information after the fact, the dynamic nature of the Web will enable marketers to reach the segment of one by adapting their actions and relating to customers, real time, on an individual basis.

Regarding the information that Liberty Financial learns from its Web site through the clicks, or behaviors that customers exhibit on-line, Liberty does use the information to make business decisions such as content and resource allocation decisions. Beyond such uses and in further determining how valuable Web efforts are to Liberty, Liberty is very early in the life of its Web site and mentioned that it is too soon to measure progress against specific milestones that it has set for itself.

Summary of Findings
Consistent with findings from the supplier interviews, the company interviews confirmed that very few sites are using clickstream technology to its fullest potential. Because the technologies are relatively new, the leading Web sites in clickstream usage have either just begun to experiment with what can be done, or are just beginning to establish a clickstream analysis methodology.

As emphasized in the discussions with Liberty Financial, it is also important to note that differences in Web site marketing strategy among sites, including clickstream analysis, may be partially attributed to the inherent nature of the relationship between the customer and the business. Each relationship does not call for the same level of involvement or interactivity to be established through the Web, or necessarily the same level of clickstream analysis. As Mr. Jeon of Liberty mentioned, the type of relationship that a consumer package goods company wishes to establish with its customers over the Web may be different from that of a financial services company.

Also, because organizations active in using clickstream analysis are in the early stages of use, the degree to which each site is active in each of these activities may or may not be obvious to the user at this time. Most of the efforts in these areas are not apparent to the user, but happen behind the scenes.

The table below summarizes which of the six basic marketing applications each of the companies is involved in.

Participation in Marketing Applications of Clickstream Data Analysis

  COMPANY OR ORGANIZATION
  Starwave PBS FedEx Liberty Financial
MARKETING APPLICATION        
Aggregate Traffic Tracking        
Web site management and maintenance ü ü ü ü
Advertising effectiveness ü ü ü ü
Customer Profiling        
Market segmentation ü   ü ü
Semi-Customization        
Page view customization ü ü ü ü
Targeted advertising ü     ü
Full Customization        
Product Customization       ü

Below the surface of each of the six applications, we found that there are varying degrees to which individual organizations collect and use its data for each of the applications. The degrees of engagement can be thought of as falling along a continuum of activity for each of the six applications. A Web site's position on any of the continuums should again be a direct function of the strategic role of the Web site within the greater context of a company's or an organization's marketing plan. As mentioned above, the type of relationship an organization wishes to develop with its customer would be a primary driver of the strategic role of the Web site. For example, if the Web site is meant primarily to engage the customer in a dialogue, that site will be constructed differently than one whose primary purpose is for advertising.

Ideally, the table would indicate the degree to which each organization is involved in each of the six marketing applications. However, because the company representatives whom we spoke with were not at liberty to disclose the specific extent of their clickstream efforts to us, and because clickstream analysis and use is less apparent to the user, organization involvement is represented in the table simply by a check for involvement, or no check for no involvement.

Finally, it appears that clickstream activity can be meaningfully differentiated on a time-of-use basis. Some analyses and applications are performed after a user has visited the site. Other analyses and applications may be performed real-time, with the analysis resulting in changes to the site while the user is still involved in the session. Such immediacy approaches true interaction, and begins to imitate the conversation a retailer may have with their customer at a store or on the phone.

SECTION IV: CONCLUSIONS

Our survey of the current state of clickstream data collection, analysis and application has revealed one overall important finding: Of the countless companies on the Internet, very few are using clickstream data for any of the potential marketing applications. Below, six potential reasons for this situation are suggested.

The technology is relatively new.
Even leading-edge companies only recently started to implement and use technology to analyze and apply clickstream data. Still others are experimenting while some have adopted a "wait and see" stance to observe the suppliers' staying power and, hopefully, see costs fall.

Processing capabilities are limited.
Internet use in general still presents problems around accessibility and the speed and reliability of connections, for both the visitor and the Web page maintainer. Some of the most sophisticated clickstream data applications - such as semi- or full-customization - require huge amounts of computer power.

Company resources are limited.
A Web site's utilization of clickstream data is constrained by the amount of resources it can bring to bear. There appears to be a direct relationship between a company's clickstream analysis efforts and the staff and resources dedicated to this effort. This finding was confirmed on two fronts. The leading companies profiled in this paper all have made a conscious effort to put a dedicated staff towards these initiatives. Further, the suppliers of the analysis technology stated that the companies with the most resources are the few leading the charge in using clickstream data. As technology evolves, the automation of such processes may allow for more efficient and effective use of such data, possibly resulting in a general shift down in the cost curve.

The return on investment for all Internet ventures has yet to be measured.
Most companies with a presence on the Web have yet to see their expenditures prove their worth. According to Brian P. Tierney, President and Chief Executive Officer of Tierney & Partners, "It's no secret that the Internet has yet to prove its effectiveness as an advertising medium....While no one doubts the Internet's potential for growth, as with other mediums companies are looking for immediate, quantifiable results and a substantial return on their investment. Fortunately, the industry is making headway in that regard, and with that, we will see an increase in the success of Internet advertising."(20)

Moreover, companies must first concentrate on setting up their Web sites, creating content, and getting the server up before they can focus on clickstream. However, according to Matthew Cutler of NetGenesis, it is the clickstream data itself that can help provide the payback from a Web site (e.g. making it more efficient and valuable), and demonstrate that the payback exists (e.g. provide data that proves ROI is positive).

Consumer privacy issues loom on the horizon.
The Federal Trade Commission (FTC) is currently considering whether sites should notify users about the amount of personal information being collected about them, and whether users should be given a choice as to whether or how their personal information is to be used. Intervention by the government or any other standard-setting body could affect the future applications for clickstream analysis.

Some companies have yet to identify the larger, strategic role of a Web site in an overall marketing program.
A decision to enter the Internet should flow out of the company's comprehensive marketing strategy, which in turn should reflect the organization's overall corporate strategy. Clickstream is one tool that contributes to the effectiveness of the Web site strategy. But, a company must know what the goal of the Web site is in order to focus their clickstream data analyses and application.

It is important to note these are barriers may diminish over time. For instance, marketers' familiarity with Web's capabilities, as well as general computer processing capabilities will inevitably increase. Similarly, privacy concerns may be reduced with increased company presence and consumer use of the Internet. Also, not all companies face these barriers to the same degree.


1. A formal definition of "clickstream" data, according to CASIE, the Consortium for Advertising Supported Information and Entertainment: "The database created by the date-stamped and time-stamped, coded/interpreted, button-pushing events enacted by users of interactive media, controlling their systems via remote control channel changers, alphanumeric PC keyboards and mice, numeric keyboards of PDAs and similar devices, and voice command of screen media."
2. "Web Searches for a Yardstick", Advertising Age, October 9, 1995.
3. "'Cookie' Proposal Could Hinder On-line Advertising", Advertising Age, March 31, 1997.
4. "Hits that Rate Attention", Inc., September, 1995.
5. Technology Column, The New York Times, April 21, 1997.
6. "Web Ad-mosphere: Still a Challenge", Inter@ctive Week, April 8, 1996.
7. Technology Column, The New York Times, April 21, 1997.
8. "Web Ad-mosphere: Still a Challenge", Inter@ctive Week, April 8, 1996.
9. "A River Runs Through It", The Economist, May 10, 1997.
10. "Top 25 Internet Companies," Software Magazine, April 1, 1997.
11. Considering that the ratio of high-traffic Web sites to total Web sites is disparate, and that only sites with a certain level of traffic will find data collection and analysis worthwhile, it is reasonable to say that there are relatively few Web sites who are active in clickstream data analysis and collection, making our census of Web sites an appropriate way to look at what clickstream activity is actually occurring at this time.
12. "When You Wish Upon a Starwave," Business Week, April 14, 1997.
13. "PERSONAL TECHNOLOGY; Tech Media; THIS WEEK'S HIGHLIGHTS," by Charles Haddad, The Atlanta Journal and Constitution, April 6, 1997
14. Content delivered on a "push" basis refers to technologies that will send content out to users without them having to retrieve content themselves. Popular push technologies include PointCast's broadcast network of information delivered through screensaver technology, or news that may be delivered directly to a user's e-mail account. A "pull" method of content delivery refers to a user actively retrieving the content for themselves. 15. "PBS ON-LINE: Down on the farm with THE AMERICAN EXPERIENCE/ PBS ON-LINE's new Web site," M2 Communications, M2 Presswire, April 11, 1997.
16. Johnson & Johnson experienced similar response with their on-air tags during the Olympics.
17. DoubleClick (http://www.doubleclick.com), the first Internet advertising network, aggregates over 60 Web sites by categorizing them under seven categories, including business and finance, directories, search engines & ISPs, and entertainment (some sites are listed more than once because of overlapping relevance to a category). Sites in the DoubleClick network include the AltaVista Search site, the Travelocity site, and the Dilbert Zone site. Advertisers can then channel their ads through DoubleClick and access sites in certain categories. DoubleClick also provides ad targeting and reporting service technology and services.
18. 1996 Federal Express Annual Report, http://www.fedex.com/annual-report/.
19. Liberty Financial Home Page, http://www.lib.com/.
20. "Business Reliance on the Internet Will Jump over 500% by 1999; Survey Reveals Trends in Purchasing and Selling, Direct Internet Access; Cities Security Issues and Slow Access to Information as Obstacles to Usage," PR Newswire, April 9, 1997. The article refers to a study released April 9, 1997 by the American Management Association (AMA) and Tierney & Partners, a Philadelphia-based strategic communications company. The survey is the largest to date on Internet use for business purposes.