February 8, 1993 Economic Classification Policy Committee Issues Paper No. 2 Aggregation Structures and Hierarchies The present U.S. Standard Industrial Classification (SIC) system is hierarchical in that each level of the system provides an aggregation of detail at the next lower level. The SIC system arrays the economy into 11 divisions, that are divided into 83 2-digit major groups, that are further subdivided into 416 3-digit industry groups, and finally disaggregated into 1,005 4-digit industries. 2.1 Classification Hierarchies Overview What role should hierarchies have in a revised economic classification system? One way of organizing the discussion of this question is to distinguish between "top-down" and "bottom-up" approaches to constructing a classification system. A top-down approach is one that begins from the division or 2-digit level--that is to say, from a hierarchical concept of classification--and assigns 3-digit or 4-digit industries to major sectors and industry groups. The top-down approach probably originated in the habit of partitioning the economy into "primary" (agriculture, mining), "secondary" (manufacturing), and "tertiary" (nongoods producing) sectors, sectors which can be distinguished clearly in the SIC. With a top-down approach, the hierarchy is inherently a vital part of the classification system. In contrast, a bottom-up approach concentrates on forming 4-digit industries from the individual producing units. Thus, a bottom-up approach implies that the second- or third-level aggregations--grouping industries into hierarchies--involve questions that are primarily extensions of questions that are confronted when the first-level aggregations are determined. If hierarchies of the basic 4-digit industries are to be formed at all, they are formed to facilitate in some way the use of the full classification system. Conceptual and Data Use Positions on the Question of Hierarchies At the International Conference on the Classification of Economic Activities at Williamsburg, Virginia ([1], hereafter, "Williamsburg Conference"), some participants explicitly stated that a hierarchical structure was needed, or implicitly assumed its importance in the context of explaining an existing classification system and/or of constructing a revised system. From an analytical perspective, Paula Young (Williamsburg Conference [26], seconded by Allan Young (Williamsburg Conference [25], p. 573), urges that emphasis be accorded to the hierarchical structure. She: o Stresses that "The present SIC, although designed as a hierarchy, in fact does not provide a hierarchical structure useful for analysis. Past revisions to the SIC seem to have focused on adding (or eliminating) 4-digit SIC's rather than reviewing the overall 2 structure of the classification system..." (Paula Young, Williamsburg Conference [26], p. 433). o Recommends that "The major emphasis of the next SIC revision should be placed on...the 2-digit and 3-digit structures [which] require a complete review... . The guideline that these 2-digit and 3-digit groupings must be relevant and useful in economic analysis must be enforced" (ibid.). After reviewing how units should be aggregated in the SIC to industries (these users specify industries based primarily on their inputs), after tracing the aggregation of 4-digit SIC industries into 3-digit industry groups and thence to 2-digit major groups, and after noting anomalies at higher levels of the current hierarchy (such as the absence of 2-digit major groups to reflect the importance of plastics and electronics), these Conference participants see an improved hierarchy as the priority for future improvement of the classification system. Moreover, because they viewed the formation of the hierarchy as a requirement for analytical use of economically-classified data, they therefore viewed it as a conceptual issue. Joel Popkin and Stanley Feldman assume the need for a hierarchy, and for each the hierarchy was a cornerstone in improving classification systems. Popkin (Williamsburg Conference [19], p. 160) proposes a relatively major revision for the system's hierarchy, while Feldman (Williamsburg Conference [6], p. 267) stresses that his hierarchy would "represent...an 3 extension of the current SIC structure and not its abandonment." Both devoted major portions of their papers to reviewing the existing hierarchical structure and to explaining their respective proposed new or improved hierarchies. Other Williamsburg Conference participants took the position that the formation of hierarchies should be of secondary importance in future classification systems. Michael Gort (Williamsburg Conference [8], p. 491) stresses the importance of microdata and relegates the issue of hierarchical structure to a level of "minor interest," except to ensure aggregations that will permit comparisons across countries, or for production of derived data such as income originating, capital stocks, and productivity measures. Again reducing the issue to one of priority, William Johnston (Williamsburg Conference [9], p. 78) discourages the search for a "perfect hierarchy," because he wishes statistical agencies "...to create a data structure that can be aggregated and disaggregated at will." Jack Triplett (Williamsburg Conference [23], p. 30) also contends that hierarchies should receive diminished attention, presenting several complementary points: o For some classification concepts, hierarchies may not be appropriate and/or may be of limited use. For example, a classification system that groups by similarity in production processes (a supply-side or production-oriented aggregation concept--see ECPC Issues Paper No. 1) will group establishments that 4 share similar or identical production processes or technologies within a 4-digit industry; these industries are distinguished from one another by their having different production processes. Whether one finds sufficient production similarities across 4-digit industries to justify grouping them into higher-level aggregations by a production aggregation concept is a research issue that should not be decided at the beginning of the classification design process. o He urges that the perceived need for a hierarchy not be used to eliminate a conceptual classification contender: "Sometimes one hears that a hierarchical structure is a requirement: If a particular classification concept does not yield a hierarchical structure...that concept is rejected. That puts the cart before the horse... ." o The guiding concept is foremost and the hierarchy--if appropriate--is secondary: "Within the appropriate theoretically consistent framework...does a hierarchical structure make sense? ...I suspect that eventually we will place far less emphasis on classification hierarchies than we have in the past." In summary, one group urges that major attention be paid to constructing a hierarchy, because they feel the hierarchy is necessary for conceptual reasons and because of the requirements they see for the analytical use of the data. The other group 5 feels that hierarchies have little intrinsic analytical use in themselves, and emphasizes the bottom-up approach to classifications. This latter group's position on the question of classification hierarchies is not so much opposition to classification hierarchies, as such. Rather, they question whether hierarchies should receive a high priority in designing a classification system. The hierarchy may be necessary for pragmatic or statistical reasons, on this view, but it is not the part of the system to be emphasized in developing its conceptual foundation. Pragmatic and Statistical Considerations Some participants stressed the practical context for forming hierarchies--the decentralized nature of the U.S. statistical system, the differences among agency missions, and the historically overriding objective to ensure comparability among the data provided/used by the several agencies. Some statistical agencies cannot collect information on 4-digit industries and accordingly use the 2-digit or 3-digit groupings for sampling, coding, and publication. Sidney Marcus (Williamsburg Conference [12], p. 153) comments that the differing reasons for collecting data among Federal statistical agencies are bound to continue to result in differing "data wish lists." Walter Neece (Williamsburg Conference [15], p. 510) points to the different levels of detail needed by various agencies as helping to explain "why we have a hierarchy." If a common hierarchical structure 6 did not exist, each agency might group data uniquely. This could be both an advantage and a disadvantage. From an international vantage point, William Seltzer (Williamsburg Conference [20], p. 486) explicitly recognizes the past, present, and likely continued future importance of a hierarchical system. Stressing the need to keep in mind more than one function of a given statistical classification scheme, he noted in the context of data grouping/hierarchy: "...its being first, a system for grouping units in an ordered way for the purpose of data collection and storage, and second, a hierarchical system of grouping units for the purpose of aggregation, tabulation and analysis." Recognizing the spread of microcomputers, Seltzer further noted that the groupings no longer need to be the same for both functions. Rather more implicitly, but just as pragmatically, Shaila Nijhowne and G rard C t (Williamsburg Conference [16], p. 410) support the importance of a hierarchy through extensive description of the two classification systems developed in Canada; through a focus similar to U.S. speakers on the contributions of a hierarchical scheme to both data compilation and to data analyses at several levels of detail; and through frequent reference to the "customary," "traditional," and widespread acceptance of, and reliance upon, hierarchies in classification work. In summary, this group of participants stressed the practical, statistical, and programmatic needs for a hierarchy. 7 They do not necessarily take a position on the question of the conceptual importance of a hierarchy, or on the top-down compared with bottom-up methods for forming economic classification systems. The Committee's Position The Economic Classification Policy Committee anticipates that a hierarchical scheme will probably be reflected in a restructured economic classification system. The issues to be resolved seem to be ones of priority. Is constructing the hierarchy of major importance for the analytical use of classified data? Or should emphasis be placed on the first-order groupings (e.g., 4-digit industries)? Should the development of the hierarchy be a major focus of the Committee's conceptual/economic phase, or is the hierarchy a pragmatic issue relating to sample selection, when agency resources or program objectives limit sample sizes? To put it another way, should the new classification system be constructed from top-down or bottom-up principles? Request for Comment The Committee invites comment on the role that should be accorded to a hierarchy in classification scheme(s) of the future, whether the hierarchy is important for the analytical uses of classified data, and whether a top-down or bottom-up approach is most appropriate. Examples where hierarchies are essential to analysis even when detail is available (for example, analytical use of 2- or 3-digit information even when 4-digit 8 information is available) would help to establish whether hierarchies have conceptual importance, or if they should be considered primarily as pragmatic methods for the presentation of data when detailed estimates do not exist. 2.2 Are Multiple Classification Hierarchies Needed? The discussion in 2.1, combined with the topics addressed in Issues Paper No. 1, suggests an extension: If a hierarchy is needed, and deemed important in the overhaul of the classification system, then "which one" should it be? The uses of economically-classified data are numerous and varied and range from production and productivity analysis to market-share analysis. Each of the uses may demand a different aggregation structure, or hierarchy, if the resulting data are to be meaningful and useful. For example, a hierarchy of industries, similar to the present SIC system, could be constructed. Alternatively, one can envision a parallel hierarchy that groups products, wherever made, along use categories (for example, a sweetener aggregate that combines refined sugar and molasses from the sugar industries with corn sweeteners, artificial sweeteners and honey). In some cases, the two principles for forming hierarchies would result in quite different arrays of data. Does the classification system provide a hierarchy to satisfy those users who need to do production-related analysis, or is it to be designed for demand or market studies? Is it 9 possible to satisfy both groups of users, either within a single system, or by providing alternative hierarchies? Frank Gollop (Williamsburg Conference [7], pp. 497) draws a distinction between supply-side and demand-side hierarchies: "The [Census] Bureau's demand-side responsibility is to preserve and present as much product detail as possible and, in ways that do not compromise legitimate disclosure concerns, provide access to interested users. Users can create whatever demand-side aggregates are required by their research." In contrast, he (ibid.) argued that "...multiple-output production and disclosure concerns prevent the Bureau from abdicating its obligation to form supply-side aggregates... ." On this reasoning, one might conclude that where data users can construct their own aggregations, this reduces the value of traditional classification hierarchies; but traditional systems must be maintained for other situations for which user aggregation is not practical. The Committee's Position The ultimate answer to this question may come in part from research among data users. It must come, in large part, as a logical consequence of the answers to similar questions that have been raised in Issues Paper No. 1, that is: Should there be a conceptual framework for economic classifications, and, if so, "which one?" In parallel with the discussion in Issues Paper No. 1, it is quite conceivable that multiple hierarchies--an industry hierarchy and a product hierarchy--are needed, to correspond to 10 different uses of the data. It is also conceivable that a hierarchy might be appropriate for one use of the data, and therefore for one conceptual basis for economic classification, and not for another. For example, a hierarchy might be more appropriate for a demand-side classification concept than for a supply-side classification concept (see Issues Paper No. 1 for a discussion of classification concepts). Request for Comment In addressing the question of hierarchies, the Committee is also interested in public input on the type of hierarchy, if any, that is relevant for the uses of economically-classified data. The Committee recognizes that views on this point are largely determined by positions taken on issues 1.3, 1.4, 1.5, and 2.1. 2.3 Should the System Have a Flexible Aggregation Structure? Overview Some Williamsburg Conference participants proposed going much further along the multiple classifications path, and proposed that the system should encompass completely flexible aggregation schemes. "Essentially, the task is to create a data structure that can be aggregated and disaggregated at will. Instead of trying to find the perfect hierarchy for our data, we should be creating a vast relational database to which we can add fields as needed" (William Johnston, Williamsburg Conference [9], p. 78). Completely flexible aggregation was referred to at the Williamsburg Conference as the "let a thousand flowers bloom" 11 approach: Statistical agencies should provide detailed data and let the user aggregate any way at any time. Flexible aggregation raises a number of issues, which were sometimes intermingled at the Williamsburg Conference. The following are the major topics and questions. (a) Enhanced capability for flexible aggregation makes traditional classification systems obsolete With the advent of the computer age, data users need not be bound to data that are pre-aggregated by a statistical agency. Some data users feel that if microdata sets were made available, users could manipulate them for their own purposes, and would not require traditional classification systems. Michael Gort (Williamsburg Conference [8], p. 483], for example, noted that the publication of data in computer-readable form greatly increases the potential for flexibility in aggregation; accordingly, he stated that "...data producers must focus [on] what data to collect and not how to aggregate them. Aggregation should increasingly be left to the user." While noting the limitations of confidentiality, Henry Kelly (Williamsburg Conference [10], p. 106) strongly echoes the conclusion that "...the problem of aggregation should be left largely to consumers." A flexible aggregation system for microdata may be the most desirable way to make statistical agency data available to users (see the following section). However, for a number of reasons, 12 traditional economic classification systems will remain relevant, even in the computer age. First, the availability of microdata is often limited by disclosure problems, which force grouping of micro-observations into product or industry groupings. Even product groupings--the Census Bureau "7-digit" product codes and Current Industrial Reports (CIR) product codes, and Bureau of Labor Statistics Producer Price Index codes, for example--can in some circumstances create disclosure problems in cases where only a small number of producers account for the bulk of production, and, in many cases, detail that is collected must be suppressed for publication. Classification systems are therefore required to provide meaningful publishable groupings. Second, even when micro-observations are available, some users will prefer that statistical agencies group the data into product or industry categories because of the expense of doing it for themselves, and because they may lack the expertise to know how data are best grouped for their purposes (Courtenay Slater, [22]). Having available a standardized grouping, or classification, system produced by the statistical agency is, therefore, a service to the user and will be a valuable reference point even for users who decide they wish to depart from the standard system in some way. Third, most statistical programs are sample surveys. Samples may not be large enough to support estimates at the detailed product level; classification systems have traditionally 13 determined how product detail will be collapsed for sampling purposes into more aggregated estimates. The sampling process often requires stratification by relevant economic variables, among which are the variables employed in economic classification systems (4-digit industry or a higher-level aggregation, for example). Both sample frame development and estimates from sample surveys thus depend on economic classification systems. To summarize this exchange of views, it is useful to distinguish conceptual considerations from pragmatic considerations, or what is the same thing in this case, requirements arising out of the use of economic data from requirements for collection and processing of data. There was no disputing at the Williamsburg Conference the value of flexible aggregation for the use of data, that when users need conceptual aggregates they should be tailored to specific data uses. But pragmatic considerations--disclosure and sample sizes--mean that the availability of microdata may sometimes be limited, so that methods for collapsing cells and aggregating data will be required. And complete flexibility in aggregation, which is desired by some users, should not preclude "standardized" aggregation systems or hierarchies provided by statistical agencies for other classes of users who will still need them, even in a computer-literate, "thousand flowers" environment. The Committee's Position The Committee believes that, regardless of the capabilities of statistical agencies for flexible aggregation, standardized 14 aggregation or classification systems will be maintained by statistical agencies, for the reasons noted above, and does not plan to recommend that they be discontinued. The Committee is accordingly not requesting comment on this matter, though if there are users who disagree with the Committee's analysis, the Committee would like to be so informed and might, depending on the nature of the comment, conduct further investigation. (b) Statistical agencies should develop flexible aggregation capabilities to supplement the "official" classification system or systems The idea that computer capability has made it technologically feasible to provide multiple groupings of economic data was widely endorsed at the Williamsburg Conference. For statistical agencies to create the capability for groupings "on demand" from users poses issues for statistical agency procedures, for the maintenance of data bases, and for maintaining confidentiality, all of which extend beyond the topic of economic classifications. First, as Robert McGuckin (Williamsburg Conference [13]), points out, for flexible aggregation to work, statistical agencies must collect and maintain data at the most detailed commodity level possible. There was general agreement at the Williamsburg Conference that statistical agencies need to collect as much detail as possible. As already noted, however, in some cases sampling and other concerns mean that detail must be restricted or collapsed. 15 Second, product detail needs to be comparable across statistical agencies. If product detail is not comparable, if product codes do not match across agencies, or do not match over time, the ability to use computing power in the aid of flexible aggregation is lost. The Committee has established a task force to work on improving statistical agency product codes. Issues concerning product classification codes are discussed in Issues Paper No. 8. Finally, flexible aggregation may be limited by what is collectible (refer to Issues Paper No. 3 for discussion of this point). Assembling data for the "tourism" industry provides an example of collectibility limitations. In principle, it is not difficult to determine which economic activities should be included in a "tourism" category. However, many of these categories require information that is not readily collected from establishments. Hotels cannot readily distinguish business travel from tourism; restaurants may be unable to distinguish local residents from those diners who come from out of town, let alone divide the latter into tourists and business travelers. Because the required information cannot be collected from establishments, tourism has not been accepted as an SIC industry. However, data can be collected from a variety of sources, and a classification system that can encompass alternative aggregations, such as "tourism," may be desirable. The Committee is mindful of the frequent requests for such data, and would like 16 to consider the classification aspects of these requests in its work, though also it emphasizes that establishing such categories involves issues that extend well beyond the limits of the Committee's work on classifications. Flexible aggregation might also be useful to manage the difficulties caused by vertical integration. In some industries in the present SIC, different stages of processing are distinguished and kept separate. For example, two meat products industries exist (SIC's 2011 and 2013) which are distinguished by the degree of vertical integration. In other cases, different stages of process are placed together. For example, production of automobile bodies is placed with final assembly of complete cars in SIC 3711. There are some purposes for which complete vertical integration (for example, providing data on "fishing" from the fish farm or the fishing boat all the way to the retail store) would be useful. For other purposes, such an aggregation obscures too much detail. Though a consistent treatment of vertical integration in a conceptually-based classification system is desirable, any decision on the treatment of vertical integration will satisfy one need for data at the expense of the other. A flexible aggregation system might facilitate construction of alternative groupings across stage-of-processing lines. It might be useful to form alternative hierarchies that cut across the economic system to integrate different stages of process for the same commodity or that integrate production with different levels of distribution. 17 The Committee's Position The Committee is conscious of requests for alternative aggregations that have been presented in various SIC revisions of the past. It believes that more flexible systems should be developed in an attempt to meet more of the needs that have been expressed but have not been satisfied. At the moment, this seems an issue that will require substantial work, innovation, and research. The precise methods for implementing flexible aggregation have not been worked out, nor have all of the implications that flexible aggregation raises for data-collection, confidentiality, and statistical agency operations been developed. This is a task that lies ahead. Request for Comment The Committee requests comments from potential users on the subject of alternative aggregations that might be valuable and the uses for which they are necessary. It would be helpful in the Committee's work if proposals for aggregations similar to those mentioned above (i.e., tourism, or fishing) be accompanied by proposals for how the requisite data should be collected and stored. The Committee is interested in innovative proposals on these lines, but feels that all aspects of the data-collection system, and not just a proposal for the classification part, must be considered before implementing a flexible system. 18 References [1] Bureau of the Census, Proceedings, International Conference on Classification of Economic Activities, Williamsburg, Virginia: U.S. Department of Commerce, November 6-8, 1991. 587 pages. (Referenced in the following as: Williamsburg Conference.) Available from Bureau of the Census, Room 2069-3, Washington, D.C. 20233. [2] Aanestad, James, "Floor Discussion," Williamsburg Conference, p. 502. [3] Berndt, Ernst R., "Alternative Approaches to Classifying Economic Activity: Panelist Remarks," Williamsburg Conference, pp. 494-5. [4] Cremeans, John, "Floor Discussion," Williamsburg Conference, pp. 425-6. [5] Dulberger, Ellen, "Floor Discussion," Williamsburg Conference, p. 503. [6] Feldman, Stanley, "Moving Towards an Improved Micro-Based Classification of Economic Activity: A Report on the Structure and Implementation of a Revised SIC and Related Economic Clusters," Williamsburg Conference, pp. 217-359. [7] Gollop, Frank M., "Panel Discussion on Alternative Approaches to Classifying Economic Activity: Discussion," Williamsburg Conference, pp. 496-501. [8] Gort, Michael, "Comments on Industrial Classifications Systems for the Future," Williamsburg Conference, pp. 491-3. [9] Johnston, William B., "Four Questions Economic Data Should Answer," Williamsburg Conference, pp. 75-9. [10] Kelly, Henry, "The Global Economy in the Year 2000," Williamsburg Conference, pp. 94-108. [11] Manser, Marilyn E., "Alternative Approaches to Classifying Economic Activity," Williamsburg Conference, pp. 520-30. [12] Marcus, Sidney, "Floor Discussion," Williamsburg Conference, p. 153. 19 [13] McGuckin, Robert H., "Multiple Classification Systems for Economic Data: Can a Thousand Flowers Bloom? and Should They?" Williamsburg Conference, pp. 384-407. [14] , "Floor Discussion," Williamsburg Conference, p. 426. [15] Neece, Walter, "Panel Discussion on Alternative Approaches to Classifying Economic Activity: A Government Perspective: Discussion," Williamsburg Conference, pp. 509-17. [16] Nijhowne, Shaila, and C t , G rard, "Industrial Classifications: Widening the Framework," Williamsburg Conference, pp. 408-20. [17] Parker, Robert P., "Floor Discussion," Williamsburg Conference, p. 502. [18] Popkin, Joel, "Monitoring Economic Performance in the 21st Century: Measurement Needs and Issues," "Floor Discussion," Williamsburg Conference, pp. 43-72, 80. [19] , "Recommendation and Description of the Principles Upon Which a Revised Industrial Classification System Should be Built," Williamsburg Conference, pp. 157-216. [20] Seltzer, William, "Alternative Systems for the Year 2000: Discussion," Williamsburg Conference, pp. 484-7. [21] , "Floor Discussion," Williamsburg Conference, p. 503. [22] Slater, Courtenay, "Floor Discussion," Williamsburg Conference, p. 462. [23] Triplett, Jack E., "Perspectives on the SIC: Conceptual Issues in Economic Classification," Williamsburg Conference, pp. 24-37. [24] , "Floor Discussion," Williamsburg Conference, pp. 504-5. [25] Young, Allan H., "Where Do We Go from Here?," Williamsburg Conference, pp. 573-4. [26] Young, Paula C., "Is the Present SIC Adequate?" Williamsburg Conference, pp. 427-35. 20