Quick Links
ALS site index
ALS data
ALS variables
Merging Variables
List of Libraries

Academic Library Statistics Longitudinal Data File

Robert E. Molyneux


1. Introduction

The Academic Library Survey (ALS) is a biennial universe survey of the libraries at the nation's colleges and universities. The summary page of publications in this series shows the series exists at least since Fiscal Year 1988 but that data are not available for all years. The purpose of this page and related pages is, eventually, to document and make available a longitidunal file of these data. A longitudinal file is one with data from these institutions over time so that one can examine trends. Library data, including those compiled by the U.S. National Center for Education Statistics (NCES), customarily are published annually with infrastructures of varying quality permitting joining these annual compilations. This series is going to be difficult to put together because of the annual differences of what has been collected and what the collected variables have been called and how they have been defined. This work is going on right now and preliminary information presented here is subject to change.

Note well, however, that the purpose of this set of pages is not to replace the NCES dcoumentation for the Academic Library Survey but, rather, to serve as an adjunct for purposes of constructing the longitudinal file. The documentation is excellent and the data series is complex, reflecting the complexity of the institutions measured and the complexity of the changing environment in which they exist. The serious analyst will have to be familar with this documentation and take it into account.

The ALS differs from other national academic series in several ways. The oldest series of data on libraries is that compiled by the Association of Research Libraries (ARL) which, when combined with its predecessor Gerould Statistics will soon publish the 100th year of data on these unusual libraries. It is the premier set of library data. While these ARL data are the longest running, the ARL members are not typical of the universe of academic libraries and to find the experience of smaller academic libraries is a difficult undertaking. But, that is a story for another time.

The ALS series, however, attempts to be a census of all academic libraries and, hence, collects and reports data on more institutions than ARL members sso when this longitudinal file is complete, we should be able to analyze trends in a broader set of academic libraries than we have been able to heretofore.

2. An Overview of the Academic Library Survey Data Compilation Today

2.1. What is an academic library?

For purposes of the ALS, NCES defines an academic library as having the following characterisics:

Documentation for the Academic Library Survey (ALS) Data File: Fiscal Year 2004 (Public Use), p. 3.

Thus, screening questions about eligibility for the institutions in this dataset are recorded in these data.

2.2. Institutions reporting by year

Here is a summary of the years currently available with a count of the number of institutions reporting each year:

Number of Institutions Reported by Year

2004 2002 2000 1998 1996 1994
Data Available? Yes Yes Yes Yes No Yes
Number of institutions reporting 3,889 3,964 3,683 3,816 3,792 3,708

The 1996 data and those from the years before 1994 were withdrawn some time ago and have not yet reappeared. One can often glean the number of institutions reporting, however, from other sources. The 1996 should appear shortly at which time I will finish this dataset.

The first, rough combined dataset has a total of 19,060 observations (without 1996), each observation being for one institution for one year. A total of 4,733 institutions (HTML, 1MB), by my current reckoning, have ever reported.

How did we arrive at the 3,889 figure for FY 2004?

There were 4,113 Institutions that received grants under Title IV of the Higher Education Act (HEA) of 1965 (which governs the federal student financial aid programs) plus four military academies in the universe surveyed. Apparently, receiving grants under Title IV is the criterion to be included in the survey. Of these 4,113, 236 are "child" institutions whose data are provided by the parent institution and are included in the parent institution's data. 224 institutions are "out-of-scope" because:

The data file includes the child institutions for research purposes so the data file has 4,113 - 224 = 3,889 institutions. However, of those, 475 did not respond and all their data are imputed. Thus, a data file that included only those institutions actually reporting would have 3,414 institutions but that would include the 236 child institutions whose data are included with the parent. The count of parent institutions actually responding is, then, 3,178

Documentation for the Academic Library Survey (ALS) Data File: Fiscal Year 2004 (Public Use), p. 2.

2.3. "span"

"span" is a variable created in the process of recompiling this series into a longitudinal dataset. If an institution reports for all years of the series the span = "A" (that is, All years) if it does not report for all years, span = "S" (that is, Some years). Here is the frequency of these two variables:

Frequency of Institutions' Reporting by span, FY 2004

span Frequency Percent
A 3,071 79
S 818 21

Of the 4,733 institutions ever reporting, 3,071 report each of the five years so far analyzed. 818 of those reporting in 2004 report for some of those years, while 1,702 of those ever reporting only report for some of the years of the dataset (4,733-3,701).

2.4. "span" by Carnegie Classification

The Carnegie Classification Code, according to the ALS FY 2004 documentation (PDF, 1.7MB), p. B-5, changed in 2000 and this change is reflected in the FY 2002 and FY 2004 data. The earlier data have different codes that might be compatible with these. For FY 2004, the 3,889 reporting institutions have the following Carnegie Classifications by these institutions' span:

Frequency of 2004 Institutions' span for Carnegie Classifications
% of total:

Carnegie Classification Code span Total
A S
15 (Doctoral/Research Universities-Extensive) 149
98.68
2
1.32
151
 
16 (Doctoral/Research Universities-Intensive) 105
100
0
0
105
 
21 (Master's Colleges and Universities I) 480
99.59
2
0.41
482
 
22 (Master's (Comprehensive) Colleges and Universities II) 108
100
0
0
108
 
31 (Baccalaureate Colleges-Liberal Arts) 210
98.59
3
1.41
213
 
32 (Baccalaureate Colleges-General) 299
99.67
1
0.33
300
 
33 (Baccalaureate/Associate's Colleges) 45
93.75
3
6.25
48
 
40 (Associate's Colleges) 1,085
78.34
300
21.66
1385
 
51 (Theological seminaries and other specialized
faith-related institutions)
177
90.77
18
9.23
195
 
52 (Medical schools and medical centers) 44
97.78
1
2.22
45
 
53 (Other separate health profession schools) 56
69.14
25
30.86
81
 
54 (Schools of engineering and technology) 47
88.68
6
11.32
53
 
55 (Schools of business and management) 27
79.41
7
20.59
34
 
56 (Schools of art, music, and design) 64
81.01
15
18.99
79
 
57 (Schools of law) 18
85.71
3
14.29
21
 
58 (Teachers colleges) 4
80
1
20
5
 
59 (Other specialized institutions) 40
75.47
13
24.53
53
 
60 (Tribal colleges and universities) 19
63.33
11
36.67
30
 
unclassed institutions 94
18.76
407
81.24
501
 
Total 3,071
79
818
21
3,889
 

Note: Each cell has the number of institutions which class in that cell and the percent (in italics) are row percents, giving the total number of institutions with that set of traits. From this table we see there are 149 institutions reporting in FY 2004 with a Carnegie code of 15 (Doctoral/Research Universities–Extensive) with a span of "A" and these 149 are 98.68% of the 151 Carnegie 15 institutions. Only 2 (1.32%) of these Carnegie 15s have not reported each of the years examined here.

Of the institutions reporting variables this year, 79% have reported all other years. The first seven in the list had more than 90% of their institutions reporting all years. 40 (Associate's Colleges) had only 75% reporting each year while those that are unclassed had a mere 19% that had reported each year and that is the group the pulls the average down. Larger and more traditional institutions, then, seem to have reported more consistently over time.

2.5. Variables in the Academic Library Survey

The suspected problem with the change in the coding of the Carnegie Classification Code brings out a general problem with any of these longitudinal compilations: changes in data collected and changes in what the variables were called in the documentation. If the name changes over the years, how does the compiler match them up over time? There are many changes in what is reported for variables over time. Categorical variables may have 5 choices one year and 7 the next year. How to match them? There is also a difficult problem with small year-to-year changes in definition discussed elsewhere.

This matching problem also brings up the difference between the role of the compiler and that of the analyst. The compiler's goal is to convey accurately to the ultimate analyst what the data were in the original source with the minimum of interference with the analyst's lonely task: to make sense of the data in spite of everything. The compiler's first rule is Hippocratic: first, do no harm. Nature does not give up her secrets easily and it behooves the compiler not to make the task of the analyst more difficult so great care needs to be taken in making matches over time. My current raw count of variables ever reported in the years of the ALS now available is that there are 505 separate variable names of which, again by my current count, 278 are reporting variables that later were called something else. The final matching will have to await the FY 1996 data's publication but three examples will suffice to give an idea of the scope of the problem. The first two are simple and, alas, too rare:

A Sampler of ALS Variable Names, 2004-1994

Variable measuring... 2004 2002 2000 1998 1996 1994
Name of reporting institution INSTNM INSTNM INSTNM INSTNM N/A INSTNM
Level of institution (4 or more years...etc.) ICLEVEL ICLEVEL ICLEVEL LEVEL N/A LEVEL
Expenditures for bibliographic utilities EXBIB EXBIB EXBIB LC22 N/A LC19

With INSTNM, we can see that the variable name does not change over time so that any kind of merge of years will make sure that the institution names will be located with the same variable name. ICLEVEL is used beginning in 2000 and before that, documentation indicates the same variable was called LEVEL. This change is pretty straightforward and presents a minor trap for the compiler because reading the documentation will smoke this one out pretty quickly. It is worth noting in passing that I said the documentation indicates that LEVEL and ICLEVEL are the same variable with different names but it will take examination of the data over time to be sure this is, in fact, true and that the data are encoded in the same fashion over time about which, there is reason to be skeptical.

Be that as it may, note the third—and more typical example—for the variable measuring the expenditures at these libraries for bibliographic utilities. This obviously is a very important number to get right especially since this dataset will give us access to trends in this number for other than ARL libraries—we have a pretty good idea what is going on there, what about community colleges? Note how we have three variable names over these five years and that the name for 1998 and 1994 are different but of a similar type of name. In fact, in 1994, LC22 measures total operating expenditures, not expenditures for bibliographic utilities as it does in 1998. How did this odd change in names between these two years happen? These variable names are based on the structure of the survey questionnaire: Part C of the Library questionnaire for each of these years was "Library Expenditures" and these two variables were just in different places in the questionnaire and got different sequential numbers. Only later was an attempt made to match variable names over time and avoid this practice of publishing a dataset with variable names based on the questionnaire.

The implications for the analyst if the compiler misses these name changes would be that total expenditures for 1994 might be mixed with bibliographic expenditures for other years and result in an inaccurate understanding of the underlying trends.

The ALS contains data both from the main Integrated Postsecondary Education Data Survey (IPEDS) and data collected for this survey. The first two variables listed above are IPEDS numbers and the last one is not. I would say that year-to-year changes in variable names are a consistent aspect of library data and relatively rare in NCES data. Still, there are 505 places spread over 5 years to go wrong.

So many things change over the years that trying to make sure one is counting the same thing the same way from year to year is, as they say, "non-trivial." Untangling these changes is taking a great deal of time.

And for those interested, it appears to me right now that there will be somewhat fewer than 50 variables reported in each of the five years in a consistent fashion. Most of these seem to be IPEDS-generated data about the parent institution. This kind of inconsistency is actually healthy. Given the rapid changes in libraries since 1994, it is reasonable for librarians to be interested in measuring different things over time.

Valid XHTML 1.0! Valid CSS!


October 10, 2007
Back to US Library Data Sources and Analysis
NCLIS 30th Anniversary logo Return to NCLIS Homepage