Search The Site
 
More options | Back issues
Home
News
Opinion & Forums
Careers
Multimedia
Chronicle/Gallup
Leadership Forum
Technology Forum
Resource Center
Campus Viewpoints
Services
/r

The Chronicle of Higher Education: Information Technology
From the issue dated July 5, 2002


'Superarchives' Could Hold All Scholarly Output

Online collections by institutions may challenge the role of journal publishers

By JEFFREY R. YOUNG

Professors' office computers hold a wealth of original content: research articles, data sets,

ALSO SEE:

Tools to Build a 'Superarchive'


field notes, images, and the like. Some of the material will be published in journals months or years after it is created, but even then it will probably be available only to the journals' subscribers. The rest will never see the light of day.

Several colleges are now looking to share more of that work by building "institutional repositories" online and inviting their professors to upload copies of their research papers, data sets, and other work. The idea is to gather as much of the intellectual output of an institution as possible in an easy-to-search online collection. One college has called its proposed repository a "super digital archive."

Proponents say such superarchives could increase communication among scholars and spark greater levels of innovation, especially in the sciences. Some imagine a day when every research university gives its research away through the Web, allowing scholars and nonacademics to mine it for ideas and information.

"The whole power of science is the power of shared ideas, not the power of hidden ideas," says Paul Jones, associate professor of information and library science at the University of North Carolina at Chapel Hill. "Science advances when there's a free exchange of ideas. We move faster by being open. We know this, but we have disincentives right now to openness."

One of those disincentives, many scholars believe, is the scholarly-journal system, which, critics argue, has a monopoly on scholarly output that leads to ever-soaring subscription prices. Institutional repositories could create an alternative to journals, fans of the archives say.

Journal publishers, meanwhile, say that such repositories are unlikely to supplant their publications. Journals, they argue, are still the best means of distributing and preserving research.

And even some of those supporting the new archives recognize the difficulty of getting professors to change their habits. To make the archives work, professors would have to take the initiative to submit their materials, and, in some cases, persuade journals they work with to allow them to place their articles in a repository.

Establishing a Model

The most ambitious and most closely watched superarchive is being developed at the Massachusetts Institute of Technology. It is called DSpace, and its goal is to collect research material from nearly every professor at the institute -- though participation will be voluntary.

"We want to give faculty the infrastructure that supports alternative forms of publishing," says MacKenzie Smith, associate director of technology for MIT's libraries. Over the past two years, officials at MIT have been building a set of software tools to support the repository, and to make it easy for professors to submit material. Those tools are nearly ready, and four departments and programs at MIT will be testing them this summer.

Beginning this fall, MIT plans to open the archive to all of its professors. "We don't know how quickly it's going to catch on," says Ms. Smith, though she adds that professors have been enthusiastic about the concept.

The biggest obstacle may be inertia. Professors are busy, and they may not use the repository if they perceive it as more work, even if they like it in principle, says Ms. Smith.

"We've gone to a lot of trouble to make the submission process very simple for faculty," she adds.

Librarians don't plan to actively police what goes into the repository, though they do offer rules for what kind of work should be included. Among those rules: Work must be "scholarly or research oriented," and it must be "complete and ready for 'publication.'" Some departments might choose to have someone serve as editor of their department's DSpace contributions, to read over them before they are placed in the archive.

To make sure the new repositories don't lead to information overload, librarians are making sure that the materials are tagged with "metadata" codes to help search engines navigate the sea of data. Such tags include keywords, publishing information about the article (if applicable), or an indication of what language the article is written in, for example. Some departments may have graduate students or staff members handle the virtual paperwork for professors. The DSpace software will add the tags using information supplied by users.

"The time involved for preparing the metadata is a small price to pay for having these documents available for the long term," says Nicholas M. Patrikalakis, a professor of ocean engineering at MIT. Mr. Patrikalakis's department is participating in the pilot project and plans to upload its technical reports.

But new search tools would need to be developed to make full use of the metadata tags. So far, traditional search engines like Google aren't equipped to do that -- though librarians say such tools would be relatively easy to create.

Professors who use the repository won't have to make all of their materials public. Researchers will be allowed to select access levels for each item they contribute. Some research may be made available only to those within MIT, while other materials may be free to anyone.

Why Share?

An incentive for making material available is that sharing research helps professors build their reputations, some experts say. Some research shows that the more professors open up their work, the more likely they are to get cited by their peers.

In computer science, for instance, articles that appear online are significantly more likely to be cited by other researchers than those that do not appear online, according to a study of computer-science research literature done by Steve Lawrence, a research scientist for NEC Research Institute Inc. "The mean number of citations to offline articles is 2.74, and the mean number of citations to online articles is 7.03, or 2.6 times greater than the number for offline articles," Mr. Lawrence wrote last year in Nature.

Different disciplines have different attitudes about how much sharing is appropriate, says Ms. Smith. Scientists often seek to get their research out as soon as possible, while scholars in the humanities might worry about someone stealing their ideas, she adds.

MIT officials say they hope institutional repositories will catch on across academe, and they plan to make the DSpace software available free to other colleges that want to use it. In fact, MIT plans to lead a "federation" of libraries that want to use the software, helping them with whatever policy issues arise, says Ms. Smith.

"We've had pretty serious interest in the system from about 30 major institutions," Ms. Smith adds. The DSpace project is supported by a $1.8-million grant from the Hewlett-Packard Company. Officials aren't sure exactly how much the archive will cost to maintain, though universities already have much of the equipment in place to run digital archives. Still, Ms. Smith estimates that DSpace could cost up to $250,000 per year, if all of the costs were added up. The hope is that free software tools will allow even small colleges to run repositories using their existing resources.

Early Adopters

Meanwhile, a few other universities have begun building their own superarchives, often at the urging of provosts or other administrators who want to showcase their professors' work and increase its impact.

One example is the California Institute of Technology, which has already built an institutional repository (http://library.caltech.edu/digital) with material from several departments. Much of the drive for Caltech's repository came from its provost, Steven E. Koonin, who is also a professor of theoretical physics.

"We do outreach and public education in so many different dimensions," says Mr. Koonin. "Why aren't we doing the same with the scholarly information we produce, which is the core of what the research institution does, most of which is funded by the public?"

Setting up the framework for an archive was the easy part, however. Getting professors to contribute is proving more difficult. "It's a slow process," says Eric F. Van de Velde, director of library information technology at Caltech. "We talk to people all the time" to try to get them to include material, he adds. "This is not foremost on the mind of any faculty member, and changing the work flow kind of takes time." So far, about 600 papers are in the archive, which has been in place for the past few years.

Another superarchive was recently created to serve the University of California system. The archive, called the Scholarship Repository, is run by the system's California Digital Library (http://escholarship.cdlib.org).

Colleges setting up repositories also have to set clear guidelines for who owns the copyright to the materials. At Caltech, professors retain copyright to anything placed in the archive, but they must sign waivers allowing the university nonexclusive rights to keep copies in its collection.

But professors don't always have the right to place their published papers in archives, or even on their own Web pages. Many journals require scholars to sign over all rights to works that are accepted for publication.

Several journals have recently changed their copyright policies, however, to allow authors to place copies of their papers in personal or institutional archives. But some publishers that have made such policy changes, such as the American Physical Society, don't make it easy for professors. Scholars must make their own Web versions of their articles by revising their own drafts to reflect editors' changes.

Librarians discourage professors from ever removing work from their repositories, so that once a paper is archived, it's there for good. "We don't want this to become like a bulletin board," says Mr. Van de Velde. "We want this to be a serious form of dissemination."

The repositories also encourage dissemination of materials that once remained hidden, including photographs and other multimedia. "The more you go out and investigate what's going on in the faculty, the more you discover the rich, rich assets that are there," says Joseph J. Branin, director of libraries at Ohio State University, which is setting up the framework for an institutional repository called the OSU Knowledge Bank (http://www.lib.ohio-state.edu/Lib_Info/scholarcom/
KBproposal.html
).

Some professors have expressed concerns that universities might try to profit from the new repositories. Planning materials for superarchives at both MIT and OSU contain suggestions for how the university could charge a fee for access to selected materials.

But proponents of the repositories say that universities have an incentive to make the archives free to all. "The special literature that is at issue here ... is worth incomparably more to researchers and their institutions through its research impact than through any pennies that could be made from charging pay-per-view tolls," says Stevan Harnad, a professor of cognitive science at the University of Southampton, in Britain.

Changing Role for Journals?

The do-it-yourself, or self-archiving, approach by colleges establishes a new front in the struggle between colleges and journal publishers over how much research should be made available free online. College administrators have long been frustrated by the current academic-publishing system, in which colleges pay the overhead costs for the research and then must pay again to get access to the research results.

Since last year, more than 30,000 scientists have pledged to boycott journals that do not make their content free online no later than six months after initial publication. But despite the pledges, led by a group called the Public Library of Science, few scientists have actually withheld their articles, and few publishers have changed their ways.

Many of those who are active in the Public Library of Science boycott are now working to help spark alternative outlets for scientific publishing, such as institutional repositories. Even so, many who are working to build institutional repositories say they aren't trying to put publishers out of business. Instead, they say their efforts may change the role publishers play.

"Obviously, the information revolution is causing us to rethink how we do scholarly communication and dissemination," says Mr. Koonin, Caltech's provost. If colleges can handle distribution on their own, journals may focus on managing peer review and lending their seal of approval to the best scholarship, and charging authors rather than subscribers for their services.

"The print journals bundle together [several activities] -- refereeing, editorial standards, dissemination and marketing," Mr. Koonin says. "What the technology starts to let you do is to unbundle those. You could have dissemination done by one organization or mechanism, but peer review done by another one."

Nice Idea, in Theory

"That will not work," says Arie Jongejan, chief executive officer of Elsevier Science and Technology, a division of Reed Elsevier, one of the largest commercial academic publishers. "You need publishers to organize that process."

"If I was a researcher, I would be scared to death to make myself dependent on that solution [institutional repositories]," adds Mr. Jongejan. Journals, he says, "do things very efficiently and very smoothly."

Elsevier does allow its authors to publish their papers in institutional repositories or other noncommercial archives, provided that the authors ask permission first. He says that fewer than 5 percent of authors ask.

Other attempts at widespread reform of academic publishing have fallen short. For instance, physicists have built a successful online archive of pre-prints -- articles that are distributed before being reviewed by journals. That effort began more than 10 years ago, and some scholars predicted that other disciplines would soon build their own online pre-print archives. But few disciplines have even tried.

The reason is that disciplines are not the right agent for change, says Mr. Harnad, of the University of Southampton. "The right entity for all of this is the university," says Mr. Harnad, who is an outspoken proponent of nontraditional academic publishing. "There is no entity behind a discipline," he adds, but universities have an economic incentive to try to reduce the cost of scientific publishing.

"We are in this confusing stage where it's very difficult to say what it's going to be like 10 years out," says Lorcan Dempsey, vice president for research at OCLC Online Computer Library Center, a nonprofit library group. "The patterns of research and learning and communication are really shifting."

Most institutions are waiting to see how DSpace and other repositories develop before they join in, says Richard K. Johnson, enterprise director of the Scholarly Publishing and Academic Resources Coalition, an alliance of research institutions, libraries, and organizations that encourages competition in scholarly communications.

"A lot of institutions are thinking about this right now," Mr. Johnson says. "Over the course of the next year or so, we'll see quite a few of them beginning to deploy."

One way or another, colleges seem interested in collecting and showing off more of their scholars' work online. As Mr. Dempsey puts it: "I think there's greater attention being paid to the whole range of informational assets on campuses."


TOOLS TO BUILD A 'SUPERARCHIVE'

Several new free tools are available or under development to help colleges create "institutional repositories," superarchives of all research generated by the college's faculty members.


DSPACE

What: Massachusetts Institute of Technology's project to develop a superarchive, as well as software tools for creating and maintaining the repository. The tools will be offered to other colleges that want to use them.

When: DSpace has been under development for two years. The university is testing it this summer, and plans to make the software available free to anyone in the fall, when the university will invite all professors at MIT to contribute to its archive.

Where: http://web.mit.edu/dspace


EPRINTS.ORG

What: Free software developed at the University of Southampton, in Britain, to help individual scholars, departments, or universities create archives of research papers online.

When: Available since 2000. An updated version was released this year.

Where: http://www.eprints.org


OPEN ARCHIVES INITIATIVE

What: A series of "metadata" codes that librarians or others can attach to research papers to help search engines pull out desired information.

When: Available since 1999. An updated version was released last month.

Where: http://www.openarchives.org

Source: Chronicle reporting



http://chronicle.com
Section: Information Technology
Page: A29


Print this article
Easy-to-print version
 e-mail this article
E-mail this article


Copyright © 2002 by The Chronicle of Higher Education