Skip to Content

Using Medical Subject Headings (MeSH) to examine patterns in American medicine

Preliminary Consideration of Vocabulary Change as a Metric




Jacque-Lynne Schulman
STS 5206, May, 2000
Virginia Polytechnic Institute and State University
Northern Virginia Center




The National Library of Medicine (NLM) is the largest medical library in the world. Its beginnings date to before the American Civil War. Since 1879, it has prepared an index or finding aid for articles published in biomedical journals. The subject scope of the index reflects the interdisciplinary and multi disciplinary nature of modern medicine.

Over time, in order to arrange the contents in a useful way, a controlled vocabulary has been developed for the NLM index. This vocabulary is rooted in historic precedent and follows principles established by John Shaw Billings in 1874. It also most accurately reflects modern biomedical knowledge and contemporary organization of information.

Each year, new vocabulary terms are added, some are modified or re-named, and others are deleted. These changes mirror how American and international biomedicine has evolved, how the knowledge maps have changed, and show how innovation and invention have grown more rapidly in some areas than in others and how areas differ in degrees of increasingly more specific terminology. Beginning in 1960, the Library recast the vocabulary to function in a machine retrieval environment. NLM pioneered the use of computer-generated publication as the basis for a computer-based retrieval system. There are 40 years of experience that are available as raw data and that can be analyzed. The earliest predecessor of the National Library of Medicine (NLM) first produced a finding aid for medical publications in 1840. From that first manuscript catalogue, that line continues in 2000 in the form of MEDLINE and other computer-produced and Internet-published information maps.

I wished to discover if the subject vocabulary developed by NLM and used in its present form for forty years had been used as a way to finding patterns in published research and examining how these reflect the changes in that area of science and technology known as medicine. (Backus, 227)

Work has been published using the contents of Index Medicus and MEDLINE as a surrogate for research in particular subject domains. Similar work has been done to characterize research output in particular languages or from specific countries and regions. In 1964, Orr and Leeds noted that the National Library of Medicine was one of the few libraries whose collections were so comprehensive that they could stand as surrogates for science in a broad area. It is therefore to be expected that NLM's collections should have been used to estimate growth and distribution of the medical literature and as a surrogate for other analyses. (Orr, p.1329). "It is commonplace today that the literature of a scientific discipline is a mirror of its activities; to furnish a macroscopic and comparative picture of these activities, more and more statistic studies of the written output are presented. (Wagner-Nobler, p. 213)

I could not find a study that used the National Library of Medicine's Thesaurus Medical Subject Headings (MeSH). I made literature searches in databases and indexes in information science and the biological sciences, history, and the humanities. I also looked specifically for those works that referred to NLM and its various predecessors and their publications. I found research on related questions but none that dealt specifically with changes in the MeSH vocabulary as a mirror of changes in medical science.

One may find it helpful to consider the more recent history of the NLM's vocabularies. MeSH's immediate predecessor, the Subject Heading Authority List published in 1954, contained 3800 headings. This list resulted from a several year project. The project sought "to

reflect current professional usage and maintain a balance between granularity and diffusion". (SHAL, 1954) MeSH combined this list of descriptors used for indexing with an unpublished card file of descriptors used for cataloging. MeSH represented a departure from previous library practice. There was now a single set of descriptors, designed to be used for both indexing and cataloging. As an aside, Billings had been actually ahead of the MeSH innovation. His Index Catalogue included both journal article citations and book citations. All were entered under the same subject terms.

The second edition of MeSH was published in 1963. This marked the introduction of a categorized list of subjects. These categories had general labeling such as A for Anatomical Terms and A1 for Parts of the Body but not the category numbers known today as "Tree Numbers". The Introduction to the second edition also described the changes arising from the implementation of MEDLARS and the use of computers to support both a retrieval system and produce publications. Since the 1963 edition, MeSH has been published annually. From 3800 descriptors in 1960, MeSH has grown to 19270 in 2000, plus 105,000 terms used for supplemental concepts, 120 publication types, and 800 qualifiers.

MEDLARS (Medical Literature Analysis and Retrieval System) was the first large-scale non-military bibliographic database system. It is perhaps appropriate that Billings is credited with suggesting to Herman Hollerith that a counting machine could be used to speed the tallying of the US Census. Hollerith followed Billing's suggestion and a primitive "computer" was used for processing of the 1880 US Census.

There is no separate published list of descriptors for the first Index Medicus. John Shaw Billings published Index Medicus as a supplement to the Index-catalogue of the Library of the Surgeon-General's Office, United States Army. This first edition consisted of some 30 volumes. The first volume of the first edition contains an introduction in the form of the transmittal letter to the War Department. This includes a description of the philosophy followed in the establishment of subject divisions.

     I. Those titles have been selected for subjects for which it has presumed that the majority of educated English-speaking physicians would look in an alphabetical arrangement.

     II. Where there is doubt among as between two or more subject-headings, cross-references are given.

     III. Where both an English and a Latin or Greek word are in common use to designate the same subject, the English word is preferred, and references are given from the others.

     IV. As a rule, substantives rather than adjectives are selected for subject headings. Exceptions occur to this in anatomical nomenclature, as "Lachrymal duct"; "Thyroid gland",.

     V. In names of subjects derived from personal names, the latter precede, as "Addison's disease"; "Eustachian tube".

     VI. Local diseases or injuries are as a rule placed under the name of the organ or locality affected, as "Kidney (Abscess of)"; "Neck (Wound of)". There are exceptions to this, in accordance with Rule I; e.g. "Abscess (Perinephritic)".

     VII. Cases in which one disease is complicated with or immediately followed by another are placed under the name of the first disease wit the sub-heading "Complication and sequelae".

     VIII. When the main subject of an article is the action of a given remedy in general or its action in several diseases, it is indexed under the name of the remedy; but if it related to its action in but one disease, it is indexed under the name of disease.

     IX. The amount of subdivision made under the principle subject-heads depends very greatly upon the number of references to be classed.

     X. As a rule, the references are given from general to more special heads, but not the reverse. It is presumed, for instance, that those who wish to consult the literature on "Aphasia" will turn to "Brain (Diseases of)" and "Nervous System (Diseases of)", as well as to "Aphasia", without being directed to do so by a cross-reference under the latter title.

     XI. Under the name of an organ will be found the books and papers relating to the anatomy and physiology of that organ. Following this usually come thew abnormalities and malformations of the organ, than its disease, then its tumors, and lastly, its wounds and injuries. (John Shaw Billings, 1880)

Those who are familiar with the Index Medicus of the last forty years will recognize most of these rules as still in use. In particular, the same rules for drugs and diseases are followed and the subheading "complications" is used today, now 120 years later.

Who was John Shaw Billings?

John Shaw Billings was a military surgeon and brevet Lt. Colonel who entered the U.S. Army in 1861 during the Civil War. He served as a battlefield surgeon, based mainly in the Maryland and Virginia area but he saw firsthand the battles of Gettysburg. He was the creator of the guidelines and resulting thesauri currently maintained by NLM. The hierarchy of MeSH is also used by the major European medical information service. The borrowing of the MeSH structure by the Dutch Excerpta Medica Foundation speaks to the universal acceptance of what Billings began.

Before he was 15, he bought himself a Latin dictionary to teach himself Latin. He got his father to agree to pay for his college education in exchange for giving up his future inheritance. Billings would become the Librarian of the Army Surgeon General's Library. He would also be the designer and first head of the research library of the New York Public Library. Twenty five years before the Flexner Report calling for a formalization of medical education and institution of standard curricula, Billings would establish the case method approach to medical education first known in the United States from its initial use at Johns Hopkins School of Medicine. (Garrison, p. 115)

Billings attended the Medical College of Ohio in 1859, the tenth oldest medical school in the United States. It was the second to be established west of the Alleghenies. (Garrison, p. 8) The program of study consisted of a pair of courses taken for five months of year one and repeated in year two. Billings described his medical education in a lecture thirty-five years later:

I graduated in medicine in a two-years' course of five months lectures each, the lectures being precisely the same for each year. I had become a resident in the hospital at the end of the first year's studies. There I was a resident of the City Hospital of one hundred and fifty beds, where I was left practically alone for the next six months, the staff not troubling themselves very much to come during the summer time, when there was no teaching. The systematic teaching of those times I have had to unlearn for the most part. There is a new chemistry, a new physiology, and a new pathology. What has remained is what I got in the dissecting-room and in the clinics." (Garrison, p. 14)

The College had a thesis requirement. This requirement led Billings to search for collections or libraries that he could use in search for his thesis on surgical treatment of epilepsy. He found that resources for scholarship were non-existent. He was disappointed in his quest and even looking to the libraries of Philadelphia and Boston, could find no adequate medical library in the United States (Garrison, p. 15). His biographers have suggested this failed quest lead Billing's to build his great library in Washington. Chapman quotes Billing's goals for the "National Medical Library":

While the object in view in forming this library has been to make a collection of sufficient extent and completeness to meet the wants of the physicians of the United States, an attempt is being made to prepare a catalogue and index of its contents whose practical usefulness shall not be confined to this country but shall be, so far as the materials available will permit, international and cosmopolitan.
We are endeavoring to make this Library complete in Medical Literature - in order that there may be one collection in the world to which a person seeking information can apply with a reasonable certainty that he can find in it all that has ever been published relative to any medical subject or institution. (Chapman, p.153)

Twenty-eight years later, Billings remembered "we heard nothing of bacteria, antiseptic surgery was unknown, the clinical thermometer and the hypodermic syringe were just new fangled notions that had not come into use and that few of us had seen." (Garrison, p. 12) His statement aptly describes the state of medical knowledge and medical education in the United Stated as the Civil War began. Regarding the thesis, Billings wrote:

In the thesis it was desirable to give the statistics of the results obtained from certain surgical operations as applied to the treatment of epilepsy. To find these data in their original and authentic form required the consulting of many books and to get at these books I not only ransacked all the libraries, public and private, to which I could get access on Cincinnati, but for those volumes not found here (and these were the greater portion), search was made in Philadelphia, New York and elsewhere to ascertain if they were in any accessible libraries in the country.
After about six months of this sort of work and correspondence I became convinced of three things. The first was, that it involves a vast amount of time and labour to search through a thousand volumes of medical books and journals for items on a particular subject, and that the indexes for such books and journals cannot always be relied on as a guide to their contents. The second was, that there are, in existence somewhere over 10,000 volumes of such medical books and journals, not counting pamphlets and reprints. And the third was, that while there was nowhere, in the world, a library which contained all medical literature, there was not in the United States any fairly good library, one in which a student might hope to find a large part of the literature relating to any medical subject, and that if one wished to do good bibliographical work to verify the references given by European medical writers, or to make reasonably sure that one had before him all that had been seen of done by previous observers or experimenters on a given subject, he must go to Europe and visit, not merely one, but several of the great capital cities in order to accomplish his desire.

Billings was surgeon during some of the bloodiest battles of the Civil War. While at Gettysberg for several days, both he and his horse received minor wounds. He wrote regularly to his wife and those letters form a diary of his experiences. From these descriptions, the level of medical care is primitive. Amputation was the treatment of choice for most rifle shot wounds of the arms or legs. Billings operated some on shoulder wounds but with a less than 50 percent success rate. The surgeon had chloroform for anesthesia but control of infection was years in the future. The ideas of Pasteur, Lister, and Semmelweis were not yet accepted. There were no available antiseptic agents of the modern kind. The main hospital treatment was fresh air, ample diet, and a dry place to sleep. In the civilian world, only the poor and homeless would seek hospital care. Care of the sick and child-bearing were still properly done within the home of the middle and upper class until well after the end of Victoria's reign.

As the war ended and he was posted to Washington, Billings gradually assumed duties related to a central store of medical texts. He was initially in charge of receiving the texts that had been sent out to field hospitals and that were being returned as the field hospitals were closed. He received an assignment to survey the literature and produce a report on the control of cholera in response to an epidemic that spread in the 1870's. A survey of the Marine hospital service over the period 1869-1874 formed another of his assignments. Based on his recommendations, the actions that followed led to the transformation of the Marine hospital service into the US Public Health Service. During this time, he also wrote "On Barracks and Hospitals" (Circular 4) and "On the Hygiene of the United States Army (Circular 8). These were adopted as the basis for hospital construction and hygiene throughout the Army. (Curran, p. 31)

Undoubtedly two of the great motivating experiences of Billing's life were his own woefully inadequate medical school course of two years in Ohio and his harrowing responsibility for the administration of Army field hospitals during the War. These canvas-roofed hospitals were filled with battle casualties and infections diseases, housed in wards reeking with infections of all kinds. (Chapman, p. 31) It has been suggested that his experiences, as a poorly trained physician who found himself Civil War surgeon, led Billings to the diverse projects he undertake over the next thirty-plus years. When Billings finished his medical training, most Americans still went to Europe to study medicine. It was not until 1870, that any graduate medical program was established in the United States. (Curran, p, 30) American medical education in the 1880's was little better than what he had experienced just before the Civil War. There was no strict selection of candidates; almost any high school graduate, or even a student with less preparation, could secure admission to one of the then existing three-year courses. Few medical students were actually able to participate in the handling of patients, either as undergraduates or in the few internships then available, except in the extensively employed extramural preceptorships. With notable exceptions, it was still an era of the commercialization of medical education by proprietary medical schools, a state of affairs which Billings on more than one occasion castigated vigorously.

Billings and the Beginnings of a National Library

Once in the Office of the Surgeon General, although he had no formal appointment or change to do so, he began ordering new texts and writing to Army surgeons asking for donations of their personal journal collections. In most cases, these were a few years of one or at most two titles. He gladly received them all and when he could, paid the postage. He also had use of a "slush fund". At that time, military hospitals provided medical care to patients but just as importantly fed them. A slush fund (literally a fund from the sale of slush) was created from hospitals' sale of bones, fat, stale bread, and other waste. Army rules did not specify what the money could be used for so there was local discretion. "To assign a portion of the funds realized from the sale of the dejecta of military hospitals to so elegant a purpose as building what was to become the most remarkable medical library in the world was ingenious and was probably attributable to Barnes himself", then-Surgeon General of the Army. Chapman observes "Joseph Barnes, Billings' superior seems to have recognized that the Library had by 1882 become an enormous professional and culture asset so he put his "slush fund" to its support. "Viewed in the late twentieth century, the Library and especially the Index Catalogue, were themselves strong nineteenth century indications that American scholarship was coming of age". (Chapman, p. 154-55).

Billings expressed the view that the Index Catalogue would "tend to elevate the standard of medical education, literature, and scholarship of the nation and thus indirectly be for the benefit of the whole country." Chapman adds that while it is not the same sort of advance as the discovery of x-rays, it did form the basis of an infrastructure of bioscience. (Chapman, p.172)

Just how useful and usable has the "Index-Cat" been? The answer is less obvious than it may appear at first blush. Shortly after its inception circa 1880, Billing's sixty-volume work certainly came to occupy pride of place, quite unlike any other American work, on the shelves of world medical libraries. Clearly, then, as an object, and as a totem of America's emerging scientific dominance in the ensuing century, the Index-Cat began early on, and continues, to create an indelible imprint on scholarship. The size of that imprint is reflected in the work's deployment in actual practice, certainly--perhaps even beyond what its content might deserve. It is safe to say that scholars, myself among them, have (faute de mieux) come to rely on Billing's run of volumes as a sort of reliable field guide to the collective mentalities of nineteenth-century (and earlier) medical communities. (Maulitz, p. 689)

MeSH Changes as a Metric

NLM revises its MeSH vocabulary annually to reflect changes in biomedical literature and the health science community. An earlier study tested two hypotheses about NLM's MeSH vocabulary. The first was that new terms are added to MeSH when their broader terms have an increased number of postings or greater frequency of use. The study examined the number of postings for the broader terms of new and existing terms in the computer version of Index Medicus, MEDLINE file. A second hypothesis proposed there is a relationship between the patterns of MEDLINE indexing and searching and the organization of the MeSH tree structure. A comparison of the distribution of searched terms in the MeSH trees with the distribution of all terms tested this hypothesis. (Backus, p. 226).

The modern formalized approach to assuring access to the published record of scholarship is well summarized by Bachrach and Charen. While the details they proffer are specific to the National Library of Medicine, the principles are universal.

The operation of MEDLINE requires three ongoing activities by persons having subject matter knowledge. These are literature selection, thesaurus maintenance and indexing. MEDLINE is intended to give access to the most generally useful biomedical literature rather than to provide indiscriminate comprehensive coverage. Literature is selected with the guidance of a group of health-science educators, editors and librarians who review periodicals under consideration for inclusion and re-evaluate those that are already regularly indexed. The MeSH thesaurus provides the descriptors that are used for subject indexing. Its hierarchical structure facilitates both general and specific searching. The appearance of new concepts and terminology in the literature requires a dynamic MeSH, but MeSH changes may complicate the process of searching backward in time. Maintaining MeSH requires finding a balance between the need for adaptability and the need for stability. Quality indexing requires accuracy and consistency in the assignment of subject headings. To this end, indexers receive didactic training plus practice under supervision. Precedents for indexers are detailed in an extensive Indexer's Manual and in MeSH annotations. Work of all indexers is reviewed on at least a sampling basis, and special sessions are held each year to familiarize indexers with MeSH changes.

The National Library of Medicine has studied the growth of its collections. Using data from its internal journal data files and comparing this data with the MEDLINE database, the growth of NLM's serial collection and of the journals indexed in Index Medicus from 1966 to 1985 was charted. The number of active serial titles in the subset of NLM's collection increased 30% over the twenty year span. The average number of articles per Index Medicus journal increased 56%. The period from 1966 to 1985 saw substantial but uneven growth in the number of serial titles in the NLM collection and in the average number of articles in Index Medicus journals. The authors concluded that the pattern of growth in the number of serials held by NLM probably reflected trends in the universe of all biomedical serials. (Humphreys, p. 20)

Technical knowledge has been defined as the special knowledge that scientists, engineers, and doctors use in their work. Cravens et al. "assert that such technical knowledge

has its ultimate origins in the larger culture. In short, the interpretive unity we seek in history is not to be found in detailed social matrices but in cultural constructs. These are in turn mental constructs. Such mental constructs are the basis of the social matrix of a given time and place no less than the underlying notions about what Michael Foucault dubbed 'the order of things,' those tacit agreements among contemporaries about how the world works in a given age" (Crazens, pp. 2-3). It therefore seems possible that the changes in a vocabulary do mirror the changes in the technology and culture within which that technology is found. An example may illustrate this. In the late 1870's, there were hundreds of reports in medical journals of puerperal insanity. There were also descriptions of variations of this disorder which included puerperal mania, insanity of pregnancy, and insanity of lactation. Before World War I, these disorders accounted for ten percent of female asylum admissions. (Theriot, p.75). Following World War I, these diseases were no longer reported. Had the illness been cured by a world war? Nancy Theriot suggests the diagnoses were artifacts of the power relations between male physicians and female patients. She cites Foucault as arguing that medical theories of insanity defined "reason" by medicalizing the ever-increasing category of "unreason". In other words, if we don't agree with "them", they must be crazy. One can take this path a short or long way but how we call something is a reflection of our culture and its structures and technologies.

In his commentary on Foucault and the Order of Things, Pratt says that classification is a central activity in the study of living things. I suggest it has become a central activity in the study of most activities and human creations. (Pratt, p. 163) In biology and related sciences, there are taxonomies of anatomic structure, of biota, of chemicals, of behavior, and so on. So, this is another reason to consider a descriptor list, particularly one that includes a hierarchical structure as a tool for analysis and measurement of trends.

Changes in a Vocabulary - Implications

With an increasing number of items to be categorized, one expects a larger number of descriptors. There is an ideal point of balance between specificity and dispersion but that is beyond the present discussion. Moreover, there has not been agreement within the community as to what the balance point is for any given audience and body of literature. However, at the extreme of 1,000 articles and 1,000 descriptors or 1,000 articles and one descriptor, all agree to reject such ratios.

The following chart shows the growth of articles indexed in Index Medicus or its computer-based surrogate, MEDLINE. The number of the articles in English is also shown.

MEDLINE Articles by Year of Publication, (1965-1995)

Chart of MEDLINE Articles by Year of Publication

Note: the chart was prepared using the number of citations for each year from 1965 through 1995. For each year, the number of citations for articles in English was then found. The resulting data was made into a chart. Although there are data labels for every third year, data for all years are included.

The charts on the following pages compare the number of citations per year in several broad areas of medicine. These include viruses; anatomy; neurology and psychiatry; and organization of health care (i.e., structural and management aspects). Each chart was created by making a matrix of citations by year and cross-referenced to the broad topic from queries using the PubMed system of the NLM. These are included to show that some patterns can be seen even with a small sample of subject areas. The rate of increase across field is not similar. This forms what I think is the core of how the MeSH vocabulary and its use can be as an indicator of the changing shape of medical research and publication. The following show the difference in the hierarchy for viruses in the area of plant viruses and compares 1963 and 1995. The arrays have grown in numbers of terms and in degrees of specificity.

Plant Viruses (1963)
Tobacco Mosaic Virus
Plant Viruses (1995)
Geminivirus
Carlavirus
Closeterovirus
Luteovirus
Mosaic Viruses
Alfalfa Mosaic Virus
Bromovirus
Caulimovirus
Comovirus
Cucumovirus
Potyvirus
Plus Pox Virus
Tobamovirus
Tobacco Mosaic Virus


References Cited

Armed Forces Medical Library. Subject Heading Authority List. Washington. 1954

Bachrach, Clifford A, and Charen, Thelma. "Selection of MEDLINE contents, the development of its thesaurus, and the indexing process." Medical Informatics (London) 1978 Sep; 3(3):237-54.

Backus, Joyce E, Sarah Davidson, and Roy Rada. "Searching for Patterns in the MeSH Vocabulary." Bulletin of the Medical Library Association, 1987 Jul; 75(3): 221-227.

Chapman, Carleton B. Order Out of Chaos: John Shaw Billings and America's Coming of Age. Boston, Boston Medical Library, 1994.

Cravens, Hamilton, Alan I. Marcus, and David M. Katzman. Technical Knowledge in American Culture: Science, Technology, and Medicine Since the Early 1800s. Tuscaloosa, University of Alabama Press, 1996.

Curran, Jean A. "John Shaw Billings' Contributions to the Advancement of Medical Education." In John Shaw Billings Centennial. National Library of Medicine. Bethesda, MD, 1965.

Garrison, Fielding H. John Shaw Billings, a Memoir. New York, Putnam. 1915.

Humphreys, Betsy L., and Dianne E. McCutcheon. "Growth patterns in the National Library of Medicine's serials collection and in Index Medicus journals, 1966-1985." Bulletin of the Medical Library Association. 1994 Jan; 82(1):18-24.

Maulitz, Russell C. "Billings in cyberspace: toward the electronic Index-Catalogue." Bulletin of the History of Medicine 71.4 (1997) 689-692.

Orr, RR and AA Leeds, "Biomedical literature: volume, growth, and other characteristics." Federation Proceedings. 1964 Nov/Dec 23(6): 1329.

Pratt, Vernon. "Foucault and the History of Classification Theory." Studies in History and Philosophy of Science, 1977 8:2, 163.

Wagner-Dobler, R. and J. Berg. "Physics 1800-1900: A Quantitative Outline". Scientometrics, 1999 46(2): 213-285.

Last updated: 20 November 2001
First published: 20 November 2001
Metadata| Permanence level: Permanence Not Guaranteed