Skip Over Navigation Links
Interface Online Center for Information Technology (CIT)
Search Interface Issues:

March 30, 2004 [Number 229]     Printable Version Printable version (422k PDF)

Index

Previous

Next

NIH Changes Search Engine to Google

NIH's online searches are now powered by Google, arguably one of the most popular tools on the net. "In the early days, search products were more or less comparable to each other," explains Dennis Rodrigues, chief of the Online Information Branch in the NIH Office of Communications and Public Liaison, which has primary responsibility for the main NIH Web site. "One product produced results pretty much as well as another. As search technology became better over the years, our expectations grew and the bar became higher. Over time, Google emerged as a far superior product."

The Search for Google

With so many search products on the market, determining which one was best for the NIH community could have posed a problem. However, Rodrigues, who serves as the gatekeeper for data placed on the main site, found that there was really no contest between best-known products. "The Google Corporation set up a test for us," he said. "I used a battery of about 25 terms, looking for the ideal result. For instance, if I typed in 'melanoma,' what pages would be listed first? What would be among the top 10 results? We also looked into Inktomi's search product, which runs on the firstgov.gov Web site. We thought we might be able to save money if we piggybacked on their use agreement. [However], we found that Google returns more relevant results for NIH's needs. It was the complete winner in every race we had."

Photo of Google team  

The team that tends to the NIH Web site search engine includes (standing) Dennis Rodrigues of the Office of Communications and Public Liaison and (seated, from l) CIT's Ginny Vinton, George Cushing and Bing Chao.

According to Ginny Vinton, home page technical coordinator at NIH's Center for Information Technology and head of the team that keeps the NIH search engine in operation, there are more than 200 servers for the 242,000 documents that require indexing on the NIH site. Deciding to change the tool used to locate these items is no small undertaking. On any given day, upwards of 19,000 searches are conducted on NIH's site, Vinton reports. The days logging the most searches are Tuesdays through Thursdays. NIH can trace a significant amount of its traffic to visitors who use global search services like Google or Yahoo.

We had been thinking about various products for quite awhile," Rodrigues admits, explaining that the search engine NIH had used for several years had begun to show its age. In addition, the CIT technical team that tends to the main NIH site sought a product that would be responsive to the questions and concerns of clients. "I realized we should make the switch one day when I called the team and realized they were all already using Google to search the Web," Rodrigues recalls, explaining that the search engine is "primarily to assist those using our public sites."

Back-Ups, Graphics and Algorithms

NIH launched its Google package on Feb. 9. The use agreement includes a backup appliance for emergencies. "We want to have a product ready to take over if the first one fails for any reason," explains Vinton. Both the primary and backup appliances are indexed once a week.

Another benefit to Google is that selected pages can be elevated in relevancy with relative ease. As the point of contact when people are unhappy with the NIH site, Rodrigues says that one of the complaints heard most often from NIH'ers was that they had conducted a search to see if their site came up on the return list. Frequently, because the word or title they were searching for was not recognized by the search engine [for example, graphics], their site would not, in fact, be listed or would be so far down on the relevancy list that people looking for it would give up before locating the information. Because fruitless searches were beginning to occur with regularity, the troubleshooting process was becoming ever more time-consuming for team members, each of whom has other duties. "We could adjust the algorithms so that additional weight was added to a title, keywords or a body of text," Vinton says, "but we never got the relevancy we desired."

Over time, Rodrigues adds, Web authors who create pages with search engines in mind will be pleased with Google's ability to rank their pages in ways that offer the most benefit to users. "One of the things we learned with the previous engine was that it didn't always follow convention," Rodrigues says. "Often it was counterintuitive to the way people would use it. We wanted a product that uses natural language to come up with reasonable results. Another consideration we had was that the product have an effective technology so that we could create Web pages that work with it."

As Web technology continues to develop at an exponential pace—according to recent tech news Google already has a new rival in the search field, Grokker—the next dilemma for Rodrigues becomes how long NIH sticks with Google. "Our goal is to find solutions that are reliable, robust and provide the best possible results for our users," he concludes. "Our next move depends on how long Google can meet the needs of our customers."

Extract from an article by Carla Garnett in the NIH Record, March 16, 2004

 
Published by Center for Information Technology, National Institutes of Health
Interface Comments |  Accessibility