CEB Projects |
Print this E-mail this |
page | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
Automating the production of bibliographic records for MEDLINE
11 System performance evaluation
Assessing the performance of the MARS system is an important goal, not only to evaluate the efficiency of its constituent modules, but also to locate potential bottlenecks. In addition, since we seek the best way to create MEDLINE bibliographic records, it is important to compare the productivity (e.g., labor hours per unit record) of the MARS systems, both versions 1 and 2, against each other, as well as with that of the manual keyboarding operation done under contract. Key questions posed as a starting point for performance evaluation are listed as follows:
- How long does it take for a bibliographic record to be completed?
- What is the time taken by each manual and automatic process for one record?
- What is the time taken by each manual and automatic process for one day's workload?
- What is the time taken to enter each field in Edit?
- What is the error rate of the zoning, labeling and reformat modules?
- What is the utilization rate of MARS-2 server processes and workstations?
- How long does data wait to begin processing by each of the daemon process? I.e., how long is it in a queue waiting to begin work?
- How often is a citation re-processed? What are the most common reasons?
- What is the overall cost (in labor-hours) for the MARS-2 operation as compared to MARS-1 and the keyboard operation?
These questions are addressed quantitatively by instrumenting the system and analyzing the data recorded, these mainly being event counts and time data. Instrumentation is implemented by two C++ classes written to record such data: ProcessTime which records times and PerformanceData that records statistics generated in a MARS process.
11.1 Process performance analysis
The instrumentation data yields information on the processes, both automatic and manual, at different levels of granularity. Figure 11.1 shows the average time taken by each process to complete its task for one bibliographic record (citation) in July 2001. Predictably, the manual processes of scanning, editing and reconciling take much longer than the automated ones. An explanation of the terminology: Edit_First and Edit_Second stand for the first and second Edit operator; Prod is the inhouse-developed daemon that controls the OCR system, hence equivalent to the OCR action; ZoneCzar combines the actions of automated zoning and labeling.
Figure 11.1
Instrumentation data for a breakdown of some of these processes into their constituents appear in Figures 11.2 and 11.3. In Figure 11.2 we find the actual process of scanning a page ("append") to take a relatively short time, but inserting a missing page after the fact and the entry of a new journal issue ("New MRI") to take much longer. This latter task, found time consuming, was the rationale behind the development of the new CheckIn module, which eliminates this function in the scanning operation.
The actual workload for these time consuming processes is not high, because they do not occur frequently. For example, as shown in Figure 11.3, the actual burden of inserting pages is very low, since this operation is performed rarely.
Figure 11.4 shows the average time taken for the Edit operator to enter the fields not automatically extracted. Only the data for the first Edit operator is shown, since the data for the second operator is approximately the same. In this figure, we show entries for those fields that are automatically extracted in compliant journals, because we are accommodating non-compliant ones also. Furthermore, even for compliant journals, there are pages that are not processed by the automatic modules (e.g., letters to the editor, editorials) requiring the Edit operator to key in the relevant data. The figure, however, indicates opportunities for further automation.
Figure 11.2
Figure 11.3
Figure 11.4
Since we keep track of operator names in the database, we also offer the supervisor the option of comparing their relative effectiveness, as shown in Figure 11.5 for scanning and Figure 11.6 for editing.
Figure 11.5
Figure 11.6
11.2 Comparison of the three data entry systems
Here we compare the two systems, MARS-1 and MARS-2, and the manual keyboarding operation on the basis of a workload of 600 completed bibliographic records per day, the average workday load for all of these approaches. The table lists the average number of seconds per page for each system and the number of minutes per 600 records, and shown in a chart in Figure 11.7. It can be seen that MARS-2, by eliminating many of the manual functions in MARS-1, is a considerable improvement, and that both are far more efficient than the manual keyboarding operation. To produce 600 records, MARS-2 requires 61 hours of labor per day, while the keyboarding requires 246 hours. In comparison with the keyboarding operation, MARS-2 therefore saves 185 direct labor-hours per day or 51,800 labor-hours per year (based on a year of 280 work days).
Figure 11.7
page | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |