A draft Verification, Validation and Evaluation of Expert Systems,. Volume I, A FHWA Handbook, has been developed. The purpose of this communication is not to present an official document, but to share a work in process and to solicit advice. It is the intention of the Development Team to produce a quality product that will truly be of value to those developing and testing expert systems. Thus I encourage you to critically review this draft handbook and provide any suggestions for improvement. We want to hear the bad news as well as the good so that we can improve this handbook.

This draft handbook discusses how verification, validation and evaluation (VV&E) should be incorporated into the expert system life cycle, shows how to partition knowledge bases with or without expert domain knowledge, presents knowledge models, presents methods of validating domain (the experts') knowledge, and discusses management issues related to expert systems development and testing. Mathematical proofs for partitioning and consistency and visualization of concepts are also presented.

I would have considered the draft handbook to be of little use if we were unable to do a pen and paper analysis using its procedures on a real-world expert system of reasonable size and complexity (with computer support for matrix manipulation, solving differential equations, etc.). The expert system PAMEX: Expert System for Maintenance Management of Flexible Pavements, with 327 rules, 20 input variables and 59 qualifiers, was selected for the in-house pen and paper analysis. The results of this analysis provided insights into PAMEX and identified errors in programming that the developers and users were unaware of.

This draft handbook will also be field tested in a number of States using operational expert systems as test cases. At the end of this testing (probably in about one year), the handbook will be updated based on the results of the testing.

It will be impossible to respond directly to every comment, but be assured that every comment will be reviewed and given the consideration it deserves. Thank you in advance for your assistance.

Sincerely yours,


James A. Wentworth
Chief, Advanced Research Team
Office of Safety and Traffic Operations
Research and Development
Federal Highway Administration
FAX: (703) 285-3102
E-Mail: jwentwor@tfhrc.gov


 

Verification, Validation & Evaluation of Expert Systems,. Volume 1
A FHWA Handbook (1st Edition, December 1995)

EXECUTIVE SUMMARY


This handbook (MS Word 6.0 document downloadable as a pkzip file) has been prepared for the Federal Highway Administration to cover the subject of verification, validation, and evaluation of expert systems. The difficulty of performing verification, validation, and evaluation (VV&E) on expert systems is one of the major factors slowing the development and acceptance of expert systems in the transportation community. There is little agreement among experts on how to accomplish the VV&E of expert systems. The complexity and uncertainty related to these tasks has lead to the situation where most expert systems are not adequately tested. In some cases testing is ignored until late in the development cycle, always with predictable disastrous results.

This guide discusses how VV&E should be incorporated into the expert system lifecycle, shows how to partition knowledge bases with and without expert domain knowledge, presents knowledge models, presents methods for validating underlying the experts' knowledge, and presents management issues related to expert systems development and testing. Mathematical proofs for partitioning, consistency, and completeness and visualization of concepts are presented. The relevant information for this handbook came from the research efforts of a three entity task force represented by James A Wentworth of the Federal Highway Administration, Rodger Knaus of MiTech, and Hamid Aougab of RAM.

1. Introduction

Many of the new technologies for road building engineering do not achieve the levels of reliability and standardization required by the civil engineering profession. Particularly within this category are many expert systems designed for the transportation industry that have proven to be major disappointment due partly to the lack of verification, validation, and evaluation standards. The goals of expert systems are usually more ambitious than those of conventional or algorithmic programs. They frequently perform not only as problem solvers but also as intelligent assistants and training aids. Expert systems have great potential for capturing the knowledge and experience of current senior professionals (many of whom are approaching retirement age) and making it available to others in the form of training aids or technical support tools. Applications include design, operations, inspection, maintenance, training, and many others.

In traditional software engineering, testing [verification, validation, and evaluation (VV&E)] is claimed to be an integral part of the design and development process. However, in the field of expert systems, there is little consensus on what testing is necessary or how to perform it. Furthermore, many of the procedures that have been developed are so poorly documented that it is difficult, if not impossible, for the procedures to be reproduced by anyone other than the originator. Also, many procedures used for VV&E were designed to be specific to the particular domain in which they were introduced. The complexity and uncertainty related to these tasks has led to a situation where most expert systems are not adequately tested.

Impelled by the existing environment of lack of consensus among experts and inadequate procedures and tools, the Federal Highway Administration (FHWA) developed this guideline for expert system verification, validation, and evaluation. The guideline is needed because knowledge engineers today do not often design and carry out rigorous test plans for expert systems.

2. Basic Definitions

This guide covers verification, validation, and evaluation of expert systems. An expert system is a computer program that includes a representation of the experience, knowledge, and reasoning processes of an expert.

Verification of an expert system is the task of determining that the system is built according to its specifications. Validation is the process of determining that the system actually fulfills the purpose for which it was intended. Evaluation reflects the acceptance of the system by the end users and its performance in the field. In other words the VV&E elements of the expert system are designed to:

a) Verify to show the system is built right.
b) Validate to show the right system was built.
c) Evaluate to show the usefulness of the system.
Once the system is validated, the next step is to verify it. This involves completeness and consistency checks and examining for technical correctness using techniques such as are described in this handbook. The final step is evaluation. For the serviceability program, this means giving the system to engineers to use in computing the coefficient. Although the system is known to produce the correct result, it could fail the evaluation because it is too cumbersome to use, requires data that is not readily available, does not really save any effort, does something that can be estimated accurately enough without a computer, solves a problem rarely needed in practice, or produces a result not universally accepted because different people define the coefficient in different ways.

3. Need for V&V

It is very important to verify and validate expert systems as well as all other software. When software is part of a machine or structure that can cause death or serious injury, V&V is especially critical. In fact, there have already been failures of expert systems and other software that have resulted in disasters.

Expert systems use computational techniques that involve making guesses, just as human experts do. Like human experts, the expert system will be wrong some of the time, even if the expert system contains no errors. The knowledge on which the expert system is based, even if it's the best available, does not completely predict what will happen. For this reason alone, it is important for the human expert to validate that the advice being given by the expert system is sound, this is especially critical when the expert system will be used by persons with less expertise than the expert, who can not themselves judge the accuracy of the advice from the expert system.

In addition to mistakes which an expert system will make because the available knowledge is not sufficient for prediction in every case, expert systems contain only a limited amount of knowledge concentrated in carefully defined knowledge areas. Today's expert systems have no common sense knowledge. They only "know" exactly what has been input into their knowledge bases. There is no underlying truth or fact structure to which it can turn in cases of ambiguity. If the expert system does not realize its mistake, and it is being used by a person with limited expertise, there is nobody to detect the error. Therefore, where the expert system is going to be used by someone without expertise, and the decisions made have the potential for harm if made badly, the very best effort at verification and validation is required.

4. Problems in Implementing Verification, Validation, and Evaluation for Expert Systems

One of the impediments to a successful V&V effort for expert systems is the nature of expert systems themselves. Expert systems are often employed for working with incomplete or uncertain information or "ill structured" situations. Since expert system specifications often do not provide a precise criteria against which to test, there is a problem in verifying, validating, and evaluating expert systems according to the definitions. Some vagueness in the specifications for expert systems is unavoidable; if there are precise enough specifications for a system, it may be more effective to design the system using conventional programming languages.

Another problem in VV&E for expert systems is that expert system languages are unstructured to accommodate the relatively unstructured applications. However, rigid structure in implementing code is a key technique used in writing verifiable code, such as the Cleanroom approach.

Selected for the FHWA Handbook are those techniques which seem the most straightforward, precise and powerful in practice. Included are particular variations of partitioning, incidence matrices, and the use of metaknowledge (i.e., knowledge models).

This handbook will provide guidance on planning and decision making early in an expert systems project. This concept applies not only to new developments, but to thinking/improved decision making at any stage from development through implementation, this includes planning the verification, validation, and evaluation of an already developed system. The advice given here should aid in developing clear problem definition and thorough system requirements, reflecting realism from both technical and organizational viewpoints. Risk identification information is also provided.

This handbook will discuss how VV&E should be incorporated into the expert system lifecycle. Although some ideas may be used for revising and/or reengineering existing systems, the aim is to design new systems and ensuring that enough VV&E operations are done during the lifecycle so that these systems are verifiable. This includes decisions that should be made during system specification and verification/validation during stepwise development of an expert system.

An overview of the basic method for formal proofs is provided to prove the correctness of small systems by non-recursive means; and to partition the larger systems into smaller systems and to insure that the component systems are proved to possess the correct relations as required by partitioning theorems. Moreover, the basic method for formal proof will insure that the components agree among themselves. In addition, this handbook will cover selected techniques for partitioning large expert systems when expert knowledge is unavailable.

Generally, it is best to partition a knowledge base using expert knowledge. This results in a knowledge base that reflects the expert's conception of the knowledge domain. This in turn facilitates communication with the expert, and later maintenance of the knowledge base. However, sometimes it is not possible to obtain expert insight into a knowledge base. In this case functions and incidence matrices can be extracted from the knowledge base, and the information contained therein used to partition the knowledge base.

Knowledge models are high level templates for expert knowledge. These templates express the high level structure of the expert knowledge. Examples of knowledge models are decision trees, flowcharts and state diagrams. By organizing the knowledge, a knowledge model helps with VV&E by suggesting strategies for proofs and partitions; in addition, some knowledge models have mathematical properties that help establish completeness, consistency or specification satisfaction.

Small expert systems are those for which direct proof of completeness, consistency and specification satisfaction, without partitioning the knowledge base. This handbook discusses techniques for these proofs.

Finally, evaluation, which includes field testing, addresses the issue "is the system valuable?". This is reflected by the acceptance of the system by its end users and the performance of the system in application. This handbook addresses this issue and some general guidelines which help in the distribution and Maintenance of expert systems.

5. Intended Audiences for the Handbook

The following table describes the intended audiences for the handbook, and the parts of the handbook that will be most useful to these audiences:

Audience Task to be Performed Part of Handbook
Managers Manage expert system project Introduction
Knowledge Engineers Build new expert systems

Techniques

VV&E on New Systems

Knowledge Engineers Perform VV&E on existing systems Techniques

VV&E on Existing Systems

Highway Engineers Insure that a correct new expert system is built VV&E on New Systems
Highway Engineers Insure that an existing expert system has been validated VV&E on Existing Systems
Software Researchers Critique and extend VV&E methods Techniques

VV&E on Existing Systems

VV&E on New Systems



Table of Contents

1. Introduction

Basic Definitions
Need for V&V
Problems in Implementing Verification, Validation, and Evaluation for Expert Systems
Intended Audiences for the Handbook

2. Verification and Validation: Past Practices

3. Planning and Management

Introduction
Identify the Need for an Expert System
The Development Team
The T / E Team

4. Developing a Verifiable System

Introduction Specification
The Importance of Specifications The General Form of Specifications Defining
Specifications
Gather Informal Descriptions of Specifications Obtain Expert Certification of the
Specifications
Validating Informal Descriptions of Specifications
Validating the Translation of Informal Descriptions
Validation of Formalized Requirements
Step-Wise Refinement Development
Design
Implementation
Correctness Verification

5. The Basic Proof Method

Introduction
Overview of Proofs Using Partitions
A Simple Example
The other subsystems of KB1 can be proved consistent in the same way.

6. Finding Partitions without Expert Knowledge
Introduction
Functions
Expert Systems are Mathematical Functions
Partitioning Functions into Compositions of Simpler Functions
Cartesian Product
Function Composition
Dependency Relations
Immediate Dependency Relation
Operations on Relations
Finding Functions in a Knowledge Base
Choosing the Output and Input Variables of a Function
Finding the Knowledge Base that Computes a Function
Hoffman Regions
The Hoffman Regions of KB1
When is a Partitioning Advantageous
Hoffman Regions of Partitioned KB1

7. Knowledge Modeling

Introduction
An Example of a Knowledge Model
Using Knowledge Models in VV&E
Decision Trees
Introduction
Definition
Example
Use During Development
Use During VV&E
Ripple Down Rules
Introduction
Definition
Example
Use During Development
Changing a Ripple Down Rule System
Use During VV&E
A Ripple-Down-Rule System is Complete.
State Diagrams
Introduction
Definition
Example
Use During Development
Use During VV&E
Flowcharts
Use During Development
Use During VV&E
Functionally Modeled Expert Systems
Introduction
Use During Development

8. VV&E for Small Expert Systems

Completeness
Consistency
Specification Satisfaction
Specification Based on Domain Subsets
Effect of the Inference Engine
Inference Engines for Very High Reliability Applications

9. Validating Underlying Knowledge

Introduction
Validating Knowledge Models
Validating the Semantic Consistency of Underlying Knowledge Items
Creating a TRUE/FALSE Test
Giving the Test
Formulating the Experiment
Analyzing the Test Results
Overall Agreement Among Experts
Approaches to Disagreement Among Experts
Clues of Incompleteness
Variable Completeness
Semantic Rule Completeness and Consistency
Validating Important Rules
Validating Confidence Factors

10. Testing

Simple Experiments for the Rate of Success
Selecting a Data Sample
Estimating a Proportion (Fraction) of a Population
The Confidence Interval of a Proportion
Choosing Sample Size
Estimating Very Reliable Systems
How a Proof Increases Reliability

11. Evaluation and Other Management Issues

Evaluation
Distributing And Maintaining Expert Systems
Distribution
Maintenance
Appendix
Symbolic Evaluation of Atomic Formulas
General Regression Neural Nets
References



LIST OF FIGURES

Figure 1.1: the V&V Process
Figure 3.1: Initial Project Planning
Figure 3.1.1: KB1 Initial Project Planning
Figure 4.1: Developing a Verifiable System
Figure 4.2: Specification
Figure 4.2.1: KB1 Specification
Figure 4.2.2: KB1 Design
Figure 4.3: Correctness Verification
Figure 4.3.1: KB1 Implementation
Figure 5.1: Knowledge Base 1
Figure 5.2: An Example of Knowledge Base Partitioning
Figure 6.1: Immediate Dependency Relation as Ordered Pairs
Figure 6.2: Examples of domains
Figure 7.1: Pamex DT
Figure 7.2: Example ES
Figure 8.1: Completeness of Investment Subsystem
Figure 8.2: Consistency of I Subsystem
Figure 8.3: Example Specification for KB1
Figure 8.4: Symbolic Evaluation
Figure 8.5: Symbolic Inference Engine



LIST OF TABLES

Table 1.1: Intended Audiences for the Handbook
Table 2.1: Validation Methods
Table 2.2: Verification Methods
Table 2.3: V&V Software
Table 4.1: Level of Effort for the Correctness Verification Stage
Table 6.1: Immediate Dependency Relation for KBI
Table 6.2: Matrix Product of the DR by Itself
Table 6.3: Immediate DR of KB1
Table 6.4: Varible Clusters of the DR of KB1
Table 6.5: How Variables Influence Rules
Table 6.6: How Rules Influence Variables
Table 6.7:Immediate Dependency Matrix for KB1
Table 6.8: Hoffman Regions for KB1
Table 9.1: Confidence Level
Table 9.2: Confidence Level with One Expert Disagreeing

Last Updated: 15 April 1996
Webmaster: D. Avallone, Web Page Author: Pc Wood