SED navigation bar go to SED home page go to SED seminars page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages

Statistical Engineering Division
Seminar Series

Drowning in Data, Thirsting for Knowledge - Learning from Artificial Neural Networks Models

Zvi Boger
OPTIMAL Industrial Neural Systems Ltd.
Guest Researcher, Process Measurement Division, CSTL, NIST

Large datasets are being generated by modern instrumentation systems, but extracting the desired information and knowledge from these data is not easy.

Artificial neural networks (ANN) models may sometime outperform standard statistical methods when modeling complex, non-linear data, but are treated with suspicion, as they are seen as unreliable heuristic "black boxes". One way of convincing doubters that this view is incorrect is to extract both known and new knowledge from the trained ANN model.

The gas sensor arrays developed at the Process Measurement Division are based on MEMS technology. Tin oxide array elements, activated by different catalyst metals, change their electrical conductivities when in contact with gas mixtures (with concentrations in the sup-ppm to hundreds ppm ranges), as a function of the array temperature that can be changed rapidly at will. Thus a data set rich in information is available for any gas mixture composition.

ANN modeling was used to analyze the resulting databases, having up to 1260 input variables. Starting from a non-random initial "connection weights" the ANN model was easily trained and validated. The more relevant inputs were automatically identified, and the ANN model re-trained with the reduced input set. This procedure was repeated until the model accuracy was degraded. The identity of the remaining input set is considered new knowledge, as it may help in understanding the chemistry of the gases interaction with the sensor array. "Causal indices" relating the influence of each input on each output, both in magnitude and sign, can be calculated from the trained ANN.

Recent results of the identification of a small number of genes that can classify correctly cancer types from cDNA gene expression array data, as well as other examples of knowledge acquisition from ANN models for text classification, instrumentation spectra analysis, industrial and medical diagnosis will be presented.

NIST Contact: Walter Liggett, x-2851.

Date created: 2/28/2003
Last updated: 2/28/2003
Please email comments on this WWW page to sedwww@nist.gov.