Statistical Engineering Division
Seminar Series
Drowning in Data, Thirsting for Knowledge - Learning from
Artificial Neural Networks Models
Zvi Boger
OPTIMAL Industrial Neural Systems Ltd.
Guest Researcher, Process Measurement Division, CSTL, NIST
Large datasets are being generated by modern instrumentation systems, but
extracting the desired information and knowledge from these data is not
easy.
Artificial neural networks (ANN) models may sometime outperform standard
statistical methods when modeling complex, non-linear data, but are
treated with suspicion, as they are seen as unreliable heuristic "black
boxes". One way of convincing doubters that this view is incorrect is to
extract both known and new knowledge from the trained ANN model.
The gas sensor arrays developed at the Process Measurement Division are
based on MEMS technology. Tin oxide array elements, activated by
different catalyst metals, change their electrical conductivities when in
contact with gas mixtures (with concentrations in the sup-ppm to hundreds
ppm ranges), as a function of the array temperature that can be changed
rapidly at will. Thus a data set rich in information is available for any
gas mixture composition.
ANN modeling was used to analyze the resulting databases, having up to
1260 input variables. Starting from a non-random initial "connection
weights" the ANN model was easily trained and validated. The more
relevant inputs were automatically identified, and the ANN model
re-trained with the reduced input set. This procedure was repeated until
the model accuracy was degraded. The identity of the remaining input set
is considered new knowledge, as it may help in understanding the
chemistry of the gases interaction with the sensor array. "Causal
indices" relating the influence of each input on each output, both in
magnitude and sign, can be calculated from the trained ANN.
Recent results of the identification of a small number of genes that can
classify correctly cancer types from cDNA gene expression array data, as
well as other examples of knowledge acquisition from ANN models for text
classification, instrumentation spectra analysis, industrial and medical
diagnosis will be presented.
NIST Contact:
Walter Liggett, x-2851.