skip to: online tools | main navigation | content | footer

Computer Sciences R&D

Home » Research & Development » Computer Sciences » Informatics & Decision Sciences » Datapipeling for Hetergeneous Data Fusion

Datapipeling for Hetergeneous Data Fusion

Recent advancements and reductions in the costs associated with computing and data storage have contributed to the growing trend of collecting and interpreting large volumes of data in a variety of areas such as genomics, proteomics, chemistry, and medicine. The goal of this LDRD-funded project is to make meaningful interpretations of data types that provide different views of the same situation, give complimentary information despite appearing dissimilar, and are collected and stored in a variety of formats.

Ensemble classification is a technique for combining the predictions of multiple classifiers into a single classification. In the literature, it has been shown to be more accurate than any of the individual classifiers. Traditionally, ensemble classification has been used to combine predictions made for the same data set by different classifiers into a single classification of that data set. We are investigating its use for disparate data sets. Some advantages of this approach are: the data can exists in separate data bases; the data formats do not need to be translated; there is a computational time and resources savings. We are examining the followinge ensemble classification techniques:

Our goal is to demonstrate the usefulness of this fusion technique on the problem of protein phosphorylation prediction. We are also interested in the problems of sensor fusion and medical surveillance.