skip over left navigation links (press 4)

Bayesian Classification Project

BAYESIAN CLASSIFICATION FOR CONTENT-BASED DATA MANAGEMENT

Over the past decade, the growth of NASA's remote sensing data archives has kept pace with Moore's law, doubling every 18 months. Just as importantly, remote sensing data is finding its way into more and more applications, most of which require data in real-time or near real-time. The combination of low latency requirements with increasing data volumes poses a major challenge for data management. In order to make the right data available at the right time, a data system must access and apply knowledge about the content of the data in its data management decisions. This particular decision support domain includes aspects such as automatic quality assessment, feature detection to support caching decisions, and content-based metadata to support effecient data selection.

Bayesian Classification of Data Content

In order to be useful for data management decisions, the content of the data must usually be assessed almost immediately after the data are created. A number of machine learning algorithms, such as neural networks and Bayesian classifiers, are extremely fast in their forward application (though they may take some time to train). In this project, we use a simple Bayesian classifier to distinguish cloudy pixels from other types of pixels (land, water, sun-glint, snow/ice, desert) in MODIS Calibrated Radiance data (aka Level 1B).

Data Usability and Usefulness

This "quick-look" classification of the data content is used to decide how usable and useful the data are likely to be to the user community. This in turn enables the optimization of scarce resources, such as online storage space and network throughput, using a variety of techniques such as content-based subscriptions, subsetting and cache management.

For more information, see Data services using Bayesian classification for data management.

Feedback

Please let us know what you think about our exploration into content-based data management. You may email us at labs-disc@listserv.gsfc.nasa.gov. The point-of-contact for this project is Dr. Christopher Lynnes.

ACKNOWLEDGMENT

This work was funded primarily through the Computing, Information, and Communications Technology program at NASA Ames Research Center.



NASA Logo - nasa.gov

  • Last updated: February 20, 2007 16:50:52 GMT