NOAA KLM User's Guide

Section 8.1

Introduction Page, NOAA KLM TOC, Acronyms
Previous Section, Next Section

8.1 Data Representation and Storage

This section describes the bit and byte numbering conventions used in this document, and the storage methods for integers and floating point numbers. This information is especially critical when transporting data from one computer architecture to another. Without special handling, data produced on one system may be unusable on another due to differences in internal data storage.

Within this section, a byte is defined as containing 8 bits (i.e., an octet), and a word can be either 8, 16 or 32 bits in length. In all cases, the least significant bit (lsb) is designated as bit 0 and has a base10 value of 2⁰ = 1. Therefore, in a 8-bit word, the most significant bit (msb) is designated as bit 7, and has a base-10 value of 2⁷=128. In a 16-bit word, the msb is designated as bit 15, and has a base-10 value of 2¹⁵ =32,768. In a 32-bit word, the msb is designated as bit 31, and has a base10 value of 2³¹ = 2,147,483,648.

For signed binary integers, the msb represents the sign of the number. The remaining bits (bits 6 through 0 for 8-bit words, 14 through 0 for 16-bit words and 30 through 0 for 32-bit words) are used to designate the magnitude of the number. Therefore, the range of signed binary integers is based on word size as follows:

1 byte -128 to 127
2 bytes -32,768 to 32,767
4 bytes -2,147,483,648 to 2,147,483,647

Positive binary integers are in true binary notation with the sign bit set to zero. Negative binary integers are in two's-complement notation with sign bit set to one. Negative binary integers are formed in two's-complement notation by inverting each bit of the positive binary integer and adding one.

Unsigned binary integers use all bits including the msb to represent the magnitude of the number. Therefore, their range is as follows, again, based on word size:

1 byte 0 to 255
2 bytes 0 to 65,535
4 bytes 0 to 4,294,967,295

A field containing a binary integer is give the data type of unsigned integer if its content will never be a negative or if a negative value just does not make sense for that field. For example, the idea of a negative scan line number or negative date or time is nonsensical. Therefore, fields containing scan line numbers, dates and times are labeled as unsigned integers.

Unfortunately, this data type is not supported by all computer languages (e.g., FORTRAN), so additional data manipulation may be necessary. In the case of reading a 16-bit unsigned integer (DATA), a FORTRAN user could use the following code snippet to extract the actual value (VALUE):


	INTEGER*2 DATA

	INTEGER*4 VALUE

	...

	READ DATA

	IF (DATA .LT. 0) THEN
		VALUE = 65536 + DATA

	ELSE
		VALUE = DATA

	ENDIF

	...

But note that nearly all unsigned integer fields can be safely read into signed integer data types of the same word sizes. This is because they were originally written to the Level 1b using signed integer data types, and thus will be within the positive range of the corresponding signed integer data type. The Level 1b format specifications will clearly indicate, by providing ranges, those unsigned integer fields that must be strictly treated as unsigned integer data types - using the data manipulation described above, if necessary - to ensure that correct values are retrieved.

However, not all fields of an unsigned integer data type contain unsigned binary integers. Fields containing packed data are also identified as unsigned integers. While its msb is not a sign bit, a field containing packed data does not represent an unsigned binary integer. Such a field requires the user to perform some type of special unpacking technique in order to extract the information of interest from the field in order for it be to interpreted correctly. Packed data may be bit fields, packed integers, or both. A bit field is one or more consecutive bits used to indicate one of two or more possible conditions or states. (A bit flag is a specialized instance of a bit field. It is a single bit indicating one of only two possible conditions.) For example, a three-bit field may indicate which of seven different modes that an instrument is operating in (i.e., 0 implies "power on mode', 1 implies "warm up mode", 2 implies "standby mode", etc.). A packed integer is simply a binary number that is stored in just a subset of an unsigned integer field's bits. Although similar to a bit field, a packed integer is not an indicator of a condition, but an actual numeric value having magnitude, that once unpacked, could be used in arithmetic computations.

To provide maximum portability of the Level 1b data sets across different computer platforms, floating point data is represented by scaled integers. Scaled integers can be either signed or unsigned, and are simply floating point numbers multiplied by a fixed scaling factor so that a sufficiently precise representation of the original number can be stored in integer form. For example, the floating point value 1.2313 might be multiplied by 10² to achieve an integer value of 123. To achieve better precision, the floating point value might be multiplied by 10³ or 10⁴ to achieve an integer value of 1231 or 12313, respectively. In the Level 1b data sets, the scaling factors are powers of ten, and only the exponents (2,3 and 4 in the previous examples) are documented within the data set. To recover an approximation of the original floating point value, divide the integer value by ten raised to the given exponent.

A major problem impeding the free transport of binary data from one computer system to another is the "Big Endian - Little Endian" dichotomy. "Big Endian" systems (e.g. IBM 370, Macintosh, SGI, Sun SPARC) store bytes of binary numeric data in reverse order relative to "Little Endian" systems (e.g. IBM PC, DEC Alpha). For example, a 32-bit hexadecimal value of x01020304 (decimal value 16,909,060) written to a binary file by a Big Endian system would be read from the file as x04030201 (decimal value 67,305,985) by a Little Endian system. Level 1b data sets generated and archived by NOAA are in Big Endian order; users with Little Endian systems must include an additional byte-swapping step when reading binary numeric data from Level 1b data sets produced by NOAA. Some processors support byte swapping in their instruction sets, but others must use compiler dependent functions.

Amended March 29, 2004

Previous Section

Top of Page

Next Section