Northern Prairie Wildlife Research Center

Geographic Information Systems

GIS Data Structures


Geographic database management systems are more complex than database management systems used for banking, library searches, airline bookings, and medical records. Three general features of the data within a GIS must be maintained: (1) information on the position of the feature being stored, (2) topological information on the spatial relationships of the features (topology is the way in which geographic features are connected and provides a mechanism to identify the positional relationships among features), and (3) attributes of the feature (Burrough 1986). The spatial component of geographic data describes the location of a feature and the possible topological relationships among features. The attribute component describes various attributes of the feature.

The spatial components of geographic data can be represented by three data types: points, lines, and areas (Fig. 1). The spatial data types are referenced to a location by a standard system of coordinates, such as UTM (see Box 1), or by a local coordinate system. Local coordinate systems can be created simply by assigning the southwest corner of a map the X and Y coordinates of 0.0 and 0.0 and measuring the horizontal (X) and vertical (Y) distances from the southwest corner of the map to the feature. The coordinates for the location of a feature, such as a bird's nest, could be ascertained by marking the location of the nest on a map and assigning the X coordinate as the number of centimeters from the west edge of the map to the location marked on the map, and assigning the Y coordinate as the number of centimeters from the south edge of the map to the location marked on the map. Of course, using standard coordinates, such as UTM, state plane, or latitude and longitude, ensures that anyone using the data in the future will know the precise location of the bird's nest. In addition to the coordinates, a label describing which bird's nest is located at the given coordinates will be stored with the coordinates. The attribute record for the bird's nest will be referenced by the label and might include various attributes for the nest, including species of nesting bird, height of the nest, number of eggs laid, and number of eggs hatched. The label links the spatial data with the appropriate attribute record.

Spatial data are represented in GISs in two very different ways. Figure 2 shows the two different ways that a stream could be represented in a GIS. Spatial data can be represented as either rasters or vectors. In raster format, a grid is used to represent the study area. The location of features in the study area is depicted by the values in the cells overlaying the feature. Vector data represent geographic features by coordinates of points, lines, and polygons. Points represent small features such as wells, towers, or nest locations. Linear features such as roads and streams are represented by lines. Areas such as cities, forests, wetlands, and soil units are represented by polygons. Polygons are bounded on all sides by a series of straightline segments.

GIF -- Figure 2
Fig. 2. -- A stream can be represented in a GIS either by a raster (a) or vector (b) format.


Raster Data

Raster data are stored in the computer as a matrix. The cells are referenced by lines and elements (Fig. 3). In the simplest form, each line is a computer record. Each record will contain the values for all elements in the line. Any cell not containing a feature would have the value of "0". In the simplest raster system, the value stored for each cell is the attribute component of the geographic data. In Fig. 4, cells with value of "1" are forests, cells with value of "2" are croplands, and cells with value of "3" are rangelands.

In more sophisticated raster systems, the cell value is a label that will link to records as an attribute file. In the above example, cells labeled as "1" could have many attributes, such as species composition, age of forest stand, and estimated volume of marketable timber.

GIF -- Figure 3
Fig. 3. -- Raster data are stored in computers as a matrix. Each cell is referenced by its line and element number. The example shown is for a small file with 10 lines and 10 elements. Cells A is located at line 3, elements 2. Cell B is located at line 6, element 8.

GIF -- Figure 4
Fig. 4. -- Land cover as represented in a simple raster system. Cells with a "1" are forests, cells with a "2" are croplands, and cells with a "3" are rangelands.

Because the raster system is strictly a two-dimensional matrix, various types of geographical data are stored as different layers or overlays in the GIS (Fig. 5). One layer may contain land use/land cover, another layer may contain wetland data, and another layer may contain information on the transportation system.

GIF -- Figure 5
Fig. 5. -- Various types of geographic data may be stored as different layers or overlays in a GIS.

The user of a raster system must determine the size of the cells to be used. This size is referred to as spatial resolution (see Box 2 for various meanings for the term resolution). The cell size can vary tremendously depending upon the size of the study area and the objectives for the GIS. Cell sizes as large as 20 ha for state or regional planning may be adequate. For a wildlife management area, a cell size of 0.05 ha or smaller might be required, depending upon the application of the GIS and the size of the wildlife management area. Storage requirements increase drastically as the cell size is reduced. Reducing the cell size by one-half will increase the data storage requirements by a factor of four. Conversely, as cell size increases, the precision of the representation of the land feature is reduced. Choosing the appropriate cell size for a particular GIS application is a compromise between cost of data storage and computer time and reliability of the representation of the land feature.


Vector Data

Vector data provide for high precision in representing the location of features. Aronoff (1989) described how vector data can be used to define the location of a point, a line, and an area. A point is represented by a simple pair of coordinates. The line is represented by an ordered list of pairs of coordinates. The area is represented as a polygon with ordered pairs of coordinates that close the polygon (the first and last pair being the same).

The coordinates can be any arbitrary units but usually are stored as UTM, state plane, or latitude and longitude coordinates. The first vector system used simple techniques to store the X and Y coordinates for polygons. In this simple system the coordinates for the common boundary between two areas were stored twice, once for the first area and again for the adjacent area. These duplicate storage techniques simplified computations and plotting but wasted storage space and, more importantly, provided no information as to adjacency or connectivity of geographic features (topology). Most vector systems now use topological models (Aronoff 1989) for representing the location of areas.

In topological models (Fig. 6), a polygon is defined by a series of arcs. Arcs begin and end at nodes, which occur wherever two or more arcs meet. Each arc is defined by a series of coordinates, starting with the coordinates for the beginning node and ending with the coordinates for the ending node. Topological relationships are stored in three tables. The polygon topology table describes the arcs that bound each polygon, the node topology table describes the arcs that end at each of the nodes, and the arc topology table describes which end points (nodes) occur on each arc and which polygons are to the left and right of each arc. These three topology tables provide the tools required to efficiently determine the positional relationships of one feature to other features. A coordinate table defining the coordinates for each arc also is used in topological models. In addition to these topological databases, the attributes for the features are stored in an attribute database.

GIF -- Figure 6
Fig. 6. -- In vector systems that use topological models, polygons are represented by a list of arcs (adopted from Aronoff 1989). The arcs required to defing each polygon are shown in the polygon topology table. The node topology table defines the arcs associated with each node. The arc topology table describes the starting and end nodes for each arc and defines the polygons to the left and right of each arc.


Raster Versus Vector Systems

Early GISs were either raster or vector systems. Table 1 lists various advantages and disadvantages of vector and raster data systems (Burrough 1986, Aronoff 1989). Both approaches are equally valid ways of representing spatial data. The advantages and disadvantages of raster and vector systems have been heavily debated. Most modern GISs handle both raster and vector data but usually are designed primarily for one data type. The complete integration of raster and vector data capabilities will be common in GISs of the future (Faust et al. 1991). These new GISs will quickly and efficiently convert among rasters, vectors, and other data structures most appropriate for the application being performed (McKeown 1987, Ripple and Wang 1989, Piwowar and LeDrew 1990). GISs must function equally well with both raster and vector data.

Table 1. Comparison advantages and disadvantages of vector and raster methods as revised from Burrough (1986) and Aronoff (1989).
Raster Method

Advantages
  • Data structure is simple.
  • The method is compatible with remotely sensed or scanned data.
  • Procedures for spatial analysis are simple.

Disadvantages
  • Greater disk storage is often required.
  • Topological relationships are difficult to represent.
  • Unless extremely small cell sizes are used, the graphic output is often aesthetically less pleasing.
  • Projection transformations are more difficult.


Vector Method

Advantages
  • Compact data structure requires less disk storage.
  • Topological relationships are readily maintained.
  • Graphic output is aesthetically more pleasing and more closely approximates hand-drawn maps.

Disadvantages
  • Data structures are complex.
  • Overlaying multiple vector maps is often time consuming.
  • Output graphics may take hours to draw on plotters.
  • Some spatial analysis procedures are difficult.
  • Software and hardware for vector systems are often more expensive.
  • The method is not as compatible with remote sensing data.

Previous Section -- What is a GIS
Return to Contents
Next Section -- Data for the GIS
NPWRC Home�|�Site�Map�|�About Us�|�Staff�|�Search�|�Contact�|�Web�Help�|�Copyright

Accessibility FOIA Privacy Policies and Notices

Take Pride in America home page. FirstGov button U.S. Department of the Interior | U.S. Geological Survey
URL: http://www.npwrc.usgs.gov/resource/habitat/research/struct.htm
Page Contact Information: npwrc@usgs.gov
Page Last Modified: August 3, 2006