National Cancer Institute   U.S. National Institutes of Healthwww.cancer.gov
GIS Geographic Information Systems
Search:   
HomeContact UsGIS Special Interest Group
 
National Cancer Institute GIS

GIS Research & Applications | GIS Database Development | Spatial Data Analysis | Geovisualization Tools Development | Communication of Georeferenced Statistics

Spatial Data Analysis

NCI develops and extends methodology for spatial data analysis to improve the identification of patterns of cancer rates and trends and to highlight areas in need of cancer control interventions. Areas of active research include:

  1. Environmental exposure assessment
  2. Statistical modeling
  3. Outlier detection for cancer surveillance
  4. Cluster identification

Environmental Exposure Assessment

  • GIS can provide information about potential environmental exposures that cannot be obtained through traditional epidemiologic methods.
  • Study in south central Nebraska demonstrated use of satellite imagery to reconstruct historical crop patterns (Ward et al., Env Health Perspectives, 2000).
Landsat Imagery
(color infrared display of bands 4,2,1)
A Landsat infrared image
+ Farm Service Agency
historical aerial photos with crops noted
A black and white aerial photograph with the locations of crops indicated
= Classified Land Cover Map
A land cover map developed from the Landsat and FSA images
Example: Epidemiologic Study of Non-Hodgkin's Lymphoma (NHL)
In an epidemiologic study of non-Hodgkin's lymphoma, NCI:
  • mapped residences, then assessed proximity of residences to specific crop;
  • assigned probabilities of exposure based on available pesticide use data for each crop; and
  • demonstrated that zones of potential exposure to agricultural pesticides and proximity measures can be determined for residences.
Example of a land cover map for an epidemiologic study of non-Hodgkin's lymphoma

Statistical Modeling

  • Cancer incidence prediction project goal is to model data from NCI cancer registries (which cover 470 counties) to predict the number of cases in all states.
  • Use hierarchical Poisson regression models to characterize associations between cancer incidence/mortality and sociodemographic/lifestyle factors by county.
  • These factors explain spatial variation so well that no spatial correlation is needed in the model.
  • Extensions of original models:
    • Spatio-temporal prediction of cancer rates by state
    • Predicted incidence is used to predict prevalence
    • Predicted incidence is used to calculate % completeness of case ascertainment for each cancer registry
Map of SEER registry locations in the US

Covariate data available for all counties:

  • cancer mortality rates
  • sociodemographic factors (income, schooling, etc.)
  • medical facilities
  • cancer screening utilization
  • smoking, obesity, no insurance

Output: Predicted Incidence Rates
Smoothed by county
US map of smoothed predicted incidence rates for female breast cancer
Absolute rates
Maps showing absolute rates of predicted cancer incidence in each US state in 1999, for each sex and for a variety of cancer sites
Relative rates
Maps showing relative rates of predicted cancer incidence in each US state in 1999, for each sex and for a variety of cancer sites
Pickle, Feuer, Edwards. U.S. Predicted Cancer Incidence, 1999: Complete maps by county and state from spatial projection models. NIH Pub No 03-5435, 2003.

Outlier detection for cancer surveillance

Lung cancer mortality rates among white males, 1950-69
Observed rates:
Map showing non-smoothed observed rates of lung cancer among white males, by US county. County borders are sharply delineated.
Smoothed rates (expected pattern):
Map showing smoothed rates of lung cancer among white males, by US county. Regions are more broadly grouped.
  • Can we detect significant outliers (unusual occurrences) of the # of new cancer cases?
  • Applied an empirical Bayes data mining algorithm to test data (DuMouchel & Pregibon, Proc KDD, 2001; Lincoln Technologies, Inc)
  • Method assumes Poisson distribution of # cases, estimates Relative Risk = observed/expected
  • Lung cancer mortality, white males, 1950-69
    • Smoothed map provided expected # cases per county
    • Algorithm compared actual # cases to this expectation
    • Found known "hot spot" in MT, site of copper smelter (Lee & Fraumeni, JNCI, 1969)
US map showing smoothed lung cancer mortality rates. An arrow indicates a region in Montana with an especially high rate.

Cluster Identification

  • Are apparent map clusters real or random noise?
  • SaTScan software identifies most likely significant cluster over space, time or both
  • Algorithm: spatial scan statistic for Poisson or Bernoulli event data, adjusts for population heterogeneity & covariates
  • Originally identified circular clusters, new version scans for elliptical clusters, various shapes & angles
  • Software: www.satscan.org
  • Recently extended to clusters of survival rates

Developed by Martin Kulldorff: Stat in Med, 1995, 1996; Communications in Statistics, 1997; Am J Epidemiology, 1997; Am J Public Health, 1998.

Examples of likely cluster of breast cancer mortality rates in the US[D]

GIS Research & Applications | GIS Database Development | Spatial Data Analysis | Geovisualization Tools Development | Communication of Georeferenced Statistics