dccps logo
Epidemiology and Genetics Research Branch

About the Data

More than 80 datasets are in the LI GIS. Five categories of data are in the system:

  • geographic attribute data — locations of roads, water features, parks and landmarks, and base maps, which define the legal boundaries of geographic areas and serve as reference points for other maps for Nassau and Suffolk counties
  • demographic data — data on the age, race, sex, and income of the population
  • health outcome data — breast cancer incidence and health facilities data
  • environmental data — land use; land cover; railroads; traffic; water use; potential sources of water pollution; releases of chemicals into water, air, and soil; electrical power lines; information on toxic chemicals and hazardous and municipal waste; and radiation
  • other data —- including weather information and satellite image maps

Sources

The datasets come from various sources. They include State Health Departments, Nassau and Suffolk counties, and the U.S. Postal Service. Demographic data sources include the U.S. Bureau of the Census. Health data sources include relative breast cancer incidence data (observed/expected) by ZIP Code for Nassau and Suffolk counties from the New York State Department of Health (DOH) Cancer Surveillance Improvement Project. Environmental data are provided by the New York State DOH, U.S. Environmental Protection Agency (EPA), U.S. Geological Survey (USGS), and U.S. Department of Agriculture (USDA).

Long Island community members also have contributed information on possible sources of environmental pollution. The pieces of information are documented by topic, location, source of information (e.g., letter, news clipping), and whether the information is confidential. The information then is categorized as potentially valuable and slated to investigate further for possible dataset acquisition, potentially useful for validating current LI GIS datasets, potentially useful for researchers doing focused studies, and level of detail is too limited to be useful. The database is available to researchers who are using the LI GIS. Researchers are cautioned that the information has not been verified and may not be accurate. For this reason, the information is not on the public Web site.

Metadata Browser

The Metadata Browser provides much information about the datasets in the LI GIS. Metadata is defined as data about data, or information about information. The Metadata Browser is a useful resource for researchers who require detailed information about the data and how they are organized in the Data Warehouse. This information helps researchers assess the usefulness and relevance of data for their purposes.

The Metadata Browser has four areas:

  • Data Warehouse – Describes how the datasets are integrated into the LI GIS
  • Federal Geographic Data Committee (FGDC) Reports – Provides metadata on the geographic data
  • Source Datasets – Provides metadata about the source datasets, which are the raw materials used to build the Data Warehouse
  • Data Quality Summaries – Reports on the origin of the data, the method and purpose of data collection, any temporal issues, and a general assessment of the quality of the data. Also included are issues to consider when choosing which types of data to use in the research, and privacy or data ownership concerns.

In most cases, data for the LI GIS were collected for purposes other than health-related research. The user should review the metadata carefully, paying special attention to accuracy, consistency, quality, and use constraints.

Selection of Data

The LI GIS team focused on identifying, prioritizing, and acquiring a core set of existing datasets believed to be important to support research on relationships between environmental exposures and breast cancer. The datasets were evaluated for their completeness, geographic applicability, and their ability to provide a meaningful level of detail. Strong emphasis was placed on high-quality data, so that researchers and the public have confidence in the system and the research findings produced with their use. The LI GIS Oversight Committee, which existed from 2000-2003, reviewed and approved the datasets before they were added to the system.

How long ago data are available depends on the specific dataset. Data that date back many years are highly desirable because of the long latency, perhaps 20 years or more, between occurrence of whatever factor(s) may be responsible for breast cancer and onset of disease.

Confidentiality and Privacy of Human Subject Data, Other Dataset Restriction Policies and Protections

Maintaining patient and provider confidentiality is a primary concern in light of the sensitive nature of some of the data in the LI GIS. To ensure the confidentiality and privacy of individuals whose data are contained in the LI GIS, personal identifiers for all patient and medical care providers are removed from health data. However, there remains the remote risk of re-identification (given the large amount of data available), and researchers must pledge their commitment to confidentiality and privacy as part of the application process to obtain access to the LI GIS data. Researchers using person-level data will have to provide evidence of Institutional Review Board (IRB) review or exemption from review.

Certain other datasets were provided by their owners/custodians with the understanding that use be restricted to approved investigators who provide specific justification to use them.

Investigators are required to obtain NCI’s approval in order to utilize the data in the LI GIS. NCI will work with investigators requesting data files to balance their research needs with those of the individuals and institutions included in, or concerned with, the data.

Researchers are responsible for obtaining approval of their research projects from the appropriate local or private IRB responsible for assuring protection of human research subjects. IRB approval is not required when applying to use the LI GIS, but it is needed before access to the system is provided.

Access to Breast Cancer Data

Data on relative breast cancer incidence (observed/expected) by ZIP Code for Nassau and Suffolk counties for 1973-1997 are available within the LI GIS, because these data are publicly available from the New York State Department of Health (DOH) in that form. However, researchers must obtain access to more precise information about the location of breast cancer patients or other breast cancer data separately from the DOH Cancer Registry. Researchers are strongly encouraged to contact NCI for assistance in tailoring the request before contacting the Cancer Registry. After the data have been obtained, they may be incorporated into the LI GIS and will be accessible only to that researcher. Direct requests to the New York State Cancer Registry.