nci logo
NIH
U.S. National Institutes of Health National Cancer Institute

SEER*Stat Frequency Exercise 2

The site recode variables are derived from the "Primary site" and "Histologic type ICD-O-3" variables after the registries submit the data to SEER (see SEER Incidence Site Recode). These variables are added to the SEER data as a convenience and are used in most SEER publications to define the cancer of interest.

For example, the sections of the SEER Cancer Statistics Review correspond to a value of a site recode variable.

Create a table showing frequencies by primary site for cases with Site recode with Kaposi and mesothelioma = lung and bronchus. Include only malignant cases diagnosed in the SEER 17 registries from 2000 through 2005 and exclude cases with unknown age and those not in the Limited-Use database.

Note: Starting with the November 2004 data submission, SEER uses the site recode variable in which mesothelioma and Kaposi sarcoma are separate categories.

Key Points and Reminders

  • This exercise introduces three variables related to cancer type in the SEER databases (primary site, site recode with Kaposi and mesothelioma, histology).
  • In this exercise, you will calculate frequencies for cases diagnosed between 2000 and 2005, the only years for which all 17 registries have cases in the database.
  • You will set primary site as a display variable in order to create a table showing frequencies for the variable's individual values.
  • The Matrix menu provides a variety of options for modifying the layout of the output table. This exercise uses the Hide Zero Count Rows option to suppress the display of primary site codes not relevant to lung and bronchus cancer.

Step 1:  Create a new Frequency Session

  • Start SEER*Stat.
  • From the File menu select New > Frequency Session or use the Frequency button on the toolbar.

Step 2:  Select a Database (Data Tab)

  • On the Data Tab select "Incidence - SEER 17 Regs Limited-Use + Hurricane Katrina Impacted Louisiana Cases, Nov 2007 Sub (1973-2005 varying)".

Learn More...

Databases distributed with SEER*Stat use names designed to describe the data. The various parts of this exercise's database name indicate the following:
  • Incidence - The database contains cancer incidence data.
  • SEER 17 Regs - The database contains data for the "SEER 17" registries as defined in SEER Registries - Common Terms.
  • Limited-Use, Nov 2007 Submission - This is the Limited-Use version of the database. The data was submitted to the SEER program by the registries in November 2007.
  • + Hurricane Katrina Impacted Louisiana Cases - Hurricane Katrina had a major impact on Louisiana's population for the July - December 2005 time period, Louisiana cases diagnosed for that six-month time period have been excluded from the limited-use database. These cases are provided with the data, but they are considered supplemental data. For more information, see Adjustments for Areas Impacted by Hurricanes Katrina and Rita.
  • (1973-2005 varying) - These are the years of diagnosis for the cases included in the database. They are considered "varying" because the years of diagnoses for cases vary per registry, depending on which year the registry joined the SEER Program and began contributing data.

Step 3:  Choose the Statistics to Display (Statistic Tab)

  • Move to the Statistic Tab.
  • In the Statistic box, select Frequencies.

Step 4:  Defining the Analysis Cohort (Selection Tab)

  • Move to the Selection Tab.
  • In this exercise, we want a frequency of malignant lung and bronchus cancer cases diagnosed from 2000 through 2005. We do not want to include all of the cancer sites included in the database, and we only want to include years for which all 17 registries have data. Therefore, we need a selection statement based on site, behavior, and year of diagnosis. The database selected on the Data Tab contains cases diagnosed in the SEER 17 registries; therefore selections based on registry are not necessary.
  • The Select Only box provides a shortcut for commonly-used selections. It is very common to select only malignant behavior when analyzing cancer data. Excluding cases with unknown ages is also common when calculating rates. For this exercise we want to select malignant cases in the limited-use database, so make sure that the Malignant Behavior and Cases in Limited-Use Database options are checked. Since we are just showing frequencies, we can include cases of unknown age. Turn off the Known Age selection.
  • Click Edit to open the Case Selection window.
  • Using the controls at the top of the window, you will create a selection statement. The variables are listed in categories in the Variable box on the top left of the screen.
  • In the Variable box, use the "+" to expand the "Site and Morphology" category.
  • Select "Site rec with Kaposi and mesothelioma".
  • Moving to the center of the window, check to see that "is = to" is selected as the Operator.
  • Scroll through the items in the Values box until you find and select "Lung and Bronchus".
  • In the Variable box, use the "+" to expand the "Race, Sex, Year Dx, Registry, County" category. Select "Year of diagnosis".
  • Moving to the center of the window, check to see that "is = to" is selected as the Operator.
  • Select "2000" from the Values box and select all the years from 2000 though 2005.
  • At this time, the following should appear in the Selection Statement box at the bottom of the window:
    {Site and Morphology.Site rec with Kaposi and mesothelioma} = ' Lung and Bronchus'.
    AND {Race, Sex, Year Dx, Registry, County.Year of diagnosis} = '2000','2001','2002','2003','2004','2005'
  • Use the OK button to close the Case Selection window.

Learn More...

  • The Problem Statement specified to create a table showing the frequency of primary site for malignant cases with site recode = lung and bronchus. It is possible to select lung and bronchus cases by creating a selection statement that uses primary site and histology. However, it is much easier to use one of the site recode variables if you do not want to include hematopoietic diseases (such as lymphomas and leukemias) in the table.
  • Each line of a selection statement defines selection criteria for one variable. When you have finished defining a line -- by choosing a variable, an operator, and one or more values -- highlighting a different variable in the Variables box will automatically begin a new line.

Step 5:  View the Primary Site Variable Groupings

  • Move to the Table Tab.
  • Use the "+" to expand the "Site and Morphology" category in the Available Variables box.
  • Select "Primary Site - labeled".
  • Open the dictionary editor to view the groupings for "Primary site". In a previous exercise we learned that the dictionary editor could be opened by:
    1. using the Dictionary button on the toolbar,
    2. selecting Dictionary from the File menu, or
    3. double-clicking on the "Primary site" variable.
  • The Dictionary window should now be open.
  • If it is not already selected, select the "Primary site - labeled" variable from the "Site and Morphology" category.
  • The Create button will be enabled when a variable is selected. Use the Create button to open the Edit Variable window to view the groupings and values associated with this variable.
  • The values of the "Primary Site - labeled" variable correspond to ICD-O-3 codes. There is one grouping for each value of the primary site variable.
  • Click Cancel to close the Edit Variable window with out making any changes.
  • Click Close to close the dictionary.

Learn More...

  • Prior to the November 2006 data submission, there was only one primary site variable available which had unlabeled values. If you use the dictionary to view the "Primary Site" variable (not "labeled"), it only has one grouping which contains all possible values. The values are listed in a 3 digit format. The C and decimal point in the ICD-O-3 codes are implied, C34.0 is displayed as 340 in SEER*Stat. To use this variable as a display variable, it was necessary to create a user-defined variable with each value added as an individual grouping.

Step 6:  Set the Row Variable

  • Use the "+" to expand the "Site and Morphology" category in the Available Variables box at the bottom of the Table Tab.
  • Select "Primary site - labeled".
  • Click Row on the right hand side of the screen.
  • At this time, "Primary site - labeled" should be listed as a row variable in the Display Variables box at the top of the window.

Step 7:  Specify a Title (Output Tab)

  • Move to the Output Tab.
  • Enter the following title:
Primary Site Frequencies for Malignant Lung and Bronchus (Site Recode with Kaposi and mesothelioma)
SEER 17 Registries, 2000-2005
Frequency Exercise 2

Step 8:  Create Matrix and Hide Extraneous Rows

  • Use the Execute button or select Execute from the Session menu to execute the session.
  • A dialog will display the progress of the job. When the job completes a SEER*Stat matrix window will open containing the results.
  • The results matrix contains one line for each value of the primary site variable. The large number of rows makes it difficult to review the relevant frequencies. SEER*Stat has an option that allows you to suppress the display of the rows with a value of zero in each column.
  • Select Options from the Matrix menu.
  • In the Options box check Hide Zero Count Rows.
  • Click OK.
  • The size of the table will be reduced to six rows: one row for each of the primary site ICD-O-3 codes for lung and bronchus.
  • Compare your results to this SEER*Stat matrix file: Exercise Matrix 2 Results.

Learn More...

  • The Options window gives you the opportunity to correct typographical errors in the title. Corrections that you make to the title in the matrix options will appear in the matrix and printed output. However, sessions extracted from the matrix will retain the original, unedited title.
  • Use the Help button on the Options window to learn about the other matrix options.