Skip to Content
nci logo
NIH
U.S. National Institutes of Health National Cancer Institute

SEER*Stat Rate Exercise 4a

In this exercise, you will define a complex selection statement to produce statistics by expanded race using the SEER Program's guidelines. You will also create selection statements using variables with unlabeled values.

If you are just getting started with SEER*Stat, be sure to do the introductory tutorials first.

Problem Statement

Create a table showing frequencies and incidence rates (age-adjusted to the 2000 US standard population) for malignant esophageal squamous cell carcinoma. Include only microscopically confirmed cases. Calculate these statistics for persons diagnosed from 1992 through 2005 in the SEER 13 Registries. Do not show statistics based on fewer than 25 cases.

Display the statistics by race and sex. Show data for males and females separately but not combined. Use the following races: "White", "Black", "American Indian", "Asian or Pacific Islander". Include standard errors and confidence intervals in the table.

Define squamous cell carcinoma as: Histologic Type ICD-O-3 = 8070-8077

Key Points and Reminders

  • This exercise requires that you create a complex selection statement to include the correct race and region combinations. When producing statistics using SEER Incidence data for American Indians, SEER frequently only includes cases that are in a Contract Health Service Delivery Area (CHSDA). The selection statement will use parentheses and the "OR" conjunction.
  • In this exercise, you will also make selections by specifying a range of numeric values. In previous exercises, you selected values from a list of values with labels.

Step 1:  Create a new Rate Session

  • Start SEER*Stat.
  • From the File menu select New > Rate Session or use the Rate button on the toolbar.

Step 2:  Select a Database (Data Tab)

  • On the Data Tab select "Incidence - SEER 13 Regs Limited-Use, Nov 2007 Sub (1992-2005) <Katrina/Rita Population Adjustment>".

Learn More...

  • Databases distributed with SEER*Stat use names designed to describe the data. The various parts of this exercise's database name indicate the following:
    • Incidence - The database contains cancer incidence data.
    • SEER 13 Regs - The database contains data for the "SEER 13 Registries" as defined in SEER Registries - Common Terms.
    • Limited-Use, Nov 2007 Sub - This is the Limited-Use version of the database. The data was submitted to the SEER program by the registries in November 2007.
    • (1992-2005) - These are the years of diagnosis for the cases included in the database.
  • The suggested citation for the database selected on the Data Tab is shown at the bottom of the screen. For more information, see Citations for SEER Databases and SEER*Stat Software.

Step 3:  Choose the Statistics to Display (Statistic Tab)

  • Move to the Statistic Tab.
  • In the Statistics box, select Rates (Age-Adjusted).
  • In the Parameters box:
    • Make sure that the Standard Population is set to "2000 US Std Population (19 age groups - Census P25-1130)".
    • Make sure the Age Variable is set to "Age recode with <1 year olds."
    • Check the Show Standard Errors and Confidence Intervals box.

Step 4:  Defining the Analysis Cohort (Selection Tab)

  • Move to the Selection Tab. Specific click-by-click instructions for creating individual selection statements were given in previous tutorials (see Frequency Exercise 1a).
  • Make sure that the Malignant Behavior option is checked in the Select Only box at the top of the tab.
  • For this problem you will need to select based on race, CHSDA region, behavior, cancer site, histology, and diagnostic confirmation. Use the Find button to locate a variable based on its name or a value (for example, if you search for "microscopically confirmed" you will find the "Diagnostic confirmation" variable.
  • In the "Race, Sex, Year Dx, Registry, County (Pop, Case Files)" box, use the conjunctions "AND" and "OR", and group lines using parentheses, to make the following selections:
  • Race recode (W, B, AI, API) = White,Black,Asian or Pacific Islander
    OR ({Race, Sex, Year Dx, Registry, County.Race recode (W, B, AI, API)} = 'American Indian/Alaska Native'
    AND {County attributes.CHSDA 2006} = 'CHSDA')

    Note: Parentheses around a group of lines tell SEER*Stat to evaluate those lines first when processing the selection statement. When using parentheses, you must first create the selection statement lines and then add the parentheses. To add parentheses to a selection statement, click and drag your cursor to highlight the lines you want to work with, then click Add (...) to enclose those lines in parentheses.
  • In the "Other (Case Files)" box, make the following case selections:
  • {Site and Morphology.Site rec with Kaposi and mesothelioma} = ' Esophagus'
    And {Site and Morphology.Histologic Type ICD-O-3} = 8070-8077
    And {Other.Diagnostic confirmation} = 'Microscopically confirmed'

Learn More...

  • Through the use of the complex selection statements, you were able to define an analysis cohort which includes:
    1. All records for Whites, Blacks, and Asian/Pacific Islanders for all registries and years in the selected database (SEER 13 registries, 1992-2005)
    2. All records for American Indians within the CHSDA regions.
  • When you selected the "Histologic Type" variable, the Values box in the Selection window changed format. The valid values for the Histologic Type variable are shown just above the Values box. It is not practical to list all values for variables with a large number of numeric values. If you want to specify a range of values for an unlabeled variable, use a hyphen to define the range and use commas to separate multiple values or ranges (e.g. 1-5,8-19).

Step 5:  Create User-Defined Variables to use on the Table Tab

For this exercise, you need to define two new variables, one for race and one for sex.

Open the Data Dictionary.

  1. Select the "Race recode (W,B,AI,API)" variable from the "Race, Sex, Year Dx, Registry, County" category and use the Create button to open the Edit Variable window.
    • Change the Name of the variable to: "Race recode (W,B,AI,API) w/o unks".
    • Delete the "Other unspecified (1991+)" and "Unknown" groupings in the Groupings box.
    • When you are finished, click the OK button.

  2. Select the "Sex" variable from the "Race, Sex, Year Dx, Registry, County" category and use the Create button to open the Edit Variable window.
    • Change the Name of the variable to: "Sex (no total)".
    • Delete the "Male and Female" grouping in the Groupings box.
    • When you are finished, click the OK button.
    • Close the dictionary.

Step 6:  Set the User-Defined Variables as Row Variables (Table Tab)

  • Use the "+" symbol to expand the User-defined category in the Available Variables box at the bottom of the Table Tab.
  • Select "Race recode (W,B,AI,API) w/o unks" from the "User-Defined" category, then add it as a row variable.
  • Next, select "Sex (no total)" and add it to the row dimension as well.

Step 7:  Specify a Title and Hide Statistics (Output Tab)

  • Move to the Output Tab.
  • Enter the following title:
  • Malignant Esophageal Squamous Cell Carcinoma
    Microscopically Confirmed Cases Only, 1992-2005
    SEER 13 for White, Black, API
    SEER 13 (incl. CHSDA only) for AI/AN
    Rate Exercise 4a
  • Check the option to "Hide Statistics When Fewer Than 25 Cases"

Step 8:  Create the Matrix and Re-order the Rows

  • Use the Execute button or select Execute from the Session menu to execute the session.
  • A dialog will display the progress of the job. When the job completes, a SEER*Stat matrix window will open containing the output table.
  • The output table contains two row variables (race and sex). The outermost row variable is the first variable listed as a row variable on the session's Table Tab. The innermost is the second row variable on the Table Tab.
  • Change the order of the row variables. From the Matrix menu, select Order and then Row.
  • Select the first variable listed and click Move Down button to switch the order of the variables.
  • Click OK.
  • The variable you moved down is now the inner row variable in your results matrix.
  • Use the Save As command on the File menu to save the matrix for use in the next exercise. Enter "Rate Exercise 4a" as the filename. SEER*Stat will assign the "sim" extension to indicate that this is a "SEER*Stat Rate Matrix" file.
  • Compare your results to this SEER*Stat matrix file: Exercise Matrix 4a Results.

Learn More...

  • The Matrix menu gives you the opportunity to customize your results, as well as export the results for use in other applications. See Results Matrix in the SEER*Stat help system for more information.