--------------- SEER*Prep 2.4.0 --------------- The SEER*Prep software converts ASCII text data files to the SEER*Stat database format, allowing you to analyze your cancer data using SEER*Stat. SEER*Prep performs two main functions: it converts text data to the specific binary format required by SEER*Stat, and it creates the SEER*Stat data dictionary. For more information on SEER*Stat, please see http://seer.cancer.gov/seerstat. ------------------------------------------ FILE SUPPORT: INCIDENCE AND MORTALITY DATA ------------------------------------------ Incidence - NAACCR 1946 byte version 11.1 incidence-only record type (dated June 2006) Mortality - SEER*Prep 58 byte (developed specifically for this application by the SEER Program) When using either incidence data format, there is a mechanism where you can prepare additional variables for use in SEER*Stat. These are variables that are not part of the documented data formats. The NAACCR data format does not support every defined variable at this time. An attempt was made to include the most important variables but there will be some that you will want that are missing. Please send email to the following address stating which variables you need. Every attempt will be made to provide a new data description (.dd file) within a few weeks of your request. seerprep@imsweb.com ------------------------------------------------ INPUT FILE NOMENCLATURE, COMPRESSION, AND FORMAT ------------------------------------------------ All files used as input to SEER*Prep must be named with either a .txd or .txd.gz extension. These extensions signify "text data" and "compressed text data" respectively. The freely available gzip utility, located at http://www.gzip.org, can be used to compress your input files. This practice will greatly reduce the resources required to store your SEER*Prep input data and will only minimally detract from the performance of SEER*Prep. SEER*Prep supports only fixed length text input records. If your records are shorter than required, you must pad them with blanks. If they are longer, you must truncate them to the supported length. For help preparing your data to meet this requirement, see the SEER*Prep utilities. http://seer.cancer.gov/seerprep/utilities All numeric values of the input records that are processed by SEER*Prep, must be zero filled on the left. In other words, for a variable of length two, the value "01" must be used in the input file and not " 1". ------------------- SYSTEM REQUIREMENTS ------------------- - Pentium-based PC - 32-bit Microsoft Windows (95, 98, NT, 2000, XP, etc.) with a text editor installed such as Notepad - SEER*Stat 6.2.3 or later installed - 32 MBs application RAM - Approximately 4 MBs disk space In addition, disk space on your PC or LAN is required for the compressed version of your incidence, mortality, population and expected survival rate (life table) data, and the indices that are all generated by SEER*Prep. An upper bound on the required space is half that of the text version (uncompressed) of your input data. In most cases, it will be less than half. -------------------- INSTALLING SEER*Prep -------------------- To install SEER*Prep on your PC: - Download the installation program named sp240.exe from the Internet and save it locally - From the Start menu, choose Run...; assuming the local path for the installation program is C:\Temp, type C:\Temp\sp240.exe - Follow the instructions on your screen ---------------- REVISION HISTORY ---------------- 5/21/2007 - Version 2.4.0 1. The addition of variable Behavior recode for analysis derived. 2. Support for the NAACCR 11.1 file format. 3. We are no longer distributing NAACCR version 9 or 10 .dds. Also support for SEER 250 .dds has been eliminated. 5/24/2006 - Version 2.3.5 - Correction to the generation of the derived race variables. Unknown race, value 99, was not being handled properly. 5/19/2006 - Version 2.3.4 1. Changed names, formats, and algorithms for derived race variables. For more details, see: http://seer.cancer.gov/seerstat/variables/seer/yr1973_2003/race_ethnicity. 2. Changed names and formats for derived Hispanic origin variables. 3. ICCC recodes based on ICD-O-3. 4. Slightly more stringent coding of site recodes if histology >= 9590. 5. Ability to create population only databases. 6. Ability to create incidence-based mortality databases. 7. Add case and pop files dialog now defaults to show zipped and regular files. 8. Several new options and defaults on the database options dialog. 9. Minor fixes and enhancements to the dd files. 9/15/2005 - Version 2.3.3 (only available in one training class) 1. Added support for large fonts. 2. Fixed some issues with most recent used list for dd files. 3. Laterality variable: changed one format in NAACCR 10 that was too long. 4. Added Birthplace variable to NAACCR 10. 5. Added Census tract variables and NHIA origin variable to RateProblemVars in NAACCR 10. 6. Updated State-county in mort and NAACCR 10. 7. Add description with link to the SEER Web site for the Behavior recode for analysis variable in NAACCR 10. 2/18/2005 - Version 2.3.2 - Bug fix for the creation of derived variable Origin recode NHIA for the condition when the underlying variable NHIA Derived Hisp Origin is blank. 2/4/2005 - Version 2.3.1 1. Origin recode NHIA derived variable. 2. Ability to choose a database's default standard population. 3. Additional user-specified variables and all user-specified variables can now be population defining. 4. Change to population file format to match those distributed here: http://seer.cancer.gov/popdata. 5. Ability to specify a database informational message. 9/29/2004 - Version 2.3.0 1. Cause of death recode derived variable was updated to support new ICD-10 codes. See www.seer.cancer.gov/codrecode/1969+_d09172004. 2. Changed SEER*Prep to exclude population, standard population, and expected survival records with blank counts. 3. Added two 5-digit and two 6-digit user-specified variables in the NAACCR 10.1 dd file. 4. Added functionality to support a dynamic population record length. 5. In the NAACCR 10.1 dd file, changed all user-specified variables so they can be population defining. 6. Corrected column positions for a few treatment date variables in the NAACCR 10.1 dd file. 4/20/2004 - Version 2.2 1. Support for SEER*Stat's MP-SIR session. 2. Logic changes in the definition of several non-cancer CODs when creating the derived COD recode variables. See www.seer.cancer.gov/codrecode. 3. Support for the NAACCR 10.1 file format. 4. Addition of Behavior recode for analysis to all incidence .dds. Note this is NOT a derived variable; it is just a defined column location. 5. Ability to turn off SEER*Stat's select only malignant behavior feature (check box). 6. Ability to specify which of three behavior variables is used with SEER*Stat's select only malignant behavior feature. 7. Added a State-county variable to the NAACCR 10.1 .dd. 8. Changed the representation of State-county in SEER*Prep's output files so SEER*Stat's "add all-using underlying data values" feature provides FIPS codes. 9. Added additional counties to Virginia in the State-county variable. Also changed a handful of other counties to match the SEER provided U.S. populations. See www.seer.cancer.gov/popdata. 10. Added COD recode derived variables to the incidence .dds. 11. Added NAACCR item numbers to the description of every applicable variable in the NAACCR 10.1 .dd. 9/17/2003 - Version 2.1 1. User can specify that the input case data are sorted to enable SEER*Stat's person selection features. 2. Support for SEER*Stat's prevalence session. 3. More user-specified variables. 4. Support for user-specified variables as population variables. 5. Switch to a single generic population file format. 6. User can hide required variables. 7. Age recode with single ages and 85+. 8. User can specify which SEER*Stat sessions are applicable for the database being created. 9. Ability to include totals and subtotals when inputting a user- specified variable's format. 10. Users can select a category for user-specified variables. 11. Ability to denote whether a user-specified variable should not be allowed for the generation of age-adjusted rates. 12. Improved version checking between the software and .dds. 13. Addition of options for Race recode Y and Race recode Z/Origin recode. 6/3/2003 - Version 2.0 1. Interface changes allowing the user to choose race and age variables to link to the populations; this reduced the number of .dds. 2. SEER's new logic for Site and COD recode. See www.seer.cancer.gov/siterecode and codrecode. ICD-O-3 is used for the creation of Site recode if present, else ICD-O-2 is used. 3. NAACCR 9.1 .dds which include ICD-O-3 histology and behavior. 4. Added a version number to the .dds and checking by the system. 5. Added Alaska "counties" to the mortality .dd. 12/4/2002 - Version 1.9.1 - Bug fix for the creation of derived variables ICCC site recode and SEER modified ICCC site recode. Specifically, if the input file contained all blanks for histology, SEER*Prep crashed when attempting to create either of these. 6/10/2002 - Version 1.9 - Derived variables ICCC site recode and SEER modified ICCC site recode. 2/7/2002 - Version 1.8 - Logic changes to Site recode (incidence) and Cause of death recode (mortality). Creation of Age recode with <1 year olds for use with 19-age groups standard populations. Additions to SEER and NAACCR .dds to take advantage of SEER*Stat's cause- specific survival features. "Comments" in the .dds so users can take advantage of SEER*Stat's multiple primary (person selection) features. Expected rates can now be merged with incidence data by any number of user-specified variables. Year and age are the only two variables required in an expected rate table. Numerous updates to the .dds to better support SEER*Stat 4.x. 2/22/2001 - Version 1.7 - Ability to rename user-specified variables, an unknown county for each state in the mortality .dds, and several enhancements when utilizing a user-supplied .zds. 2/8/2001 - Version 1.6 - Inclusion of Alzheimers to COD recode. Generation of "old style" indices which take less disk space and consume more memory to generate. Bug fix to ICD-10 support where some invalid values caused an exception and others misrepresented themselves as a valid code. These were codes like all numbers or those starting with lower case letter. 1/10/2001 - Version 1.5 - Support for ICD-10 and the ability to generate Cause of death recode from ICD-10. Correction to the definition of basal and squamous cell skin in the generation of Site recode. 6/20/2000 - Version 1.1 - Support for standard populations and expected survival rate (life) tables. Output file and directory name changes so it is easier to share created databases with colleagues. 3/30/1999 - Version 1.0 - Minor fixes over Beta 5. Conversion sections removed from .dd files which makes SEER*Stat 2.0 a requirement. 2/22/1999 - Version 1.0 (Beta 5) - Bug fix for numeric values occurring in Addr at DX--state variable of NAACCR .dds. 2/10/1999 - Version 1.0 (Beta 4) - Bug fixes, treatment variables in the NAACCR .dds, and additions to the .dds for SEER*Stat 2.0. Fixed conversion in NAACCR .dd for Histologic type. 9/23/1998 - Version 1.0 (Beta 3) - Added feature to update a database and improved wording on CR and LF exception. Fixed problem that caused creations to take longer towards the end of processing. Added support for gzipped (.txd.gz) input files and support for mortality data. 6/22/1998 - Version 1.0 (Beta 2) - Performance enhancements of 40-70%. Derived variable, Race recode A no longer dependent on Hispanic origin variable. 5/26/1998 - Version 1.0 (Beta 1) - Initial release. ----------------- TECHNICAL SUPPORT ----------------- Email: seerprep@imsweb.com Web: seer.cancer.gov/seerprep