National Program of Cancer Registries (NPCR)
Link Plus Features and Future Plans

  • Features and Future Plans
  • Technical Information and Installation
  • Technical Support

  • Version 2.0 — New Features

    • Improved file import process.
    • Enhanced support for deduplication linkage.
    • Ability to use nicknames in the First Name matching method.
    • New and powerful manual review and file export functions.
    • Improved context-sensitive and online Help.


    • Supports North American Association of Central Cancer Registries (NAACCR) file, fixed width file, delimited file, and CRS Plus database.
    • Computes probabilistic record linkage scores based on the theoretical framework developed by Fellegi and Sunter. (Fellegi IP, Sunter AB. A theory for record linkage. Journal of the American Statistical Association 1969;64:1183–1210).
    • Handles missing values of matching variables by treating null or empty values as missing data automatically, and allows the user to indicate additional values to treat as missing data.
    • Facilitates a simple and efficient blocking mechanism ("OR blocking") by indexing the variables for blocking and comparing the pairs with the identical values on at least one variable.
    • Offers a choice of two phonetic coding systems (Soundex and NYSIIS), as well as several variable-specific matching methods that find partial, approximate, or fuzzy matches.
    • Provides the following matching methods, or comparators (in addition to the exact matching method, several approximate matching methods find partial, approximate, or fuzzy matches, and are customized for the content of specific data items or types):
      • Value-specific (frequency-based): Sets weights for matching values based on the frequencies of values in the files being compared. A match on a frequent value is associated with a low weight, but a match on a rare value is associated with a high weight.
      • Last name and first name: Incorporates both partial matching and value-specific matching and NYSIIS phonetic code to account for minor typographical errors, misspellings, and hyphenated names. For first names, nicknames are matched with formal names.
      • Middle name: Accounts for occurrence of the middle initial versus the full middle name.
      • Date: Incorporates partial matching on separate date components, and accounts for transposition of date components, as well as missing month or day values.
      • Social security number: Accounts for typographical errors and transposition of digits. Also matches a 9-digit number in one file with a 4-digit number in another file.
      • Generic string: Uses an edit distance function and incorporates partial matching to account for typographical errors.
      • ZIP code: Enables the match between a 9-digit ZIP code and a 5-digit ZIP code.

    Future Plans

    • Perform professional usability study to maximize interface user-friendliness and effectiveness.
    • Develop a Phone Number matching method to handle a partial match on the last seven digits.
    • Develop an Address matching method.
    • Provide frequency distribution of variables to help users identify missing values.
    • Convert to .NET.

    The Link Plus Development Priority List is a list of development tasks prioritized by the NPCR Registry Plus development team. Each task is the direct result of meetings with the Registry Plus User Group (RPUG) as well as requests from individual cancer registries and leaders in the cancer registry field.

    Link Plus Development Priority List (updated October 24, 2008)
    Completed Tasks
    Beta testing for version 2.0
    Updated the help file for version 2.0
    Released version 2.0
    Added the feature that allows the keyboard to be used for manual reviews
    Added the feature that allows double review of clericals from a given link so they can be compared and resolved into a final review file
    Added the option of specifying the path and file name for the linkage report
    Added the menu item Summary to the Tools menu to summarize the current view
    Allowed users to view and work on the uncertain matches only (allowed users to use the sorting, finding, and other features continually on unhidden records)
    Allowed users to add variables that are not on the linkage report for clerical review (redesigned the interface that enables users to select additional fields for clerical review and specify whether two variables, one from file 1 and another from file 2, can occupy the same column or separate columns)
    Added the toolbar to the configuration and manual review forms; added Help, Save, Close, New View, Restore View, Pair View, Find, Display All, Display Only Certain Matches, and Summary icons
    Added a contextual (right click) menu that enables users to change a column format, hide, and unhide a column
    Loaded the clerical review form in the default column order (SSN, birthday, first name, middle name, last name)
    Added exception handling for manual review when the configuration file is missing
    Provided the New York State Identification and Intelligence System (NYSIIS)
    Allowed, in an un-duplication procedure, multiple records of a single individual to be grouped into a single "set" instead of allowing only pairs
    Provided String Edit Distance comparator for long strings
    Added a feature that allows users to run Link Plus in batch mode
    Refined the date comparator
    Refined name comparators by accommodating the NYSIIS
    Used a picklist instead of radio buttons to select matching methods; dropped the requirement of field type Date from the import data forms for File 1 and allowed users to import field names from the first row of File 1 so the user interface for data importing is the same for File 1 and File 2
    Added support for NAACCR 9, 10, and 11
    Allowed users to specify the range of linkage ID for manual review when the linkage report includes more than 30,000 records
    Allowed users to copy and paste the Social Security Number on both the list and pair views
    Fixed the problem that caused Link Plus to crash when a screen saver is activated
    Released the latest beta version on our FTP site
    Refined Social Security Number match method to handle a partial match on the last four digits
    Developed and implemented algorithms with efficient memory allocation to perform record linkage for large data files
    Refined name-matching methods to handle names with embedded spaces
    Refined name matching methods to apply partial matching algorithms to swapped names
    Used a function in C++ to create a new view
    Used a function in C++ to export the results of manual review
    Users can use their own name frequency files for computing linkage scores
    For external linkages, users can choose whether to write all comparison pairs, or just comparison pairs with the highest score to linkage reports (many-to-many linkage)
    Included record number in exported files
    Remaining Task Priority Percentage
    Allow users to assign match status by scores without overwriting the existing match status 1 0%
    Export NAACCR format file 2 0%
    Provide nonmatch report after manual review 3 0%
    Allow CRS Plus users to select additional variables for manual review 4 0%
    Add the feature of transporting view files that allow multiple users to do manual reviews on different computers 5 0%
    Implement the address comparison function in C++ 6 50%
    Write a paper about Link Plus 7 5%
    1. Literature review   80%
    2. Description of Link Plus (features, functionalities, and GUI)   5%
    3. The theory behind the Link Plus   5%
    4. The evaluation Link Plus using realistic data   0%
    5. The applications of Link Plus   0%
    Implement the ZIP Code comparison function in C++ 8 10%
    Refine name comparators using the frequencies of names by sex 9 90%
    Professional usability evaluation for interface standardization 10 20%
    Provide phone number comparator 11 0%

