Understanding NEIBank Data

  • EST clones are identified and clustered using a process called GRIST (GRouping and Identification of Sequence Tags).

  • Where possible, each cluster is named for both GenBank and Unigene entries. Discrepancies in names often reflect errors in Unigene clustering. Both names are shown, color-coded, in the Descriptions column and links to both databases are provided. The GC button runs a search of the GeneCards database for each entry.

  • If GenBank and Unigene names are not available, clones are identified with matches against individual ESTs in the dbEST section of GenBank and by similarity in predicted protein sequence.

  • Unidentified clones may be novel or may have failed identification criteria, perhaps because of sequence quality or length.

  • Chromosome location data, shown in the Chromosome column, are currently extracted from Unigene. Direct extraction from Genome sequence builds is being developed.

  • The number of clones is shown. Each entry in the # Clones column links to the clone names and from there to their sequences.

  • Keyword annotation is in progress. At present clones can be listed by chromosome location or by simple text word search through the Search function.

  • Full descriptions of the clustering, library synthesis etc are being prepared for publication.