Understanding NEIBank Data

EST clones are identified and clustered using a process called GRIST (GRouping and Identification of Sequence Tags).

Where possible, each cluster is named for both GenBank and Unigene entries. Discrepancies in names often reflect errors in Unigene clustering. Both names are shown, color-coded, in the Descriptions column and links to both databases are provided. The GC button runs a search of the GeneCards database for each entry.

If GenBank and Unigene names are not available, clones are identified with matches against individual ESTs in the dbEST section of GenBank and by similarity in predicted protein sequence.

Unidentified clones may be novel or may have failed identification criteria, perhaps because of sequence quality or length.

Chromosome location data, shown in the Chromosome column, are currently extracted from Unigene. Direct extraction from Genome sequence builds is being developed.

The number of clones is shown. Each entry in the # Clones column links to the clone names and from there to their sequences.

Keyword annotation is in progress. At present clones can be listed by chromosome location or by simple text word search through the Search function.

Full descriptions of the clustering, library synthesis etc are being prepared for publication.