GenPept - Format Enhancement

With the next full release of GenPept (141) to coincide with the next full release of GenBank(141) on ~April 15, 2004, a number of new record types will be added to enhance the data content of GenPept.

New Types:

Version A compound identifier consisting of the GenPept Locus and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the peptide sequence. Mandatory keyword/exactly one record.



Keywords Short phrases describing gene products and other information, taken directly from the corresponding GenBank entry. Mandatory keyword in all annotated entries/one or more records.


Source Common name of the organism or the name most frequently used in the literature. Mandatory keyword in all annotated entries/one or more records/includes one subkeyword.


PI Isoelectric point. Mandatory keyword/exactly one record.

Comment/NucGI GI of corresponding nucleotide entry

The LOCUS line will contain new additional information: Number of amino acids, GB division, date.

Detailed format for the LOCUS line:

Positions    Contents
---------    --------
01-05        'LOCUS'
06-12        spaces
13-25        GenPept Locus name
26-26        space
27-35        GenBank Locus name
36-40        Length of peptide sequence
41-41        space
42-43        'aa'
44-47        spaces
48-50        'PEP'
51-55        spaces
56-61        'linear'
62-64        spaces
65-67        GenBank division code
68-68        space
69-79        Date, in format dd-mmm-yyyy


Below is an example of the old format followed by the new format of the
reference section of an entry:


OLD:
1-------10--------20--------30--------40--------50--------60--------70------78
LOCUS       X76706_1 [A15H9FIB]
DEFINITION  Adenovirus type 15H9 (Morrison) fibre gene, nonenveloped DNA.
DATE        29-JAN-1996
ACCESSION   X76706
ORGANISM    Human adenovirus type 15
            Viruses; dsDNA viruses, no RNA stage; Adenoviridae; Mastadenovirus.
COMMENT     CDS  50..1138
            /gene="fiber gene"
            /product="fiber protein"
            /protein_id="CAA54127.1"
            /db_xref="GI:436055"
            /db_xref="GOA:P36846"
            /db_xref="Swiss-Prot:P36846"
WEIGHT      39420
LENGTH      362
ORIGIN      Translated using phase 1
1-------10--------20--------30--------40--------50--------60--------70------78


NEW:
1-------10--------20--------30--------40--------50--------60--------70------78
LOCUS       X76706_1      A15H9FIB   362 aa    PEP     linear   VRL 29-JAN-1996
DEFINITION  Adenovirus type 15H9 (Morrison) fibre gene, nonenveloped DNA.
DATE        29-JAN-1996
ACCESSION   X76706
VERSION     X76706_1.1  GI:436055
KEYWORDS    fiber gene; fiber protein.
SOURCE      Human adenovirus type 15
ORGANISM    Human adenovirus type 15
            Viruses; dsDNA viruses, no RNA stage; Adenoviridae; Mastadenovirus.
COMMENT     CDS 50..1138
            /gene="fiber gene"
            /product="fiber protein"
            /protein_id="CAA54127.1"
            /db_xref="GI:436055"
            /db_xref="GOA:P36846"
            /db_xref="Swiss-Prot:P36846"
            /NucGI="436054"
WEIGHT      39419.48
PI          6.03
LENGTH      362
ORIGIN      Translated using phase 1
1-------10--------20--------30--------40--------50--------60--------70------78

ABCC GenPept is available from ftp://ftp.ncifcrf.gov/pub/genpept.

If you have questions or comments please contact: Gary Smythers.

GenPept(R) and GenBank(R) are registered trademarks of the U.S. Department of Health and Human Services for the GenBank Gene Products and the GenBank GeneticSequence Data Banks.