GenPept - Format Enhancement
With the next full release of GenPept (141) to coincide with the next full release of GenBank(141) on ~April 15, 2004, a number of new record types will be added to enhance the data content of GenPept.New Types:
Version A compound identifier consisting of the GenPept Locus and a numeric version number associated with the current version of the sequence data in the record. This is followed by an integer key (a "GI") assigned to the peptide sequence. Mandatory keyword/exactly one record.Keywords Short phrases describing gene products and other information, taken directly from the corresponding GenBank entry. Mandatory keyword in all annotated entries/one or more records.
Source Common name of the organism or the name most frequently used in the literature. Mandatory keyword in all annotated entries/one or more records/includes one subkeyword.
PI Isoelectric point. Mandatory keyword/exactly one record.
Comment/NucGI GI of corresponding nucleotide entry
The LOCUS line will contain new additional information: Number of amino acids, GB division, date.
Detailed format for the LOCUS line:
Positions Contents --------- -------- 01-05 'LOCUS' 06-12 spaces 13-25 GenPept Locus name 26-26 space 27-35 GenBank Locus name 36-40 Length of peptide sequence 41-41 space 42-43 'aa' 44-47 spaces 48-50 'PEP' 51-55 spaces 56-61 'linear' 62-64 spaces 65-67 GenBank division code 68-68 space 69-79 Date, in format dd-mmm-yyyy Below is an example of the old format followed by the new format of the reference section of an entry: OLD: 1-------10--------20--------30--------40--------50--------60--------70------78 LOCUS X76706_1 [A15H9FIB] DEFINITION Adenovirus type 15H9 (Morrison) fibre gene, nonenveloped DNA. DATE 29-JAN-1996 ACCESSION X76706 ORGANISM Human adenovirus type 15 Viruses; dsDNA viruses, no RNA stage; Adenoviridae; Mastadenovirus. COMMENT CDS 50..1138 /gene="fiber gene" /product="fiber protein" /protein_id="CAA54127.1" /db_xref="GI:436055" /db_xref="GOA:P36846" /db_xref="Swiss-Prot:P36846" WEIGHT 39420 LENGTH 362 ORIGIN Translated using phase 1 1-------10--------20--------30--------40--------50--------60--------70------78 NEW: 1-------10--------20--------30--------40--------50--------60--------70------78 LOCUS X76706_1 A15H9FIB 362 aa PEP linear VRL 29-JAN-1996 DEFINITION Adenovirus type 15H9 (Morrison) fibre gene, nonenveloped DNA. DATE 29-JAN-1996 ACCESSION X76706 VERSION X76706_1.1 GI:436055 KEYWORDS fiber gene; fiber protein. SOURCE Human adenovirus type 15 ORGANISM Human adenovirus type 15 Viruses; dsDNA viruses, no RNA stage; Adenoviridae; Mastadenovirus. COMMENT CDS 50..1138 /gene="fiber gene" /product="fiber protein" /protein_id="CAA54127.1" /db_xref="GI:436055" /db_xref="GOA:P36846" /db_xref="Swiss-Prot:P36846" /NucGI="436054" WEIGHT 39419.48 PI 6.03 LENGTH 362 ORIGIN Translated using phase 1 1-------10--------20--------30--------40--------50--------60--------70------78
ABCC GenPept is available from ftp://ftp.ncifcrf.gov/pub/genpept.
If you have questions or comments please contact: Gary Smythers.
GenPept(R) and GenBank(R) are registered trademarks of the U.S. Department of Health and Human Services for the GenBank Gene Products and the GenBank GeneticSequence Data Banks.