Annotation Examples
mRNA sequence
Relevant feature information for a mRNA (cDNA) sequence encoding a protein:
- coding region intervals, including start and stop codons
- protein name
- gene name, if available
- amino acid sequence, if available
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Homo sapiens prolidase (PEPD) mRNA, complete cds.
source 1..1888
/organism="Homo sapiens"
/chromosome="19"
/map="19q12-q13.2"
/cell_type="fibroblasts"
gene 1..1888
/gene="PEPD"
CDS 17..1498
/gene="PEPD"
/EC_number="3.4.13.9"
/note="imidodipeptidase"
/product="prolidase"
Prokaryotic gene
Relevant feature information for a prokaryotic genomic sequence encoding a protein:
- coding region intervals, including start and stop codons, if present
- protein name
- gene name, if known
- amino acid sequence, if known
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Escherichia coli RecA protein (recA) gene, complete cds.
source 1..3300
/organism="Escherichia coli"
/strain="K-12"
gene 783..1961
/gene="recA"
CDS 783..1961
/gene="recA"
/function="DNA repair protein"
/product="RecA protein"
Eukaryotic gene
Relevant feature information for a eukaryotic genomic sequence encoding a protein:
- coding region intervals, including start and stop codons, if
present, and all exon intervals
- protein name
- gene name, if known
- amino acid sequence, if known
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Caenorhabditis elegans tyrosine kinase PTK-2 (ptk-2) gene, complete cds.
source 1..3180
/organism="Caenorhabditis elegans"
gene 211..3011
/gene="ptk-2"
mRNA join(211..288,533..703,763..890,940..1024,
1084..1380,1838..1962,2018..2099,2301..3011)
/gene="ptk-2"
/product="protein kinase PTK-2"
CDS join(250..288,533..703,763..890,940..1024,
1084..1380,1838..1962,2018..2099,2301..2456)
/gene="ptk-2"
/product="protein kinase PTK-2"
rRNA and/or ITS
Relevant feature information for a genomic sequence containing structural RNAs and/or spacers:
- names of any structural RNAs (eg, tRNA-Ile, 16S ribosomal
RNA)
- names of any spacer regions (eg, internal transcribed spacer 1,
16S/23S intergenic spacer)
- nucleotide spans of each of the above features, if known
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Saccharomyces cerevisiae 18S ribosomal RNA gene, partial sequence; internal transcribed spacer 1, 5.8S ribosomal RNA gene and internal transcribed spacer 2, complete sequence; and 28S ribosomal RNA gene, partial sequence.
source 1..540
/organism="Saccharomyces cerevisiae"
/strain="UMD 334"
rRNA <1..5
/product="18S ribosomal RNA"
misc_RNA 6..178
/product="internal transcribed spacer 1"
rRNA 179..377
/product="5.8S ribosomal RNA"
misc_RNA 378..519
/product="internal transcribed spacer 2"
rRNA 520..>540
/product="28S ribosomal RNA"
Promoter region
Relevant feature information for promoter, genomic 5' flanking sequence, or genomic 3' flanking sequence:
- protein or gene name for the sequence to which the promoter or
flanking region belongs
- intervals of any transcribed regions or coding regions, if present
on the sequence
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Homo sapiens enhancer-binding protein 2 (EBP2) gene, promoter region and partial cds.
source 1..3061
/organism="Homo sapiens"
/chromosome="15"
/map="15q13"
/cell_line="H441"
/tissue_type="lung"
gene 1..>3061
/gene="EBP2"
promoter 1..2947
/gene="EBP2"
TATA_signal 2918..2923
/gene="EBP2"
mRNA 2948..>3061
/gene="EBP2"
/product="enhancer-binding protein 2"
5'UTR 2948..3010
/gene="EBP2"
CDS 3011..>3061
/gene="EBP2"
/product="enhancer-binding protein 2"
Viral sequence
Relevant feature information for a viral sequence:
- include strain, serotype, host, country, and collection_date when known
- coding region intervals, including start and stop codons, if present
- protein name
- gene name, if known
- amino acid sequence, if known
- if no coding region is present, other description of the sequence
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Human adenovirus 3 strain RKI-4263/07 hexon (H) gene, partial cds.
source 1..1520
/organism="Human adenovirus 3"
/mol_type="genomic DNA"
/strain="RKI-4263/07"
/serotype="3"
/host="Homo sapiens"
/db_xref="taxon:45659"
/country="Germany"
/collection_date="Apr-2007"
gene <1..>1520
/gene="H"
CDS <1..>1520
/note="major capsid protein"
/codon_start=1
/product="hexon"
HIV-1
Relevant feature information for an HIV-1 sequence:
- name of the country from which the virus was isolated
- clone and isolate information
AND
- coding region intervals, including start and stop codons, if
present
- protein names
- gene names, if known
- amino acid sequences, if known
OR
- if no coding region is present, other description of the sequence
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
HIV-1 isolate X clone 5601 from USA, complete genome.
source 1..9720
/organism="Human immunodeficiency virus type 1"
/clone="5601"
/isolate="X"
/country="USA"
LTR 1..634
gene 789..2291
/gene="gag"
CDS 789..2291
/gene="gag"
/product="gag protein"
gene 2084..5095
/gene="pol"
CDS 2084..5095
/gene="pol"
/product="pol protein"
gene 5040..5618
/gene="vif"
CDS 5040..5618
/gene="vif"
/product="vif protein"
gene 5558..5848
/gene="vpr"
CDS 5558..5848
/gene="vpr"
/product="vpr protein"
gene 5829..8476
/gene="tat"
CDS join(5829..6043,8386..8476)
/gene="tat"
/product="tat protein"
gene 5968..8660
/gene="rev"
CDS join(5968..6043,8386..8660)
/gene="rev"
/product="rev protein"
gene 6060..6305
/gene="vpu"
CDS 6060..6305
/gene="vpu"
/product="vpu protein"
gene 6223..8802
/gene="env"
/pseudo
gene 8804..9070
/gene="nef"
CDS 8804..9070
/gene="nef"
/product="nef protein"
LTR 9086..9719
polyA_signal 9612..9617
Influenza viruses
Relevant feature information for Influenza sequences:
- properly formatted strain identifier. Example: A/chicken/India/1234/2010
- name of the country from which the virus was isolated
- collection date, including month and day if known
- serotype for Influenza A viruses
- host
AND
- coding region intervals, including start and stop codons and exons, if present
- protein names
- gene names
For Influenza A and B submissions, use the Influenza Virus Resource
Annotation webtool to create a feature table:
http://www.ncbi.nlm.nih.gov/genomes/FLU/Database/annotation.cgi
Example:
Influenza A virus (A/Wisconsin/28/2011 (H1N1)) segment 8 nuclear export protein (NEP) and nonstructural protein 1 (NS1) genes, complete cds.
source 1..864
/organism="Influenza A virus (A/Wisconsin/28/2011(H1N1))"
/mol_type="viral cRNA"
/strain="A/Wisconsin/28/2011"
/serotype="H1N1"
/host="Homo sapiens"
/segment="8"
/country="USA"
/collection_date="01-Dec-2011"
/note="C1 passage(s)"
gene 1..838
/gene="NEP"
/gene_synonym="NS2"
CDS join(1..30, 503..838)
/gene="NEP"
/note="nonstructural protein 2"
/product="nuclear export protein"
gene 1..660
/gene="NS1"
CDS 1..660
/gene="NS1"
/product="nonstructural protein 1"
Transposon or insertion sequence
Relevant feature information for transposons or insertion sequences:
- specific name of the transposon or IS, if available
- nucleotide spans corresponding to the transposon/IS
Optional:
- name and nucleotide intervals of any host gene/product disrupted
by the transposon/IS
- name and nucleotide intervals of any gene/product in the
transposon/IS (eg, transposase)
- nucleotide spans any other features (LTRs, repeat regions)
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Bacillus subtilis strain RS2 transposon BLT transposase (tnpA) gene, complete cds
source 1..1221
/organism="Bacillus subtilis"
/strain="RS2"
repeat_region 21..1127
/rpt_type="dispersed"
/mobile_element="transposon: BLT"
repeat_region 21..61
/rpt_type=inverted
gene 128..1034
/gene="tnpA"
CDS 128..1034
/gene="tnpA"
/product="transposase"
repeat_region 1085..1127
/rpt_type=inverted
Microsatellite sequence
Relevant feature information for a microsatellite sequence:
- unique microsatellite/clone name for each sequence
- interval of any repeat region(s) within the microsatellite sequence,
if known
- are these considered STS sequences?
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example #1:
Chorthippus parallelus clone IIB-G5 microsatellite sequence.
source 1..288
/organism="Chorthippus parallelus"
/mol_type="genomic DNA"
/db_xref="taxon:37639"
/clone="IIB-G5"
repeat_region 1..288
/rpt_type=tandem
/satellite="microsatellite"
Example #2:
Noturus exilis voucher KU 40271 microsatellite Noex254 sequence.
source 1..556
/organism="Noturus exilis"
/mol_type="genomic DNA"
/specimen_voucher="KU 40271"
/db_xref="taxon:61323"
/clone="Noex_02_03_H06"
/PCR_primers="fwd_seq: catgtttgcacaaagggaaa, rev_seq:
atgtggatgcagattgtgga"
repeat_region 77..100
/rpt_type=tandem
/rpt_unit_range=77..100
/rpt_unit_seq="ca"
/satellite="microsatellite:Noex254"
Repeat regions
Relevant feature information for sequences containing repeat regions:
- repeat region intervals
- repeat family, if known (eg, Alu, Mer)
- repeat type (tandem, inverted, flanking, terminal, direct, dispersed,
or other)
- repeat unit description/intervals, if region contains more than one
repeat
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Homo sapiens repeat regions
source 1..2050
/organism="Homo sapiens"
/chromosome="6"
/map="6q25"
repeat_region 8..126
/rpt_type=dispersed
/rpt_family="B2"
repeat_region 197..344
/rpt_type="direct"
/rpt_unit="197..220"
repeat_region 389..673
/rpt_family="AluSx"
/rpt_type=dispersed
repeat_region 847..876
/rpt_type="tandem"
/rpt_unit="ca"
/satellite="microsatellite:BT21"
Pseudogene
Relevant feature information for a pseudogene sequence:
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Mus musculus DNA methyltransferase (Dmt1) pseudogene, complete sequence.
source 1..2131
/organism="Mus musculus"
/strain="SvJ/129"
gene 123..1444
/gene="Dmt1"
/note="DNA methyltransferase 1"
/pseudo
Translocation and/or fusion protein
Relevant feature information for a sequence resulting from a chromosomal translocation:
- nucleotide location of the translocation breakpoint, if known
- map information for the translocation breakpoint (e.g.,
t(18;X)(q11.2;p11.2)
if the translocation results in a fusion protein, please include:
- coding region intervals, including start and stop codons, if
present
- protein name
- amino acid sequence, if known
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Homo sapiens SYT/SSX4 fusion protein mRNA, complete cds.
source 1..2935
/organism="Homo sapiens"
/tissue_type="sarcoma"
/map="t(18;X)(q11.2;p11.2)"
source 1..1242
/organism="Homo sapiens"
/chromosome="18"
/map="18q11.2"
CDS 1..1479
/product="SYT/SSX4 fusion protein"
source 1243..2935
/organism="Homo sapiens"
/chromosome="X"
/map="Xp11.2"
3'UTR 1480..2935
Cloning vector
Relevant feature information for a cloning vector
- unique name for the cloning vector
Optional:
- coding region intervals, including start and stop codons
- protein names, gene names
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Cloning vector pRB223, complete sequence
source 1..4361
/organism="Cloning vector pRB223"
gene 86..1276
/gene="tet"
CDS 86..1276
/gene="tet"
/product="tetracycline resistance protein"
RBS 1905..1909
/note="Shine-Dalgarno sequence"
rep_origin 2535
gene complement(3293..4194)
/gene="bla"
CDS complement(3293..4153)
/gene="bla"
/product="beta-lactamase"
misc_feature 4069..4125
/note="multiple cloning site"
RBS complement(4161..4165)
/gene="bla"
/note="Shine-Dalgarno sequence"
promoter complement(4188..4194)
/gene="bla"
Gapped sequence
A gapped sequence includes both known, directly sequenced data and
unknown data. The unknown sections of sequence are represented by strings of
'nnn' between the known, directly sequenced, contiguous data. All pieces of
a gapped sequence must be from the same source and be in the same
orientation and in the correct order.
Relevant feature information for a gapped sequence:
- if a gap length is estimated, insert the equivalent number of nnns between
the directly determined, contiguous sections of sequence
- if the gap length is unknown, insert a string of 100 nnns to represent the
gap between the sections of sequence
- add a misc_feature for each gap with a /note qualifier to describe it
as either 'gap of unknown length' or 'gap of estimated length, # nts'
- add all other appropriate features (exons, introns, CDS, gene, etc)
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
Example:
Homo sapiens MHC class I antigen (HLA-B) gene, HLA-B_458_01445 allele, exons 2, 3 and partial cds.
source 1..788
/organism="Homo sapiens"
/mol_type="genomic DNA"
/db_xref="taxon:9606"
gene <1..>788
/gene="HLA-B"
/allele="HLA-B_458_01445"
mRNA join(<1..270,513..>788)
/gene="HLA-B"
/allele="HLA-B_458_01445"
/product="MHC class I antigen"
CDS join(<1..270,513..>788)
/gene="HLA-B"
/allele="HLA-B_458_01445"
/codon_start=3
/product="MHC class I antigen"
/protein_id="ACR38915.1"
/db_xref="GI:238055051"
/translation="SHSMRYFDTAMSRPGRGEPRFISVGYVDDTQFVRFDSDAASPRE
EPRAPWIEQEGPEYWDRNTQIFKTNTQTDRESLRNLRGYYNQSEAGSHTLQSMYGCDV
GPDGRLLRGHDQSAYDGKDYIALNEDLRSWTAADTAAQITQRKWEAARVAEQDRAYLE
GTCVEWLRRYLENGKDTLERA"
exon 1..270
/gene="HLA-B"
/allele="HLA-B_458_01445"
/number=2
gap 271..512
/estimated_length=242
exon 513..788
/gene="HLA-B"
/allele="HLA-B_458_01445"
/number=3
Phylogenetic or population set
Relevant feature information for population or phylogenetic studies:
A set comprises a group of sequences that represent the same gene or locus
in different organisms or in different isolates, strains, or clones of the
same organism. A set can be, for example, phylogenetic (different organisms), population (same organism), or environmental (unclassified or unknown organisms).
- unique descriptive information for each sequence (eg, clone, strain,
isolate, or organism names)
- creating a set will allow the sequences to be retreivable by Entrez PopSet
as a group.
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
EST submissions
Please submit directly to
dbEST: the EST division of GenBank.
GSS submissions
Please submit directly to
dbGSS: the GSS division of GenBank.
STS submissions
Relevant feature information for STS submissions:
- submit directly to dbSTS:
the STS division of GenBank
OR
- submit using BankIt and provide:
- chromosome and/or specific map locations
- clone name
- clone library [catalog number, reference, lab source, and/or
specific (in-house) name or number]
- PCR conditions and primer binding sites
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.
HTGS submissions
Requirements for HTGs submissions:
FLICs submissions
Relevant feature information for FLIC submissions:
- explicit labeling as FLICs
Optional:
- protein name
- gene name
- CDS intervals, including start/stop codons
We strongly suggest that you provide as much of the above information
as possible to ensure the most complete annotation of your sequence.
If any of this information is not known, please inform us.