SITE MAP
Back
Submission Examples
|
|
BankIt Help - Index
Getting Started with BankIt    (Back/Index)
A BankIt submission involves seven easy steps:
- Confirm your sequence is not an update or a duplicate of
a previous submission by you.
- Enter sequence length and press the 'New' button.
- Complete an initial form with general information about the
submitters and sequence, and enter the DNA sequence data.
- Press the 'Validate and Continue' button.
- View the draft of the GenBank record.
- If any errors appear in RED, correct the errors. Press the 'Validate and Continue' button.
- If any errors appear in BLUE, attempt to correct the errors. Press the 'Validate and Continue' button. If you think the submission is correct, but you still get BLUE errors, press the 'Review and Submit' button.
- Specify the number and types of biological features you want
to annotate on the record (e.g., add 1 CDS feature and 1 tRNA feature),
and press the 'Modify Submission' button to return to the form.
There, you can enter the details about each feature,
including protein translations.
- Repeat steps 3 and 4 until the record is complete.
- View the final draft by pressing the 'Review and Submit' button.
- As the final step, press the 'Submit to Genbank' button.
General Submission Information     (Back/Index)
Multiple Submissions Information
If you are submitting more than one sequence at this time,
number each sequence and indicate the total number of sequences
to be submitted so that we can correctly assign consecutive accession numbers to your set.
Contact Page
Enter the name, telephone and fax numbers, and e-mail address of
the person who is submitting the sequence. This is the person who will
be contacted regarding the sequence submission. This person does not
have to be on the list of authors involved in the sequencing (see
below). The phone, fax, and email address will not be visible in the
database record.
Release Information
May we release this record before publication?
Select one of the two radio buttons. If you select "Yes," the
entry will be released to the public after the database staff has
finished processing and added it to the database. If you select "No,"
then select the fields to indicate the date on which the
sequences should be released to the public. The submission will then be
held back by the database staff until formal publication of the
sequence or GenBank accession number, or until the Release Date,
whichever comes first. Sequences must be released when the accession
number or any portion of the sequence is published.
Primary Sequence Data
Are you submitting sequence data that you have determined? If not, and
you are submitting a sequence derived or assembled from data that is
already available to the public (for example, it is downloaded from a
public site and annotated by you or otherwise computationally predicted),
please check the 'No' box in this section.
If your sequence is not primary data, we can accept it for our Third
Party Annotated (TPA) database, which will be part of the International
Nucleotide Sequence Database Collaboration.
In order to submit your sequence, you must provide the accession numbers
of the primary sequence(s) from which you derived your sequence data.
Furthermore, in order to release your sequence publicly in TPA, you must
provide us with the bibliographic information of your published reference
from a peer-reviewed biological journal. The complete reference information
is not required for submission, only for the sequence's public release.
Reference Information     (Back/Index)
Sequence Authors
For each author to be credited for the sequence submission,
list one author per line. Provide the Middle Initial, not the full Middle Name. The Suffix field has a pull down menu that lists the types of acceptable options. This field
is not intended for titles or degrees, such as M.S. or Ph.D.
An example of a correctly entered name would be:
Robert W. Plant, Jr.
In the final GenBank flat file, the Sequence Authors information
will appear in the Direct Submission REFERENCE block.
First Citation Associated with Sequence
If a citation is associated with the sequence, select one of
the three options under Publication Status:
- Unpublished (Submitted) - an article "submitted" for
publication or "in preparation" and not yet officially accepted
by a journal should be entered as Unpublished
- In-Press
- Published
Provide a Citation Title (longer than 1 word) even if the
publication status is Unpublished.
When entering the Citation Authors, follow the instructions
provided for the Sequence Authors in the previous section. The Citation
Authors should contain at least one of the submitters, unless the
reference is to be cited on a feature (e.g., misc_difference, allele,
exon) in the record.
The Journal Title field is to be filled out if the Publication
Status is either In-Press or Published.
The fields for Volume and Pages are to be filled out only if the
Publication Status is Published.
The Year field is to be filled out if the Publication Status is
either In-Press or Published.
In the final GenBank flat file, the Citation information will
appear as a REFERENCE block directly above the Direct Submission
REFERENCE block.
Second citation (if any) associated with sequence
This section is to be filled out only if another citation refers to
this sequence. The instructions for completing this section are
identical to those above for the First citation associated with
sequence.
Source Information     (Back/Index)
Source Organism Name
An appropriate scientific name for the source organism is crucial for
proper taxonomic assignment and genetic code use for conceptual translation
of coding regions. A binomial or trinomial genus species designation is
most useful. If the scientific name of the source organism is not available
in the provided list or is not otherwise known, provide at least a
general description of the source organism (eg, uncultured bacterium) to
assist the taxonomic staff in classifying the source organism. Providing
as much information as known about the source organism will improve its
taxonomic classification and provide database users with important information
about the sequence record. Vectors should be designated as "vector" and
fusion proteins should be designated as "synthetic construct".
Source modifiers
Source modifiers provide valuable additional information, about the source
organism or biomolecule, that help to further define and augment the view of
the biological context pertaining to the sequence submission. In addition to
the legal qualifiers available in the pop-up list of source modifiers,
additional information about the source organism or biomolecule that the
submitter feels is important to include in the record can be entered in the
"Other Source Information" box. If more source modifiers need to be added
than were initially specified press "Add More Modifiers".
Other Source Description
Information about the source genome is very helpful to database users. If the
source genome for the submitted biomolecule was mitochondrial or chloroplast,
check the appropriate box. Otherwise, check the "neither" box (this also
includes products encoded, for example, in the nuclear genome which are
subsequently imported into mitochondria or chloroplasts). If the sequenced
biomolecule is a transposon or insertion sequence indicate this in the
"other" box. The name of the transposon or insertion sequence can be entered
in the "Value" field associated with the "Source Modifier" list (select
transposon or insertion sequence). If the source genome of the biomolecule
was other than nuclear, mitochondrial, or chloroplast, specify in the "other"
box. Genomically integrated viral sequences should be designated as proviral.
Non-integrated viral sequences should be designated as virion.
Input DNA Sequence     (Back/Index)
Molecule Type
This should indicate the type of molecule present
in the cell. If you sequenced a cDNA corresponding to a mRNA, designate
mRNA. If you sequenced an RNA virus (no DNA stage), designate RNA. Most
other sequenced molecules exist as DNA (eg, ribosomal RNA gene, tRNA gene,
protein coding gene, etc.).
Sequence Summary/Definition Information
Enter a descriptive title for your sequence entry. This information
will appear in the DEFINITION line of your preliminary GenBank
flatfile, but may be modified by the GenBank submissions staff to
conform to current GenBank guidelines. Although the general format of
the definition line varies depending on several factors, below are some
examples for several different situations:
- For an mRNA having a complete CDS:
- Genus species product name (optional gene symbol) mRNA, complete cds.
- For an mRNA having a partial CDS:
-
Genus species product name (optional gene symbol) mRNA, partial cds.
- For a genomic record having a complete CDS:
-
Genus species product name (optional gene symbol) gene, complete cds.
- For a genomic record having only one exon and a partial CDS:
-
Genus species product name (optional gene symbol) gene, exon 2 and partial cds.
- For mitochondrian or chloroplast-localized proteins and RNA sequences:
-
Genus species product name (optional gene symbol) gene, complete cds;
[one choice from below].
-
Genus species XXS ribosomal RNA gene, partial sequence;
[one choice from below].
- nuclear gene(s) for mitochondrial product(s)
- nuclear gene(s) for chloroplast product(s)
- mitochondrial gene(s) for mitochondrial product(s)
- chloroplast gene(s) for chloroplast product(s)
- Non-gene (intergenic) chloroplast or mitochondrial sequences:
- Genus species xxx region, chloroplast sequence.
- Genus species xxx region, mitochondrial sequence.
Enter DNA Sequence
- Sequence must have only single letter IUPAC code, raw sequence only.
- Sequence must be at least 50 bp in length (shorter sequences will not
be processed).
- Sequence must be biologically contiguous and not contain any internal
unsequenced spacers.
- Sequence must not be EST. Use dbEST for submitting EST data.
- Avoid submitting sequences with strings of NN's.
- Make sure that your sequence is linker/vector-free, including
the removal of linker sequences beyond the polyA tail of mRNAs.
Failure to adhere to these general requirements could result in a GenBank Accession
Number not being issued, or may cause a delay in the processing of your entry.
Additional Information     (Back/Index)
Use this field to enter:
-
any biological information for which you found no appropriate
place on the Bankit form
-
any special instructions that will help GenBank annotators
process your submission
Sequence features such as coding regions and structural RNAs can be
added after you press "Validate and Continue" (below this field).
On the following page, enter the number of feature types (Coding
Regions, RNA Features, or Other Features) you wish to add and press
"Modify Submission".
Save, Validate and Continue, and Review and Submit Buttons     (Back/Index)
On the BankIt entry page:
"New"
- create a new BankIt submission
- you must first indicate the sequence length in nucleotides,
then press New
"Update"
- modify a previous bankit submission
- you can also request your changes or corrections in the
text of an email to GenBank (gb-admin@ncbi.nlm.nih.gov)
On the bottom of the BankIt submission form:
"Save This Form"
- save the current BankIt form's information to your
local hard drive to use for additional BankIt
submissions
- this can reduce time used to enter common
information (names, address, phone and fax numbers,
etc) in subsequent forms
- to re-load see the Saving Common Data Help below.
"Validate and Continue"
- press this to validate the information
currently entered in the BankIt form and
check for errors or warnings
- errors and warnings are reported at the top
of the BankIt page and are links to the
appropriate sections of the BankIt form
"Review and Submit"
- if all reported errors and warnings have been
corrected or if only warnings remain, press this
button to review a final version of the record
- this button will not appear if any errors are
reported
On the Review Submission Page:
"Modify Submission"
- use this to go back to the BankIt form and modify
the current submission
- enter the number of desired features (Coding
Regions, RNA Features, or Other Features) in the
boxes provided; blank fields for these features
will then be provided on the original BankIt form
"Submit to GenBank"
- final step in the creation and submission of a
BankIt entry
Note: Errors appear at the top of the BankIt Pages in RED, while warnings are
in BLUE. All errors must be corrected before the BankIt submission will be
electronically accepted. Warnings should be addressed, but a BankIt submission
can still be completed if warnings remain.
Coding Regions     (Back/Index)
Coding Regions
Annotating an open reading frame on a sequence with a coding
region interval and gene and/or protein names makes a
submitted seuqence more informative. BankIt allows you to
enter one or more coding regions (CDS) on the sequence
you are submitting.
The CDS is based on
- the nucleotide sequence you submit
- the nucleotide intervals of the CDS, including the start
and stop codons, if present
- the amino acid translation
Along with the nucleotide sequence, you must provide either
the CDS nucleotide intervals or the amino acid translation of
the CDS. BankIt will then either translate the amino acid
sequence from the intervals or attempt to predict the
nucleotide intervals corresponding to the translation you
provide.
Organisms utilizing non-standard genetic codes may not
translate properly. The Source Information section allows you
to select alternative genetic codes, which may be appropriate
for the source organism of your sequence.
If BankIt detects a discrepancy in the conceptual translation
of the CDS intervals, you will receive an error message, which
will describe the problem and give you the option to make the
necessary corrections. If you are unable to correct the error
you may continue with the submission process, and a member of
the GenBank Annotation Staff may contact you regarding your
submission when it is processed.
To add a CDS to a BankIt submission:
- press the Validate and Continue button on the bottom of the
Bankit form
- on the next page, enter the number of CDS features you wish
to add and press the Modify Submission button; Bankit will
bring you back to the original form
- click on the CDS Feature in the Contents list or scroll
down to the CDS Feature box near the bottom of the form
- complete the CDS Feature subsections
Nucleotide Intervals
- mRNAs and intronless genes: enter the first nucleotide
of the start codon and the last nucleotide of the stop codon
in the fields provided.
- Eukaryotic genomic sequences containing exons and introns:
enter the nucleotide spans of each of the exons, including the
start codon, stop codon, and any upstream or downstream exonic
sequence. If an amino acid sequence is entered without
accompanying nucleotide intervals, BankIt will attempt to
determine the exon spans using the univeral splice
acceptor/donor consensus sequences, AG and GT, respectively.
You will be given an opportunity to view this coding region
after you press the Validate and Continue button at the bottom
of the form.
- Single genomic sequences with multiple coding regions:
after pressing the Validate and Continue button the first
time, enter the total number of CDSs to be added to the record
and press the Modify Submissions button. BankIt will create
the indicated number of CDS features on the original BankIt
form; complete these and press Validate and Continue to view
the results
- If the sequence does not include either or both the
start/stop codons, be sure to check the appropriate 5' and/or
3' partial flag(s).
- If the 5' end of a CDS is in the middle of a codon enter the
start nucleotide at position 1, indicate the correct reading
frame: 1, 2, or 3, and mark the 5' partial flag
Amino Acid Sequence (Optional)
- enter the amino acid sequence, if known, in this field using
single letter amino acid abbreviations
- if no amino acid sequence is entered, BankIt will
conceptually translate the given CDS intervals
Protein Information
- the name or description of the protein encoded by the CDS
is required information
- any further description (eg, EC number, function) can be
entered in the Additional Information field
Gene Information
- enter the gene name, allele and/or description in the
available fields
- do not repeat information already given in the Protein
Information fields
- both short and long forms of a gene name should be given
for Drosophila
mRNA Intervals (Optional)
- if the sequence is genomic, enter the intervals of the
corresponding mRNA, extending the 5' and 3' ends beyond the
start and stop codons, respectively, if known
- do not enter intervals if the Molecule Type of the sequence
on which the CDS if found is already noted as mRNA (see
'Molecule Type' under ' Input DNA Sequence,' above)
RNA Features     (Back/Index)
If you have already annotated an mRNA feature as part of the coding region
annotation, there is no need to enter the same information here.
- Click on the RNA type from the list.
Add additional information about the RNA in the box.
- If the feature sequence is on the (-) strand, check the (-) box.
The default is the (+) strand.
- Check the 5' Partial or 3' Partial box if the RNA in the nucleic acid
sequence is missing residues at the 5' or 3' ends, respectively. If a
complete feature sequence is entered, do not check either box.
- Enter the sequence range of the feature. The numbers should correspond to
the nucleotide sequence interval. If the feature spans multiple,
non-continuous intervals on the sequence, indicate the beginning and end
points of each interval. If you require more boxes to enter each interval,
click on "Add More RNA Intervals" and enter the sequence range in
the new boxes.
If the feature is on the (-) strand, enter the interval base
numbers of the (+) strand and be sure to check (-) strand above.
List of RNA Types:
-
premessage RNA
-
Any RNA species that is not yet the mature RNA product; may include 5' clipped
region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon),
intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped
region (3'clip)
-
mRNA
-
messenger RNA; includes 5'untranslated region (5'UTR), coding sequences (CDS,
exon) and 3'untranslated region (3'UTR)
-
tRNA
-
mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the
translation of a nucleic acid sequence into an amino acid sequence
Enter the amino acid transferred by the tRNA in the information box.
-
rRNA
-
mature ribosomal RNA ; the RNA component of the ribonucleoprotein particle
(ribosome) which assembles amino acids into proteins
-
snRNA
-
small nuclear RNA; any one of many small RNA species confined to the nucleus;
several of the snRNAs are involved in splicing or other RNA processing
reactions
-
scRNA
-
small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules
present in the cytoplasm and (sometimes) nucleus of a eukaryote
-
misc_RNA
-
internal transcribed spacer (ITS), any transcript or RNA product that cannot
be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5'clip,
3'clip, 5'UTR, 3'UTR, exon, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA)
Other Features     (Back/Index)
If you have annotation(s) which apply to one or more intervals on a sequence
which are more specific annotation(s) of your protein, gene, or RNA feature(s)
or are NOT protein, gene, or RNA,feature(s), add them here. Most features
are indicated on the nucleotide sequence even if they refer to amino acid
sequence motifs. See additional features.
Add the feature by clicking the feature in the "Choose Feature type" menu.
If you have additional information about the feature, enter it in the
"Feature Selected" box beneath the menu. If the feature is not present in the
menu, enter the it in the "Feature Selected" box.
After you have selected a feature, provide the following information:
(1) If the feature sequence is on the (-) strand, check the (-) box.
The default is the (+) strand.
(2) Check the 5' Partial or 3' Partial box if the feature in your nucleic acid
sequence is missing residues at the 5' or 3' ends, respectively. If a
complete feature sequence is entered, do not check either box.
(3) Enter the sequence range of the feature. The numbers should correspond to
the nucleotide sequence interval. If the feature spans multiple,
non-continuous intervals on the sequence, indicate the beginning and end
points of each interval. If you require more boxes to enter each interval,
click on "Add More Other Feature Intervals" and enter the
sequence range in the new boxes.
If the feature is on the (-) strand, enter the interval base
numbers of the (+) strand and be sure to check (-) strand above.
List of Additional Features:
-
attenuator
-
1) region of DNA at which regulation of termination of transcription occurs,
which controls the expression of some bacterial operons;
2) sequence segment located between the promoter and the first structural gene
that causes partial termination of transcription
-
C_region
-
Constant region of immunoglobulin light and heavy chains, and T-cell receptor
alpha, beta, and gamma chains. Includes one or more exons depending on the
particular chain
-
CAAT_signal
-
CAAT box; part of a conserved sequence located about 75 bp upstream of the
start point of eukaryotic transcription units which may be involved in RNA
polymerase binding; consensus=GG(C or T)CAATCT
-
conflict
-
independent determinations of the "same" sequence differ at this site or region
-
D-loop
-
displacement loop; a region within mitochondrial DNA in which a short stretch
of RNA is paired with one strand of DNA, displacing the original partner DNA
strand in this region; also used to describe the displacement of a region of
one strand of duplex DNA by a single stranded invader in the reaction catalyzed
by RecA protein
-
D_segment
-
Diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain
-
enhancer
-
a cis-acting sequence that increases the utilization of (some) eukaryotic
promoters, and can function in either orientation and in any location (upstream
or downstream) relative to the promoter
-
exon
-
region of genome that codes for portion of spliced mRNA; may contain 5'UTR, all
CDSs, and 3' UTR
*** For exon number, enter the NUMBER ONLY in the comment section. ***
-
GC_signal
-
GC box; a conserved GC-rich region located upstream of the start point of
eukaryotic transcription units which may occur in multiple copies or in either
orientation; consensus=GGGCGG
-
iDNA
-
intervening DNA; DNA which is eliminated through any of several kinds of
recombination
-
intron
-
a segment of DNA that is transcribed, but removed from within the transcript
by splicing together the sequences (exons) on either side of it
For intron number, enter the NUMBER ONLY in the comment section.
-
J_segment
-
Joining segment of immunoglobulin light and heavy chains, and T-cell receptor
alpha, beta, and gamma chains
-
LTR
-
long terminal repeat, a sequence directly repeated at both ends of a defined
sequence, of the sort typically found in retroviruses
-
mat_peptide
-
mature peptide or protein coding sequence; coding sequence for the mature or
final peptide or protein product following post-translational modification.
the location does not include the stop codon (unlike the corresponding CDS).
-
misc_binding
-
site in nucleic acid which covalently or non-covalently binds another moiety
that cannot be described by any other Binding key (primer_bind or protein_bind)
-
misc_difference
-
feature sequence is different from that presented in the entry and cannot be
described by any other Difference key (conflict, unsure, old_sequence, mutation,
variation, allele, or modified_base)
-
misc_feature
-
region of biological interest which cannot be described by any other feature
key
-
misc_recomb
-
site of any generalized, site-specific or replicative recombination event where
there is a breakage and reunion of duplex DNA that cannot be described by other
recombination keys (iDNA and virion) or qualifiers of source key
(/insertion_seq, /transposon, /proviral)
-
misc_signal
-
any region containing a signal controlling or altering gene function or
expression that cannot be described by other Signal keys (promoter, CAAT_signal,
TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer,
attenuator, terminator, and rep_origin)
-
misc_structure
-
any secondary or tertiary structure or conformation that cannot be described
by other Structure keys (stem_loop and D-loop)
-
modified_base
-
the indicated nucleotide is a modified nucleotide and should be substituted
for by the indicated molecule (given in the mod_base qualifier value)
-
N_region
-
Extra nucleotides inserted between rearranged immunoglobulin segments
-
old_sequence
-
the presented sequence revises a previous version of the sequence at this
location
-
polyA_signal
-
recognition region necessary for endonuclease cleavage of an RNA transcript
that is followed by polyadenylation; consensus=AATAAA
-
polyA_site
-
site on an RNA transcript to which will be added adenine residues by post-
transcriptional polyadenylation
*** Remove any linker sequence after the polyA tail. ***
-
prim_transcript
-
primary (initial, unprocessed) transcript; includes 5' clipped region (5'clip),
5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening
sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'
clip)
-
primer_bind
-
Non-covalent primer binding site for initiation of replication, transcription,
or reverse transcription. Includes site(s) for synthetic e.g., PCR primer
elements
-
promoter
-
region on a DNA molecule involved in RNA polymerase binding to initiate
transcription
-
protein_bind
-
non-covalent protein binding site on nucleic acid
-
RBS
-
ribosome binding site
-
repeat_region
-
region of genome containing repeating units; microsatellites are annotated
using this feature.
Try to provide these 3 items:
- repeat type: tandem, dispersed, direct, or inverted
- repeat family: the name of a family, i.e. Alu, LTR, MIR, B1, MER, etc.
- repeat unit: nucleotide sequence repeated
-
repeat_unit
- Use the repeat_region feature and note the specific unit as a modifier
instead. single repeat element
-
rep_origin
-
origin of replication; starting site for duplication of nucleic acid to give
two identical copies
-
S_region
-
Switch region of immunoglobulin heavy chains. Involved in the rearrangement of
heavy chain DNA leading to the expression of a different immunoglobulin class
from the same B-cell
-
satellite
-
Use the repeat_region feature.
many tandem repeats (identical or related) of a short basic repeating unit;
many have a base composition or other property different from the genome
average that allows them to be separated from the bulk (main band) genomic DNA
-
sig_peptide
-
signal peptide coding sequence; coding sequence for an N-terminal domain of a
secreted protein; this domain is involved in attaching nascent polypeptide to
the membrane; leader sequence
-
stem_loop
-
hairpin; a double-helical region formed by base-pairing between adjacent
(inverted) complementary sequences in a single strand of RNA or DNA
-
TATA_signal
-
TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp
before the start point of each eukaryotic RNA polymerase II transcript unit
which may be involved in positioning the enzyme for correct initiation;
consensus=TATA(A or T)A(A or T)
-
terminator
-
sequence of DNA located either at the end of the transcript or adjacent to a
promoter region that causes RNA polymerase to terminate transcription; may also
be site of binding of repressor protein
-
transit_peptide
-
transit peptide coding sequence; coding sequence for an N-terminal domain of a
nuclear-encoded organellar protein; this domain is involved in post-
translational import of the protein into the organelle
-
unsure
-
author is unsure of exact sequence in this region
-
V_region
-
Variable region of immunoglobulin light and heavy chains, and T-cell receptor
alpha, beta, and gamma chains. Codes for the variable amino terminal portion.
Can be made up from V_segments, D_segments, N_regions, and J_segments
-
V_segment
-
Variable segment of immunoglobulin light and heavy chains, and T-cell receptor
alpha, beta, and gamma chains. Codes for most of the variable region (V_region)
and the last few amino acids of the leader peptide
-
variation
-
a related strain contains stable mutations from the same gene (e.g., RFLPs,
polymorphisms, etc.) which differ from the presented sequence at this location
(and possibly others)
-
virion
-
viral genomic sequence as it is encapsidated, as distinguished from its
proviral form (integrated in a host cell's chromosome)
-
3'clip
-
3'-most region of a precursor transcript that is clipped off during processing
-
3'UTR
-
region near or at the 3' end of a mature transcript (usually following the stop
codon) that is not translated into a protein; trailer
-
5'clip
-
5'-most region of a precursor transcript that is clipped off during processing
-
5'UTR
-
region near or at the 5' end of a mature transcript (usually preceding the
initiation codon) that is not translated into a protein; leader
-
-10_signal
-
Pribnow box; a conserved region about 10 bp upstream of the start point of
bacterial transcription units which may be involved in binding RNA polymerase;
consensus=TAtAaT
-
-35_signal
-
a conserved hexamer about 35 bp upstream of the start point of bacterial
transcription units; consensus = TTGACa or TGTTGACA
VecScreen     (Back/Index)
The submitted sequence has been screened using VecScreen to identify segments of nucleic acid sequence which may be of vector, adapter, or linker origin.
All sequences are screened using VecScreen to combat the problem
of vector contamination in GenBank.
Failure to recognize foreign segments in a sequence can:
- Lead to erroneous conclusions about the biological significance of the sequence.
- Waste time and effort in analysis of contaminated sequence.
- Delay the release of the sequence in a public database.
- Pollute public databases with contaminated sequence.
VecScreen searches the submitted sequence for segments that match any sequence
in a specialized non-redundant vector database, UniVec. UniVec contains the
unique segments and only a single copy of each of the shared segments from all
the vector, adapter, linker and primer sequences that were used to build the
database. Searches using VecScreen do not necessarily indicate the identity of
the vector having the strongest match to the submitted sequence because many
redundant sequences were eliminated in the construction of the UniVec database.
The full extent of the match to any individual vector will also not be apparent
because the sequence for most vectors in UniVec is not present as one contiguous
piece.
The most likely sources of vector contamination can be deduced from the
cloning history of the sequenced DNA. If it is necessary to identify the
vector that has the best match to the query sequence, a search should be
made using a database that contains a contiguous sequence for each vector,
such as NCBI's vector database. You can perform this type of search by using
blastn and
entering your original nucleotide sequence. (Select 'blastn' search against
the 'vector' database once at that page).
If VecScreen has detected foreign sequence, follow one of the following
procedures:
- Sequence does contain foreign sequence:
- Return to 'Enter DNA sequence'
- Edit sequence in box or replace with corrected sequence
- Press 'Validate and Continue'
- Sequence is a cloning vector:
- Return to BankIt: VecScreen Information
- Toggle the box marked 'Cloning Vector'
- Press 'Validate and Continue'
- Sequence has a VecScreen hit but is not a cloning vector. In this
case an explanation must be supplied:
- Return to BankIt: VecScreen Information
- Provide an explanation for the presence of the
detected foreign segment in your sequence.
- Press 'Validate and Continue'
One of the above procedures must be followed through in order to complete your BankIt submission.
Saving Common Data     (Back/Index)
To save data common to a set of records to be submitted with BankIt:
- On the first input page, complete the information that will be common for all submissions.
- In Netscape or Internet Explorer: Press the 'Save This Form' button at the bottom of the page and then name this file on your local computer.
In MacWeb: Press the 'Save This Form' button; when naming the file add the extension .html (i.e., localfile.html).
- For your first submission, complete the current BankIt form with sequence data and all other relevant descriptive information. When finished, submit the data to GenBank by selecting "Submit to GenBank" at the top of the BankIt flatfile review page. You will receive a confirmation saying "Thank you for using BankIt. Your submission has been sent to GenBank..." Only now is your BankIt submission complete.
- For each remaining submission, select File/Open File (or File/Open Local, depending upon the client) to open your previously saved common information file in BankIt.
- Immediately after loading the file, scroll to the bottom of the page and press the "Validate and Continue" button. This will resync the form with the correct URL links and assign a new BankIt submission ID number.
- Enter the new sequence data and descriptive information, and submit to GenBank.
- Follow steps 4, 5, and 6, as many times as needed. Use the numbering boxes
(under Multiple Submissions Information, at the top of the input page) to
indicate the number of the submission and the total number of submissions (eg,
'This submission is number 4 of a total of 15 submission(s).')
Revised June 25, 2003
|