NCBI Home BankIt: GenBank Submissions by WWW
PubMed Entrez BLAST OMIM Taxonomy Structure
 Spacer Image
NCBI
 Back to NCBI Home
 Back to NCBI Home
SITE MAP


Back

Submission
Examples


 Spacer Image

 

*BankIt Help - Index

Getting Started with BankIt    (Back/Index)

A BankIt submission involves seven easy steps:
  1. Confirm your sequence is not an update or a duplicate of a previous submission by you.
  2. Enter sequence length and press the 'New' button.
  3. Complete an initial form with general information about the submitters and sequence, and enter the DNA sequence data.
    • Press the 'Validate and Continue' button.
  4. View the draft of the GenBank record.
    • If any errors appear in RED, correct the errors. Press the 'Validate and Continue' button.
    • If any errors appear in BLUE, attempt to correct the errors. Press the 'Validate and Continue' button. If you think the submission is correct, but you still get BLUE errors, press the 'Review and Submit' button.
  5. Specify the number and types of biological features you want to annotate on the record (e.g., add 1 CDS feature and 1 tRNA feature), and press the 'Modify Submission' button to return to the form. There, you can enter the details about each feature, including protein translations.
  6. Repeat steps 3 and 4 until the record is complete.
  7. View the final draft by pressing the 'Review and Submit' button.
  8. As the final step, press the 'Submit to Genbank' button.

General Submission Information     (Back/Index)

Multiple Submissions Information

If you are submitting more than one sequence at this time, number each sequence and indicate the total number of sequences to be submitted so that we can correctly assign consecutive accession numbers to your set.

Contact Page

Enter the name, telephone and fax numbers, and e-mail address of the person who is submitting the sequence. This is the person who will be contacted regarding the sequence submission. This person does not have to be on the list of authors involved in the sequencing (see below). The phone, fax, and email address will not be visible in the database record.

Release Information

May we release this record before publication?

Select one of the two radio buttons. If you select "Yes," the entry will be released to the public after the database staff has finished processing and added it to the database. If you select "No," then select the fields to indicate the date on which the sequences should be released to the public. The submission will then be held back by the database staff until formal publication of the sequence or GenBank accession number, or until the Release Date, whichever comes first. Sequences must be released when the accession number or any portion of the sequence is published.

Primary Sequence Data

Are you submitting sequence data that you have determined? If not, and you are submitting a sequence derived or assembled from data that is already available to the public (for example, it is downloaded from a public site and annotated by you or otherwise computationally predicted), please check the 'No' box in this section.

If your sequence is not primary data, we can accept it for our Third Party Annotated (TPA) database, which will be part of the International Nucleotide Sequence Database Collaboration.

In order to submit your sequence, you must provide the accession numbers of the primary sequence(s) from which you derived your sequence data. Furthermore, in order to release your sequence publicly in TPA, you must provide us with the bibliographic information of your published reference from a peer-reviewed biological journal. The complete reference information is not required for submission, only for the sequence's public release.

Reference Information     (Back/Index)

Sequence Authors

For each author to be credited for the sequence submission, list one author per line. Provide the Middle Initial, not the full Middle Name. The Suffix field has a pull down menu that lists the types of acceptable options. This field is not intended for titles or degrees, such as M.S. or Ph.D.

An example of a correctly entered name would be:

	Robert W. Plant, Jr.

In the final GenBank flat file, the Sequence Authors information will appear in the Direct Submission REFERENCE block.

First Citation Associated with Sequence

If a citation is associated with the sequence, select one of the three options under Publication Status:
  1. Unpublished (Submitted) - an article "submitted" for publication or "in preparation" and not yet officially accepted by a journal should be entered as Unpublished
  2. In-Press
  3. Published

Provide a Citation Title (longer than 1 word) even if the publication status is Unpublished.

When entering the Citation Authors, follow the instructions provided for the Sequence Authors in the previous section. The Citation Authors should contain at least one of the submitters, unless the reference is to be cited on a feature (e.g., misc_difference, allele, exon) in the record.

The Journal Title field is to be filled out if the Publication Status is either In-Press or Published.

The fields for Volume and Pages are to be filled out only if the Publication Status is Published.

The Year field is to be filled out if the Publication Status is either In-Press or Published.

In the final GenBank flat file, the Citation information will appear as a REFERENCE block directly above the Direct Submission REFERENCE block.

Second citation (if any) associated with sequence

This section is to be filled out only if another citation refers to this sequence. The instructions for completing this section are identical to those above for the First citation associated with sequence.

Source Information     (Back/Index)

Source Organism Name

An appropriate scientific name for the source organism is crucial for proper taxonomic assignment and genetic code use for conceptual translation of coding regions. A binomial or trinomial genus species designation is most useful. If the scientific name of the source organism is not available in the provided list or is not otherwise known, provide at least a general description of the source organism (eg, uncultured bacterium) to assist the taxonomic staff in classifying the source organism. Providing as much information as known about the source organism will improve its taxonomic classification and provide database users with important information about the sequence record. Vectors should be designated as "vector" and fusion proteins should be designated as "synthetic construct".

Source modifiers

Source modifiers provide valuable additional information, about the source organism or biomolecule, that help to further define and augment the view of the biological context pertaining to the sequence submission. In addition to the legal qualifiers available in the pop-up list of source modifiers, additional information about the source organism or biomolecule that the submitter feels is important to include in the record can be entered in the "Other Source Information" box. If more source modifiers need to be added than were initially specified press "Add More Modifiers".

Other Source Description

Information about the source genome is very helpful to database users. If the source genome for the submitted biomolecule was mitochondrial or chloroplast, check the appropriate box. Otherwise, check the "neither" box (this also includes products encoded, for example, in the nuclear genome which are subsequently imported into mitochondria or chloroplasts). If the sequenced biomolecule is a transposon or insertion sequence indicate this in the "other" box. The name of the transposon or insertion sequence can be entered in the "Value" field associated with the "Source Modifier" list (select transposon or insertion sequence). If the source genome of the biomolecule was other than nuclear, mitochondrial, or chloroplast, specify in the "other" box. Genomically integrated viral sequences should be designated as proviral. Non-integrated viral sequences should be designated as virion.

Input DNA Sequence     (Back/Index)

Molecule Type

This should indicate the type of molecule present in the cell. If you sequenced a cDNA corresponding to a mRNA, designate mRNA. If you sequenced an RNA virus (no DNA stage), designate RNA. Most other sequenced molecules exist as DNA (eg, ribosomal RNA gene, tRNA gene, protein coding gene, etc.).

Sequence Summary/Definition Information

Enter a descriptive title for your sequence entry. This information will appear in the DEFINITION line of your preliminary GenBank flatfile, but may be modified by the GenBank submissions staff to conform to current GenBank guidelines. Although the general format of the definition line varies depending on several factors, below are some examples for several different situations:
  1. For an mRNA having a complete CDS:
    • Genus species product name (optional gene symbol) mRNA, complete cds.

  2. For an mRNA having a partial CDS:
    • Genus species product name (optional gene symbol) mRNA, partial cds.

  3. For a genomic record having a complete CDS:
    • Genus species product name (optional gene symbol) gene, complete cds.

  4. For a genomic record having only one exon and a partial CDS:
    • Genus species product name (optional gene symbol) gene, exon 2 and partial cds.

  5. For mitochondrian or chloroplast-localized proteins and RNA sequences:
    • Genus species product name (optional gene symbol) gene, complete cds;
    • [one choice from below].
    • Genus species XXS ribosomal RNA gene, partial sequence;
    • [one choice from below].

      • nuclear gene(s) for mitochondrial product(s)
      • nuclear gene(s) for chloroplast product(s)
      • mitochondrial gene(s) for mitochondrial product(s)
      • chloroplast gene(s) for chloroplast product(s)

  6. Non-gene (intergenic) chloroplast or mitochondrial sequences:
    • Genus species xxx region, chloroplast sequence.
    • Genus species xxx region, mitochondrial sequence.

Enter DNA Sequence

  • Sequence must have only single letter IUPAC code, raw sequence only.
  • Sequence must be at least 50 bp in length (shorter sequences will not be processed).
  • Sequence must be biologically contiguous and not contain any internal unsequenced spacers.
  • Sequence must not be EST. Use dbEST for submitting EST data.
  • Avoid submitting sequences with strings of NN's.
  • Make sure that your sequence is linker/vector-free, including the removal of linker sequences beyond the polyA tail of mRNAs.

Failure to adhere to these general requirements could result in a GenBank Accession Number not being issued, or may cause a delay in the processing of your entry.

Additional Information     (Back/Index)

Use this field to enter:
  • any biological information for which you found no appropriate place on the Bankit form
  • any special instructions that will help GenBank annotators process your submission
Sequence features such as coding regions and structural RNAs can be added after you press "Validate and Continue" (below this field). On the following page, enter the number of feature types (Coding Regions, RNA Features, or Other Features) you wish to add and press "Modify Submission".

Save, Validate and Continue, and Review and Submit Buttons     (Back/Index)

On the BankIt entry page:

"New"
  • create a new BankIt submission
  • you must first indicate the sequence length in nucleotides, then press New
"Update"
  • modify a previous bankit submission
  • you can also request your changes or corrections in the text of an email to GenBank (gb-admin@ncbi.nlm.nih.gov)

On the bottom of the BankIt submission form:

"Save This Form"
  • save the current BankIt form's information to your local hard drive to use for additional BankIt submissions
  • this can reduce time used to enter common information (names, address, phone and fax numbers, etc) in subsequent forms
  • to re-load see the Saving Common Data Help below.
"Validate and Continue"
  • press this to validate the information currently entered in the BankIt form and check for errors or warnings
  • errors and warnings are reported at the top of the BankIt page and are links to the appropriate sections of the BankIt form
"Review and Submit"
  • if all reported errors and warnings have been corrected or if only warnings remain, press this button to review a final version of the record
  • this button will not appear if any errors are reported

On the Review Submission Page:

"Modify Submission"
  • use this to go back to the BankIt form and modify the current submission
  • enter the number of desired features (Coding Regions, RNA Features, or Other Features) in the boxes provided; blank fields for these features will then be provided on the original BankIt form
"Submit to GenBank"
  • final step in the creation and submission of a BankIt entry
Note: Errors appear at the top of the BankIt Pages in RED, while warnings are in BLUE. All errors must be corrected before the BankIt submission will be electronically accepted. Warnings should be addressed, but a BankIt submission can still be completed if warnings remain.

Coding Regions     (Back/Index)

Coding Regions Annotating an open reading frame on a sequence with a coding region interval and gene and/or protein names makes a submitted seuqence more informative. BankIt allows you to enter one or more coding regions (CDS) on the sequence you are submitting. The CDS is based on

  • the nucleotide sequence you submit
  • the nucleotide intervals of the CDS, including the start and stop codons, if present
  • the amino acid translation
Along with the nucleotide sequence, you must provide either the CDS nucleotide intervals or the amino acid translation of the CDS. BankIt will then either translate the amino acid sequence from the intervals or attempt to predict the nucleotide intervals corresponding to the translation you provide. Organisms utilizing non-standard genetic codes may not translate properly. The Source Information section allows you to select alternative genetic codes, which may be appropriate for the source organism of your sequence. If BankIt detects a discrepancy in the conceptual translation of the CDS intervals, you will receive an error message, which will describe the problem and give you the option to make the necessary corrections. If you are unable to correct the error you may continue with the submission process, and a member of the GenBank Annotation Staff may contact you regarding your submission when it is processed. To add a CDS to a BankIt submission:
  1. press the Validate and Continue button on the bottom of the Bankit form
  2. on the next page, enter the number of CDS features you wish to add and press the Modify Submission button; Bankit will bring you back to the original form
  3. click on the CDS Feature in the Contents list or scroll down to the CDS Feature box near the bottom of the form
  4. complete the CDS Feature subsections

Nucleotide Intervals

  • mRNAs and intronless genes: enter the first nucleotide of the start codon and the last nucleotide of the stop codon in the fields provided.
  • Eukaryotic genomic sequences containing exons and introns: enter the nucleotide spans of each of the exons, including the start codon, stop codon, and any upstream or downstream exonic sequence. If an amino acid sequence is entered without accompanying nucleotide intervals, BankIt will attempt to determine the exon spans using the univeral splice acceptor/donor consensus sequences, AG and GT, respectively. You will be given an opportunity to view this coding region after you press the Validate and Continue button at the bottom of the form.
  • Single genomic sequences with multiple coding regions: after pressing the Validate and Continue button the first time, enter the total number of CDSs to be added to the record and press the Modify Submissions button. BankIt will create the indicated number of CDS features on the original BankIt form; complete these and press Validate and Continue to view the results
  • If the sequence does not include either or both the start/stop codons, be sure to check the appropriate 5' and/or 3' partial flag(s).
  • If the 5' end of a CDS is in the middle of a codon enter the start nucleotide at position 1, indicate the correct reading frame: 1, 2, or 3, and mark the 5' partial flag

Amino Acid Sequence (Optional)

  • enter the amino acid sequence, if known, in this field using single letter amino acid abbreviations
  • if no amino acid sequence is entered, BankIt will conceptually translate the given CDS intervals

Protein Information

  • the name or description of the protein encoded by the CDS is required information
  • any further description (eg, EC number, function) can be entered in the Additional Information field

Gene Information

  • enter the gene name, allele and/or description in the available fields
  • do not repeat information already given in the Protein Information fields
  • both short and long forms of a gene name should be given for Drosophila

mRNA Intervals (Optional)

  • if the sequence is genomic, enter the intervals of the corresponding mRNA, extending the 5' and 3' ends beyond the start and stop codons, respectively, if known
  • do not enter intervals if the Molecule Type of the sequence on which the CDS if found is already noted as mRNA (see 'Molecule Type' under ' Input DNA Sequence,' above)

RNA Features     (Back/Index)

If you have already annotated an mRNA feature as part of the coding region annotation, there is no need to enter the same information here.
  1. Click on the RNA type from the list. Add additional information about the RNA in the box.
  2. If the feature sequence is on the (-) strand, check the (-) box. The default is the (+) strand.
  3. Check the 5' Partial or 3' Partial box if the RNA in the nucleic acid sequence is missing residues at the 5' or 3' ends, respectively. If a complete feature sequence is entered, do not check either box.
  4. Enter the sequence range of the feature. The numbers should correspond to the nucleotide sequence interval. If the feature spans multiple, non-continuous intervals on the sequence, indicate the beginning and end points of each interval. If you require more boxes to enter each interval, click on "Add More RNA Intervals" and enter the sequence range in the new boxes. If the feature is on the (-) strand, enter the interval base numbers of the (+) strand and be sure to check (-) strand above.

List of RNA Types:

premessage RNA
Any RNA species that is not yet the mature RNA product; may include 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3'clip)
mRNA
messenger RNA; includes 5'untranslated region (5'UTR), coding sequences (CDS, exon) and 3'untranslated region (3'UTR)
tRNA
mature transfer RNA, a small RNA molecule (75-85 bases long) that mediates the translation of a nucleic acid sequence into an amino acid sequence Enter the amino acid transferred by the tRNA in the information box.
rRNA
mature ribosomal RNA ; the RNA component of the ribonucleoprotein particle (ribosome) which assembles amino acids into proteins
snRNA
small nuclear RNA; any one of many small RNA species confined to the nucleus; several of the snRNAs are involved in splicing or other RNA processing reactions
scRNA
small cytoplasmic RNA; any one of several small cytoplasmic RNA molecules present in the cytoplasm and (sometimes) nucleus of a eukaryote
misc_RNA
internal transcribed spacer (ITS), any transcript or RNA product that cannot be defined by other RNA keys (prim_transcript, precursor_RNA, mRNA, 5'clip, 3'clip, 5'UTR, 3'UTR, exon, intron, polyA_site, rRNA, tRNA, scRNA, and snRNA)

Other Features     (Back/Index)

If you have annotation(s) which apply to one or more intervals on a sequence 
which are more specific annotation(s) of your protein, gene, or RNA feature(s)
or are NOT protein, gene, or RNA,feature(s), add them here.  Most features
are indicated on the nucleotide sequence even if they refer to amino acid 
sequence motifs. See additional features.

Add the feature by clicking the feature in the  "Choose Feature type" menu.
If you have additional information about the feature, enter it in the 
"Feature Selected" box beneath the menu. If the feature is not present in the 
menu, enter the it in the "Feature Selected" box. 

After you have selected a feature, provide the following information:

(1) If the feature sequence is on the (-) strand, check the (-) box.
    The default is the (+) strand.  
     
(2) Check the 5' Partial or 3' Partial box if the feature in your nucleic acid 
    sequence is missing residues at the 5' or 3' ends, respectively. If a 
    complete feature sequence is entered, do not check either box. 

(3) Enter the sequence range of the feature. The numbers should correspond to 
    the nucleotide sequence interval. If the feature spans multiple, 
    non-continuous intervals on the sequence, indicate the beginning and end 
    points of each interval. If you require more boxes to enter each interval,
    click on "Add More Other Feature Intervals" and enter the 
    sequence range in the new boxes.
    If the feature is on the (-) strand, enter the interval base 
    numbers of the (+) strand and be sure to check (-) strand above.

List of Additional Features:

attenuator
1) region of DNA at which regulation of termination of transcription occurs, which controls the expression of some bacterial operons; 2) sequence segment located between the promoter and the first structural gene that causes partial termination of transcription

C_region
Constant region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains. Includes one or more exons depending on the particular chain

CAAT_signal
CAAT box; part of a conserved sequence located about 75 bp upstream of the start point of eukaryotic transcription units which may be involved in RNA polymerase binding; consensus=GG(C or T)CAATCT

conflict
independent determinations of the "same" sequence differ at this site or region

D-loop
displacement loop; a region within mitochondrial DNA in which a short stretch of RNA is paired with one strand of DNA, displacing the original partner DNA strand in this region; also used to describe the displacement of a region of one strand of duplex DNA by a single stranded invader in the reaction catalyzed by RecA protein

D_segment
Diversity segment of immunoglobulin heavy chain, and T-cell receptor beta chain

enhancer
a cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter

exon
region of genome that codes for portion of spliced mRNA; may contain 5'UTR, all CDSs, and 3' UTR *** For exon number, enter the NUMBER ONLY in the comment section. ***

GC_signal
GC box; a conserved GC-rich region located upstream of the start point of eukaryotic transcription units which may occur in multiple copies or in either orientation; consensus=GGGCGG

iDNA
intervening DNA; DNA which is eliminated through any of several kinds of recombination

intron
a segment of DNA that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it For intron number, enter the NUMBER ONLY in the comment section.

J_segment
Joining segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains

LTR
long terminal repeat, a sequence directly repeated at both ends of a defined sequence, of the sort typically found in retroviruses

mat_peptide
mature peptide or protein coding sequence; coding sequence for the mature or final peptide or protein product following post-translational modification. the location does not include the stop codon (unlike the corresponding CDS).

misc_binding
site in nucleic acid which covalently or non-covalently binds another moiety that cannot be described by any other Binding key (primer_bind or protein_bind)

misc_difference
feature sequence is different from that presented in the entry and cannot be described by any other Difference key (conflict, unsure, old_sequence, mutation, variation, allele, or modified_base)

misc_feature
region of biological interest which cannot be described by any other feature key

misc_recomb
site of any generalized, site-specific or replicative recombination event where there is a breakage and reunion of duplex DNA that cannot be described by other recombination keys (iDNA and virion) or qualifiers of source key (/insertion_seq, /transposon, /proviral)

misc_signal
any region containing a signal controlling or altering gene function or expression that cannot be described by other Signal keys (promoter, CAAT_signal, TATA_signal, -35_signal, -10_signal, GC_signal, RBS, polyA_signal, enhancer, attenuator, terminator, and rep_origin)

misc_structure
any secondary or tertiary structure or conformation that cannot be described by other Structure keys (stem_loop and D-loop)

modified_base
the indicated nucleotide is a modified nucleotide and should be substituted for by the indicated molecule (given in the mod_base qualifier value)

N_region
Extra nucleotides inserted between rearranged immunoglobulin segments

old_sequence
the presented sequence revises a previous version of the sequence at this location

polyA_signal
recognition region necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA

polyA_site
site on an RNA transcript to which will be added adenine residues by post- transcriptional polyadenylation *** Remove any linker sequence after the polyA tail. ***

prim_transcript
primary (initial, unprocessed) transcript; includes 5' clipped region (5'clip), 5' untranslated region (5'UTR), coding sequences (CDS, exon), intervening sequences (intron), 3' untranslated region (3'UTR), and 3' clipped region (3' clip)

primer_bind
Non-covalent primer binding site for initiation of replication, transcription, or reverse transcription. Includes site(s) for synthetic e.g., PCR primer elements

promoter
region on a DNA molecule involved in RNA polymerase binding to initiate transcription

protein_bind
non-covalent protein binding site on nucleic acid

RBS
ribosome binding site

repeat_region
region of genome containing repeating units; microsatellites are annotated using this feature. Try to provide these 3 items:
  • repeat type: tandem, dispersed, direct, or inverted
  • repeat family: the name of a family, i.e. Alu, LTR, MIR, B1, MER, etc.
  • repeat unit: nucleotide sequence repeated

repeat_unit
Use the repeat_region feature and note the specific unit as a modifier instead. single repeat element

rep_origin
origin of replication; starting site for duplication of nucleic acid to give two identical copies

S_region
Switch region of immunoglobulin heavy chains. Involved in the rearrangement of heavy chain DNA leading to the expression of a different immunoglobulin class from the same B-cell

satellite
Use the repeat_region feature. many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA

sig_peptide
signal peptide coding sequence; coding sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane; leader sequence

stem_loop
hairpin; a double-helical region formed by base-pairing between adjacent (inverted) complementary sequences in a single strand of RNA or DNA

TATA_signal
TATA box; Goldberg-Hogness box; a conserved AT-rich septamer found about 25 bp before the start point of each eukaryotic RNA polymerase II transcript unit which may be involved in positioning the enzyme for correct initiation; consensus=TATA(A or T)A(A or T)

terminator
sequence of DNA located either at the end of the transcript or adjacent to a promoter region that causes RNA polymerase to terminate transcription; may also be site of binding of repressor protein

transit_peptide
transit peptide coding sequence; coding sequence for an N-terminal domain of a nuclear-encoded organellar protein; this domain is involved in post- translational import of the protein into the organelle

unsure
author is unsure of exact sequence in this region

V_region
Variable region of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains. Codes for the variable amino terminal portion. Can be made up from V_segments, D_segments, N_regions, and J_segments

V_segment
Variable segment of immunoglobulin light and heavy chains, and T-cell receptor alpha, beta, and gamma chains. Codes for most of the variable region (V_region) and the last few amino acids of the leader peptide

variation
a related strain contains stable mutations from the same gene (e.g., RFLPs, polymorphisms, etc.) which differ from the presented sequence at this location (and possibly others)

virion
viral genomic sequence as it is encapsidated, as distinguished from its proviral form (integrated in a host cell's chromosome)

3'clip
3'-most region of a precursor transcript that is clipped off during processing

3'UTR
region near or at the 3' end of a mature transcript (usually following the stop codon) that is not translated into a protein; trailer

5'clip
5'-most region of a precursor transcript that is clipped off during processing

5'UTR
region near or at the 5' end of a mature transcript (usually preceding the initiation codon) that is not translated into a protein; leader

-10_signal
Pribnow box; a conserved region about 10 bp upstream of the start point of bacterial transcription units which may be involved in binding RNA polymerase; consensus=TAtAaT

-35_signal
a conserved hexamer about 35 bp upstream of the start point of bacterial transcription units; consensus = TTGACa or TGTTGACA

VecScreen     (Back/Index)

The submitted sequence has been screened using VecScreen to identify segments of nucleic acid sequence which may be of vector, adapter, or linker origin. All sequences are screened using VecScreen to combat the problem of vector contamination in GenBank. Failure to recognize foreign segments in a sequence can:
  • Lead to erroneous conclusions about the biological significance of the sequence.
  • Waste time and effort in analysis of contaminated sequence.
  • Delay the release of the sequence in a public database.
  • Pollute public databases with contaminated sequence.

VecScreen searches the submitted sequence for segments that match any sequence in a specialized non-redundant vector database, UniVec. UniVec contains the unique segments and only a single copy of each of the shared segments from all the vector, adapter, linker and primer sequences that were used to build the database. Searches using VecScreen do not necessarily indicate the identity of the vector having the strongest match to the submitted sequence because many redundant sequences were eliminated in the construction of the UniVec database. The full extent of the match to any individual vector will also not be apparent because the sequence for most vectors in UniVec is not present as one contiguous piece.

The most likely sources of vector contamination can be deduced from the cloning history of the sequenced DNA. If it is necessary to identify the vector that has the best match to the query sequence, a search should be made using a database that contains a contiguous sequence for each vector, such as NCBI's vector database. You can perform this type of search by using blastn and entering your original nucleotide sequence. (Select 'blastn' search against the 'vector' database once at that page).

If VecScreen has detected foreign sequence, follow one of the following procedures:

  1. Sequence does contain foreign sequence:
    • Return to 'Enter DNA sequence'
    • Edit sequence in box or replace with corrected sequence
    • Press 'Validate and Continue'
  2. Sequence is a cloning vector:
    • Return to BankIt: VecScreen Information
    • Toggle the box marked 'Cloning Vector'
    • Press 'Validate and Continue'
  3. Sequence has a VecScreen hit but is not a cloning vector. In this case an explanation must be supplied:
    • Return to BankIt: VecScreen Information
    • Provide an explanation for the presence of the detected foreign segment in your sequence.
    • Press 'Validate and Continue'

One of the above procedures must be followed through in order to complete your BankIt submission.


Saving Common Data     (Back/Index)

To save data common to a set of records to be submitted with BankIt:
  1. On the first input page, complete the information that will be common for all submissions.
  2. In Netscape or Internet Explorer: Press the 'Save This Form' button at the bottom of the page and then name this file on your local computer.
    In MacWeb: Press the 'Save This Form' button; when naming the file add the extension .html (i.e., localfile.html).
  3. For your first submission, complete the current BankIt form with sequence data and all other relevant descriptive information. When finished, submit the data to GenBank by selecting "Submit to GenBank" at the top of the BankIt flatfile review page. You will receive a confirmation saying "Thank you for using BankIt. Your submission has been sent to GenBank..." Only now is your BankIt submission complete.
  4. For each remaining submission, select File/Open File (or File/Open Local, depending upon the client) to open your previously saved common information file in BankIt.
  5. Immediately after loading the file, scroll to the bottom of the page and press the "Validate and Continue" button. This will resync the form with the correct URL links and assign a new BankIt submission ID number.
  6. Enter the new sequence data and descriptive information, and submit to GenBank.
  7. Follow steps 4, 5, and 6, as many times as needed. Use the numbering boxes (under Multiple Submissions Information, at the top of the input page) to indicate the number of the submission and the total number of submissions (eg, 'This submission is number 4 of a total of 15 submission(s).')

Revised June 25, 2003