dbGSS: database of "Genome Survey Sequences" |
PubMed | Entrez | BLAST | OMIM | Taxonomy | Structure |
GSS data submission
file types
Map data submission
file types |
The GSS division of GenBank is similar in nature to the EST division, except that its sequences are genomic in origin, rather than cDNA (mRNA). The GSS division contains (but is not limited to) the following types of data:
GSSs by nature are usually submitted to GenBank and dbGSS as batches of dozens to thousands of entries, with a great deal of redundancy in the citation, submittor and library information. To improve the efficiency of the submission process for this type of data, we have designed a separate streamlined submission process and data format. NOTE: Beginning in 2009 Sequences derived from "next generation" sequencing platforms, including Roche 454, Illumina, Applied Biosystems SOLiD, and Helicos Biosciences HeliScope, should be submitted to the Short Read Archive (SRA) (For information contact sra@ncbi.nlm.nih.gov.)
There are two parts to the submission instructions, one for the sequence data, and one for any mapping data. (NOTE: starting in 2009 map data will no longer be entered for dbGSS submissions.)
1. Publication
The format for each file is described below. If all the GSSs share the same Publication, Library, and Contact information, you only need to prepare one of each of those files. Then complete a separate GSS file (file type d) for each sequence. If any of the GSS files have different Publication, Library, or Contact information, you must complete a new set of file types 1-3. Once we have entered particular Publication, Library, or Contact information into the database, you do not need to resend the data input files. The batch submission process for GSS map data involves the completion of four file types, below. 1. Publication
The publication and contact files use the same file format as the publication and contact files described under the submission format for GSS sequence data. If the map data share the same Publication and Contact files as the sequence data, there is no need to resubmit the Publication and Contact files. Rather, the CITATION and contact name (CONT_NAME) fields of the Map Data files will serve as a cross reference to the appropriate Publication and Contact files.
Send the completed files to: batch-sub@ncbi.nlm.nih.gov You can attach all the files to a single email message, or you can include them in the body of the email message. Please be sure that they are in plain text (ASCII) format. We prefer to have the individual GSS and Map data files batched together as much as possible: for example, all GSS entries in one file and all Map entries in another file. You can submit Publication, Library, and Contact data together in one file. You can also send them in the same file as the GSS entries - the TYPE field will differentiate them for the parsing software.
You will receive a list of dbGSS IDs and GenBank Accession numbers from a dbGSS curator via email. If you would like your sequences held confidential until publication, you can indicate that by putting the release date in the PUBLIC field of the GSS files. Your sequences will be released on that date, or when the Accession numbers or sequence data are published, whichever comes first. Once your sequences are released into the public database, they will be available from the GSS division of GenBank (accessible through the Entrez Nucleotide division).
Updates to GSS entries are done basically in the same way as new entries. Changes to any item in the GSS input file (other than GSS# or CONT_NAME) are made by completing an input file with new data in the fields that need to be changed. For the STATUS field, enter "Update" instead of "New". In addition to the fields to be changed Updates need to include TYPE, STATUS, GSS#, and CONT_NAME fields. For changes in Publication, Contact, or Source data, or for changes in GSS#'s or CONT_NAME, send an email message describing the change that is needed. Send the update files to: batch-sub@ncbi.nlm.nih.gov
If you have questions about the GSS submission format, please contact info@ncbi.nlm.nih.gov Submission Format for GSS Sequence DataThe following is a specification for flat file formats for delivering GSS and related data to the NCBI GSS database.
File TypesThere are four types of deliverable files:1. Publication 2. Library 3. Contact 4. GSS sequence file Each GSS file needs to reference the Publication, Library, and Contact data. Therefore the Publication, Library, and Contact files must be in the database when the GSS file is entered. Once these files have been submitted and entered, they do not need to be re-submitted for additional GSS files that have the same Publication, Library, or Contact.
1. Publication FilesThe following is an example of the valid tags and some illustrative data:TYPE: Entry type - must be "Pub" for publication entries. **Obligatory field**. MEDUID: Medline unique identifier. Not obligatory, include if you know it. TITLE: Title of article. **Obligatory field**. Begin on line below tag, use multiple lines if needed AUTHORS: Author name, format: Name,I.I.; Name2,I.I.; Name3,I.I. **Obligatory field**. Begin on line below tag, use multiple lines if needed JOURNAL: Journal name VOLUME: Volume number SUPPL: Supplement number ISSUE: Issue number I_SUPPL: Issue supplement number PAGES: Page, format: 123-9 YEAR: Year of publication. **Obligatory field**. STATUS: Status field.1=unpublished, 2=submitted, 3=in press, 4=published **Obligatory field**. || Examples: TYPE: Pub MEDUID: 92347897 TITLE: Genomic sequences from a subtracted retinal pigment epithelium library AUTHORS: Gieser,L.; Swaroop,A. JOURNAL: Genomics VOLUME: 13 ISSUE: 2 PAGES: 873-6 YEAR: 1992 STATUS: 4 ||
2. Library FilesThe following is an example of the valid tags and some illustrative data:TYPE: Entry type - must be "Lib" for library entries. **Obligatory field**. NAME: Name of library. **Obligatory field**. ORGANISM: Organism from which library prepared. STRAIN: Organism strain CULTIVAR: Plant cultivar ISOLATE: Individual isolate from which the sequence was obtained SEX: Sex of organism (female, male, hermaphrodite) ORGAN: Organ name TISSUE: Tissue type CELL_TYPE: Cell type CELL_LINE: Name of cell line STAGE: Developmental stage HOST: Laboratory host VECTOR: Name of vector V_TYPE: Type of vector (Cosmid, Phage, Plasmid, YAC, other) RE_1: Restriction enzyme at site1 of vector RE_2: Restriction enzyme at site2 of vector DESCR: Description of library preparation methods, vector, etc. This field starts on the line below the DESCR: tag. || Examples: TYPE: Lib NAME: Rat Lambda Zap Express Library ORGANISM: Rattus norvegicus STRAIN: Sprague-Dawley SEX: male STAGE: embryonic day 17 post-fertilization TISSUE: aorta CELL_TYPE: vascular smooth muscle DESCR: Put description here. ||
3. Contact FilesThe following is an example of the valid tags and some illustrative data:TYPE: Entry type - must be "Cont" for contact entries. **Obligatory field**. NAME: Name of person providing the GSS sequence **Obligatory field**. FAX: Fax number as string of digits. TEL: Telephone number as string of digits. EMAIL: E-mail address LAB: Laboratory INST: Institution name ADDR: Address string || Examples: TYPE: Cont NAME: Sikela JM FAX: 303 270 7097 TEL: 303 270 EMAIL: tjs@tally.hsc.colorado.edu LAB: Department of Pharmacology INST: University of Colorado Health Sciences Center ADDR: Box C236, 4200 E. 9th Ave., Denver, CO 80262-0236, USA ||
4. GSS FilesThe following is an example of the valid tags and some illustrative data:TYPE: Entry type - must be "GSS" for GSS entries. **Obligatory field** STATUS: Status of GSS entry - "New" or "Update". **Obligatory field** CONT_NAME: Name of contact Must be identical string to the contact entry **Obligatory field** CITATION: Journal citation Must be identical string to the publication title Begins on line below tag. Use continuation lines if needed. **Obligatory field** LIBRARY: Library name Must be identical string to library name entry. **Obligatory field** GSS#: GSS name or number assigned by contact lab. For GSS entry updates, this is the string we match on. **Obligatory field** GDB#: Genome Database accession number GDB_DSEG: Genome Database Dsegment number CLONE: Clone number/name SOURCE: Source providing clone, e.g., ATCC SOURCE_DNA: Source identity number for the clone as pure DNA SOURCE_INHOST: Source identity number for the clone stored in the host OTHER_GSS: Other GSSs on this clone. DBNAME: Database name for cross-reference to another database DBXREF: Database cross-reference accession PCR_F: Forward PCR primer sequence PCR_B: Backward PCR primer sequence INSERT: Insert length (in bases) ERROR: Estimated error in insert length (bases) PLATE: Plate number or code ROW: Row number or letter COLUMN: Column number or letter SEQ_PRIMER: Sequencing primer description or sequence P_END: Which end sequenced, e.g., 5' HIQUAL_START: Base position of start of high-quality sequence (default = 1) HIQUAL_STOP: Base position of last base of high-quality sequence DNA_TYPE: Genomic (default), cDNA, Viral, Synthetic, Other CLASS: Class of sequencing method, e.g., BAC ends, YAC ends, exon-trapped **Obligatory field** PUBLIC: Date of public release Leave blank for immediate release. **Obligatory field** Format: MM/DD/YYYY PUT_ID: Putative identification of sequence by submitter COMMENT: Comments about GSS. Text starts on line below COMMENT: tag. SEQUENCE: Sequence string. Text starts on line below SEQUENCE: tag. **Obligatory field** || Examples: TYPE: GSS STATUS: New CONT_NAME: Sikela JM GSS#: Ayh00001 CLONE: HHC189 SOURCE: ATCC SOURCE_INHOST: 65128 OTHER_GSS: GSS00093, GSS000101 CITATION: Genomic sequences from Human brain tissue SEQ_PRIMER: M13 Forward P_END: 5' HIQUAL_START: 1 HIQUAL_STOP: 285 DNA_TYPE: Genomic CLASS: shotgun LIBRARY: Hippocampus, Stratagene (cat. #936205) PUBLIC: PUT_ID: Actin, gamma, skeletal COMMENT: This is a comment about the sequence. It may contain features. It may span several lines. SEQUENCE: AATCAGCCTGCAAGCAAAAGATAGGAATATTCACCTACAGTGGGCACCTCCTTAAGAAGCTG ATAGCTTGTTACACAGTAATTAGATTGAAGATAATGGACACGAAACATATTCCGGGATTAAA CATTCTTGTCAAGAAAGGGGGAGAGAAGTCTGTTGTGCAAGTTTCAAAGAAAAAGGGTACCA GCAAAAGTGATAATGATTTGAGGATTTCTGTCTCTAATTGGAGGATGATTCTCATGTAAGGT GCAAAAGTGATAATGATTTGAGGATTTCTGTCTCTAATTGGAGGATGATTCTCATGTAAGGT TGTTAGGAAATGGCAAAGTATTGATGATTGTGTGCTATGTGATTGGTGCTAGATACTTTAAC TGAGTATACGAGTGAAATACTTGAGACTCGTGTCACTT ||
AFLP fragment Alu-PCR B1-PCR BAC ends BAC sequence gap BAC subclone BAC subclone end BAC/YAC ends CAPS Concatamer T-DNA junction cosmid ends cosmid sequence CoT 5E-3 hydroxyapatite-fractioned DNA DArT clone deletion endpoint Ds tagged Ds/TDNA launch pad EcoRI fragments enhancer trap ERIC-PCR exon-trapped fosmid ends Gene Trap Genomic PCR High-Cot HindIII fragments HpaII fragments HpaII/MspI fragment Hydroxyapatite-fractionated DNA internal BAC sequence Intron Spanning ISSR Low-Cot MboI fragments methylation filtered microarray microsatellite MuTAIL-PCR NdeI/DraI fragments NotI site P1 ends PAC end PAC nested deletions PAC subclone paralogous sequence variant partial digestion PCR fragment PCR from cDNA PCR product PCR product with degenerate primers PCR with nonspecific primers PCR with specific primers PCR-based subtractive hybridization plasmid plasmid ends plasmid insert plasmid insertion site primer walking PSTI fragment Random amplified microsatellites random plasmid subclone Random sheared small inserts RAPD REP-PCR repeat-enriched representational difference analysis RFLP clone RFLP probe RLGS SCAR sheared ends shotgun SRAP SSR-containing BAC subclone SSR-containing genome clone Subtraction library subtractive hybridization TAC ends TAIL-PCR Targeting vectors TDNA tagged Telomere Associated Sequences transposon insertion site transposon-tagged U3NeoSV1-trapped U3NeoSV2-trapped viral insertion site viral tagged virtual transcript YAC ends If you have a new category of genome survey sequence to enter, simply email us this information, and we will add it to the accepted list of classes.
CONT_NAME of GSS file and NAME field of the Contact file LIBRARY field of GSS file and NAME field of the Library file CITATION field of GSS file and TITLE field of the Publication file These fields from the GSS file are scanned and matched automatically to fields in the Library, Contact, and Publication tables. Differences in content, spelling, letter case, and spacing will result in no match or in an incorrect match.
Submission Format for GSS Map Data The following is a specification for flatfile formats for delivering GSS mapping and related data to the NCBI GSS database.
File TypesThere are four types of deliverable files:1. Publication 2. Contact 3. Method 4. Map data 1. Publication Files
2. Contact Files
3. Method FilesThe following is an example of the valid tags and some illustrative data:TYPE: Entry type - must be "Meth" for method entries **Obligatory field**. NAME: Name of method **Obligatory field**. ORGANISM: Organism from which library prepared **Obligatory field**. ABSOLUTE: Method gives absolute or relative address? Y or N **Obligatory field**. L1: Interpretation of line 1 L2: Interpretation of line 2 L3: Interpretation of line 3 L4: Interpretation of line 4 L5: Interpretation of line 5 L6: Interpretation of line 6 L7: Interpretation of line 7 L8 Interpretation of line 8 L9: Interpretation of line 9 L10: Interpretation of line 10 DESCR: Description of method. Description starts on line after DESCR tag. May be multi-line, free format text. || Entry separator Examples: TYPE: Meth NAME: YAC/CEPH JMS ORGANISM: Homo sapiens ABSOLUTE: n L1: plate L2: row L3: column L4: comment L5: comment L6: comment L7: comment DESCR: PCR-based mapping of 3'UT-derived primers to CEPH YAC DNA pools. Primers are chosen using the PRIMER program by Lincoln et al., ver 0.5 (1991). To date, MIT puts out YAC pools A and B; if both pools were used for the mapping data given, then 'C' is designated. || TYPE: Meth NAME: Radiation Hybrid JMS ORGANISM: Homo sapiens ABSOLUTE: y L1: chromosome L2: bin L3: comment L4: comment L5: comment DESCR: Radiation hybrid panels with binning. Primers are chosen using the PRIMER program by Lincoln et al., ver 0.5 (1991). || TYPE: Meth NAME: Somatic Hybrid JMS ORGANISM: Homo sapiens ABSOLUTE: y L1: chromosome L2: arm L3: band L4: band range L5: comment L6: comment DESCR: Somatic cell hybrid mapping. Primers are chosen using the PRIMER program by Lincoln et al., ver 0.5 (1991). ||
4. Map Data FilesThe following is an example of the valid tags and some illustrative data:TYPE: Entry type - must be "Map" for map data entries **Obligatory field** STATUS: Status of GSS entry - "New","Replace" or "Update" **Obligatory field** CONT_NAME: Name of contact (must be identical string to the contact name.) METHOD: Method name (Must be identical string to the method entry name.) CITATION: Citation title (Must be identical string to the publication entry title.) NCBI#: NCBI Id of GSS (File must have either NCBI#, GSS#, or GB#) GSS#: Name of GSS (File must have either NCBI#, GSS#, or GB#) GB#: GenBank accession number of GSS PUBLIC: blank = for release to public; date (MM/DD/YYYY) = confidential. **Obligatory field** MAPSTRING: Full mapping information. Unparsed. For output only. **Obligatory field** CHROM: Chromosome name or number L1: Line 1 of parsed mapping information. L2: Line 2 of parsed mapping information. L3: Line 3 L4: Line 4 L5: Line 5 L6: Line 6 L7: Line 7 L8: Line 8 L9: Line 9 L10: Line 10 of parsed mapping information. || Entry separator Examples: TYPE: Map STATUS: New CONT_NAME: Sikela JM METHOD: YAC/CEPH JMS CITATION: Nature Genetics, 2:180-185 (1992) NCBI#: 51839 PUBLIC: MAPSTRING: 956H08 CHROM: L1: 959 L2: H L3: 08 L4: Pool B L5: Forward Primer: CCCCAGAGTTCCAAGTTAATT L6: Reverse Primer: GTCGCATTGCTCAACATTCGTTT L7: Product Length: 162 || TYPE: Map STATUS: New CONT_NAME: Sikela JM METHOD: Radiation hybrid JMS CITATION: Nature Genetics, 2:180-185 (1992) GSS#: GSST001a PUBLIC: MAPSTRING: 4, bin 2 CHROM: 4 L1: 4 L2: 2 L3: Forward Primer: TTDDGTAGAGGGTGCTAAGAAGG L4: Reverse Primer: GAAATGGACCTATTAAAACCAGCT L5: Product Length: 119 || TYPE: Map STATUS: New CONT_NAME: Sikela JM METHOD: Somatic hybrid JMS CITATION: Nature Genetics, 2:180-185 (1992) GB#: T12813 PUBLIC: MAPSTRING: 20 CHROM: 20 L1: 20 L2: L3: L4: L5: Forward Primer: CGTAATGTCCCTGTGTCTGAG L6: Reverse Primer: CACCTCACCCATAGCCTTAGCTA ||
Revised 02/18/2009. |