dbSTS: database of "Sequence Tagged Sites" |
PubMed | Entrez | BLAST | OMIM | Taxonomy | Structure |
STS data submission
file types:
Map data submission
file types: |
Sequenced Tagged Sites (STSs) are short (about 200-500 bp) sequences that are operationally unique in a genome (i.e., can be specifically detected by PCR in the presence of all other genomic sequences), and that define a specific position on the physical map. STSs can therefore be used to generate mapping reagents which map to single positions within the genome. STSs are usually submitted to GenBank and dbSTS as batches of dozens to thousands of entries, with a great deal of redundancy in the citation, submitter and library information. To improve the efficiency of the submission process for this type of data, we have designed a special streamlined submission process and data format (see below). In dbSTS and GenBank an STS record includes a SEQUENCE, which is usually the sequenced product (amplicon) of a Polymerase Chain Reaction (PCR) using specific PRIMERS. In some cases a researcher may have primer sequences, but will not have determined the sequence which they amplify. Often knowing primer sequences is all that is needed for mapping or genotyping experiments. For those cases in which you only have primer sequences you may consider submitting data to the NCBI Probe Database. The NCBI has established a public access database, the Probe Database, for archiving primers and other nucleic acid reagents designed for use in a wide variety of biomedical and research applications. Please contact the Probe Database administrator probe-admin@ncbi.nlm.nih.gov for depositing primers and any other sequences or data that were used or obtained in your experiments and that are not a sequenced amplicon. Submitters who have prepared their files in dbSTS submission format (described below) can continue use the format for Probe submissions. New users or users who did not prepare their files in dbSTS submission format, please contact the Probe Database administrator (probe-admin@ncbi.nlm.nih.gov) to inquire about Probe Database submission format.
There are two sets of file types for STS submissions, one for the sequence data, and one for any mapping data. The batch submission process for STS sequence data involves the completion of six file types: a. Publication
Typically a batch of STSs share the same publication, source, contact, protocol, and buffer information. You only need to prepare one of each of those files. If any of the STS files have different publication, source, contact, protocol, and buffer information, you must complete a new file for that data. The batch submission process for STS map data involves the completion of four file types, below. a. Publication
The publication and contact files use the same file format as the publication and contact files for STS sequence data. If the map data share the same Publication and Contact files as the sequence data, there is no need to resubmit the Publication and Contact files.
Send the completed files to:
batch-sub@ncbi.nlm.nih.gov You can attach all the files to a single email message, or you can include them in the body of the email message. Please be sure that they are in plain text (ASCII) format. We prefer to have the individual STS and Map data files batched together as much as possible: for example, all STS entries in one file and all Map entries in another file. You can submit sources, publications, contacts, protocols, and buffers together in one file. You can also send them in the same file as the STS entries - the TYPE field will differentiate them for the parsing software.
When STS data is loaded into the database, checks are run to determine if the given primer sequences are found in the STS sequence and if the given length of the STS is accurate. If an entry does not pass these checks, it sometimes indicates that there was an error in the sequences in the input file. Entries that do not pass this validation check will be returned to the submitter so that they can be re-checked and corrected, if necessary, before entry.
You will receive a list of dbSTS IDs and GenBank accession numbers from a dbSTS curator via email. Once your sequences are released into the public database, they will be available from the STS division of GenBank and from the separate dbSTS site (How to Access STS Entries). The sequences and accession numbers in both sources are the same, but there is additional annotation in the dbSTS records such as references to the top nucleotide and protein matches. If you would like your sequences held confidential until publication, you can indicate that by putting the release date in the PUBLIC field of the STS files. Your sequences will be released on that date, or when the accession numbers or sequence data are published, whichever comes first.
Updates to STS entries are done basically in the same way as new entries. Changes to any item in the STS input file (other than STS# or CONT_NAME) are made by completing an input file with new data in the fields that need to be changed. For the STATUS field enter "Update" instead of "New". In addition to the fields to be changed Updates need to include TYPE, STATUS, STS#, and CONT_NAME fields. For changes in Publication, Contact, or Source data, or for changes in STS#'s or CONT_NAME, send an email message describing the change that is needed. Send the update files to: batch-sub@ncbi.nlm.nih.gov
If you have questions about the STS submission format, please contact
info@ncbi.nlm.nih.gov
1. Submission Format for STS Sequence DataThe following is a specification for flat file formats for delivering STS and related data to the NCBI STS database.
File TypesThere are six types of deliverable files (in addition to Map data files which are covered separately, below):a. Publication b. Source c. Contact d. Protocol e. Buffer f. STS Each STS file needs to reference the Publication, Source, and Contact data. Therefore the Publication, Source, and Contact files must be in the database when the STS file is entered. Once these files have been submitted and entered, they do not need to be re-submitted for additional STS files that have the same Publication, Source, or Contact. a. Publication FilesThese are the valid tags and a short description:TYPE: Entry type - must be "Pub" for publication entries. **Obligatory field** MEDUID: Medline unique identifier. Not obligatory, include if you know it. TITLE: Title of article. (Begin on line below tag, use multiple lines if necessary) **Obligatory field** AUTHORS: Author name, format: Name,I.I.; Name2,I.I.; Name3,I.I. (Begin on line below field tag, use multiple lines if necessary) **Obligatory field** JOURNAL: Journal name VOLUME: Volume number SUPPL: Supplement number ISSUE: Issue number I_SUPPL: Issue supplement number PAGES: Page, format: 123-9 YEAR: Year of publication. **Obligatory field** STATUS: Status field.1=unpublished, 2=submitted, 3=in press, 4=published. **Obligatory field** || Examples: TYPE: Pub MEDUID: TITLE: Human chromosome 7 STS AUTHORS: Green,E. YEAR: 1996 STATUS: 1 || TYPE: Pub MEDUID: 96172835 TITLE: CpG islands of chicken are concentrated on microchromosomes AUTHORS: McQueen,H.A.; Fantes,J.; Cross,S.H.; Clark,V.H.; Archibald,A.L.; Bird,A.P. JOURNAL: Nat. Genet. VOLUME: 12 PAGES: 321-4 YEAR: 1996 STATUS: 4 ||
b. Source FilesThese are the valid tags and a short description:TYPE: Entry type - must be "Source" for source entries. **Obligatory field** NAME: Name of source. **Obligatory field** ORGANISM: Organism from which source prepared: Scientific name. **Obligatory field** STRAIN: Organism strain CULTIVAR: Plant cultivar SEX: Sex of organism (female, male, hermaphrodite) ORGAN: Organ name TISSUE: Tissue type CELL_TYPE: Cell type CELL_LINE: Name of cell line STAGE: Developmental stage VECTOR: Name of vector. V_TYPE: Type of vector (Cosmid, Phage, Plasmid, YAC, Other) HOST: Laboratory host name DESCR: Description of source preparation methods, vector, etc. This field starts on the line below the DESCR: tag. || Examples: TYPE: Source NAME: cSRL flow sorted Human Chromosome 11 specific cosmid ORGANISM: Homo sapiens VECTOR: sCos-1 V_TYPE: Cosmid DESCR: Human Chromosome 11 specific cosmid library prepared from flow sorted human Chromosome 11 derived from Chinese Hampster Ovary (CHO) monochromosomal somatic cell hybrid, J1 || TYPE: Source NAME: Bovine sperm ORGANISM: Bos taurus STRAIN: Holstein SEX: male TISSUE: seminal vesicle CELL_TYPE: sperm STAGE: adult VECTOR: pBluescript V_TYPE: Plasmid DESCR: Genomic PstI fragments cloned into pBluescript ||
c. Contact FilesThese are the valid tags and a short description:TYPE: Entry type - must be "Cont" for contact entries. **Obligatory field** NAME: Name of person who provided the STS. FAX: Fax number as string of digits. TEL: Telephone number as string of digits. EMAIL: E-mail address LAB: Laboratory providing STS. INST: Institution name ADDR: Address string, comma delineation. || Examples: TYPE: Cont NAME: Eric Green FAX: TEL: EMAIL: egreen@wugenmail.wustl.edu LAB: Center for Genetics in Medicine INST: Washington University School of Medicine ADDR: Box 8232, 4566 Scott Avenue, St. Louis, MO 63110, USA ||
d. Protocol FilesThese are the valid tags and a short description:TYPE: Entry type - must be "Protocol" for protocol entries. **Obligatory field** NAME: Name of protocol. **Obligatory field** PROTOCOL: Description of protocol used. Starts on the line below the PROTOCOL tag. Lay out this description as you want it to appear in GenBank, using blanks, not tabs, to line up columns. || Examples: TYPE: Protocol NAME: STS-A (E.Green) PROTOCOL: Template: 30-100 ng Primer: each 1 uM dNTPs: each 200 uM Taq Polymerase: 0.05 units/ul Total Vol: 5 ul || TYPE: Protocol NAME: STS-B (E.Green) PROTOCOL: Template: 30-100 ng Primer: each 1 uM dNTPs: each 200 uM Taq Polymerase: 0.05 units/ul Total Vol: 10 ul ||
e. Buffer FilesThese are the valid tags and a short description:TYPE: Entry type - must be "Buffer" for buffer entries. **Obligatory field** NAME: Name of buffer. **Obligatory field** BUFFER: Description of buffer used. Starts on the line below the BUFFER tag. Lay out this description as you want it to appear in GenBank, using blanks, not tabs, to line up columns. || Examples: TYPE: Buffer NAME: STS-1 (E.Green) BUFFER: MgCl2: 1.5 mM KCl: 50 mM Tris-HCl: 10 mM pH: 8.3 || TYPE: Buffer NAME: STS-2 (E.Green) BUFFER: MgCl2: 2.5 mM KCl: 50 mM Tris-HCl: 10 mM pH: 8.3 ||
f. STS FilesThese are the valid tags and a short description:TYPE: Entry type - must be "STS" for STS entries. **Obligatory field** STATUS: Status of STS entry - "New" or "Update". **Obligatory field** CONT_NAME: Name of contact (Must be identical string to the NAME field of the Contact file.) **Obligatory field** PROTOCOL: Protocol name. (Must be identical string to the NAME field of the Protocol file.) **Obligatory field for New entries** BUFFER: Buffer name. (Must be identical string to the NAME field of the Buffer file.) **Obligatory field for New entries** SOURCE: Source name. (Must be identical string to the NAME field of the Source file.) **Obligatory field for New entries** CITATION: Journal citation. (Must be identical string to the TITLE field of the Publication file). Starts on line below CITATION: tag - use continuation lines if necessary. **Obligatory field for New entries** STS#: STS id assigned by contact lab. **Obligatory field** For STS entry updates, this is the string we match on. SYNONYMS: Synonyms list, separated by commas. PRIMER_DB: Database which contains the sequence used as the source of the primer sequences, if relevant. PRIMER_ACC: Accession number of the sequence from which primer sequences were derived. GB#: GenBank accession number. GDB#: Human genome database accession number. GDB_DSEG: Human genome database Dsegment number. CLONE: Clone id. P_END: Which end sequenced, e.g. 5' DNA_TYPE: Genomic (default),cDNA, Viral, Synthetic, Other. SIZE: Size of STS (in nucleotides); includes primer sites. F_PRIMER: Sequence of forward primer. B_PRIMER: Sequence of backward primer. PCR_PROFILE: Description of PCR profile. Starts on line below the PCR_PROFILE: tag. Line up data as you wish it to appear in GenBank. Use blanks, not tabs to format this data. PUBLIC: Date for public release. **Obligatory field** Leave blank for immediate release. Use the date format mm/dd/yyyy (e.g., 12/31/1999). GENE_SYMBOL: Putative gene symbol. GENE_NAME: Full name of putative gene. PRODUCT: Putative product identification. COMMENT: Comments about STS. Starts on line below COMMENT: tag. SEQUENCE: Sequence string. Starts on line below SEQUENCE: tag. **Obligatory field for New entries** || Examples: TYPE: STS STATUS: New CONT_NAME: Eric Green PROTOCOL: STS-A (E.Green) BUFFER: STS-1 (E.Green) CITATION: Human chromosome 7 STS SOURCE: Human EGreen STS#: sWSS282 SYNONYMS: F_PRIMER: AAGCACAGGAGAAGATGG B_PRIMER: GAATTGACAGACAGTAAGGAAG DNA_TYPE: Genomic P_END: PUBLIC: PRODUCT: GENE_SYMBOL: GENE_NAME: SIZE: 143 PCR_PROFILE: Presoak: 0 degrees C for 0.00 minute(s) Denaturation: 92 degrees C for 1.00 minute(s) Annealing: 60 degrees C for 2.00 minute(s) Polymerization: 72 degrees C for 2.00 minute(s) PCR Cycles: 35 Thermal Cycler: Perkin Elmer TC SEQUENCE: ATTCTATCCAAGTCTCAAGGCCCCACAACCTGGAGCTCTGATGCTCAAGCACAGGAGAAG ATGGGTGTCCAGCTCAAACACAGAGAACACATTCACCCTTCCCTGCCTTTTTGTTCTGTT CAGACCCTCAGCAGATAGGATGCCTGCCCACAGCGGTAAGGGCACATCTTCCTTACTGTC TGTCAATTCAGATGCTGATCACTCTGGT || Example of a sequence update: TYPE: STS STATUS: Update CONT_NAME: Thomas Hudson STS#: DXYS112 F_PRIMER: CTTCAGATCAGATTAAGGTGCTCT B_PRIMER: GGGAAGCATTGACTGCATTA PUBLIC: SIZE: 231 SEQUENCE: CTNTACAGCAAGCTTAGTATCATCCTCTTCAGATCAGATTAAGGTGCTCTTGAAAGCTCA GANNNTTGTATTTGTTTAAATGCACAGTAATTAAAAGTNTTTTTTTTAATCAGCAAAAGC AGTTAAAGTAAANCAANATATTNANGCCNAAANTNTATTTATNTCACATATCCTGANGTG GCNCTNNCANGNTGTTNTNCATGGGGNAAATNTGCATCTGTAGATCTGTTGNTTCANTAA TGCAGTCAATGCTTCCCTTTGNNCAGNTCTAGGGTAGNTTAAATNAGANTCTTNCANCTT TNNNGGNCTGAAAAGAANNATTTAACCNCCTTGTNNANNCTGGAAACCNNGCTACCTNTG NAGGTNNTCGTNCTNCCNTNNCANCGTTTTGCTGTTTGCTANGTCAAGCCTCTTGCCTTC NTCCGNCCCAAGTANCCNGTNCTNGGGCACTNAAAACCCNNNTTTTNGGACCANGCNNGN ANGCCCCANATT ||
2. Submission Format for STS Map DataThe following is a specification for flat file formats for delivering STS mapping and related data to the NCBI STS database.
File TypesThere are four types of deliverable files:a. Publication b. Contact c. Method d. Map Data
a. Publication Files
b. Contact Files
c. MethodThese are the valid tags and a short description:TYPE: Entry type - must be "Meth" for method entries. **Obligatory field** NAME: Name of method. **Obligatory field** ORGANISM: Organism from which library prepared. **Obligatory field** ABSOLUTE: Y or N. (Enter Y if method gives absolute address; enter N if method gives relative address.) **Obligatory field.** L1: Interpretation of line 1 of Map Data files. L2: Interpretation of line 2 of Map Data files. L3: Interpretation of line 3 of Map Data files. L4: Interpretation of line 4 of Map Data files. L5: Interpretation of line 5 of Map Data files. L6: Interpretation of line 6 of Map Data files. L7: Interpretation of line 7 of Map Data files. L8 Interpretation of line 8 of Map Data files. L9: Interpretation of line 9 of Map Data files. L10: Interpretation of line 10 of Map Data files. DESCR: Description of method. Description starts on line after DESCR tag. May be multi-line free format text. || Entry separator Examples: TYPE: Meth NAME: YAC/CEPH JMS ORGANISM: Homo sapiens ABSOLUTE: n L1: plate L2: row L3: column L4: comment L5: comment L6: comment L7: comment DESCR: PCR-based mapping of 3'UT-derived primers to CEPH YAC DNA pools. Primers are chosen using the PRIMER program by Lincoln et al., ver 0.5 (1991). To date, MIT puts out YAC pools A and B; if both pools were used for the mapping data given, then 'C' is designated. || TYPE: Meth NAME: Radiation Hybrid JMS ORGANISM: Homo sapiens ABSOLUTE: y L1: chromosome L2: bin L3: comment L4: comment L5: comment DESCR: Radiation hybrid panels with binning. Primers are chosen using the PRIMER program by Lincoln et al., ver 0.5 (1991). || TYPE: Meth NAME: Somatic Hybrid JMS ORGANISM: Homo sapiens ABSOLUTE: y L1: chromosome L2: arm L3: band L4: band range L5: comment L6: comment DESCR: Somatic cell hybrid mapping. Primers are chosen using the PRIMER program by Lincoln et al., ver 0.5 (1991). ||
d. Map Data FilesThese are the valid tags and a short description:TYPE: Entry type - must be "Map" for map data entries. **Obligatory field** STATUS: Status of STS entry - "New","Replace" or "Update". **Obligatory field** CONT_NAME: Name of contact (Must be identical string to the NAME field of the Contact file.) **Obligatory field** CONT_LAB: Contact laboratory. (Must be identical string to the LAB field of the Contact file.) METHOD: Method name. (Must be identical string to the NAME field of the Method file.) **Obligatory field** CITATION: Journal citation. (Must be identical string to the TITLE field of the Publication file.) Begins on line below CITATION: tag - use continuation lines if necessary. NCBI#: NCBI Id of STS. (Must have either NCBI#, STS# or GB#) STS#: Name of STS (Must have STS#, NCBI# or GB#) GB#: GenBank accession number of STS . PUBLIC: Date of public release. Leave blank for immediate release. Use the date format mm/dd/yyyy (e.g., 12/31/1999). **Obligatory field** MAPSTRING: Full mapping information. Unparsed. **Obligatory field** CHROM: Chromosome name or number L1: Line 1 of parsed mapping information. L2: Line 2 of parsed mapping information. L3: Line 3. L4: Line 4. L5: Line 5. L6: Line 6. L7: Line 7. L8: Line 8. L9: Line 9. L10: Line 10 of parsed mapping information. || Entry separator Examples:: TYPE: Map STATUS: New CONT_NAME: Sikela JM METHOD: YAC/CEPH JMS CITATION: Nature Genetics, 2:180-185 (1992) NCBI#:51839 PUBLIC: MAPSTRING: 956H08 CHROM: L1: 959 L2: H L3: 08 L4: Pool B L5: Forward Primer: CCCCAGAGTTCCAAGTTAATT L6: Reverse Primer: GTCGCATTGCTCAACATTCGTTT L7: Product Length: 162 || TYPE: Map STATUS: New CONT_NAME: Sikela JM METHOD: Radiation hybrid JMS CITATION: Nature Genetics, 2:180-185 (1992) STS#: STST001a PUBLIC: MAPSTRING: 4, bin 2 CHROM: 4 L1: 4 L2: 2 L3: Forward Primer: TTDDGTAGAGGGTGCTAAGAAGG L4: Reverse Primer: GAAATGGACCTATTAAAACCAGCT L5: Product Length: 119 || TYPE: Map STATUS: New CONT_NAME: Sikela JM METHOD: Somatic hybrid JMS CITATION: Nature Genetics, 2:180-185 (1992) GB#: T12813 PUBLIC: MAPSTRING: 20 CHROM: 20 L1: 20 L2: L3: L4: L5: Forward Primer: CGTAATGTCCCTGTGTCTGAG L6: Reverse Primer: CACCTCACCCATAGCCTTAGCTA ||
On-Line STS Database, Data Input Format Specification
This draft document is being made available solely for review purposes and should not be quoted, circulated, reproduced or represented as an official NCBI document. The draft is undergoing revisions and should not be considered or represented as reflecting the views, positions or intentions of the NCBI or the National Library of Medicine. Rev. 04/14/99 |