Handout    NAR 2006 Paper     NAR 2002 Paper     FAQ     Email GEO  
   NCBI > GEO > Info

   

GEO Overview

  1. General
  2. Query and Analysis
  3. Data Download
  4. Deposit and Update

1. General Overview

GEO serves as a public repository for a wide range of high-throughput experimental data. These data include single and dual channel microarray-based experiments measuring mRNA, miRNA, genomic DNA (arrayCGH, ChIP-chip, and SNP), and protein abundance, as well as non-array techniques such as serial analysis of gene expression (SAGE), mass spectrometry peptide profiling, and various types of quantitative sequence data.


The basic record types within the primary database are as follows:

  Schematic overview of GEO data submission.


Platform Platform records are supplied by submitters.
A Platform record defines the list of elements that may be detected and quantified in that experiment (e.g., cDNAs, oligonucleotide probesets). Each Platform record is assigned a unique and stable GEO accession number (GPLxxx). A Platform may reference many Samples that have been submitted by multiple submitters.
Example Platform record
Text description of the array
Text tab-delimited table of the array template
Sample Sample records are supplied by submitters.
A Sample record describes the conditions under which an individual Sample was handled, the manipulations it underwent, and the abundance measurement of each element derived from it. Each Sample record is assigned a unique and stable GEO accession number (GSMxxx). A Sample entity must reference only one Platform and may be included in multiple Series.
Example Sample record
Text description of a biological sample
Text tab-delimited table of processed hybridization result
(may optionally include raw data columns)
Original raw data file
Series Series records are supplied by submitters.
A Series record links together a group of related Samples and provides a focal point and description of the whole study. Series records may also contain tables describing extracted data, summary conclusions, or analyses. Each Series record is assigned a unique and stable GEO accession number (GSExxx).
Example Series record
Text description of the overall experiment



Selected primary records undergo an upper-level of rendering into DataSet and gene Profile records:

DataSet DataSet records are assembled by GEO curators.
As explained above, A GEO Series record is an original submitter-supplied record that summarizes an experiment. These data are reassembled by GEO staff into GEO Dataset records (GDSxxx). A DataSet represents a curated collection of biologically and statistically comparable GEO Samples and forms the basis of GEO's suite of data display and analysis tools. Samples within a DataSet refer to the same Platform, that is, they share a common set of array elements. Value measurements for each Sample within a DataSet are assumed to be calculated in an equivalent manner, that is, considerations such as background processing and normalization are consistent across the DataSet. Information reflecting experimental factors is provided through DataSet subsets. Both Series and DataSets are searchable using the Entrez GEO DataSets interface, but only DataSets form the basis of GEO's advanced data display and analysis tools including gene expression profile charts and DataSet clusters. Not all submitted data are suitable for DataSet assembly and we are experiencing a backlog in DataSet creation, so not all Series have corresponding DataSet record(s).
Example DataSet record.
Profile Profiles are derived from DataSets.
A Profile consists of the expression measurements for an individual gene across all Samples in a DataSet. Profiles can be searched using Entrez GEO Profiles.
Example Profile records.


For more information, please see these publications.



2. Query and Analysis

GEO data can be retrieved and analyzed in several ways:

  • To look at a particular GEO record for which you have the accession number, use the GEO accession box on the GEO homepage. Also, the Accession Display bar (found at the foot of the GEO homepage and at the top of each GEO record) has several options for selecting the format and amount of data to view (see the Data Download section below).

  • The simplest first step to find data relevant to your interests is to search Entrez GEO DataSets or Entrez GEO Profiles with keywords:

    Entrez GEO DataSets queries all experiment descriptions, allowing identification of studies of interest
    Entrez GEO Profiles queries gene expression profiles, allowing identification of genes of interest.

    As with any other Entrez database, keywords or a simple Boolean phrase may be entered and restricted to any number of supported attribute fields, enabling effective query and mining of GEO data. Tools available under the ‘Preview/Index’ tab can help you construct complex, fielded queries.

    Once you have identified a DataSet of interest, there are several features on the DataSet record that help visualize or identify interesting gene expression profiles within that experiment:

  • Query subset A vs B tool - finds genes differentially expressed between experimental subgroups, more...
  • Clusters - visualize cluster heat map images and select regions of interest for further study, more...
  • Value distribution - a box and whiskers plot displaying the distribution of expression values of each Sample within a DataSet
  • ‘Find gene in this DataSet’ box

    Once you have identified gene expression profiles of interest, there are several tools on the Profile records that help identify additional genes of interest:

  • Profile neighbors - retrieves other genes with similar expression patterns in that DataSet
  • Chromosome neighbors - retrieves chromosomally closest 20 genes
  • Links - to related NCBI databases including Gene, UniGene, OMIM and PubMed


3. Data Download

GEO data can be viewed and downloaded in several ways:

GEO records
  • FTP download
    All GEO records and raw data files are freely available for bulk download from our FTP site. Data are structured and formatted in a variety of ways, see our README for details.

  • Links at the foot of Series records
    Links to experiment family downloads in various formats and supplementary files are provided at the foot of each GEO Series record.

  • Accession Display Bar
    The Accession Display bar is found at the foot of the GEO homepage and at the top of each GEO record and can be used to download or view complete or partial records, or related Platform, Sample and Series records. The Scope feature allows display of a single accession number (Self) or any (Platform, Sample, or Series) or all (Family) records related to that accession. Amount dictates the quantity of data displayed, with choices including metadata only (Brief), metadata and the first 20 rows of the data table (Quick), data table only (Data), or full metadata/data table records (Full). Format controls whether records are displayed in HTML, SOFT (plain text) or MINiML (XML) format.

  • Construct a URL
    An alternative to using the Accession Display Bar described above is to construct a URL to retrieve data. URLs are formatted as follows:
    Example: http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=gpl96&targ=self&view=brief&form=text
    - this URL will retrieve a text file containing the 'brief' view of accession GPL96.
    The possible values for each component are:
    acc = a valid GEO accession i.e., gplxxx, gsmxxx or gsexxx
    targ = self, gsm, gpl, gse or all
    view = brief, quick, data or full
    form = text, html or xml
    Note that your browser may time-out when html format is selected for particularly large retrievals.

  • Programmatic access
    GEO records metadata can be programmatically accessed and retrieved using a suite of programs called the Entrez Programming Utilities (E-Utils), see more information...

  • Entrez GEO DataSets and Entrez GEO Profiles query downloads
    It is possible to export Entrez GEO DataSets and Entrez GEO Profiles document summaries by setting the tool bar at the head of the page to 'Send to: File'.


DataSet records and Profiles
  • FTP download
    All GEO DataSet records are freely available for bulk download from our FTP site.

  • Links on DataSet records
    Links to DataSet SOFT files are available under the 'download' button on each DataSet record.

  • Programmatic access
    GEO DataSets metadata can be programmatically accessed and retrieved using a suite of programs called the Entrez Programming Utilities (E-Utils), see more information...

  • Profile values downloads
    Use the 'Download profile data' button at the head of Entrez GEO Profiles retrievals to download the expression values of genes found in your query.

  • Entrez GEO DataSets and Entrez GEO Profiles query downloads
    It is possible to export Entrez GEO DataSets and Entrez GEO Profiles document summaries by setting the tool bar at the head of the page to 'Send to: File'.


4. Deposit and Update

There are several ways in which data may be submitted to GEO. Please refer to the Submitting data guidelines for a complete overview of the options available.

After we receive your final Series submission, we will begin processing your records. Once your records pass review, you will receive an e-mail confirming your GEO accession numbers and their release dates. Processing normally takes approximately 2-5 business days after completion of Series submission. If you need approval of your GEO accession numbers to be expedited, please e-mail us at geo@ncbi.nlm.nih.gov.

Each record you submit will receive a unique and stable GEO accession number that you may quote in manuscripts. Do not quote GEO accession numbers in manuscripts until you have received an approval notice e-mail from GEO staff. Records may remain private for several months until the data are published.

After your records have been approved, you can create an access link to your private submissions using the 'Click here to create a reviewer access link' near the top of your Series (GSExxx) record. The link that is generated can be sent to the journal editor who will circulate it to reviewers requiring access to your private data.

Edits and updates to individual records may be performed at any time by submitters by selecting the 'UPDATE' section on the Web deposit/update page. Submitters may also perform batch updates using SOFT format. Alternatively, e-mail batch edit details to GEO staff at geo@ncbi.nlm.nih.gov and we will process a batch edit on your behalf.




| NLM | NIH | GEO Help | NCBI Help | Disclaimer | Section 508 |
NCBI Home NCBI Search NCBI SiteMap