GEO Overview
- General
- Query and Analysis
- Data Download
- Deposit and Update
1. General Overview
GEO serves as a public
repository for a wide range of high-throughput experimental data. These
data include single and dual channel microarray-based experiments
measuring mRNA, miRNA, genomic DNA (arrayCGH, ChIP-chip, and SNP), and protein abundance, as well as
non-array techniques such as serial analysis of gene expression (SAGE),
mass spectrometry peptide profiling, and various types of quantitative sequence data.
The basic record types within the primary database are as follows:
|
|
Platform |
Platform records are supplied by submitters. A Platform record
defines the list of elements that may be detected and quantified in that experiment (e.g., cDNAs,
oligonucleotide probesets). Each Platform record is assigned a unique and stable
GEO accession number (GPLxxx). A Platform may reference many Samples
that have been submitted by multiple submitters.
Example Platform record
|
|
Text description of the array
|
|
Text tab-delimited table of the array template |
Sample |
Sample records are supplied by submitters. A Sample record
describes the conditions under which an individual Sample was handled,
the manipulations it underwent, and the abundance measurement of each
element derived from it. Each Sample record is assigned a unique and
stable GEO accession number (GSMxxx). A Sample entity must reference
only one Platform and may be included in multiple Series.
Example Sample record |
|
Text description of a biological sample |
|
Text tab-delimited table of processed hybridization result
(may optionally include raw data columns) |
|
Original raw data file |
Series |
Series records are supplied by submitters.
A Series record links together a group of related Samples and provides a focal point and description of the whole study.
Series records may also contain tables describing extracted data,
summary conclusions, or analyses. Each Series record is assigned a
unique and stable GEO accession number (GSExxx).
Example Series record
|
|
Text description of the overall experiment |
Selected primary records undergo an upper-level of rendering into DataSet and gene Profile records:
DataSet
|
DataSet records are assembled by GEO curators. As explained above, A GEO Series record is an original
submitter-supplied record that summarizes an experiment.
These data are reassembled by GEO staff into GEO Dataset records (GDSxxx).
A DataSet represents a curated collection of biologically
and statistically comparable GEO Samples and forms the basis of GEO's
suite of data display and analysis tools.
Samples within a DataSet refer to the same Platform, that is, they share a
common set of array elements. Value measurements for each Sample within
a DataSet are assumed to be calculated in an equivalent manner, that is,
considerations such as background processing and normalization are
consistent across the DataSet. Information reflecting experimental
factors is provided through DataSet subsets. Both Series and DataSets
are searchable using the Entrez GEO DataSets
interface, but only DataSets form the basis of GEO's advanced data display and analysis tools
including gene expression profile charts and DataSet clusters.
Not all submitted data are suitable for DataSet assembly and we are experiencing a backlog in DataSet creation,
so not all Series have corresponding DataSet record(s).
Example DataSet record.
|
|
Profile
|
Profiles are derived from DataSets.
A Profile consists of the expression measurements for an individual gene across all Samples in a DataSet.
Profiles can be searched using Entrez GEO Profiles.
Example Profile records.
|
|
For more information, please see these publications.
2. Query and Analysis
GEO data can be retrieved and analyzed in several ways:
- To look at a particular GEO record for which you have the accession number, use the GEO accession box on the GEO homepage. Also, the Accession Display bar (found at the foot of the GEO homepage and at the top of each GEO record)
has several options for selecting the format and amount of data to view
(see the Data Download
section below).
- The simplest first step to find data relevant to your interests is to search
Entrez GEO DataSets or Entrez GEO Profiles with keywords:
Entrez GEO DataSets queries all experiment descriptions, allowing identification of studies of interest
Entrez GEO Profiles queries gene expression profiles, allowing identification of genes of interest.
As with any other Entrez database, keywords or a simple Boolean phrase may be entered and restricted to any number of supported attribute fields, enabling
effective query and mining of GEO data.
Tools available under the ‘Preview/Index’ tab can help you construct complex, fielded queries.
Once you have identified a DataSet of interest, there are several features on the DataSet record that help visualize or identify interesting gene expression profiles within that experiment:
- Query subset A vs B tool - finds genes differentially expressed between experimental subgroups, more...
- Clusters - visualize cluster heat map images and select regions of interest for further study, more...
- Value distribution - a box and whiskers plot displaying the distribution of expression values of each Sample within a DataSet
- ‘Find gene in this DataSet’ box
Once you have identified gene expression profiles of interest, there are several tools on the Profile records that help identify additional genes of interest:
- Profile neighbors - retrieves other genes with similar expression patterns in that DataSet
- Chromosome neighbors - retrieves chromosomally closest 20 genes
- Links - to related NCBI databases including Gene, UniGene, OMIM and PubMed
3. Data Download
GEO data can be viewed and downloaded in several ways:
GEO records
- FTP download
All GEO records and raw data files are freely available for bulk download from our FTP site.
Data are structured and formatted in a variety of ways, see our README for details.
- Links at the foot of Series records
Links to experiment family downloads in various formats and supplementary files are provided at the foot of each GEO Series record.
- Accession Display Bar
The Accession Display bar is found at
the foot of the GEO homepage and at the top of each GEO record and can be used to download or view complete or
partial records, or related Platform, Sample and Series records.
The Scope
feature allows display of a single accession number (Self) or any
(Platform, Sample, or Series) or all (Family) records related to that
accession. Amount dictates the quantity of data displayed, with
choices including metadata only (Brief), metadata and the first 20 rows of the
data table (Quick), data table only (Data), or full metadata/data table records (Full). Format
controls whether records are displayed in HTML, SOFT (plain text) or MINiML (XML) format.
- Construct a URL
An alternative to using the Accession Display Bar described above is to construct a URL to retrieve data. URLs are formatted as follows:
Example: http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=gpl96&targ=self&view=brief&form=text
- this URL will retrieve a text file containing the 'brief' view of accession GPL96. The possible values for each component are:
acc = a valid GEO accession i.e., gplxxx, gsmxxx or gsexxx
targ = self, gsm, gpl, gse or all
view = brief, quick, data or full
form = text, html or xml
Note that your browser may time-out when html format is selected for particularly large retrievals.
- Programmatic access
GEO records metadata can be programmatically accessed and retrieved using a suite of programs called the Entrez Programming Utilities (E-Utils), see more information...
- Entrez GEO DataSets and Entrez GEO Profiles query downloads
It is possible to export Entrez GEO DataSets and Entrez GEO Profiles document summaries by setting the tool bar at the head of the page to 'Send to: File'.
DataSet records and Profiles
- FTP download
All GEO DataSet records are freely available for bulk download from our FTP site.
- Links on DataSet records
Links to DataSet SOFT files are available under the 'download' button on each DataSet record.
- Programmatic access
GEO DataSets metadata can be programmatically accessed and retrieved using a suite of programs called the Entrez Programming Utilities (E-Utils), see more information...
- Profile values downloads
Use the 'Download profile data' button at the head of Entrez GEO Profiles retrievals to download the expression values of genes found in your query.
- Entrez GEO DataSets and Entrez GEO Profiles query downloads
It is possible to export Entrez GEO DataSets and Entrez GEO Profiles document summaries by setting the tool bar at the head of the page to 'Send to: File'.
4. Deposit and Update
There are several ways in which data may be submitted to GEO.
Please refer to the Submitting data guidelines for a complete overview of the options available.
After we receive your final Series submission, we will begin processing your records. Once your records pass review, you will receive an e-mail
confirming your GEO accession numbers and their release dates.
Processing normally takes approximately 2-5 business days after completion of Series
submission. If you need approval of your GEO accession numbers to be
expedited, please e-mail us at geo@ncbi.nlm.nih.gov.
Each record you submit will receive a unique and stable GEO accession
number that you may quote in manuscripts. Do
not quote GEO accession numbers in manuscripts until you have received
an approval notice e-mail from GEO staff.
Records may remain private
for several months until the data are published.
After your records have been approved, you can create an access link to your private submissions
using the 'Click here to create a reviewer access link' near the top of your Series (GSExxx) record.
The link that is generated can be sent to the journal editor who will circulate it to reviewers requiring access to your private data.
Edits and updates to individual records may be performed at any time by
submitters by selecting the 'UPDATE'
section on the Web
deposit/update page. Submitters may also perform batch updates using SOFT format.
Alternatively, e-mail batch edit details to GEO staff at geo@ncbi.nlm.nih.gov and we will
process a batch edit on your behalf.
|
|
|
|