Application Build: 246 Database Build: 2008-04 |
Home | High-Throughput | Getting Started | Requirements | Installation | Downloads | Command Line | Database | FAQ | News | Citing | GoMiner in Papers | Credits |
Question:Can I use GoMiner if I do not have a total-genes file?
Answer:Microarray users will have a natural total-genes file – the list of all genes represented on the
microarray.
Other users (eg. proteomics) may not have a natural total-genes file. The Auto-generate function fills this need by
computing an artificial total-genes list.
The computation is performed by issuing an sql query to the GO database. This query is delimited by the parameter choices (such as organism, etc) selected in the user interface (UI). Although the artificial list is not as precise as a user-supplied file, it can provide an approximate background reference against which to estimate the statistical values for enrichment of categories by the changed genes.
Question:If I am providing a changed file, how does that relate to the total file?
Answer:We strongly encourage you to make your changed file be a subset of the total file. In fact, when you are using
High-Throughput GoMiner, you will recieve an error if this is not the case. If you are using the suffix notation, then
each suffixed variant should be in the total file. For example, if you have BAD~1, BAD~2
and BAD~3 in your changed file, then all three should be in the total file. You must
follow these guidelines when submitting documents to High-Throughput GoMiner. If you are
using GUI GoMiner, we recommend that you follow these guidelines, but GUI GoMiner will
tolerate some variation. If you have some extra identifiers in the changed file that are
not in the total file, GUI GoMiner will not throw an error, but will ignore the unmatched
entries.
Question:I have a microarray whose probes were specifically chosen to study a
particular biomedical problem (e.g. an oncochip). When I used the list of all of the probe targets as the
basis of the total-genes file in GoMiner, I had a hard time identifying
significant categories because the list of total genes is already very focused on a handful of GO categories.
Is there a way to address this issue in GoMiner?
Answer:You can use the autogenerate feature instead of the traditional total-genes file, to partially
compensate for the bias in this
scenario. You can see the significant categories that would otherwise be obscured because of the bias in selection
of the genes that make up
the chip. The disadvantage of the approach is that the statistical results are only approximately correct. We thank
Martin
Cadeiras for contributing to the idea that led to this particular application of the autogenerate feature.
Question:Can GoMiner take as input identifiers besides gene names?
Answer:Yes it can. GoMiner works with all of the types of identifiers
that are included in the GO Consortium database. We have
several different lookup options for users
to tune how these identifiers are queried. If you need to do additional
identifier processing, we also provide the tool
MatchMiner.
Question:How can I convert a list of GenBank gi numbers to an identifier
type that is recognized by GoMiner?
Answer:For these identifier the best bet is to use the Biothesaurus
search engine developed by Hongfang Liu of the PIR team. The specific
example below works for the challenging case of a proteomics user
converting D. rerio gi numbers to gene names, to match the ZFIN
entries in the GO database. The basic approach can be used for other
organisms as well.
Question:I changed my database settings, but I want to reset them to the original values. What
are those?
Answer: For the MySQL adapter the settings are the following:
Question:Should I use GoMiner or High-Throughput GoMiner?
Answer:It depends on your situation. We have added some guidance on the Getting Started
page to help determine which version is most suitable.
Question:I am on a Mac, and I would like to use Firefox as my web browser for viewing SVG's,
but GoMiner launches Safari instead.
Answer:GoMiner will lauch whatever browser is used as the default browser for your operating system. To
change this to Firefox, go to the Firefox Preferences, and select the option to check for Firefox as the
default browser. Then, launch Firefox, and when prompted, select Firefox as the default web browser.
Question:On the Mac, which is the best browser for viewing SVG's?
Answer:On the Mac, if you are interested in panning and zooming use Safari with the Adobe SVG viewer.
If you want the search by gene or category features, use Firefox.
Question:I am on an Intel Mac, and while I have installed the plug-in, the SVG's don't appear, what do I do?
Answer:Select your web browser application icon, and choose Get Info from the File menu in the Finder. Check the
box that says Open With Rosetta. Then, restart your browser. The plugin should now work.
Question:There is an icon () besides some of the genes. What
does this mean?
Answer:It indicates that there is data available for these gene in the NCBI ENTREZ STRUCTURE database. If
data is available,
then the NCBI_STRUCTURE menu option will be active on the menu that appears when you right click (control-click on
the Mac) on the gene.
Question:How can I access VennMaster Diagrams directly from within GoMiner?
Answer:Select the desired root category from the 'Genes Mapped on GO' tab on the right GoMiner panel , and
right-click to invoke the context menu. Select 'VennMaster view of changed category (SVG)' from the menu.
Question:I am trying to generate a VennMaster image, but I am just getting a blank panel.
Answer:You may be getting an OutOfMemory error. Restart GoMiner with more memory (see below)
and try again.
Question: Can GoMiner analyze splice variants?
Answer:GoMiner and High-Throughput GoMiner (referred to in the following collectively as 'GoMiner')
traditionally dereplicate total and changed gene input files so that only one instance of a gene name is processed.
When multiple alternatively spliced forms are to be analyzed, however, dereplication would result in a loss of
relevant information. Consequently, we have added a new feature to GoMiner to retain full information about the
alternative splice variants by replicating the input of each gene by the number of alternative exons per gene in
total and changed gene input files.
As a specific example, suppose that a microarray platform contained probes that were unique for two different splice variants of BRCA1. Then the two splice variants would be designated in the input files as 'BRCA1~1' and 'BRCA1~2'. The '~' tells GoMiner to treat these as different entries, rather than to de-replicate them, but to ignore the suffix when querying the GO database. By this mechanism, all splice variants are counted when computing the Fisher exact p-value.
The total-genes file must still be a superset of the changed-genes file, so in this previous example, you would need both BRCA1~1 and BRCA1~2 in the total file as well.
In general, the GO database does not seem to differentiate between different biological functional properties of splice variants, and the GoMiner strategy reflects this by using the suffix as an arbitrary tag. We are developing an extension of this approach in which the suffix will literally identify a particular splice variant. We hope that the GO database will gradually become enriched in differential biological functional annotations for splice variants.
An example of the use of this new feature can be found in a paper entitled "Nova regulates brain-specific splicing to shape the synapse" by Jernej Ule, Alja Ule, Joanna Spencer, Alan Williams, Jing-Shan Hu, Melissa Cline, Hui Wang, Tyson Clark, Claire Fraser, Matteo Ruggiu, Barry R Zeeberg, David Kane, John N Weinstein, John Blume, and Robert B Darnell (Nature Genetics 37, 844 - 852 (2005))
Question:What is the difference between the Smallest Category Size for Category Statistics and the
Largest Category Size to Include in CIM?
Answer:The Smallest Category Size for Category Statistics parameter is broader. It is used in
all variants of GoMiner. Categories whose size is less than this threshold will be omitted from category
statistic calculations.
Many reports and displays will still include these categories, but they won't have p-values,
enrichment ratios, or FDR's. This threshold is also used to filter smaller randomized categories when
determining the FDR. The Largest Category Size to Include in CIM has a more limited scope. It only
affects the CIM's, a report-type in HTGM. It eliminates the categories above the threshold from the
category gene matrixes.
Question:How do I set the Smallest Category Size in GUI GoMiner
Answer:It is an option when you choose to "Select Root and Recalcuate Statistics." This option
is available when you right-click a category (control-click on a Mac) in the navigation tree.
Question:How do I choose which Lookup Settings options to select?
Answer:There are four fields in the enhanced GO database
that can be used for matching your input identifiers.
The primary field is 'symbol,' but certain types of identifiers require searching 'synonym,' 'dbxref,' or
'officialname' fields. The default setting in the user interface will result in searching all four fields. However,
using the default setting may be too 'promiscuous' for a user who has a very specific type of search in mind, and it
can be more expensive computationally than a properly restricted search. In the next several FAQs we describe some
special instances that use fields other than 'symbol.'
Question:What are the proper Lookup Settings options for input containing HGNC symbols?
Answer:You should choose 'UniProt' as your Datasource and 'Enhanced Names (UniProt Only)' as your Lookup
Settings. De-select the Lookup Settings 'Cross Reference' and 'Synonym' options.
Question:How can I query with FlyBase (FB) identifiers that contain prefixes like '&' or
the names of Greek letters?
Solution:For broadest coverage of these identifier types, select the 'synonym' option in the
'Lookupsettings' Menu. This should cover both the SGML-style (e.g. &) and the text style
(e.g. alpha) prefixes. GoMiner will then automatically recognize and process both prefix formats.
For more information on these identifiers refer to
the FlyBase Reference
Manual and their page
on encodings.
Question:Can I query gominer using ORF format identifiers for yeast?
Solution:A number of researchers in the yeast community use the ORF format identifier.
GoMiner supports this format if you select the 'synonyms' option of Lookup settings. We
have some sample files that use these identifiers available.
Question:Can you explain the p-Values and FDR's?
Answer:The current implementation of GoMiner uses a one-sided p-value
calculated from a Fisher's exact test. You can think of the p-value as
computed from the enrichment AND from the size (i.e., the total number of
genes in that category) of the category. To get a low p-value, you need
both a fairly good enrichment AND a fairly large size of category. It
would be unusual for a small category like N=2 to have a low p-value.
The formula for computing p is such that low N makes the p-value become
pretty high. Even if the p-value was low, statisticians usually would not take seriously
anything with N<5. You can read more about the p-value calculation in
our original GoMiner
paper.
The FDR is actually a comparison of the number of times that the real data had
a certain p-value versus the number of times that randomized data has the
same or better p-value. The point is that random data will give you a few
categories that look like they have low p-values. The FDR procedure more
or less subtracts this background of p-values that are good by random chance.
So your data needs to have a distribution of p-values that is better than
random for us to report a positive result. The FDR addresses the multiple
comparison problem that occurs when calculating the p-values for hundreds
or thousands of categories, and protects against over-interpreting p-values
that do not have a biological meaning. The FDR calculation is described
in more detail in
our High-Throughput
GoMiner paper.
Question:I noticed that in High-Throughput GoMiner the root node
that is used for the FDR computation is 'biological_process,' but the root node
used for the FDR computation in GUI GoMiner is 'all.' Can I change the root node
used for the FDR computation in GUI GoMiner to 'biological_process?'
Answer:Select a category, then right click, and choose
'More Robust FDR.' from the pop-up menu. This will recalculate the FDR's from that node down to the bottom of the
ontology.
Categories that are not in the selected path will have their FDR's cleared when you recalculate.
Question:I get a warning message -- "Are you sure you want to generate the report from a different root than was used to generate the statistics" -- from GUI GoMiner when I try to download a results file. What should I do?
Answer:The message means that the category you selected for generating your report does not
match the one that was used to generate the FDR,P-Value and Enrichment Ratios.
By default, these statistics are calculated use the top node, Gene Ontology as the base.
You can do one of three things:
Question:I need greater accuracy in the FDR value than that afforded
by the default number of 5 randomizations. Can I compute the FDR value using more randomizations
than the default number?
Answer:The number of randomizations can be increased by first selecting a desired root
category for the FDR computation as described in the previous question. Select a category, then right click, and
choose
'More Robust FDR.' from the pop-up menu A dialogue box will appear that allows you to
select a larger number of randomizations. This will recalculate the FDR's from that
node down to the bottom of the ontology. Categories that are not in the selected path
will have their FDR's cleared when you recalculate. Because the computation may take a few minutes,
we present a progress bar so that you will know when it is completed. You can interact with the system while the
FDR's are recalculated, but the FDR values aren't stable until the process is complete.
Question:I've noticed that the FDR values reported by GoMiner
can be greater than 1. In published literature I've seen FDR described in such a
way that leads me to believe that it should be between 0 and 1. Can you explain
why your FDR can sometimes be greater than 1?
Answer:FDR values can be greater than 1. The easiest way to see this is to imagine
that the real data came about by random sampling (in fact, if you took enough random
samples, eventually you would generate an instance of your real data). Now consider
the best p-value in this real data set - it will not be very good, since it came
from randoms. In fact, maybe several of the randoms that are used for computing
FDR generate p-values as good as or better than this one. in that case, the
number of randoms generating that p-value would exceed the number of reals
generating that p-value, so the FDR is > 1.
Symptom:Your total-gene file is very large (typically >10,000 entries), and the program hangs
while reading the total-gene file.
Solution:Start GoMiner from the command line as follows:java -Xms256M -Xmx256M -jar
gominer.jar
This will allocate more memory to run the application. Note you either need to be in the same directory as
gominer.jar, or you
need to specify the path to gominer.jar. You can get more detail on working with the command interface
in Windows and
Mac OS X.
Symptom:GoMiner freezes whenever I put "A" in my total gene list
Solution:This is a side effect of a problem in the GO database.
The lastest version of the GO database has corrected the problem.
The current version of GoMiner has a filter to address this problem if
you are using an older build of the database.
Download the latest copy of GoMiner and try your data set again.
We have also written a more detailed description
of the problem and our solution.
Symptom:You can't find gominer.jar or can't find something once gominer.jar has been unzipped.
Solution:The gominer.jar file should not be unzipped. If you are prompted by your web browser, just save the
file to your disk as is, and follow the rest of the instructions. If your browser automatically attempts to
unzip the file, try right-clicking (cntl-click on a Mac) from your browser when you download the file
to explicitly select the save file option.
Symptom:You downloaded the GoMiner application a couple weeks ago, and something is not
working.
Solution:We are regularly updating the GoMiner tool. If you encounter problems, we may have already fixed it.
It may be helpful to download and use the latest version of the application to see if your problem persists.
Symptom:Application becomes sluggish and unresponsive.
Solution:You machine may be running out of memory. Shut down GoMiner and other applications that you are not
currently using, and restart GoMiner.
Symptom:It takes a very long time to load the GO Terms and the gene input files.
Solution:You may be having problems connecting to our database server. You may get better performance by
establishing your own copy of the database. Steps for installing a local copy of the database are found on the installation page.
Symptom:You are trying to install a local copy of the database using MySQL. You receive the error message,
"Error: The used command is not allowed with this MySQL version, when using table: tablename.txt".
Solution:Add -L to the myssqlimport command. E.g. "mysqlimport -L -uUSERNAME -pPASSWORD DATABASE
\go_200301-assocdb-tables\tablename.txt"
Symptom:You are trying to Load GO terms from the remote server. You receive the error message,
"General error: Table 'GEEVS.edit_session' doesn't exist".
Solution:You are using an old version of GoMiner that was designed to access the GO database prior to the GO
Consortium modification of the database schema. You need to download the most recent version of gominer.jar
Symptom:You are trying to Load GO terms from the remote server. You receive the error message,
"cannot establish
connection to database".
Solution:This is likely a local firewall issue. If port 3306 is blocked by your Site’s firewall, then the
problem described above will surface. Your system administrator should be able to change the firewall parameter
settings to permit you to connect with our server.
Symptom:When using the GoMiner database install wizard, or during the mysqlimport step when
installing manually, you get an error like the
following, ERROR 1064 at line 18: You have an error in your SQL syntax. Check the manual that corresponds to
your MySQL server version for the right syntax to use near 'DEFAULT CHARSET-latin1' at line 10
.
Solution: There are several possible solutions. If you are using the database install wizard, update to the latest version, we have provided a fix for this error. If
you are installing the database manually, you can edit the .sql files to remove the expression DEFAULT
CHARSET-latin1
from the files. Lastly, upgrading to a newer version of mysqlimport should also address the
problem.
Symptom:You cannot view downloaded 3D molecular structures from the NCBI ENTREZ STRUCTURE
database.
Solution:You may need to download and install the Cn3D viewer. Steps for installing the viewer are found on
the installation page
Symptom:You are entering Arabidopsis gene names into GoMiner, but few or none of the
genes appear to have annotations.
Solution: Arabidopsis gene names in the GO database are given a suffix such as ".1",".2", etc. Since
GoMiner performs exact matches on the input against the database entry, the input gene list needs to have these
suffixes. According to TAIR, The ".1" and ".2" suffixes are used to distinguish splice variants. Most
Arabidopsis genes do not have splice variants and therefore have only a ".1" suffix. Add the appropriate
suffix to your input files and retry GoMiner.
Symptom:You are entering TIGR identifers in to Gominer, but none of the identifier
appear to have annotations.
Solution:
We would like to hear from you. You can reach the team via email.
GoMiner was originally developed jointly by the Genomics and Bioinformatics Group (GBG) of LMP, NCI, NIH and the Medical Informatics and Bioimaging group of BME, Georgia Tech/Emory University. It is now maintained and under continuing development by GBG.