GoMiner Help

Question:Can I use GoMiner™ if I do not have a total-genes file?
Answer:Microarray users will have a natural total-genes file – the list of all genes represented on the microarray. Other users (eg. proteomics) may not have a natural total-genes file. The Auto-generate function fills this need by computing an artificial total-genes list.

The computation is performed by issuing an sql query to the GO database. This query is delimited by the parameter choices (such as organism, etc) selected in the user interface (UI). Although the artificial list is not as precise as a user-supplied file, it can provide an approximate background reference against which to estimate the statistical values for enrichment of categories by the changed genes.

Question:If I am providing a changed file, how does that relate to the total file?
Answer:We strongly encourage you to make your changed file be a subset of the total file. In fact, when you are using High-Throughput GoMiner, you will recieve an error if this is not the case. If you are using the suffix notation, then each suffixed variant should be in the total file. For example, if you have BAD~1, BAD~2 and BAD~3 in your changed file, then all three should be in the total file. You must follow these guidelines when submitting documents to High-Throughput GoMiner. If you are using GUI GoMiner, we recommend that you follow these guidelines, but GUI GoMiner will tolerate some variation. If you have some extra identifiers in the changed file that are not in the total file, GUI GoMiner will not throw an error, but will ignore the unmatched entries.

Question:I have a microarray whose probes were specifically chosen to study a particular biomedical problem (e.g. an oncochip). When I used the list of all of the probe targets as the basis of the total-genes file in GoMiner, I had a hard time identifying significant categories because the list of total genes is already very focused on a handful of GO categories. Is there a way to address this issue in GoMiner?
Answer:You can use the autogenerate feature instead of the traditional total-genes file, to partially compensate for the bias in this scenario. You can see the significant categories that would otherwise be obscured because of the bias in selection of the genes that make up the chip. The disadvantage of the approach is that the statistical results are only approximately correct. We thank Martin Cadeiras for contributing to the idea that led to this particular application of the autogenerate feature.

Question:Can GoMiner take as input identifiers besides gene names?
Answer:Yes it can. GoMiner works with all of the types of identifiers that are included in the GO Consortium database. We have several different lookup options for users to tune how these identifiers are queried. If you need to do additional identifier processing, we also provide the tool MatchMiner.

Question:How can I convert a list of GenBank gi numbers to an identifier type that is recognized by GoMiner?
Answer:For these identifier the best bet is to use the Biothesaurus search engine developed by Hongfang Liu of the PIR team. The specific example below works for the challenging case of a proteomics user converting D. rerio gi numbers to gene names, to match the ZFIN entries in the GO database. The basic approach can be used for other organisms as well.

Go to Biothesaurus and select GI Number in the ID Type menu.
Copy and paste the list of gi numbers, one per line, into the Query IDs text entry box.
Click on Retrieve.
Click on Display Options.
Select Gene Name from Fields Not in Display menu.
Click on the > to the right of the Fields Not in Display menu to move Gene Name into the Fields in Display menu.
Select each item in the Fields in Display menu except for Gene Names and Matched Fields.
Click on the < to the right of the Fields Not in Display menu to move the selected items into the Fields Not in Display menu.
Click in the Protein AC/ID checkbox.
Click on Save Results As: Table.
You will need to do a little bit of parsing of the output file to retrieve the gene names.
The list of gene names can be submitted to GoMIner.

Question:I changed my database settings, but I want to reset them to the original values. What are those?
Answer: For the MySQL adapter the settings are the following:

Database Driver: com.mysql.jdbc.Driver
Database URL: jdbc:mysql://discover.nci.nih.gov:1521/GEEVS?autoReconnect=true
Username: deploy
Password: selectonly

For the Derby adapter the settings are the following:

Database Driver: org.apache.derby.jdbc.EmbeddedDriver
Database URL: jdbc:derby:goMinerDerbyDB
Username:
Password:

Question:Should I use GoMiner or High-Throughput GoMiner?
Answer:It depends on your situation. We have added some guidance on the Getting Started page to help determine which version is most suitable.

Question:I am on a Mac, and I would like to use Firefox as my web browser for viewing SVG's, but GoMiner launches Safari instead.
Answer:GoMiner will lauch whatever browser is used as the default browser for your operating system. To change this to Firefox, go to the Firefox Preferences, and select the option to check for Firefox as the default browser. Then, launch Firefox, and when prompted, select Firefox as the default web browser.

Question:On the Mac, which is the best browser for viewing SVG's?
Answer:On the Mac, if you are interested in panning and zooming use Safari with the Adobe SVG viewer. If you want the search by gene or category features, use Firefox.

Question:I am on an Intel Mac, and while I have installed the plug-in, the SVG's don't appear, what do I do?
Answer:Select your web browser application icon, and choose Get Info from the File menu in the Finder. Check the box that says Open With Rosetta. Then, restart your browser. The plugin should now work.

Question:There is an icon () besides some of the genes. What does this mean?
Answer:It indicates that there is data available for these gene in the NCBI ENTREZ STRUCTURE database. If data is available, then the NCBI_STRUCTURE menu option will be active on the menu that appears when you right click (control-click on the Mac) on the gene.

Question:How can I access VennMaster Diagrams directly from within GoMiner?
Answer:Select the desired root category from the 'Genes Mapped on GO' tab on the right GoMiner panel , and right-click to invoke the context menu. Select 'VennMaster view of changed category (SVG)' from the menu.

Question:I am trying to generate a VennMaster image, but I am just getting a blank panel.
Answer:You may be getting an OutOfMemory error. Restart GoMiner with more memory (see below) and try again.

Question: Can GoMiner analyze splice variants?
Answer:GoMiner and High-Throughput GoMiner (referred to in the following collectively as 'GoMiner') traditionally dereplicate total and changed gene input files so that only one instance of a gene name is processed. When multiple alternatively spliced forms are to be analyzed, however, dereplication would result in a loss of relevant information. Consequently, we have added a new feature to GoMiner to retain full information about the alternative splice variants by replicating the input of each gene by the number of alternative exons per gene in total and changed gene input files.

As a specific example, suppose that a microarray platform contained probes that were unique for two different splice variants of BRCA1. Then the two splice variants would be designated in the input files as 'BRCA1~1' and 'BRCA1~2'. The '~' tells GoMiner to treat these as different entries, rather than to de-replicate them, but to ignore the suffix when querying the GO database. By this mechanism, all splice variants are counted when computing the Fisher exact p-value.

The total-genes file must still be a superset of the changed-genes file, so in this previous example, you would need both BRCA1~1 and BRCA1~2 in the total file as well.

In general, the GO database does not seem to differentiate between different biological functional properties of splice variants, and the GoMiner strategy reflects this by using the suffix as an arbitrary tag. We are developing an extension of this approach in which the suffix will literally identify a particular splice variant. We hope that the GO database will gradually become enriched in differential biological functional annotations for splice variants.

An example of the use of this new feature can be found in a paper entitled "Nova regulates brain-specific splicing to shape the synapse" by Jernej Ule, Alja Ule, Joanna Spencer, Alan Williams, Jing-Shan Hu, Melissa Cline, Hui Wang, Tyson Clark, Claire Fraser, Matteo Ruggiu, Barry R Zeeberg, David Kane, John N Weinstein, John Blume, and Robert B Darnell (Nature Genetics 37, 844 - 852 (2005))

Question:What is the difference between the Smallest Category Size for Category Statistics and the Largest Category Size to Include in CIM?
Answer:The Smallest Category Size for Category Statistics parameter is broader. It is used in all variants of GoMiner. Categories whose size is less than this threshold will be omitted from category statistic calculations. Many reports and displays will still include these categories, but they won't have p-values, enrichment ratios, or FDR's. This threshold is also used to filter smaller randomized categories when determining the FDR. The Largest Category Size to Include in CIM has a more limited scope. It only affects the CIM's, a report-type in HTGM. It eliminates the categories above the threshold from the category gene matrixes.

Question:How do I set the Smallest Category Size in GUI GoMiner
Answer:It is an option when you choose to "Select Root and Recalcuate Statistics." This option is available when you right-click a category (control-click on a Mac) in the navigation tree.

Lookup Settings Questions

Question:How do I choose which Lookup Settings options to select?
Answer:There are four fields in the enhanced GO database that can be used for matching your input identifiers. The primary field is 'symbol,' but certain types of identifiers require searching 'synonym,' 'dbxref,' or 'officialname' fields. The default setting in the user interface will result in searching all four fields. However, using the default setting may be too 'promiscuous' for a user who has a very specific type of search in mind, and it can be more expensive computationally than a properly restricted search. In the next several FAQs we describe some special instances that use fields other than 'symbol.'

Question:What are the proper Lookup Settings options for input containing HGNC symbols?
Answer:You should choose 'UniProt' as your Datasource and 'Enhanced Names (UniProt Only)' as your Lookup Settings. De-select the Lookup Settings 'Cross Reference' and 'Synonym' options.

Question:How can I query with FlyBase (FB) identifiers that contain prefixes like '&' or the names of Greek letters?
Solution:For broadest coverage of these identifier types, select the 'synonym' option in the 'Lookupsettings' Menu. This should cover both the SGML-style (e.g. &) and the text style (e.g. alpha) prefixes. GoMiner will then automatically recognize and process both prefix formats. For more information on these identifiers refer to the FlyBase Reference Manual and their page on encodings.

Question:Can I query gominer using ORF format identifiers for yeast?
Solution:A number of researchers in the yeast community use the ORF format identifier. GoMiner supports this format if you select the 'synonyms' option of Lookup settings. We have some sample files that use these identifiers available.

False Discovery Rate (FDR) Questions

Question:Can you explain the p-Values and FDR's?
Answer:The current implementation of GoMiner uses a one-sided p-value calculated from a Fisher's exact test. You can think of the p-value as computed from the enrichment AND from the size (i.e., the total number of genes in that category) of the category. To get a low p-value, you need both a fairly good enrichment AND a fairly large size of category. It would be unusual for a small category like N=2 to have a low p-value. The formula for computing p is such that low N makes the p-value become pretty high. Even if the p-value was low, statisticians usually would not take seriously anything with N<5. You can read more about the p-value calculation in our original GoMiner paper.

The FDR is actually a comparison of the number of times that the real data had a certain p-value versus the number of times that randomized data has the same or better p-value. The point is that random data will give you a few categories that look like they have low p-values. The FDR procedure more or less subtracts this background of p-values that are good by random chance. So your data needs to have a distribution of p-values that is better than random for us to report a positive result. The FDR addresses the multiple comparison problem that occurs when calculating the p-values for hundreds or thousands of categories, and protects against over-interpreting p-values that do not have a biological meaning. The FDR calculation is described in more detail in our High-Throughput GoMiner paper.

Question:I noticed that in High-Throughput GoMiner the root node that is used for the FDR computation is 'biological_process,' but the root node used for the FDR computation in GUI GoMiner is 'all.' Can I change the root node used for the FDR computation in GUI GoMiner to 'biological_process?'
Answer:Select a category, then right click, and choose 'More Robust FDR.' from the pop-up menu. This will recalculate the FDR's from that node down to the bottom of the ontology. Categories that are not in the selected path will have their FDR's cleared when you recalculate.

Question:I get a warning message -- "Are you sure you want to generate the report from a different root than was used to generate the statistics" -- from GUI GoMiner when I try to download a results file. What should I do?
Answer:The message means that the category you selected for generating your report does not match the one that was used to generate the FDR,P-Value and Enrichment Ratios. By default, these statistics are calculated use the top node, Gene Ontology as the base. You can do one of three things:

Change what category you are using to generate your report, so that it is the same one that was used to generate the statistics. If you have not tried to update the statistics, this means generate the report from the Gene Ontology Category.
Recalculate the statistics from the category that you want to use as the base of your report, and then generate your report from the same category.
Ignore the message and proceed, in which case the statistics will still be correct for the context from which they were generated, but the report may not completely cover that context.

Question:I need greater accuracy in the FDR value than that afforded by the default number of 5 randomizations. Can I compute the FDR value using more randomizations than the default number?
Answer:The number of randomizations can be increased by first selecting a desired root category for the FDR computation as described in the previous question. Select a category, then right click, and choose 'More Robust FDR.' from the pop-up menu A dialogue box will appear that allows you to select a larger number of randomizations. This will recalculate the FDR's from that node down to the bottom of the ontology. Categories that are not in the selected path will have their FDR's cleared when you recalculate. Because the computation may take a few minutes, we present a progress bar so that you will know when it is completed. You can interact with the system while the FDR's are recalculated, but the FDR values aren't stable until the process is complete.

Question:I've noticed that the FDR values reported by GoMiner can be greater than 1. In published literature I've seen FDR described in such a way that leads me to believe that it should be between 0 and 1. Can you explain why your FDR can sometimes be greater than 1?
Answer:FDR values can be greater than 1. The easiest way to see this is to imagine that the real data came about by random sampling (in fact, if you took enough random samples, eventually you would generate an instance of your real data). Now consider the best p-value in this real data set - it will not be very good, since it came from randoms. In fact, maybe several of the randoms that are used for computing FDR generate p-values as good as or better than this one. in that case, the number of randoms generating that p-value would exceed the number of reals generating that p-value, so the FDR is > 1.

General Problems

Symptom:Your total-gene file is very large (typically >10,000 entries), and the program hangs while reading the total-gene file.
Solution:Start GoMiner from the command line as follows:
java -Xms256M -Xmx256M -jar gominer.jar
This will allocate more memory to run the application. Note you either need to be in the same directory as gominer.jar, or you need to specify the path to gominer.jar. You can get more detail on working with the command interface in Windows and Mac OS X.

Symptom:GoMiner freezes whenever I put "A" in my total gene list
Solution:This is a side effect of a problem in the GO database. The lastest version of the GO database has corrected the problem. The current version of GoMiner has a filter to address this problem if you are using an older build of the database. Download the latest copy of GoMiner and try your data set again. We have also written a more detailed description of the problem and our solution.

Symptom:You can't find gominer.jar or can't find something once gominer.jar has been unzipped.
Solution:The gominer.jar file should not be unzipped. If you are prompted by your web browser, just save the file to your disk as is, and follow the rest of the instructions. If your browser automatically attempts to unzip the file, try right-clicking (cntl-click on a Mac) from your browser when you download the file to explicitly select the save file option.

Symptom:You downloaded the GoMiner application a couple weeks ago, and something is not working.
Solution:We are regularly updating the GoMiner tool. If you encounter problems, we may have already fixed it. It may be helpful to download and use the latest version of the application to see if your problem persists.

Symptom:Application becomes sluggish and unresponsive.
Solution:You machine may be running out of memory. Shut down GoMiner and other applications that you are not currently using, and restart GoMiner.

Symptom:It takes a very long time to load the GO Terms and the gene input files.
Solution:You may be having problems connecting to our database server. You may get better performance by establishing your own copy of the database. Steps for installing a local copy of the database are found on the installation page.

Symptom:You are trying to install a local copy of the database using MySQL. You receive the error message, "Error: The used command is not allowed with this MySQL version, when using table: tablename.txt".
Solution:Add -L to the myssqlimport command. E.g. "mysqlimport -L -uUSERNAME -pPASSWORD DATABASE \go_200301-assocdb-tables\tablename.txt"

Symptom:You are trying to Load GO terms from the remote server. You receive the error message, "General error: Table 'GEEVS.edit_session' doesn't exist".
Solution:You are using an old version of GoMiner that was designed to access the GO database prior to the GO Consortium modification of the database schema. You need to download the most recent version of gominer.jar

Symptom:You are trying to Load GO terms from the remote server. You receive the error message, "cannot establish connection to database".
Solution:This is likely a local firewall issue. If port 3306 is blocked by your Site’s firewall, then the problem described above will surface. Your system administrator should be able to change the firewall parameter settings to permit you to connect with our server.

Specific Problems

Symptom:When using the GoMiner database install wizard, or during the mysqlimport step when installing manually, you get an error like the following, ERROR 1064 at line 18: You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near 'DEFAULT CHARSET-latin1' at line 10.
Solution: There are several possible solutions. If you are using the database install wizard, update to the latest version, we have provided a fix for this error. If you are installing the database manually, you can edit the .sql files to remove the expression DEFAULT CHARSET-latin1 from the files. Lastly, upgrading to a newer version of mysqlimport should also address the problem.

Symptom:You cannot view downloaded 3D molecular structures from the NCBI ENTREZ STRUCTURE database.
Solution:You may need to download and install the Cn3D viewer. Steps for installing the viewer are found on the installation page

Symptom:You are entering Arabidopsis gene names into GoMiner, but few or none of the genes appear to have annotations.
Solution: Arabidopsis gene names in the GO database are given a suffix such as ".1",".2", etc. Since GoMiner performs exact matches on the input against the database entry, the input gene list needs to have these suffixes. According to TAIR, The ".1" and ".2" suffixes are used to distinguish splice variants. Most Arabidopsis genes do not have splice variants and therefore have only a ".1" suffix. Add the appropriate suffix to your input files and retry GoMiner.

Symptom:You are entering TIGR identifers in to Gominer, but none of the identifier appear to have annotations.
Solution:

TIGR

GO annotation

TIGR

TIGR_TGI:Human_THC2011962

TIGR_TGI:Mouse_TC1242654

We would like to hear from you. You can reach the team via email.

GoMiner was originally developed jointly by the Genomics and Bioinformatics Group (GBG) of LMP, NCI, NIH and the Medical Informatics and Bioimaging group of BME, Georgia Tech/Emory University. It is now maintained and under continuing development by GBG.

Notice and Disclaimer