|
Application Build: 246 Database Build: 2008-04 |
|
|
Local Database and Gene Enhancement Installation
Thanks to Dr. Wilhelm Schneiderhan, Mary Ann Wilson, Kevin Kransler and Edward Lobenhofer for providing these installation instructions.
Note: These instructions use the convention terms that UPPER CASE AND UNDERLINED should be replaced with
the specifics for your local installation.
There are two ways to install a local MySQL database.
- For users running under a Unix-based operation system, there is an
automated script. Download
and uncompress the script. The instructions for running and configuring
the script are in the enclosed README file.
- The rest of this page documents the manual procedure for
installing a local database.
You can also read a description of the GO database and the algorithm for
these enhancements. We also have an alternative local database implementation using Derby that is simpler, but less flexible.
Install GO Database Locally :
When you download GoMiner, it is configured to query a database hosted on the GoMiner site.
You may wish to run a copy of the database for performance or for data customization.
- Install a copy of the MySQL database server if it is not already installed.
We have tested GoMiner against MySQL version 5.0. Earlier versions of GoMiner were tested with MySQL versions 3.23 and 4.0, and should still work. An example of these installation can be found
here. Make a note of the root pasword that you assign. During the mysql installation process,
you must assign a USERNAME and a PASSWORD if one is not already assigned. These are used in the steps below
- Log on to the MySQL command line client with root user and password as defined during installation.
mysql -uroot -pPASSWORD
Replace PASSWORD
as defined during mysql installation.
- In Mysql prompt run the following to create database
mysql> create database IF NOT EXISTS DATABASE;
Replace DATABASE
with the name you would like this new database instance to have.
- To create user and provide all access to the user, type the following at the MySql prompt
mysql> grant ALL on *.* to 'USERNAME'@'SERVERNAME' identified by 'PASSWORD';
Assign USERNAME
with new username which will be used to access database. If you are going to run and access mysql on the same machine, you can use localhost
as the SERVERNAME
. If you are going
to use MySQL from another machine, you can either specify the machine name or '%' (single quotes included). The MySQL documention has more documentation about managing
privileges. When you first install MySql, the root user is not assigned a password. It is also a good idea to give your root user
a password as described in the MySql installation guide.
-
Exit mysql at this point by typing the following at mysql prompt
mysql> exit
- The current version of GoMiner has been tested with the
Oct 2007 build of the database.
You can also find other builds of the database,
or you can try the latest version. Download
the version of the file with the name in the format go_YYYYMM-assocdb-tables.tar.gz.
Download compressed file using one of the link provided above. Extracting the compressed file should create a
directory (We refer this directory as GO_DATA_DIR
) containing GO data files with *.txt and .sql extension.
It is a large file, so be patient. This file is updated monthly, the instructions in this section
are appropriate to use if you want to download the latest GO annotations at a later date and update
your database with the new file.
- From the command-line, run the following script to load the
database (USERNAME and PASSWORD must be the same as the ones assigned during MySQL installation;
DATABASE should be the instance name established above with the create command.
GO_DATA_DIR should be directory where GO data files where downloaded in previous step)
Depending on the speed of your hardware, the mysqlimport step can take quite a
while to run. Also note that you can also potentially encounter a
mysql version problem. The syntax differs based on the OS:
On Unix (inc. Mac OS X):
cd GO_DATA_DIR
cat *.sql | mysql -uUSERNAME -pPASSWORD -DDATABASE
mysqlimport -L -uUSERNAME -pPASSWORD DATABASE *.txt
On Windows:
Download loadDB.bat script (Credits: Dr. Wilhelm Schneiderhan and Mary Ann Wilson).
- Right click on 'loadDB.bat' link
- Select 'Save As' menu. Make sure to retain '.bat' extension
- Choose
GO_DATA_DIR
directory
Then, execute the script
cd GO_DATA_DIR
loadDB.bat . USERNAME PASSWORD DATABASE
Wait for all database files to load. There will be periods when it seems like everything has frozen, but the script is still working.
h. This will take an extended period of time. Instead of making yourself a cup of coffee, splurge a bit, go out and buy one.
By the time you get back, it might be done. Everything will be complete when the command prompt returns. Once this is done, close the command prompt.
- When it is done, if you are proceeding with Step 3, leave the DOS window open.
Run Gominer:
- When you run GoMiner, change the JDBC URL in the Load Go Terms dialog box to reference your machine and database
instance name.
(i.e. jdbc:mysql://YOURDATABASEMACHINEDOMAINNAME/DATABASE
or
jdbc:mysql://YOURDATABASEMACHINEIPNUMBER/DATABASE)
- Change Username and Password in the Load Go Terms dialog box to your USERNAME and PASSWORD
- GoMiner "remembers" the most recently-used JDBC URL, USERNAME,and PASSWORD, so you will only need to type
these in if you wish to access a different database than you used the last time you ran GoMiner
Enhanced Annotations Mapping (Required In Order To Be Able To Use HUGO Names):
This enhancement is a translation mechanism that allows GoMiner to search for
GO associations from the UniProt Data Source by gene name. This enhancement may be useful for
expanding the query space for all organisms, but it is essential for using HUGO names in queries.
The enhancement does not
involve the addition of any genes that are not already present in UniProt. Note that these
scripts only work with the GO database builds from at least December 2003. Earlier GO database
build will not work correctly with these scripts. A description
of the algorithms used for this enhancement is available. The steps for applying the enhancement to a local
instance of the database are listed here:
STEP 0: Prerequisite
- You need a to have completed installing a local copy of the GO database to get started.
STEP 1: Download UniProt Mapping Files:
- Download and decompress the UniProt Mapping file
(Zip or gzip format). The mapping file is generated from
UniProt (Swiss-Prot, TrEMBL and PIR) data. For the human entries, these names are further
filtered against those recognized by HUGO.
- Optional: You can also generate your own UniProt mapping file.
(These scripts can only be run on a Unix-based platform)
STEP 2: Load UniProt Enhancement Mapping :
- From the command-line, run the following script to load the database (With USERNAME, PASSWORD and DATABASE replaced
with their appropriate values) from the system prompt:
mysql -uUSERNAME -pPASSWORD -DDATABASE < uniprotmapping.sql
mysqlimport -L -uUSERNAME -pPASSWORD DATABASE uniprotmapping.txt
STEP 3: Enhance GO Database with UniProt Mapping data:
- Note: To complete this step, Java runtime version 1.3 or higher is needed.
- Download the Enhancer Script file.
- From the command-line system prompt, run the following command to enhance GO database (With USERNAME, PASSWORD and DATABASE and TAXA_ID replaced with their appropriate values):
Note:
- To process a specific set of organisms, replace TAXA_ID with semicolon separated Taxonomy Ids. To process every organism, replace TAXA_ID with all.
- This process could take between several minutes and 24hrs to complete depending on the number of species selected and the available hardware
java -jar UpdateGeneProduct.jar jdbc:mysql://localhost/DATABASE com.mysql.jdbc.Driver USERNAME PASSWORD DATABASE TAXA_ID
You may wish to create your own version of UniProt Mapping file. You may want a customized version or you may desire a build from a more recent version of UniProt than in our current version of the database:
Prerequisites
- Can execute only on Unix-based platform (including Mac OS X)
Scripts and Instructions:
- Download and decompress UniProt Mapping Scripts
- From the command-line, run the following script to parse the UniProt files to generate "uniprotmapping.txt"
./uniprotmapping.sh SPECIES_FILE MATCHMINER_FLAG SCRIPTS_FULLPATHNAME GZIP_FULLPATHNAME
- SPECIES_FILE is the name of a file containing NCBI taxonomy IDs to be used for enhancement.
This file can contain:
- all = All species (must be on first line and only entry in file)
- human = Human only (must be on first line and only entry in file)
- List of NCBI Taxonomy IDs (one per line)
Some Popular Species
Species | NCBI Tax Id |
Human | 9606 |
Rat | 10090 |
Drosophila | 7227 |
C. elegans | 6239 |
S. cerevisiae | 4932 |
A. thaliania | 3702 |
- MATCHMINER_FLAG
- If it is equal to "n" then an actual HUGO download is used to constrain HUGO names. This is the preferred and recommended option.
- if it is equal to "M" or "m" then matchminer rather than an actual HUGO download is used to constrain HUGO names
- SCRIPTS_FULLPATHNAME is the pathname of the directory containing the scripts
- GZIP_FULLPATHNAME is the pathname of the directory containing the version of
gzip
that can decompress files whose size exceeds 2 GB after decompression. This is required because some decompressed UniProt files now exceed 2GB.
- The output of
uniprotmapping.sh
is the UniProt-Enhancement Mapping file "uniprotmapping.txt"