ORIOGEN (O rder R estricted I nference for O rdered G ene E xpressioN )

Inputs Screen:

Input file name

Output file name

Ontology file name

Total number of dose groups or time points

Vector of sample sizes per dose group or time point

Number of bootstrap samples

P-value for performing the test for profiles

Bootstrap random seed

Take log of signal

Advanced inputs

 

Input file name:

This field specifies the path and name of the file to be analyzed.

Users can directly enter the file path and name, or use the browse button for file selection.

All fields in the input file must be tab-delimited.

ORIOGEN does not normalize the input data. It is recommended that the user pre-process the data by applying a suitable normalization method before submitting the data to ORIOGEN. ORIOGEN selects and clusters genes on the basis of the mean of the expression values provided to ORIOGEN.

The format of the input file should be as follows:

Header row (optional): Row 1 can be a header row (if present it is ignored) or data row.

Column 1: Contains the gene ID. The gene ID is an alpha-numeric character string used to identify the gene and can also be used as the key when performing a gene look-up on selected genes. The format of the gene ID used for gene look-up can be any of the following:

Probe ID (example: A_42_P453131)

Systemic name (example: 216994_Rn)

GenBank Accession Number (example: BE109018)

UniGene ID (example: Rn.19577)

Column 2 (optional): May contain a gene description string. If present, this string will be saved into the output file if the gene gets selected and will appear on the popup window if the user clicks on this gene in the Results graph.

All Remaining Columns: Contain tab-delimited numeric gene expression data. Missing values in the input data should be represented by a single period(i.e. ".").

Back to top

 

Output file name (Fitted means/Raw means):

This field specifies the path and name of the file containing the genes selected by the ORIOGEN software.

Users can directly enter the file path and name, or use the browse button for file selection.

The fields in the output file are tab-delimited.

The format of the file is as follows:

Column 1: Contains the counter number of the gene selected, starting at number one.

Column 2: Contains the row number from the input file of the selected gene.

Column 3: Contains the gene ID.

Column 4: Contains the user provided gene description from the input file if present, blank otherwise.

Column 5: Contains the profile number of the selected gene.

Column 6: Contains the computed P-value.

Column 7: Contains the computed Q-value.

Columns 8 and Higher: Contains the following:

Last 3 Columns: The last three columns of the output file may contain the following fields depending on the results of the ontology look-up procedure. If the ontology file is not specified, or the look-up procedure finds no data, these fields will not be present.

NOTE: In addition to the output file specified above, ORIOGEN will also create a raw output file that contains the input data for the genes that were selected. This file will have the same name as the primary output file, with "(Raw)" appended to the end of the filename.

Back to top

 

Ontology file name:

This field specifies the path and name of the file used for the ontology look-up for a selected gene. Ontology files are available for download from ftp://ftp.tigr.org/pub/data/tgi/Resourcerer/.

ORIOGEN uses the following procedure to find the ontology data for a particular gene:

Back to top

 

Total number of dose groups or time points:

This field specifies the total number of dose groups or time points present in the input file. The maximum number of dose groups or time points supported in this release is 30.

Back to top

 

Vector of sample sizes per dose/time point:

This field specifies the sample sizes that are associated with each dose/time point.

Sample sizes can be individually entered using the "Enter Sample Sizes" button, or as a string with values separated by commas.

The ORIOGEN software performs a check to ensure that the input data is correct with respect to dose/time points. For example, if a user specifies 4 dose/time points with each sample size being 4, then the input file must contain 16 data values for each gene (including a period "." for missing values). If the sum of the sample sizes for each dose/time point does not equal the number of data values, an error message is displayed.

Back to top

 

Number of bootstrap samples:

This field specifies the number of bootstrap samples to be used in the analysis. It is suggested that this field contain a large value (example: for a P-value of 0.001 the suggested number of bootstraps is 100,000).

Back to top

 

P-value for performing the test for profiles:

This field specifies the level of significance to be used for selecting significant genes. Genes with computed P-values less than this value are selected and written to the output files specified above.

Back to top

 

Bootstrap random seed:

Automatic random seed uses a constantly changing seed value for the random number generator. Manual random seed uses a user provided seed value for the random number generator.

Back to top

 

Take log of signal:

If this option is checked, ORIOGEN will take the log of all signal values in the Input file before calculating means or performing any processing.

Back to top

 

Advanced Inputs:

Click on the Advanced button (next to the Help button) to view or change the Advanced Inputs used by the program.

Back to top