Genomics and Bioinformatics Group CIMMiner Genomics and Bioinformatics Group

Basic Steps

Prepare Dataset

The input file must be in tab delimited ASCII text format (For MAC user, the file should be saved as .txt extension).

The file includes data, row name and column name. The first row should be row name. And the first left column should be column name. See below example:

File Name: mydata.txt

  row 1row 2row 3
column 1-245
column 2121
column 3462

We use a period (full stop) as a decimal point. Using a comma will result in errors, since we use commas as list separators. For the same reason, please do not use commas to separate digits in large numbers. For example, the number "one hundred twenty-three thousand, four hundred fifty-six and seventy eight hundredths(or 'point seventy-eight')" (Expanded version: 100000 + 20000 + 3000 + 400 + 50 + 6 + .7 + .08 ) should be written as "123456.78" not "123456,78" or "123,456.78" or "123,456,78".

In addition, you should not attempt to upload a data file having fewer than 3 rows and 3 columns to cluster them. This is because the clustered image map is intended for presenting clustered data and with fewer than 3 items the clustering algorithm cannot provide any useful information.

Select Order Algorithm

Selecting order algorithm will determine the order the output apprears. If you want like data to be grouped, then choose "Cluster". For the computer to randomly order your data then choose "Randomize". To have your results appear in the order specified in your original file, select "No cluster". You must specify the order for each axis.

If you select cluster in order algorithm, you have to select a cluster algorithm. Otherwise, skip this section.

The cluster algorithm gives the linkage used by the hierarchical clustering algorithm to determine the distance between cluster groups.

  • Average linkage defines the distance as the average of all pairs from each cluster group.
  • Connected linkage defines the distance as the minimum distance of some pair of elements from each cluster group.
  • Complete linkage defines the distance as the largest distance between a pair of elements from each cluster group.
  • Correlation distance uses 1-ρ as the distance where ρ is the correlation of two vectors.
  • Euclidean distance uses the euclidean distance, the square root of the sum of squared differences of the coordinate values. Each coordinate is first normalized first (to have standard deviation 1).
  • Visual distance uses the euclidean distance between vectors. The coordinates are not normalized in this definition.
  • Manhattan distance uses the sum of absolute differences of the coordinate values. Each coordinate is first normalized first (to have standard deviation 1).
  • Maximum distance uses the maximum absolute difference in all coordinates. Each coordinate is first normalized first (to have standard deviation 1).
  • Absolute correlation distance uses the absolute value of the correlation between the items.

The cluster distance and cluster method can be chosen separately for each axis.

Result

The result has four frames. The top frame contains CIMminer menu and your input file name. The left frame contains a list of the X axis elements, in the order that they appear on the X axis of the image. The right frame contains a list of the Y axis elements, in the order that they appear on the Y axis of the image. These two frames also contain links to display a separate image of the respective cluster trees and a merge height plot. The main frame, in the middle, contains the image itself. The image is a gif file.

  You may reformat the image by clicking the button "Color", "Binning", "Zoom", "Axes", and "Page Layout".

  You can download the postscript image file.