Scientific Supercomputing at the NIH

MAT on Helix

MAT is an algorithm to reliably detect regions enriched by transcription factor Chromatin ImmunoPrecipitation (ChIP) on Affymetrix tiling arrays (chip). MAT models the baseline probe behavior by considering probe sequence and copy number on each array. The correlation between the baseline probe model estimates and the observed measurements can be as high as 0.72. MAT standardizes the probe value via the probe model, eliminating the need for sample normalization. A novel scoring function is applied to the standardized data to identify the ChIP-enriched regions, which allows robust p-value and false discovery rate calculations. MAT can detect ChIP-regions from a single ChIP sample, multiple ChIP samples, or multiple ChIP samples with controls with increasing accuracy. Based on the mock ChIP samples provided by the ENCODE consortium, MAT achieved 100% accuracy (0 false positive and 0 false negative) for the target detection of those spike-in plasmids, which are 2,4,8,-256 fold enriched compared with the genomic background. Quantitatively, MAT yielded a 0.95 correlation coefficient between the spike-in DNA concentration and the predicted score. Upon further analysis, MAT identified more than 70% of the true targets at 5% FDR cutoff from a single ChIP sample. This is a valuable feature for quickly testing the protocols and antibodies for ChIP-chip, and easily identifying ChIP-chip samples with questionable quality.

MAT is developed by the Liu Lab at Harvard. MAT website.

Sample session with MAT against the sample ER data. User input in bold.


[susanc@helix SampleData]$ ls -l
total 441520
-rw-r--r-- 1 susanc staff     2458 Nov  6 09:26 ER.tag
-rw-r--r-- 1 susanc staff 70891463 Dec  8  2005 Humanhg17Rep.lib
-rwxr--r-- 1 susanc staff 19559305 Feb 10  2004 MCF_ER_A1.CEL
-rwxr--r-- 1 susanc staff 19071557 Jun 16  2005 MCF_ER_A3.CEL
-rwxr--r-- 1 susanc staff 19146257 Jun 16  2005 MCF_ER_A4.CEL
-rwxr--r-- 1 susanc staff 19459591 Feb 10  2004 MCF_ER_B1.CEL
-rwxr--r-- 1 susanc staff 19143665 Jun 16  2005 MCF_ER_B3.CEL
-rwxr--r-- 1 susanc staff 19198441 Jun 16  2005 MCF_ER_B4.CEL
-rwxr--r-- 1 susanc staff 19403556 Feb 10  2004 MCF_ER_C1.CEL
-rwxr--r-- 1 susanc staff 19279905 Jun 16  2005 MCF_ER_C3.CEL
-rwxr--r-- 1 susanc staff 19357596 Jun 16  2005 MCF_ER_C4.CEL
-rwxr--r-- 1 susanc staff 20068676 Jan 29  2004 MCF_INP_A1.CEL
-rwxr--r-- 1 susanc staff 19028992 Jun 16  2005 MCF_INP_A3.CEL
-rwxr--r-- 1 susanc staff 18780486 Jun 16  2005 MCF_INP_A4.CEL
-rwxr--r-- 1 susanc staff 19571266 Jan 29  2004 MCF_INP_B1.CEL
-rwxr--r-- 1 susanc staff 19145572 Jun 16  2005 MCF_INP_B3.CEL
-rwxr--r-- 1 susanc staff 18885018 Jun 16  2005 MCF_INP_B4.CEL
-rwxr--r-- 1 susanc staff 19573266 Jan 29  2004 MCF_INP_C1.CEL
-rwxr--r-- 1 susanc staff 19202290 Jun 16  2005 MCF_INP_C3.CEL
-rwxr--r-- 1 susanc staff 18741518 Jun 16  2005 MCF_INP_C4.CEL
-rw-r--r-- 1 susanc staff 10933504 Nov  4  2005 P1_CHIP_A.Anti-Sense.hs.NCBIv35.NR.bpmap
-rw-r--r-- 1 susanc staff 11710221 Nov  4  2005 P1_CHIP_B.Anti-Sense.hs.NCBIv35.NR.bpmap
-rw-r--r-- 1 susanc staff 10885258 Nov  4  2005 P1_CHIP_C.Anti-Sense.hs.NCBIv35.NR.bpmap

[susanc@helix SampleData]$ MAT ./ER.tag
[ Thu Nov  6 09:28:17 2008 ]

P1_CHIP_A.Anti-Sense.hs.NCBIv35.NR.bpmap
Treat:  MCF_ER_A1.CEL MCF_ER_A3.CEL MCF_ER_A4.CEL
Control:  MCF_INP_A1.CEL MCF_INP_A3.CEL MCF_INP_A4.CEL

P1_CHIP_B.Anti-Sense.hs.NCBIv35.NR.bpmap
Treat:  MCF_ER_B1.CEL MCF_ER_B3.CEL MCF_ER_B4.CEL
Control:  MCF_INP_B1.CEL MCF_INP_B3.CEL MCF_INP_B4.CEL

P1_CHIP_C.Anti-Sense.hs.NCBIv35.NR.bpmap
Treat:  MCF_ER_C1.CEL MCF_ER_C3.CEL MCF_ER_C4.CEL
Control:  MCF_INP_C1.CEL MCF_INP_C3.CEL MCF_INP_C4.CEL
Reading  P1_CHIP_A.Anti-Sense.hs.NCBIv35.NR.bpmap Thu Nov  6 09:28:18 2008
PMX  PMY  MatchScore Thu Nov  6 09:28:18 2008
All probes
reading  chr21
Making Uniq Index  Thu Nov  6 09:28:23 2008
Maximum copy number:  2204  duplicate probe measurements:  17679
PMProbe Thu Nov  6 09:28:27 2008
Partial probes 329395
reading  chr21
Getting cel intensities:  Thu Nov  6 09:28:33 2008
reading  MCF_ER_A1.CEL Thu Nov  6 09:28:33 2008
reading  MCF_ER_A3.CEL Thu Nov  6 09:28:35 2008
reading  MCF_ER_A4.CEL Thu Nov  6 09:28:36 2008
reading  MCF_INP_A1.CEL Thu Nov  6 09:28:38 2008
reading  MCF_INP_A3.CEL Thu Nov  6 09:28:39 2008
reading  MCF_INP_A4.CEL Thu Nov  6 09:28:41 2008
Making design matrix Thu Nov  6 09:28:50 2008
Fitting model ...  Thu Nov  6 09:28:52 2008
Model fitting on all unique probes Thu Nov  6 09:29:25 2008
Chr  Position Thu Nov  6 09:29:26 2008
All probes
reading  chr21
Standardizing Sample: MCF_ER_A1.CEL Thu Nov  6 09:29:29 2008
Standardizing Sample: MCF_ER_A3.CEL Thu Nov  6 09:29:29 2008
Standardizing Sample: MCF_ER_A4.CEL Thu Nov  6 09:29:30 2008
Standardizing Sample: MCF_INP_A1.CEL Thu Nov  6 09:29:30 2008
Standardizing Sample: MCF_INP_A3.CEL Thu Nov  6 09:29:31 2008
Standardizing Sample: MCF_INP_A4.CEL Thu Nov  6 09:29:31 2008
Making MAT score Thu Nov  6 09:29:32 2008
Control Input Variance :  0
100000 chr21 3.98
200000 chr21 7.67
300000 chr21 11.34
Making FDR table Thu Nov  6 09:29:46 2008
Saving bar files Thu Nov  6 09:29:48 2008
Region calling with cutoff 4.39599681562 Thu Nov  6 09:29:52 2008
Repeat Masking ...  Thu Nov  6 09:29:53 2008
Repeat Masking ...  Thu Nov  6 09:29:53 2008
Reading  P1_CHIP_B.Anti-Sense.hs.NCBIv35.NR.bpmap Thu Nov  6 09:29:54 2008
PMX  PMY  MatchScore Thu Nov  6 09:29:54 2008
All probes
reading  chr21
reading  chr22
Making Uniq Index  Thu Nov  6 09:29:59 2008
Maximum copy number:  2327  duplicate probe measurements:  58502
PMProbe Thu Nov  6 09:30:06 2008
Partial probes 327725
reading  chr21
reading  chr22
Getting cel intensities:  Thu Nov  6 09:30:13 2008
reading  MCF_ER_B1.CEL Thu Nov  6 09:30:13 2008
reading  MCF_ER_B3.CEL Thu Nov  6 09:30:15 2008
reading  MCF_ER_B4.CEL Thu Nov  6 09:30:17 2008
reading  MCF_INP_B1.CEL Thu Nov  6 09:30:18 2008
reading  MCF_INP_B3.CEL Thu Nov  6 09:30:20 2008
reading  MCF_INP_B4.CEL Thu Nov  6 09:30:22 2008
Making design matrix Thu Nov  6 09:30:31 2008
Fitting model ...  Thu Nov  6 09:30:32 2008
Model fitting on all unique probes Thu Nov  6 09:30:56 2008
Chr  Position Thu Nov  6 09:30:57 2008
All probes
reading  chr21
reading  chr22
Standardizing Sample: MCF_ER_B1.CEL Thu Nov  6 09:31:00 2008
Standardizing Sample: MCF_ER_B3.CEL Thu Nov  6 09:31:00 2008
Standardizing Sample: MCF_ER_B4.CEL Thu Nov  6 09:31:01 2008
Standardizing Sample: MCF_INP_B1.CEL Thu Nov  6 09:31:01 2008
Standardizing Sample: MCF_INP_B3.CEL Thu Nov  6 09:31:02 2008
Standardizing Sample: MCF_INP_B4.CEL Thu Nov  6 09:31:02 2008
Making MAT score Thu Nov  6 09:31:03 2008
Control Input Variance :  0
100000 chr21 3.74
200000 chr21 6.97
0 chr22 7.14
100000 chr22 11.26
Making FDR table Thu Nov  6 09:31:18 2008
Saving bar files Thu Nov  6 09:31:19 2008
Region calling with cutoff 4.35025299639 Thu Nov  6 09:31:24 2008
Repeat Masking ...  Thu Nov  6 09:31:25 2008
Repeat Masking ...  Thu Nov  6 09:31:26 2008
Reading  P1_CHIP_C.Anti-Sense.hs.NCBIv35.NR.bpmap Thu Nov  6 09:31:27 2008
PMX  PMY  MatchScore Thu Nov  6 09:31:27 2008
All probes
reading  chr22
Making Uniq Index  Thu Nov  6 09:31:31 2008
Maximum copy number:  4487  duplicate probe measurements:  9581
PMProbe Thu Nov  6 09:31:34 2008
Partial probes 327001
reading  chr22
Getting cel intensities:  Thu Nov  6 09:31:40 2008
reading  MCF_ER_C1.CEL Thu Nov  6 09:31:40 2008
reading  MCF_ER_C3.CEL Thu Nov  6 09:31:42 2008
reading  MCF_ER_C4.CEL Thu Nov  6 09:31:44 2008
reading  MCF_INP_C1.CEL Thu Nov  6 09:31:45 2008
reading  MCF_INP_C3.CEL Thu Nov  6 09:31:47 2008
reading  MCF_INP_C4.CEL Thu Nov  6 09:31:49 2008
Making design matrix Thu Nov  6 09:31:58 2008
Fitting model ...  Thu Nov  6 09:31:59 2008
Model fitting on all unique probes Thu Nov  6 09:32:22 2008
Chr  Position Thu Nov  6 09:32:23 2008
All probes
reading  chr22
Standardizing Sample: MCF_ER_C1.CEL Thu Nov  6 09:32:26 2008
Standardizing Sample: MCF_ER_C3.CEL Thu Nov  6 09:32:27 2008
Standardizing Sample: MCF_ER_C4.CEL Thu Nov  6 09:32:27 2008
Standardizing Sample: MCF_INP_C1.CEL Thu Nov  6 09:32:27 2008
Standardizing Sample: MCF_INP_C3.CEL Thu Nov  6 09:32:28 2008
Standardizing Sample: MCF_INP_C4.CEL Thu Nov  6 09:32:28 2008
Making MAT score Thu Nov  6 09:32:30 2008
Control Input Variance :  0
100000 chr22 3.37
200000 chr22 6.35
300000 chr22 9.35
Making FDR table Thu Nov  6 09:32:41 2008
Saving bar files Thu Nov  6 09:32:43 2008
Region calling with cutoff 4.27327837638 Thu Nov  6 09:32:47 2008
Repeat Masking ...  Thu Nov  6 09:32:48 2008
Repeat Masking ...  Thu Nov  6 09:32:48 2008

[susanc@helix SampleData]$