|
|
|
|
Research Project:
DEVELOPMENT OF BIOINFORMATICS TOOLS FOR LIVESTOCK
Location: Bovine Functional Genomics
Title: Identification of conserved regulatory elements in upstream promoter regions of mammals at relaxed thresholds by comparative genomics - a case study using PEPCK
Authors
Submitted to: Genome Biology
Publication Type:
Peer Reviewed Journal
Publication Acceptance Date: June 25, 2007
Publication Date: N/A
Interpretive Summary: Comparative genomics is the primary method to discover regulatory elements by identifying conserved genetic sequences by cross-species genome comparison. Except for the most conserved and prominent transcription factor binding sites (TFBS), there is a general lack of agreement between in silico predictions and experimental results for most of TFBS, particularly, for those less conserved but biologically active elements which might be relevant to the tissue- and temporal-specific transcription regulation. A detailed quality control and benchmarking of in silico predictions is currently missing. We designed a systematic approach, combining position weight matrixes (PWM from JASPAR) and phylogenetic footprinting algorithm (TFLOC), to identify less conserved but biologically active TFBS in mammalian promoter regions. Using human, mouse and rat promoter sequence alignments as input, we applied this approach to the upstream 1 kb promoter regions of all available RefSeq genes. Computational prediction was compared with previously known sites of PEPCK (Phosphoenolpyruvate Carboxykinase, Cytolsolic isoform, pck1). This approach produced a sensitivity over 75% and a true-positive rate about 32%. With previously known TFBS being correctly predicted, some novel candidate sites were revealed. The newly discovered sites were further confirmed by experimental verifications including gel shifting and in vitro reporter assays. This approach provides an accessible resource for developing transcription research hypotheses and the TFBS dataset for all available RefSeq genes is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.
Technical Abstract: Background
Comparative genomics is the primary method to discover regulatory elements by identifying conserved sequences due to evolutionary constraints by cross-species genome comparison. Except for the most conserved and prominent transcription factor binding sites (TFBS), there is a general lack of cross reference between the in silico predictions and experimental results for most of TFBS. Particularly, for those less conserved but biologically active elements which might be relevant to the tissue- and temporal-specific transcription regulation, a detailed quality control and benchmarking of in silico predictions is currently missing.
Results
A systematic approach, combining position weight matrixes (PWM from JASPAR) and phylogenetic footprinting algorithm (TFLOC), was implemented to identify less conserved but biologically active TFBS in mammalian promoter regions. Using human, mouse and rat promoter sequence alignments as input, this approach was applied to the upstream 1 kb promoter regions of all available RefSeq genes. Computational prediction was compared with previously known sites of PEPCK (Phosphoenolpyruvate Carboxykinase, Cytolsolic isoform, pck1). This approach produced a reasonable sensitivity over 75% and a true-positive rate about 32%. With previously known TFBS being correctly predicted, some novel candidate sites were revealed. The newly discovered sites were further confirmed by experimental verifications including gel shifting and in vitro reporter assays.
Conclusions
This approach is featured with expandable TFBS matrix, adjustable threshold, and is compatible with the whole genome analysis. It provides an accessible resource for developing transcription research hypotheses and the TFBS dataset for all available RefSeq genes is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.
|
|
|
|
|
|
Last Modified: 05/12/2009
|
|