Bovine Functional Genomics Site Logo
ARS Home About Us Helptop nav spacerContact Us En Espanoltop nav spacer
Printable VersionPrintable Version     E-mail this pageE-mail this page
Agricultural Research Service United States Department of Agriculture
Search
  Advanced Search
Programs and Projects
 

Research Project: DEVELOPMENT OF BIOINFORMATICS TOOLS FOR LIVESTOCK

Location: Bovine Functional Genomics

Title: Identification of conserved regulatory elements in upstream promoter regions of mammals at relaxed thresholds by comparative genomics - a case study using PEPCK

Authors
item Liu, Ge
item Weirauch, Matthew - UNIV OF CA SANTA CRUZ
item Van Tassell, Curtis
item Li, Robert
item Sonstegard, Tad
item Matukumalli, Lakshmi - GEORGE MASON UNIVERSITY
item Connor, Erin
item Hanson, Richard - CASE WESTERN UNIVERSITY
item Yang, Jianqi - CASE WESTERN UNIVERSITY

Submitted to: Genome Biology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: June 25, 2007
Publication Date: N/A

Interpretive Summary: Comparative genomics is the primary method to discover regulatory elements by identifying conserved genetic sequences by cross-species genome comparison. Except for the most conserved and prominent transcription factor binding sites (TFBS), there is a general lack of agreement between in silico predictions and experimental results for most of TFBS, particularly, for those less conserved but biologically active elements which might be relevant to the tissue- and temporal-specific transcription regulation. A detailed quality control and benchmarking of in silico predictions is currently missing. We designed a systematic approach, combining position weight matrixes (PWM from JASPAR) and phylogenetic footprinting algorithm (TFLOC), to identify less conserved but biologically active TFBS in mammalian promoter regions. Using human, mouse and rat promoter sequence alignments as input, we applied this approach to the upstream 1 kb promoter regions of all available RefSeq genes. Computational prediction was compared with previously known sites of PEPCK (Phosphoenolpyruvate Carboxykinase, Cytolsolic isoform, pck1). This approach produced a sensitivity over 75% and a true-positive rate about 32%. With previously known TFBS being correctly predicted, some novel candidate sites were revealed. The newly discovered sites were further confirmed by experimental verifications including gel shifting and in vitro reporter assays. This approach provides an accessible resource for developing transcription research hypotheses and the TFBS dataset for all available RefSeq genes is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.

Technical Abstract: Background Comparative genomics is the primary method to discover regulatory elements by identifying conserved sequences due to evolutionary constraints by cross-species genome comparison. Except for the most conserved and prominent transcription factor binding sites (TFBS), there is a general lack of cross reference between the in silico predictions and experimental results for most of TFBS. Particularly, for those less conserved but biologically active elements which might be relevant to the tissue- and temporal-specific transcription regulation, a detailed quality control and benchmarking of in silico predictions is currently missing. Results A systematic approach, combining position weight matrixes (PWM from JASPAR) and phylogenetic footprinting algorithm (TFLOC), was implemented to identify less conserved but biologically active TFBS in mammalian promoter regions. Using human, mouse and rat promoter sequence alignments as input, this approach was applied to the upstream 1 kb promoter regions of all available RefSeq genes. Computational prediction was compared with previously known sites of PEPCK (Phosphoenolpyruvate Carboxykinase, Cytolsolic isoform, pck1). This approach produced a reasonable sensitivity over 75% and a true-positive rate about 32%. With previously known TFBS being correctly predicted, some novel candidate sites were revealed. The newly discovered sites were further confirmed by experimental verifications including gel shifting and in vitro reporter assays. Conclusions This approach is featured with expandable TFBS matrix, adjustable threshold, and is compatible with the whole genome analysis. It provides an accessible resource for developing transcription research hypotheses and the TFBS dataset for all available RefSeq genes is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.

   

 
Project Team
Van Tassell, Curtis - Curt
Liu, Ge
 
Publications
   Publications
 
Related National Programs
  Food Animal Production (101)
 
 
Last Modified: 11/07/2008
ARS Home | USDA.gov | Site Map | Policies and Links 
FOIA | Accessibility Statement | Privacy Policy | Nondiscrimination Statement | Information Quality | USA.gov | White House