Science. 2010 Dec 24;330(6012):1787-97. Epub 2010 Dec 22.
Identification of functional elements and regulatory circuits by Drosophila modENCODE.
modENCODE Consortium,
Roy S,
Ernst J,
Kharchenko PV,
Kheradpour P,
Negre N,
Eaton ML,
Landolin JM,
Bristow CA,
Ma L,
Lin MF,
Washietl S,
Arshinoff BI,
Ay F,
Meyer PE,
Robine N,
Washington NL,
Di Stefano L,
Berezikov E,
Brown CD,
Candeias R,
Carlson JW,
Carr A,
Jungreis I,
Marbach D,
Sealfon R,
Tolstorukov MY,
Will S,
Alekseyenko AA,
Artieri C,
Booth BW,
Brooks AN,
Dai Q,
Davis CA,
Duff MO,
Feng X,
Gorchakov AA,
Gu T,
Henikoff JG,
Kapranov P,
Li R,
MacAlpine HK,
Malone J,
Minoda A,
Nordman J,
Okamura K,
Perry M,
Powell SK,
Riddle NC,
Sakai A,
Samsonova A,
Sandler JE,
Schwartz YB,
Sher N,
Spokony R,
Sturgill D,
van Baren M,
Wan KH,
Yang L,
Yu C,
Feingold E,
Good P,
Guyer M,
Lowdon R,
Ahmad K,
Andrews J,
Berger B,
Brenner SE,
Brent MR,
Cherbas L,
Elgin SC,
Gingeras TR,
Grossman R,
Hoskins RA,
Kaufman TC,
Kent W,
Kuroda MI,
Orr-Weaver T,
Perrimon N,
Pirrotta V,
Posakony JW,
Ren B,
Russell S,
Cherbas P,
Graveley BR,
Lewis S,
Micklem G,
Oliver B,
Park PJ,
Celniker SE,
Henikoff S,
Karpen GH,
Lai EC,
MacAlpine DM,
Stein LD,
White KP,
Kellis M.
Roy S, Ernst J, Kheradpour P, Bristow CA, Lin MF, Washietl S, Ay F, Meyer PE, Di Stefano L, Candeias R, Jungreis I, Marbach D, Sealfon R, Kellis M, Landolin JM, Carlson JW, Booth B, Brooks AN, Davis CA, Duff MO, Kapranov P, Samsonova AA, Sandler JE, van Baren MJ, Wan KH, Yang L, Yu C, Andrews J, Brenner SE, Brent MR, Cherbas L, Gingeras TR, Hoskins RA, Kaufman TC, Perrimon N, Cherbas P, Graveley BR, Celniker SE, Comstock CL, Dobin A, Drenkow J, Dudoit S, Dumais J, Fagegaltier D, Ghosh S, Hansen KD, Jha S, Langton L, Lin W, Miller D, Tenney AE, Wang H, Willingham AT, Zaleski C, Zhang D, Kharchenko PV, Tolstorukov MY, Alekseyenko AA, Gorchakov AA, Gu T, Minoda A, Riddle NC, Schwartz YB, Elgin SC, Kuroda MI, Pirrotta V, Park PJ, Karpen GH, Acevedo D, Bishop EP, Gadel SE, Jung YL, Kennedy CD, Lee OK, Linder-Basso D, Marchetti SE, Shanower G, Nègre N, Ma L, Brown CD, Spokony R, Grossman RL, Posakony JW, Ren B, Russell S, White KP, Auburn R, Bellen HJ, Chen J, Domanus MH, Hanley D, Heinz E, Li Z, Meyer F, Miller SW, Morrison CA, Scheftner DA, Senderowicz L, Shah PK, Suchy S, Tian F, Venken KJ, White R, Wilkening J, Zieba J, Eaton ML, MacAlpine HK, Nordman JT, Powell SK, Sher N, Orr-Weaver TL, MacAlpine DM, DeNapoli LC, Ding Q, Eng T, Kashevsky H, Li S, Prinz JA, Robine N, Berezikov E, Dai Q, Okamura K, Lai EC, Dai Q, Hannon GJ, Hirst M, Marra M, Rooks M, Zhao Y, Henikoff JG, Sakai A, Ahmad K, Henikoff S, Bryson TD, Arshinoff BI, Washington NL, Carr A, Feng X, Perry MD, Kent WJ, Lewis SE, Micklem G, Stein LD, Barber G, Chateigner A, Clawson H, Contrino S, Guillier F, Hinrichs AS, Kephart ET, Lloyd P, Lyne R, McKay S, Moore RA, Mungall C, Rutherford KM, Ruzanov P, Smith R, Stinson EO, Zha Z, Artieri CG, Li R, Malone JH, Sturgill D, Oliver B, Jiang L, Mattiuzzo N, Will S, Berger B, Feingold EA, Good PJ, Guyer MS, Lowdon RF.
Source
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, MA 02139, USA.
Abstract
To gain insight into how genomic information is translated into cellular and developmental programs, the Drosophila model organism Encyclopedia of DNA Elements (modENCODE) project is comprehensively mapping transcripts, histone modifications, chromosomal proteins, transcription factors, replication proteins and intermediates, and nucleosome properties across a developmental time course and in multiple cell lines. We have generated more than 700 data sets and discovered protein-coding, noncoding, RNA regulatory, replication, and chromatin elements, more than tripling the annotated portion of the Drosophila genome. Correlated activity patterns of these elements reveal a functional regulatory network, which predicts putative new functions for genes, reveals stage- and tissue-specific regulators, and enables gene-expression prediction. Our results provide a foundation for directed experimental and computational studies in Drosophila and related species and also a model for systematic data integration toward comprehensive genomic and functional annotation.
- PMID:
- 21177974
- [PubMed - indexed for MEDLINE]
- PMCID:
- PMC3192495
Free PMC ArticleFig. 1
Overview of Drosophila modENCODE data sets. Range of genomic elements and trans factors studied, with relevant techniques and resulting genome annotations. hnRNA, heterogeneous nuclear RNA.
Science. 2010 December 24;330(6012):1787-1797.
Fig. 2
Coding and noncoding genes and structures. (A) Extended region of male-specific expression in chromosome 2R including new protein-coding and noncoding transcripts. MIP03715 contains two short ORFs of 23 and 21 codons, respectively. ORF multispecies alignments (color coded) show abundant synonymous (bright green) and conservative (dark green) substitutions and a depletion of nonsynonymous substitutions (red), indicative of protein-coding selection [ratio of nonsynonymous to synonymous substitutions (dN/dS) < 1 for both, P < 10−7 and P < 10−11, respectively, likelihood ratio test]. Surrounding regions show abundant stop codons (blue, magenta, yellow) and frame-shifted positions (orange). (B) A transcribed region in chromosome 3R (26,572,290 to 26,573,456), identified by RNA-seq and supported by promoter-specific and transcription-associated chromatin marks, shows RNA secondary-structure conservation in eight Drosophila species. (C) Example of a new miRNA derived from a protein-coding exon of CG6700, with 21- to 23-nt RNAs indicative of Drosha/Dicer-1 processing and also recovered in AGO1-immunoprecipitate libraries from S2 cells and adult heads indicative of Argonaute loading. Evolutionary evidence suggests protein-coding constraint, no conservation for the mature arm, and conservation of the star arm. Red boxes indicate 8-mer “seed” sequence potentially mediating 3′ UTR targeting.
Science. 2010 December 24;330(6012):1787-1797.
Fig. 3
Chromatin-based annotation of functional elements. (A) Average enrichment profiles of histone marks, chromosomal proteins, and physical chromatin properties at genes, origins of replications, insulator proteins, and TF binding positions. Each panel shows 4 kb centered at a specified location, either proximal to TSS (prox.) or distal (dist.). (B) Example of a transcript predicted by chromatin signatures associated with promoter (red trace) and gene bodies (blue box) and supported by cDNA evidence. Strong RNA Pol II and H3K4me3 peaks in the promoter region and strong H2B ubiquitination extending toward the previously annotated luna gene are confirmed by RNA-seq junction reads that were not used in the prediction. (C) Intergenic H3K36me1 chromatin signatures predict replication activity. Enrichment of multiple chromatin marks were used to identify putative large (>10 kbp) intergenic H3K36me1/H3K18ac domains located outside of annotated genes. Although these marks generally correspond to long introns within transcripts, their intergenic domains were enriched for replication activity (fig. S5). In this example from BG3 cells, such a domain was found upstream of the bi locus and is associated with early replication, contains an early origin, is enriched for ORC binding, and is further supported by NippedB binding.
Science. 2010 December 24;330(6012):1787-1797.
Fig. 4
Discovery and characterization of chromatin states and their functional enrichments. Combinatorial patterns of chromatin marks in S2 and BG3 cells reveal chromatin states associated with different classes of functional elements. A discrete model (states d1 to d30) captures the presence/absence information, and a continuous model (states c1 to c9) also incorporates mark intensity information (22). States were learned solely from mapped locations of marks (left) and were associated with modENCODE-defined elements (right) with most pronounced patterns in euchromatin (green) and heterochromatin (blue) shown here (additional variations shown in fig. S6).
Science. 2010 December 24;330(6012):1787-1797.
Fig. 5
High-occupancy TF binding regions and their relation to motifs, ORC, and chromatin. (A) Enrichment of known motifs for regions bound by corresponding TF, sorted by average complexity, denoting the number of distinct TFs bound in the same region. For eight TFs, motifs are depleted (blue) for higher-complexity regions, suggesting non–sequence-specific recruitment. In seven of eight cases, known motifs were enriched in bound regions (Enrich), suggesting sequence-specific recruitment in lower-complexity regions. For each factor, binding sites were highly reproducible between replicates (Reprod). (B) ORC versus TF complexity. The relation between HOT spot complexity (x axis) and enrichment in ORC binding (y axis). (C) Discovered motifs in high- or low-complexity regions (boxed range) and their enrichment in regions of higher (red) or lower (blue) complexity. M1 to M5 are candidate “drivers” of HOT region establishment.
Science. 2010 December 24;330(6012):1787-1797.
Fig. 6
Genome coverage by modENCODE data sets. (A) Unique (bars) and cumulative (lines) coverage of nonrepetitive (blue line) and conserved (red line) genomes. (B) Multiple coverage for data sets grouped into transcribed elements (red), bound regulators (blue), and chromatin domains (green) (17). Across all three classes (black), 10.8% of the genome is covered 15 or more times, and 69.5% is covered at least twice. (C) Increased coverage in a Chr2R region with no prior annotation (left half), now showing multiple overlapping data sets. Coverage by different tracks is highly clustered (fig. S11), with some regions showing little coverage and others densely covered by many types of data.
Science. 2010 December 24;330(6012):1787-1797.
Fig. 7
Properties of the physical regulatory network. (A) Hierarchical view of mixed ChIP-based/miRNA physical regulatory network that combines transcriptional regulation by 76 TFs (green) from ChIP experiments and posttranscriptional regulation by 52 miRNAs (red). TFs are organized in a five-level hierarchy on the basis of their relative proportion of TF targets versus TF regulators. miRNAs are separated into two groups: the ones that are regulated by TFs (left) and the ones that only regulate TFs (right). The horizontal position of the TFs in each level shows whether they regulate miRNAs (left), have no regulation to or from miRNAs (middle), or do not regulate but are targeted by miRNAs (right). Different shades of green and red represent the total number of target genes for TFs and miRNAs, respectively (darker nodes indicate more targets). Ninety-two percent of TF regulatory connections are downstream connections from higher levels to lower levels (green), and only 8% are upstream (blue). miRNA regulatory connections are red. (B) Highly enriched network motifs in a mixed physical regulatory network including TFs (green), miRNAs (red), and target genes (black). For each motif, five examples are shown. Known activators, blue; known repressors, red; other TFs, black.
Science. 2010 December 24;330(6012):1787-1797.
Fig. 8
Gene function prediction from coexpression and co-regulation patterns. Receiver operator characteristic curves for GO terms with predicted new members and area-under-the-curve statistics. False negatives for each GO term are predictions for genes previously annotated for “incompatible” GO terms, defined as pairs of GO terms that have less than 10% common genes relative to the union of their gene sets.
Science. 2010 December 24;330(6012):1787-1797.
Fig. 9
Predictive models of regulator, region, and gene activity. (A) Dynamic regulatory map produced by DREM predicts stage-specific regulators associated with expression changes (y axis, log space relative to first time point) across developmental stages (x axis) (17). Each path (colored lines) indicates the average expression of a group of genes (solid circles) and its standard deviation (size of circle). Predicted bifurcation events, or splits, (open circles) are numbered 1 through 19. The colored insets show the expression level of each individual gene going through the split and ranked regulators from the physical (black) or functional (blue) regulatory network associated with the higher (H), lower (L), or middle (M) path. The uncolored inset shows the expression of repressor SU(HW), whose expression decrease coincides with an expression increase of its targets (red asterisk). (B) Predicted S2 activators (top group) or repressors (bottom group), based on the coherence between relative expression of the TF in S2 (yellow) versus BG3 (green) and the relative motif enrichment (red) or depletion (blue) in S2 versus BG3 for activating (left columns) or repressive marks (right columns). (C) True (top of shaded area) and predicted (dotted blue line) expression levels for target genes, from the expression levels of inferred activators (red) and repressors (green). Only the top five positive and negative regulators are shown, ranked by their contribution to the expression prediction (weight of linear-regression model). Examples are shown from 8 of 1487 predictable genes, ranked by prediction quality scores (rank in upper right corner), evaluated as the averaged squared error between predicted and true expression levels across the time course. An expanded set of examples is shown in fig S23.
Science. 2010 December 24;330(6012):1787-1797.
Publication Types
MeSH Terms
Substances
Grant Support
Full Text Sources
Other Literature Sources
Miscellaneous