Resources

Burkitt Lymphoma Genome Sequencing Project (BLGSP) Standard Operating Procedures (SOP) Manual

Epstein-Barr Virus (EBV) Sequences from Burkitt Lymphoma Cases Published in Grande, Gerhard et al.,2019

The EBV sequences are available for download as BAM alignments from the Public directory at the DCC: https://cgci-data.nci.nih.gov/Public/BLGSP/WGS/L2/.  

The 106 BAM files made available by open access are the Epstein-Barr virus (EBV) sequences that were extracted from the BLGSP patient cohort genomes included in the following publication:

Grande BM, Gerhard DS, Jiang A, et al. Genome-wide discovery of somatic coding and non-coding mutations in pediatric endemic and sporadic Burkitt lymphoma. Blood. March 2019; 21;133(12):1313-1324. (PMID: 30617194)

The following intentionally stringent criteria were used to ensure that no human reads were included in the BAMs.

  • Only reads aligned to the EBV genome (chrEBV) in the reference (GenBank accession AJ507799.2) were included. 
  • Unmapped reads were excluded. 
  • Reads whose mate did not align to the same chromosome (i.e. chrEBV) were excluded. 
  • Reads with more than 5 clipped bases (soft- or hard-clipped) in case of a split read (e.g. due to an EBV genome integration event) were excluded. 

As an additional check, the number of reads in EBV-negative tumors were counted with the expectation of finding virtually nothing if human reads are not contaminating. Out of 35 EBV-negative genomes, 25 (71%) had exactly zero reads. The remaining genomes, with one exception (which had 90), had at most 19 (range: 1-19) reads. When a few randomly selected reads were attempted to align to the human genome, only short matches (20-30 bp) were found that were expected to be spurious. Therefore, it is believed that these are real EBV reads.

Given that EBV is ubiquitous (e.g. over 90% of adults globally and most African children are infected), it is possible that EBV-infected normal B cells were included at very low levels in otherwise EBV-negative tumor biopsies. This would explain the presence of a few EBV reads found in EBV-negative BL samples. In general, EBV reads are often found in DNA sequencing data. For more information, see http://www.cureffi.org/2013/02/01/the-decoy-genome/ .Therefore, we are confident that there are virtually no human reads in these EBV BAM files, consistent with the strict criteria that were used.

Experimental Methods for Burkitt Lymphoma Genome Sequencing Project

On this page, researchers can find data generation and data analysis protocols from the following manuscript :

Grande BM, Gerhard DS, Jiang A, et al. Genome-wide discovery of somatic coding and non-coding mutations in pediatric endemic and sporadic Burkitt lymphoma. Blood. March 2019; 21;133(12):1313-1324. (PMID: 30617194)

Experimental Methods for HIV+ Tumor Molecular Characterization Project

On this page, researchers can find data generation and data analysis protocols from the following manuscript : 

Gagliardi A, Porter VL, Zong Z, et al. Analysis of Ugandan cervical carcinomas identifies human papillomavirus clade-specific epigenome and transcriptome landscapes. Nat Genet. 2020;52(8):800-810. (PMID: 32747824)

HIV+ Tumor Molecular Characterization Project (HTMCP) Standard Operating Procedures (SOP) Manual

HTMCP-Cervical Cancer Human Papillomavirus (HPV) Transcript References

Supplemental data from the manuscript "Analysis of Ugandan Cervical Carcinomas Identifies Human Papillomavirus (HPV) Clade-specific Epigenome and Transcriptome Landscapes. PMID: 32747824" : 

To probe the impact of viral gene expression on tumor gene expression, HTMCP project team performed unsupervised clustering of viral E1,E2, E6, and E7 transcripts which had annotations associated with them in GenBank (download date: December 2019). A list of these references can be accessed via the “CGCI HTMCP-CC HPV Transcript References (December 2019)