Genome Sequencing Section 

DOE Human Genome Program Contractor-Grantee Workshop VII 
January 12-16, 1999  Oakland, CA


13. Concatenation cDNA Sequencing and Analysis of 500 Human Brain cDNA Clones 

Wei Yu, John Bouck, James H. Gorrell, Donna M. Muzny, and Richard A. Gibbs 
Human Genome Sequencing Center, Department of Molecular and Human Genetics Baylor College of Medicine, Houston, Texas 77030 
agibbs@bcm.tmc.edu 

Using a shotgun based strategy entitled Concatenation cDNA Sequencing (CCS), we have completed sequencing of 503 random selected cDNA clones with a total length of 807 kb from Homo sapiens brain cDNA library (1NIB). All sequence data have been annotated and submitted to GenBank. The statistics from completed projects have shown that CCS is as efficient as sequencing of single large DNA fragment, and the reads/kb range from 13-21 with an average of 16.8 and the number of primers/kb ranges from 0.62-1.8 with an average of 1.02. Computer analysis was performed to search for the similarity against the public database. Of the 471 clone sequences used for DNA similarity searches, 255 (54%) were not matched to any sequences in the non-redundant database. The remaining 216 were matched to previously defined sequences or known genes from human to other organisms. Of the 471 clone sequences, 230 clones (48.9%) possess putative complete and incomplete open reading frames with a minimal length of 100 amino acids. When all 471 cDNA sequences were compared to the protein sequences in the database, 255 were not assigned definitely to any known protein. For the remaining 216 clones, 145 displayed similarities to previously deposited protein sequences, providing a consistent search result between nucleic and amino acid data from each clone. There were 71 clones that failed to reveal any protein match despite their corresponding DNA similarity matches with database entries. To determine the amount of unique information that our cDNA clone sequences were adding to the database, we examined the distribution of 243 clones which have been incorporated into the unigene database maintained by the NCBI. When the 243 cDNA sequences were compared to the representative sequences from the unigene database, we found 10 cDNA sequences contained weak matches to representative clone, but were not included in unigene clusters. Of the 233 clusters that were matched, nearly all of them contained multiple sequences in each cluster. But when the same 233 clone sequences were used to compare to mRNA/gene sequences in each cluster, 143 (61%) clusters contained only one single mRNA/gene sequence, which is our cDNA sequences. The majority of the cDNA clones were found in small clusters with only a few other mRNA or EST. 


 
Home Sequencing Functional Genomics
Author Index Sequencing Technologies Microbial Genome Program
Search Mapping Ethical, Legal, & Social Issues
Order a copy Informatics Infrastructure