Getting to Know Craig Venter One Base at a Time
Getting to Know Craig Venter One Base at a Time
A review of The Diploid Genome Sequence of an Individual Human
Note: This is a review of the published article listed below. All information, quotes, figures, methods, and findings mentioned in this review are from that article, and are the property of its authors and/or the publication in which the article originally appeared.
As molecular cytogenetic techniques have matured and offered researchers new insights into the intricacies of chromosomal balance, a wealth of new information has emerged. In spite of the rapid pace of data delivery afforded by global interrogation of the genome, researchers have only begun to scratch the surface of genetic variation and the relationship to human development and disease. Groups like the Human Genome Sequencing Consortium and Celera Genomics have dedicated themselves to solving the mysteries of the human genome. To date, researchers have focused on defining human genome variation by assaying and studying single nucleotide polymorphisms (SNPs). Recently, smaller-scale (<100 bp) insertion/deletion sequences (indels) and large-scale structural variants have been proven to be critical components to human biology and disease. To further complicate matters, the generation of a complete diploid genome structure has been hampered by the limitations of haplotype phasing, presenting additional inconsistencies in the availability of biological information. The reconciliation of these issues and the integration of whole-genome sequencing in large disease populations would enable researchers to build an unbiased individual haplotype, addressing the needs of the rapidly growing field of personalized medicine.
To begin to characterize the true nature of the diploid human genome, a consortium of researchers led by Samuel Levy produced a full genome sequence of an individual human, Dr. J. Craig Venter. It was generated from “~32 million random DNA fragments, sequenced by Sanger dideoxy technology, and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. [The team developed] a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome.” To detect copy number variants and compare against those annotated computationally, the research team used several different microarray platforms, including the Agilent Human Genome CGH array, which contains 244,000 60-mer probes on a single slide. Across all microarray platforms, 62 CNVs (32 losses and 30 gains) were identified. “Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb.” These variants, of which a vast many were novel, included single nucleotide polymorphisms (SNPs), block substitutions (2–206 bp), heterozygous insertion/deletion events (indels; 1–571 bp), homozygous indels (1–82,711 bp), and inversions, as well as numerous segmental duplications and copy number variation regions.
While SNP analysis has been an important foundation to defining diploid genome structure, this work proves that non-SNP genetic alterations play a significant role as well. The group found that while “[n]on-SNP DNA variation accounted for 22% of all events identified in the donor, they involved 74% of all variant bases…and 44% of genes were heterozygous for one or more variants.” This data builds upon the collection of information being used to accurately reflect the complete molecular portrait of the diploid human genome, providing valuable insights into chromosomal structure and inherent variation. These insights, and the work of future researchers, will help the burgeoning fields of pharmacogenomics and personalized medicine grow and address medical intricacies from entirely new perspectives.

Table 1. Summary of Variant Types Identified in the HuRef Genome Assembly

Figure 1. Number and Length Distribution of Apparent Homozygous Insertion and Deletion Sequences Greater than 100 bp
Note that the number of indel events are similar but that there are more longer insertions than deletions.

Table 2. Copy Number Variants Identified on the HuRef Sample.
Table 3: CNV Identified in the Donor DNA Overlapping Genes
Click here for Table 3 data file. (11k PDF)
Title: The Diploid Genome Sequence of an Individual Human.
Authors: Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, Macdonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC
Journal: PLoS Biol. 2007 Sep 4;5(10):e254
More