Human genome sequenced using next-gen sequencing technology and aCGH

A review of A highly annotated whole-genome sequence of a Korean individual
Highly annotated whole human genome sequences have the potential to provide insight into genetic variance and ancestry, and to predict phenotypes and outcome. To date, eight human genome sequences, representing individuals from various geographical regions, have been published. An international research team led by Seoul National University has recently obtained a highly annotated whole genome sequence of a Korean individual, AK1, using an approach combining whole-genome shotgun sequencing, targeted bacterial artificial chromosome (BAC) sequencing, and high-resolution microarray-based CGH. 

Massively parallel sequencing using Illumina Genome Analyzers was performed on (1) selected genomic regions, including chromosome 20 and 390 regions commonly affected by copy number variants (CNVs), at very high depth (151x average coverage) using overlapping AK1 BAC clones, and (2) the whole genome using libraries of AK1 genomic DNA, with 27.8x average coverage (Table 1).  The researchers found that 74.4% of the sequences aligned to the NCBI reference genome, and 99.8% of the reference genome was represented.  Using filtering criteria, the team identified approximately 3.45 million single-nucleotide polymorphisms (SNPs), of which 17.1% were novel and 10,162 were non-synonymous, and validated these results with SNP genotyping arrays, deep sequencing of chromosome 20 BAC clones, and Sanger resequencing. The researchers identified 170,202 deletion or insertion polymorphisms (indels), and found strong correlation between SNP and indel densities genome-wide (Figure 1).  Sequence comparisons with four other published human genomes (J. Watson, C. Venter, Han Chinese, and Yoruban African) indicated that 21% of AK1’s SNPs were unique, and 8% of the approximately 9.5 million overlapping SNPs were common to all (Figure 1). Approximately 2.1 million AK1 SNPs were found to be heterozygous, resulting in higher SNP diversity than the Watson, Venter, or Han Chinese genomes, but less than the Yoruban African genome. 

The research team implemented complementary array CGH and sequencing approaches to detect CNVs. Using 24 Agilent 1x1 million custom-designed CGH arrays containing 24 million oligonucleotide probes, the team initially detected 1,237 CNV regions. The researchers applied very conservative criteria to yield highly reliable CNVs and identified 238 deletions, ranging from 277 bases to 196,900 bases and totaling 2.4 Mb, and 77 copy number gains, totaling 7.0 Mb. Of these CNVs, 148 of the deletions and 33 of the gains had not been previously described in the Database of Genomic Variants and were therefore considered to be novel.  Figure 2 shows an example of a copy number gain identified by the Agilent CGH microarray and confirmed by diploid sequencing.

A comparison of non-synonymous SNPs (nsSNPs) in the AK1, Chinese and Yoruban genomes showed that they share a common subset of genes enriched for nsSNPs (Figure 3a). An analysis of the implications of AK1 variants revealed 773 SNPs potentially associated with different phenotypes (Figure 3b) and 106 genes affected by CNV losses (Figure 3c). 

In conclusion, this study demonstrates that using a combination of next-generation sequencing technology and high-resolution array CGH enables highly accurate annotation of individual whole genome sequences and provides more comprehensive and reliable data on structural variations.


Table 1. Overview of libraries and sequence data






Figure 1. Geographic map and Venn diagram of five sequenced genomes, indel distribution, and SNP-indel densities correlation.
(a) Geographic map showing the regions of ancestry of five sequenced genomes. MT type, mitochondrial haplogroup. (b) The number of SNPs overlapping between five genomes. (c) Correlation between SNP–indel densities on chromosome 6 (per 10-kb window). From top: SNP density, indel density, SNP–indel density (moving average of ten 10-kb windows), SNP density in a portion of chromosome 6, and indel density along the same portion of chromosome 6. The x axis represents the nucleotide position in Mb.







Figure 2. Representative examples of genomic variations in AK1.
(a) Homozygous deletion identified by targeted haploid sequencing (top) and diploid sequencing (bottom). Stretched sequencing pairs and a drop in sequencing coverage define the deletion in both panels. Chr: chromosome. (b) Heterozygous deletion identified by targeted haploid sequencing (top) and confirmed by diploid sequencing (bottom). Stretched pairs confirm the deletion in the diploid sequence but complete coverage drop is not detected. (c) Copy number gain is identified by CGH microarray (top) and confirmed by increased coverage for the corresponding genomic region by diploid sequencing (bottom). For all panels: blue, fold coverage; horizontal red lines, stretched sequence pairs; green, CNV region in the DGV; grey, gene; vertical red bars, homozygous SNPs; vertical black bars, heterozygous SNPs; and broken vertical grey lines define the boundaries of the structural variants.





Figure 3. Potential implications of AK1 variants and comparisons of non-synonymous SNPs among three sequenced genomes.
(a) Top, the numbers of non-synonymous SNPs (nsSNPs) and genes containing non-synonymous SNPs are compared between the Korean (AK1), Han Chinese (YH) and Yoruban (NA18507) genomes. Bottom, comparison of non-synonymous SNPs and genes containing non-synonymous SNPs in AK1 with those in the YH and Yoruban genomes. Common denotes shared by three genomes. Left axis: number of nsSNPs (blue) or genes containing nsSNPs (red); right axis: ratio (%) of the number of nsSNP genes to the number of nsSNPs (green). (b) Seven-hundred-and-seventy-three SNPs potentially associated with phenotypes derived from the database of human gene mutation data (HGMD), OMIM, SNPedia and other hypotheses. DM, diabetes mellitus; NIDDM, non-insulin-dependent diabetes mellitus; TB, tuberculosis. (c) Genes affected by large homozygous and heterozygous chromosomal deletions.






Title: A highly annotated whole-genome sequence of a Korean individual

Journal: Nature. 2009 Aug 20;460(7258):1011-5.

Authors: Kim JI, Ju YS, Park H, Kim S, Lee S, Yi JH, Mudge J, Miller NA, Hong D, Bell CJ, Kim HS, Chung IS, Lee WC, Lee JS, Seo SH, Yun JY, Woo HN, Lee H, Suh D, Lee S, Kim HJ, Yavartanoo M, Kwak M, Zheng Y, Lee MK, Park H, Kim JY, Gokcumen O, Mills RE, Zaranek AW, Thakuria J, Wu X, Kim RW, Huntley JJ, Luo S, Schroth GP, Wu TD, Kim H, Yang KS, Park WY, Kim H, Church GM, Lee C, Kingsmore SF, Seo JS

Read Paper