Discovering New Complexity in Human Copy-Number Variants
Discovering New Complexity in Human Copy-Number Variants
A review of The Fine-Scale and Complex Architecture of Human Copy-Number Variation
Note: This is a review of the published article listed below. All information, quotes, figures, methods, and findings mentioned in this review are from that article, and are the property of its authors and/or the publication in which the article originally appeared.
Copy number variable regions are prevalent in the human genome and there is great interest in understanding their functional significance. Precise copy number variant (CNV) breakpoints (physical deletion or duplication boundaries) are important to understanding mechanisms of CNV formation and are critical for relating genotype to phenotype. Knowledge about the fine-scale architecture of CNV regions is vital to understanding how these variations may be associated with the development of disease or other adverse phenotypes. Toward this end, George Perry and colleagues (2008) constructed a high-resolution Agilent comparative genomic hybridization microarray set comprising approximately 470,000 oligonucleotide probes approaching 1 kb spacing through 2191 recognized CNV regions and their flanking regions. The study interrogated human DNA samples from four populations of the International HapMap project. This high-resolution analysis identified CNVs in 1153 (53%) of the 2191 regions. In addition, these results were compared to those from the Redon et al. study that used two different genome-wide platforms to identify CNVs in the same individuals that were sampled in this study. By comparing CNV calls which were defined as high confidence CNVs (calls made by both the WGTP and 500K EA platforms in the same direction (i.e., gain or loss) for the same individual), 97% overlap based on breakpoints from the WGTP calls. This demonstrates that the Agilent platform has a low false-negative rate for CNVs that were consistently identified across multiple platforms. Perry and colleagues present evidence that the total genomic content of many common CNVs is smaller than previously thought. The Agilent CNV-enriched arrays enabled the researchers to estimate CNV breakpoints to approximately 1 kb resolution. Their data show more than 50% reduction in the total amount CNV sequence for 876 regions from the Database of Genomic Variants (Figure 1). They applied PCR amplification and sequencing over several CNV breakpoints to evaluate their estimates and better understand the mechanisms of CNV formation. When the breakpoint-region sequences were compared to a random set of similar size genomic sequences, the researchers unexpectedly found that simple tandem repeats were considerably enriched in the CNV breakpoint-region sequences (Figure 2). Further, the group discovered previously uncharacterized architectural complexity, including smaller CNVs located within larger ones and breakpoint variability between individuals (Figure 3). These findings provide important new insights into genomic diversity and reveal the need for assessing CNV complexity in disease association studies.

Figure 1. Size Distribution of CNVs from the Database of Genomic Variants, with Corresponding CNVs from This Study
We identified CNVs in at least one individual for 1153 of 2191 putative CNV regions annotated in the Database of Genomic Variants (DGV) as of 30 November 2006. Size distributions for these regions are shown in log scale, with 10-fold multiples of 1 and O10, based on the size of each region from DGV and the estimates from our study of the total amount of copy-number-variable sequence within and overlapping the DGV-defined region. Our estimates were smaller than the corresponding DGV region for 1020 of the 1153 loci (88%) and smaller by more than 50% for 876 regions (76%).

Figure 2. Enrichment for Tandem Repeats within IndividualCNV Breakpoint-Region Sequences
This figure depicts the empirical cumulative distribution of the observed longest repeated subsequence ki (k 3 i), where k ¼ the length of the repeated subsequence and i ¼ the number of recurrences within the sequence, for the sequences between the copy number-variable probes at CNV boundaries and the adjacent noncopy-number-variable probes estimated to harbor breakpoints in our study (CNV breakpoint sequences; approximately 1 kb each), sequences from between random pairs of adjacent non-CNV probes on the array (random interprobe sequences), and a random set of genome-wide sequences. The random sequences were selected such as to not alter the characteristics of the observed set of CNV calls, in terms of lengths and proximity of the end sequences. The graph reflects only the significant end of the distribution—the top 100 sequences as ranked by ki. A larger proportion of CNV breakpoint-region sequences contain long tandem repeats than the random sequences.

Figure 3. Validation of Architecturally Complex CNV Regions by qPCR
We used a series of quantitative PCR (qPCR) probes positioned across CNV regions to validate the patterns of architectural complexity observed with our CNV-enriched array. The probe-by-probe log2 ratios depicted in the heatmaps (see scale bars) illustrate examples of a smaller CNV inside a larger one on chromosome 4 at 162.2 Mb (A) and a CNV with immediately adjacent and variably present CNVs (i.e., juxtaposed gain and loss CNV calls in the same individual) on chromosome 6 at 0.2 Mb (B). The relative genomic positions of the probes are depicted with black lines, with midpoint positions (hg17) provided for selected probes (thicker lines). For each CNV, qPCR primers were designed at intervals throughout and flanking the CNV region and tested on all individuals depicted in the heatmaps. The qPCR results (i.e., relative copy number to the reference individual NA10851) are consistent with the aCGH results provided as log ratio (i.e., to be on a consistent scale with the qPCR results) for each interval. Error bars represent the SD. See Table S11 for qPCR primers and results.
Title: The Fine-Scale and Complex Architecture of Human Copy-Number Variation
Authors: George H. Perry, Amir Ben-Dor, Anya Tsalenko, Nick Sampas, Laia Rodriguez-Revenga, Charles W. Tran, Alicia Scheffer, Israel Steinfeld, Peter Tsang, N. Alice Yamada, Han Soo Park, Jong-Il Kim, Jeong-Sun Seo, Zohar Yakhini, Stephen Laderman, Laurakay Bruhn, and Charles Lee
Journal: American Journal of Human Genetics 82, 1-11, March 2008
Download Paper