Hybrid Selection of Targets for Next-Generation Sequencing

A review of Solution Hybrid Selection with Oligonucleotide Probes for Massively Parallel Targeted Sequencing

Cost-effective massively parallel sequencing requires efficient and robust methods for enriching high-value targets. A team of researchers has recently reported a novel capture method for preparing sequencing templates that are enriched for targeted regions of the genome (Figure 1). Biotinylated 170mer RNA capture probes ("bait") were transcribed from PCR-amplified 200mer oligodeoxynucleotides synthesized in parallel on an Agilent wafer by SurePrint technology. A complex mixture of 22,000 non-overlapping probes designed for exon capture and 10,000 probes designed for the capture of long contiguous genome regions was used to hybridize to selected targets in a sample of genomic DNA fragments derived from a human cell line. After hybridization in solution and capture on streptavidin-coated magnetic beads, the hybrid-selected DNA targets ("catch") were PCR-amplified and sequenced to generate 36-base Illumina reads.

The high concentration of ultra-long custom-made Agilent oligonucleotide probes enabled specific and efficient capture. For 15,565 targeted exons (2.5 Mb), approximately 90% of the bases that aligned uniquely to the human reference genome were on or within 500 bp of a probe sequence. Shotgun sequencing gave a higher proportion of exon versus near-exon bases compared to end-sequencing (Figure 2). For longer contiguous targets (1.7 Mb), 95% of the bases aligned uniquely to the targeted genome segments. Coverage profiles exhibited peaks for unique segments that were targeted and valleys for non-targeted repeats (Figure 3). Over 60% of the targeted exon bases achieved coverage equal or greater to the normalized coverage level (Figure 4a), while 80% of the bases in targeted regional sequencing received at least half the mean coverage (Figure 4b). Excellent experiment-to-experiment and sample-to-sample reproducibility was demonstrated (Figure 5). Furthermore, SNP concordance of genotype calls and known HapMap genotypes was determined to be greater than 99%. The authors conclude that solution hybrid selection is a flexible, robust, scalable and economical method for enriching candidate subsets of the human genome, thereby enabling efficient deep targeted next-generation sequencing of thousands of exons as well as megabase regions.

 

Figure 1. Overview of solution hybrid selection.
Illustrated are steps involved in the preparation of a complex pool of biotinylated RNA capture probes ("bait"; top left), whole-genome fragment input library ("pond"; top right) and hybrid-selected enriched output library ("catch"; bottom). Two sequencing targets and their respective capture probes are indicated in red and blue. Thin and thick lines represent single and double strands, respectively.

 

Figure 2. Coverage profiles of exon targets by end sequencing and shotgun sequencing and shotgun sequencing.Shown are cumulative coverage profiles that sum the per-base sequencing coverage along 7,052 single-bait target exons. Only free-standing baits that were not within 500 bases of another one were included in this analysis. (a) End sequencing with 36-base reads produced a bimodal profile with high sequence coverage near and slightly beyond the ends of the 170-base baits (indicated by the horizontal bar). (b) Shotgun sequencing of a capture from a different pond library (containing fragments with generic rather than Illumina-specific adapters) with 36-base reads after concatenating and reshearing have more coverage on bait (shaded area) that near bait. (c) Resequencing of the first capture with 76-base end reads had a similar effect, although the peak was slightly wider and the on-bait fraction of the peak area slightly smaller. Note that the scale on the y-axis and hence the absolute peak height is different in each case. The different scales reflect the different numbers of the sequenced bases, which are much lower for GA-I lanes (a,b) than for GA-II lane (c).

 

Figure 3. Coverage profile along a contiguous sequencing target.
Shown is a typical 11-kb segment (chr4:118635000-118646000) out of 1.7 Mb. Sequence targeted by capture probes is marked in blue. Repetitive sequence without probe coverage is shown in red.

 

Figure 4. Normalized coverage-distribution plots.
Shown is the fraction of bait-covered bases in the genome achieving coverage with uniquely aligned sequence equal or greater than the normalized coverage indicated on the x-axis. (a,b) The absolute per base coverage was divided by the mean coverage of all bait positions (18 in a; 221 in b). The curve for the shotgun-sequenced exon capture (a) is steeper than the curve for the regional capture (b), indicating a less uniform representation of sequencing targets in the exon catch. Dashed lines point to the fraction of bases achieving at least half or one-fifth the mean average.
 

Figure 5. Reproducibility of targeted sequencing.
(a): For each exon (n = 15,565), the ratio of the mean coverage in two independent hybrid-selection experiments performed on the same source DNA (NA15510) was plotted over its mean coverage in one experiment. Coverage was normalized to adjust for the different number of sequencing reads. The average ratio (black line) is close to 1. S.d. is indicated by purple lines. (b): Base-by-base sequence coverage along one target in three independent hybrid selections, two of them performed on NA15510 (purple and teal lines) and one NA11994 source DNA (black). Note the similarities at this fine resolution of the three profiles, which were normalized to the same height. The position of target exon (ENSE00000968562) and bait is indicated by red and blue bars, respectively.

 

 

Title: Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing.

Journal: Nat Biotechnol. 2009 Feb;27(2):182-9. Epub 2009 Feb 1

Authors: Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C

Read Paper