ChIP-on-chip
Revealing characteristics of mammalian transcription factors using cross-platform, multi-factor analysis of genome-wide ChIP data
A review of A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors.
Note: This is a review of the published article listed below. All information, quotes, figures, methods, and findings mentioned in this review are from that article, and are the property of its authors and/or the publication in which the article originally appeared.
A collaborative team of researchers from Harvard University and Stanford University (2006) conducted a comparative study of eight independent chromatin immunoprecipitation (ChIP) experiments involving six different transcription factors in human and mouse to obtain a basic understanding of the location data generated for mammalian transcription factors and potential issues in their analysis. The group used three different platforms to generate five ChIP-chip datasets, of which two were Agilent ChIP-on-chip arrays. The Gli ChIP-chip was generated using a custom array design with 50–150 kb regions surrounding promoters and 3'-untranslated regions (3'-UTRs) of selected gene sets surveyed by 60-mer oligo probes at a density of one probe per 125 bp. The Oct4, Sox2 and Nanog ChIP-chips were based on Agilent promoter arrays which surveyed -8 to +2 kb promoter regions of all human genes using 60mer probes with an estimated probe spacing of 280 bp. Although the data analyzed by this group were generated from three different technological platforms, common characteristics of mammalian location analysis emerged from the cross-study comparisons and these commonalities have implications in analyzing future genome-wide location data. These cross-study comparisons are the first to analyze multiple datasets, revealing the importance of carefully chosen genomic controls in the de novo identification of key transcription factor binding motifs. In addition, this research raises issues about the interpretation of ubiquitously occurring sequence motifs, and demonstrates the clustering tendency of protein-binding regions for certain transcription factors.

Figure 1. Comparisons of de novo motif discovery from multiple ChIP studies.
Eight experiments were examined here, including (A) Gli-chip, (B) ER-chip, (C) p53-PET, (D) Oct4-chip, (E) Sox2-chip, (F) Nanog-chip, (G) Oct4-PET and (H) Nanog-PET. For each factor, representative motifs recovered by de novo discovery are shown. The motif score reported by Gibbs motif sampler for each motif is shown in parantheses. The complete lists of motifs reported by Gibbs motif sampler are present in Supplementary Figures S1–S8. For each reported motif, the relative enrichment level r1, r2 and r3 were computed by comparing TFBS occurrence rates in ChIP regions to their counterparts in matched genomic controls (labeled by ‘Matched’) or random genomic controls (labeled by ‘Random’). The enrichment levels of all discovered motifs are compared here, and the data used to generate the figures are listed in Supplementary Tables S1–S8. The motifs responsible for the sequence-specific protein binding are underlined or indicated by arrows. For Nanog-chip and Nanog-PET, the motif responsible for the binding was unknown. In these two cases, Oct-Sox composite motif was highlighted, and the relative enrichment levels of the previously proposed Nanog motif (labeled by ‘Nanog’) are also shown.

Figure 2. GC-content and cross-species conservation of ChIP-binding regions.
(A) GC-content of ChIP-binding regions. Blue bar, genome-wide GC-content; cyan bar, GC-content for ChIP-binding regions; yellow bar, GC-content for genomic regions surveyed by the ChIP experiments; red bar, GC-content for matched genomic controls. The error bar shows three times standard error for the GC-content estimate. For p53-PET, Oct4-PET and Nanog-PET, the regions surveyed by the ChIP experiments are the whole genome; for ER-chip, they are human chr21 and chr22; for Oct4-chip, Sox2-chip and Nanog-chip, they are promoter regions that span from -8 to +2 kb of TSS; for Gli-chip, they are regions tiled in the custom array. Matched genomic controls used here are the same control regions used for relative enrichment computation. (B and C) Cumulative probability function of conservation scores for ChIP regions and genomes. Conservation scores are defined for each base pair and were linearly scaled to interval [0, 255]. A large score corresponds to a more conserved status. Human Genome, genome-wide conservation for human; Human Promoter, conservation for human promoter regions spanning from -8 to +2 kb of TSS; Human chr21 and 22, conservation for human chr21 and chr22; and 22, conservation for human chr21 and chr22; Mouse Genome, genome-wide conservation for mouse; Mouse tiled in Gli, conservation for regions surveyed in the Gli study, i.e. regions tiled in the custom array.

Figure 3. Clustering tendency of binding regions.
(A–C) Three examples of binding region clusters. The transcription factor that binds to DNA is shown on the top of each figure. The gene that is bound by the transcription factor is shown on the bottom of the figure. For the two ChIP-chip examples (A and B), binding regions are indicated by high fold enrichment of IP samples versus control samples (i.e. peaks in the figure). For the ChIP-PET example (C), binding regions are indicated by the blocks in the ‘Oct4-PET’ and ‘Nanog-PET’ track in the UCSC genome browser. (D–G) Cumulative probability functions (CDF) of the observed and simulated peak-to-peak distance. The simulated distance was considered to be the random distribution. Observed and simulated distributions were fitted by Gamma and Exponential density respectively, the fitted CDF are also shown.
Title: A comparative analysis of genome-wide chromatin immunoprecipitation data for mammalian transcription factors.
Authors: Ji H, Vokes SA, Wong WH.
Journal: Nucleic Acids Res. 2006 Dec; 34(21): e146.
More