ChIP-chip sequence bias not removed
1
0
Entering edit mode
Edwin Groot ▴ 230
@edwin-groot-3606
Last seen 10.3 years ago
Hello all, I am analyzing Affymetrix AtTile1F ChIP-chip data from GEO to compare the localization of different histone modifications in Arabidopsis. The goal is to query a genomic region for relative enrichment of the different histone modifications. After trying several normalization methods in Starr, I get good MA plots, densities and histograms, but neither the GC-bias, nor the base-position bias is changed by any normalization method. The vignette data, in contrast, shows great improvement in the bias problems. Have I missed something? Should I worry about this? I have so far tried loess, vsn, quantile and rankpercentile through Starr. Thanks, Edwin -- Here is sample code for one of the normalization methods: > library(Starr) > library(geneplotter) > library(vsn) > AtTile1F <- readBpmap("GPL1979.bpmap") #Only the + strand is represented for all chromosomes > summary(AtTile1F$"At:TIGRv5;chr4"$strand) > cels <- c("h3k27me301.CEL", "h3k27me303.CEL", "h3k27me302.CEL", "h3k27me304.CEL", "input01.CEL", "input03.CEL", "input02.CEL", "input04.CEL") > names <- c("k27me301", "k27me302", "k27me303", "k27me304", "input01", "input02", "input03", "input04") > type <- c("IP", "IP", "IP", "IP", "INPUT", "INPUT", "INPUT", "INPUT") > k27me3 <- readCelFile(AtTile1F, cels, names, type, featureData=TRUE, log.it=TRUE) #Normalize > k27me3_loess <- normalize.Probes(k27me3, method = "loess") #QC #Try only one pair of IP and control. > ips <- c(TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE) > controls <- c(FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE) > plotMA(k27me3, ip = ips, control = controls) #There is a negative deviation down to -1.5 LFC > plotMA(k27me3_loess, ip = ips, control = controls) #The MA is straight, except for a slight negative bias at highest intensity. > plotGCbias(exprs(k27me3)[, 1], featureData(k27me3)$seq, main=paste(sampleNames(k27me3)[1],"GC Bias Before Normalization")) #The GC bias increases linearly with base position. > plotGCbias(exprs(k27me3_loess)[, 1], featureData(k27me3_loess)$seq, main=paste(sampleNames(k27me3_loess)[1],"GC Bias After Loess Normalization")) #Same rise (-2 to +2) with base position as Before Normalization. -- > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices datasets utils methods [8] base other attached packages: [1] vsn_3.16.0 geneplotter_1.26.0 annotate_1.26.0 [4] AnnotationDbi_1.10.1 Starr_1.4.4 affxparser_1.20.0 [7] affy_1.26.1 Ringo_1.12.0 Matrix_0.999375-40 [10] lattice_0.18-8 limma_3.4.3 RColorBrewer_1.0-2 [13] Biobase_2.8.0 loaded via a namespace (and not attached): [1] affyio_1.16.0 DBI_0.2-5 genefilter_1.30.0 [4] MASS_7.3-6 preprocessCore_1.10.0 pspline_1.0-14 [7] RSQLite_0.9-1 splines_2.11.1 survival_2.35-8 [10] xtable_1.5-6 Dr. Edwin Groot, postdoctoral associate AG Laux Institut fuer Biologie III Schaenzlestr. 1 79104 Freiburg, Deutschland +49 761-2032945
Normalization vsn Starr Normalization vsn Starr • 1.7k views
ADD COMMENT
0
Entering edit mode
@zacherlmbuni-muenchende-3726
Last seen 10.3 years ago
Dear Edwin, as I guess from inspecting your code chunck, you did not substract a reference experiment from the IP. To eliminate the sequence-dependent bias you either need to substract a reference experiment (like Mock-IP or genomic input) or apply a sepcific normalization method like MAT, which is designed for this purpose. If you have a reference experiment I absolutely recommend to use this instead of MAT, as it perfoms a lot better in my experience. Best regards, Bendikt Edwin Groot <edwin.groot at="" biologie.uni-freiburg.de=""> schrieb : > Hello all, > I am analyzing Affymetrix AtTile1F ChIP-chip data from GEO to compare > the localization of different histone modifications in Arabidopsis. The > goal is to query a genomic region for relative enrichment of the > different histone modifications. > After trying several normalization methods in Starr, I get good MA > plots, densities and histograms, but neither the GC-bias, nor the > base-position bias is changed by any normalization method. The vignette > data, in contrast, shows great improvement in the bias problems. Have I > missed something? Should I worry about this? > I have so far tried loess, vsn, quantile and rankpercentile through > Starr. > > Thanks, > Edwin > -- > Here is sample code for one of the normalization methods: > > library(Starr) > > library(geneplotter) > > library(vsn) > > AtTile1F <- readBpmap("GPL1979.bpmap") > #Only the + strand is represented for all chromosomes > > summary(AtTile1F$"At:TIGRv5;chr4"$strand) > > cels <- c("h3k27me301.CEL", "h3k27me303.CEL", > "h3k27me302.CEL", > "h3k27me304.CEL", "input01.CEL", "input03.CEL", > "input02.CEL", > "input04.CEL") > > names <- c("k27me301", "k27me302", > "k27me303", "k27me304", "input01", > "input02", "input03", "input04") > > type <- c("IP", "IP", "IP", > "IP", "INPUT", "INPUT", "INPUT", > "INPUT") > > k27me3 <- readCelFile(AtTile1F, cels, names, type, featureData=TRUE, > log.it=TRUE) > #Normalize > > k27me3_loess <- normalize.Probes(k27me3, method = "loess") > #QC > #Try only one pair of IP and control. > > ips <- c(TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE) > > controls <- c(FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE) > > plotMA(k27me3, ip = ips, control = controls) > #There is a negative deviation down to -1.5 LFC > > plotMA(k27me3_loess, ip = ips, control = controls) > #The MA is straight, except for a slight negative bias at highest > intensity. > > plotGCbias(exprs(k27me3)[, 1], featureData(k27me3)$seq, > main=paste(sampleNames(k27me3)[1],"GC Bias Before Normalization")) > #The GC bias increases linearly with base position. > > plotGCbias(exprs(k27me3_loess)[, 1], featureData(k27me3_loess)$seq, > main=paste(sampleNames(k27me3_loess)[1],"GC Bias After Loess > Normalization")) > #Same rise (-2 to +2) with base position as Before Normalization. > -- > > sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-pc-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] grid stats graphics grDevices datasets utils methods > > [8] base > > other attached packages: > [1] vsn_3.16.0 geneplotter_1.26.0 annotate_1.26.0 > [4] AnnotationDbi_1.10.1 Starr_1.4.4 affxparser_1.20.0 > [7] affy_1.26.1 Ringo_1.12.0 Matrix_0.999375-40 > [10] lattice_0.18-8 limma_3.4.3 RColorBrewer_1.0-2 > [13] Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] affyio_1.16.0 DBI_0.2-5 genefilter_1.30.0 > [4] MASS_7.3-6 preprocessCore_1.10.0 pspline_1.0-14 > [7] RSQLite_0.9-1 splines_2.11.1 survival_2.35-8 > [10] xtable_1.5-6 > > Dr. Edwin Groot, postdoctoral associate > AG Laux > Institut fuer Biologie III > Schaenzlestr. 1 > 79104 Freiburg, Deutschland > +49 761-2032945 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
0
Entering edit mode
Bendikt is right, however even in the presence of a control (Input DNA or Mock-IP) a sequence based normalization will help you. Please have a look at the supplementary material of the rMAT paper where we show that on two datasets. http://bioinformatics.oxfordjournals.org/cgi/content/abstract/26/5/678 You might also want to try the rMAT package available from BioC. Raphael On 2010-07-26, at 7:43 PM, zacher at lmb.uni-muenchen.de wrote: > > > Dear Edwin, > > as I guess from inspecting your code chunck, you did not substract a reference experiment from the IP. To eliminate the sequence-dependent bias you either need to substract a reference experiment (like Mock-IP or genomic input) or apply a sepcific normalization method like MAT, which is designed for this purpose. If you have a reference experiment I absolutely recommend to use this instead of MAT, as it perfoms a lot better in my experience. > Best regards, > > Bendikt > > Edwin Groot <edwin.groot at="" biologie.uni-freiburg.de=""> schrieb : > >> Hello all, >> I am analyzing Affymetrix AtTile1F ChIP-chip data from GEO to compare >> the localization of different histone modifications in Arabidopsis. The >> goal is to query a genomic region for relative enrichment of the >> different histone modifications. >> After trying several normalization methods in Starr, I get good MA >> plots, densities and histograms, but neither the GC-bias, nor the >> base-position bias is changed by any normalization method. The vignette >> data, in contrast, shows great improvement in the bias problems. Have I >> missed something? Should I worry about this? >> I have so far tried loess, vsn, quantile and rankpercentile through >> Starr. >> >> Thanks, >> Edwin >> -- >> Here is sample code for one of the normalization methods: >> > library(Starr) >> > library(geneplotter) >> > library(vsn) >> > AtTile1F <- readBpmap("GPL1979.bpmap") >> #Only the + strand is represented for all chromosomes >> > summary(AtTile1F$"At:TIGRv5;chr4"$strand) >> > cels <- c("h3k27me301.CEL", "h3k27me303.CEL", >> "h3k27me302.CEL", >> "h3k27me304.CEL", "input01.CEL", "input03.CEL", >> "input02.CEL", >> "input04.CEL") >> > names <- c("k27me301", "k27me302", >> "k27me303", "k27me304", "input01", >> "input02", "input03", "input04") >> > type <- c("IP", "IP", "IP", >> "IP", "INPUT", "INPUT", "INPUT", >> "INPUT") >> > k27me3 <- readCelFile(AtTile1F, cels, names, type, featureData=TRUE, >> log.it=TRUE) >> #Normalize >> > k27me3_loess <- normalize.Probes(k27me3, method = "loess") >> #QC >> #Try only one pair of IP and control. >> > ips <- c(TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE) >> > controls <- c(FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE) >> > plotMA(k27me3, ip = ips, control = controls) >> #There is a negative deviation down to -1.5 LFC >> > plotMA(k27me3_loess, ip = ips, control = controls) >> #The MA is straight, except for a slight negative bias at highest >> intensity. >> > plotGCbias(exprs(k27me3)[, 1], featureData(k27me3)$seq, >> main=paste(sampleNames(k27me3)[1],"GC Bias Before Normalization")) >> #The GC bias increases linearly with base position. >> > plotGCbias(exprs(k27me3_loess)[, 1], featureData(k27me3_loess)$seq, >> main=paste(sampleNames(k27me3_loess)[1],"GC Bias After Loess >> Normalization")) >> #Same rise (-2 to +2) with base position as Before Normalization. >> -- >> > sessionInfo() >> R version 2.11.1 (2010-05-31) >> x86_64-pc-linux-gnu >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] grid stats graphics grDevices datasets utils methods >> >> [8] base >> >> other attached packages: >> [1] vsn_3.16.0 geneplotter_1.26.0 annotate_1.26.0 >> [4] AnnotationDbi_1.10.1 Starr_1.4.4 affxparser_1.20.0 >> [7] affy_1.26.1 Ringo_1.12.0 Matrix_0.999375-40 >> [10] lattice_0.18-8 limma_3.4.3 RColorBrewer_1.0-2 >> [13] Biobase_2.8.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.16.0 DBI_0.2-5 genefilter_1.30.0 >> [4] MASS_7.3-6 preprocessCore_1.10.0 pspline_1.0-14 >> [7] RSQLite_0.9-1 splines_2.11.1 survival_2.35-8 >> [10] xtable_1.5-6 >> >> Dr. Edwin Groot, postdoctoral associate >> AG Laux >> Institut fuer Biologie III >> Schaenzlestr. 1 >> 79104 Freiburg, Deutschland >> +49 761-2032945 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Mon, 26 Jul 2010 20:43:01 +0200 <zacher at="" lmb.uni-muenchen.de=""> wrote: > > > Dear Edwin, > > as I guess from inspecting your code chunck, you did not substract a > reference experiment from the IP. To eliminate the sequence- dependent > bias you either need to substract a reference experiment (like > Mock-IP or genomic input) or apply a sepcific normalization method > like MAT, which is designed for this purpose. If you have a reference > experiment I absolutely recommend to use this instead of MAT, as it > perfoms a lot better in my experience. > Best regards, > > Bendikt Hello Benedikt, Thanks for your reply, but I am a bit confused about the ChIP-chip data analysis procedure. My experience with gene expression Affymetrix is to run RMA on the PM probe sets with the quantile normalization option. Then use limma to analyze ratios among experiments. In the Starr package there is no GCRMA (which would help correct the base position bias. There is not even an RMA procedure! My goal is to properly background-subtract and normalize the ChIP-chip data so that I can obtain log2 enrichment ratio (IP / input). I am feeling somewhat embarrassed at being stuck in the preprocessing stage. What do you mean about substracting the input samples? Is it a form of background subtraction? I plan to use these same input samples to make my log2 enrichment ratios! Regards, Edwin -- > > Edwin Groot <edwin.groot at="" biologie.uni-freiburg.de=""> schrieb : > > > Hello all, > > I am analyzing Affymetrix AtTile1F ChIP-chip data from GEO to > compare > > the localization of different histone modifications in Arabidopsis. > The > > goal is to query a genomic region for relative enrichment of the > > different histone modifications. > > After trying several normalization methods in Starr, I get good MA > > plots, densities and histograms, but neither the GC-bias, nor the > > base-position bias is changed by any normalization method. The > vignette > > data, in contrast, shows great improvement in the bias problems. > Have I > > missed something? Should I worry about this? > > I have so far tried loess, vsn, quantile and rankpercentile through > > Starr. > > > > Thanks, > > Edwin > > -- > > Here is sample code for one of the normalization methods: > > > library(Starr) > > > library(geneplotter) > > > library(vsn) > > > AtTile1F <- readBpmap("GPL1979.bpmap") > > #Only the + strand is represented for all chromosomes > > > summary(AtTile1F$"At:TIGRv5;chr4"$strand) > > > cels <- c("h3k27me301.CEL", > "h3k27me303.CEL", > > "h3k27me302.CEL", > > "h3k27me304.CEL", "input01.CEL", > "input03.CEL", > > "input02.CEL", > > "input04.CEL") > > > names <- c("k27me301", "k27me302", > > "k27me303", "k27me304", "input01", > > "input02", "input03", "input04") > > > type <- c("IP", "IP", "IP", > > "IP", "INPUT", "INPUT", > "INPUT", > > "INPUT") > > > k27me3 <- readCelFile(AtTile1F, cels, names, type, > featureData=TRUE, > > log.it=TRUE) > > #Normalize > > > k27me3_loess <- normalize.Probes(k27me3, method = > "loess") > > #QC > > #Try only one pair of IP and control. > > > ips <- c(TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE) > > > controls <- > c(FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE,FALSE) > > > plotMA(k27me3, ip = ips, control = controls) > > #There is a negative deviation down to -1.5 LFC > > > plotMA(k27me3_loess, ip = ips, control = controls) > > #The MA is straight, except for a slight negative bias at highest > > intensity. > > > plotGCbias(exprs(k27me3)[, 1], featureData(k27me3)$seq, > > main=paste(sampleNames(k27me3)[1],"GC Bias Before > Normalization")) > > #The GC bias increases linearly with base position. > > > plotGCbias(exprs(k27me3_loess)[, 1], > featureData(k27me3_loess)$seq, > > main=paste(sampleNames(k27me3_loess)[1],"GC Bias After Loess > > Normalization")) > > #Same rise (-2 to +2) with base position as Before Normalization. > > -- > > > sessionInfo() > > R version 2.11.1 (2010-05-31) > > x86_64-pc-linux-gnu > > > > locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > > attached base packages: > > [1] grid stats graphics grDevices datasets utils > methods > > > > [8] base > > > > other attached packages: > > [1] vsn_3.16.0 geneplotter_1.26.0 annotate_1.26.0 > > [4] AnnotationDbi_1.10.1 Starr_1.4.4 affxparser_1.20.0 > > [7] affy_1.26.1 Ringo_1.12.0 Matrix_0.999375-40 > > [10] lattice_0.18-8 limma_3.4.3 RColorBrewer_1.0-2 > > [13] Biobase_2.8.0 > > > > loaded via a namespace (and not attached): > > [1] affyio_1.16.0 DBI_0.2-5 genefilter_1.30.0 > > > [4] MASS_7.3-6 preprocessCore_1.10.0 pspline_1.0-14 > > > [7] RSQLite_0.9-1 splines_2.11.1 survival_2.35-8 > > > [10] xtable_1.5-6 > > > > Dr. Edwin Groot, postdoctoral associate > > AG Laux > > Institut fuer Biologie III > > Schaenzlestr. 1 > > 79104 Freiburg, Deutschland > > +49 761-2032945 > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > Dr. Edwin Groot, postdoctoral associate AG Laux Institut fuer Biologie III Schaenzlestr. 1 79104 Freiburg, Deutschland +49 761-2032945
ADD REPLY

Login before adding your answer.

Traffic: 375 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6