Question

GC-content sensitive normalization of Affymetrix tiling arrays for ChIP-chip

0

Entering edit mode

Christian Feller ▴ 50

@christian-feller-2898

Last seen 11.4 years ago

Dear Richard Bourgon and list, I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays (GeneChip Drosophila Tiling 1.0R Array). My question is how can I take into accound the GC-effect of single probes if I do not have expression sets (due to the nature of a tiling array)? We had the idea of taking a fixed window size, defining the probes within them as a "probeset", and using GCRMA for background correction/normalization. In addition, can we use this configuration (normalization via GCRMA) for profiles with broad ChIP-enriched regions (as it is the case for many histone modifications). If there are some additional advice especially for the pre-processing steps I would be very happy! Until now, we do the normalization using vsn2. Thank you in advance! Best wishes, Christian Feller [[alternative HTML version deleted]]

Normalization gcrma Normalization gcrma • 1.9k views

ADD COMMENT • link updated 17.5 years ago by Sean Davis 21k • written 17.5 years ago by Christian Feller ▴ 50

score 0 · Answer 1 · 2008-07-08

0

Entering edit mode

Sean Davis 21k

@sean-davis-490

Last seen 7 days ago

United States

On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller <feller.christian at="" gmail.com=""> wrote: > Dear Richard Bourgon and list, > > I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays (GeneChip > Drosophila Tiling 1.0R Array). > My question is how can I take into accound the GC-effect of single probes if > I do not have expression sets (due to the nature of a tiling array)? We had > the idea of taking a fixed window size, defining the probes within them as a > "probeset", and using GCRMA for background correction/normalization. In > addition, can we use this configuration (normalization via GCRMA) for > profiles with broad ChIP-enriched regions (as it is the case for many > histone modifications). > > If there are some additional advice especially for the pre- processing steps > I would be very happy! > Until now, we do the normalization using vsn2. Hi, Christian. Do you have the input DNA from which you are going to form a ratio, or are you attempting to do a single-channel analysis? If the latter, then you might look at MAT from Shirley Liu's group. I don't think it is available for R, but the algorithm could probably be coded in R relatively easily. There are likely other solutions. Sean

ADD COMMENT • link 17.5 years ago Sean Davis 21k

0

Entering edit mode

GC-sensitive normalization: naja, MAT l?uft unter Python... Und, die gestrigen Daten sind unterirdisch c -----Original Message----- From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean Davis Sent: Wednesday, July 09, 2008 2:04 AM To: Christian Feller Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix tiling arrays for ChIP-chip On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller <feller.christian at="" gmail.com=""> wrote: > Dear Richard Bourgon and list, > > I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays (GeneChip > Drosophila Tiling 1.0R Array). > My question is how can I take into accound the GC-effect of single probes if > I do not have expression sets (due to the nature of a tiling array)? We had > the idea of taking a fixed window size, defining the probes within them as a > "probeset", and using GCRMA for background correction/normalization. In > addition, can we use this configuration (normalization via GCRMA) for > profiles with broad ChIP-enriched regions (as it is the case for many > histone modifications). > > If there are some additional advice especially for the pre- processing steps > I would be very happy! > Until now, we do the normalization using vsn2. Hi, Christian. Do you have the input DNA from which you are going to form a ratio, or are you attempting to do a single-channel analysis? If the latter, then you might look at MAT from Shirley Liu's group. I don't think it is available for R, but the algorithm could probably be coded in R relatively easily. There are likely other solutions. Sean

ADD REPLY • link 17.5 years ago Christian Feller ▴ 50

0

Entering edit mode

Hi Sean, Thank you for your quick response! We successfully used MAT under Python for a dataset with 3 control arrays (hybridized with input) and 3 IP arrays (all biological replicates). In comparison with vsn2, probe standardization via MAT significantly increased the signal-to- noise ratio. However, we have still some doubts about the reliability of those results since the raw data seem to be very noisy, and the correlation of the biological replicates is not very strong. Thanks again! Best Christian -----Original Message----- From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean Davis Sent: Wednesday, July 09, 2008 2:04 AM To: Christian Feller Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix tiling arrays for ChIP-chip On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller <feller.christian at="" gmail.com=""> wrote: > Dear Richard Bourgon and list, > > I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays (GeneChip > Drosophila Tiling 1.0R Array). > My question is how can I take into accound the GC-effect of single probes if > I do not have expression sets (due to the nature of a tiling array)? We had > the idea of taking a fixed window size, defining the probes within them as a > "probeset", and using GCRMA for background correction/normalization. In > addition, can we use this configuration (normalization via GCRMA) for > profiles with broad ChIP-enriched regions (as it is the case for many > histone modifications). > > If there are some additional advice especially for the pre- processing steps > I would be very happy! > Until now, we do the normalization using vsn2. Hi, Christian. Do you have the input DNA from which you are going to form a ratio, or are you attempting to do a single-channel analysis? If the latter, then you might look at MAT from Shirley Liu's group. I don't think it is available for R, but the algorithm could probably be coded in R relatively easily. There are likely other solutions. Sean

ADD REPLY • link 17.5 years ago Christian Feller ▴ 50

0

Entering edit mode

Dear Christian, few points: - afaIu the background correction method of GC-RMA does not make use of probe sets, it works on individual probes. Probe sets only come into play later, for the expression estimate. But getting it to work for your use case may be a hard problem (has anyone on the list managed?) - vsn2 does not do probe-sequence specific adjustments, so I am not sure why it was mentioned in this context. - the choice of language should be secondary to these criteria: quality of the underlying science and of the implementation. - you say "how can I take into accound (sic) the GC-effect of single probes", but would it make sense to take a step back and tell us why you want to do that and what you want to achieve? Perhaps your answer is somewhere else. - the normalizeByReference function in the tilingArray package offers a method to do probe(sequence)-specific background correction for Affymetrix tiling array data, and is described in a paper [1], but I have only used it on RNA expression data, not on ChIP, so porting it to that application would need some care. [1] http://bioinformatics.oxfordjournals.org/cgi/reprint/22/16/1963.pdf Best wishes Wolfgang Christian Feller wrote: > Hi Sean, > > Thank you for your quick response! We successfully used MAT under Python for a dataset with 3 control arrays (hybridized with input) and 3 IP arrays (all biological replicates). In comparison with vsn2, probe standardization via MAT significantly increased the signal-to- noise ratio. However, we have still some doubts about the reliability of those results since the raw data seem to be very noisy, and the correlation of the biological replicates is not very strong. > > Thanks again! > > Best > Christian > > -----Original Message----- > From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of Sean Davis > Sent: Wednesday, July 09, 2008 2:04 AM > To: Christian Feller > Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk > Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix tiling arrays for ChIP-chip > > On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller > <feller.christian at="" gmail.com=""> wrote: >> Dear Richard Bourgon and list, >> >> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays (GeneChip >> Drosophila Tiling 1.0R Array). >> My question is how can I take into accound the GC-effect of single probes if >> I do not have expression sets (due to the nature of a tiling array)? We had >> the idea of taking a fixed window size, defining the probes within them as a >> "probeset", and using GCRMA for background correction/normalization. In >> addition, can we use this configuration (normalization via GCRMA) for >> profiles with broad ChIP-enriched regions (as it is the case for many >> histone modifications). >> >> If there are some additional advice especially for the pre- processing steps >> I would be very happy! >> Until now, we do the normalization using vsn2. > > Hi, Christian. Do you have the input DNA from which you are going to > form a ratio, or are you attempting to do a single-channel analysis? > If the latter, then you might look at MAT from Shirley Liu's group. I > don't think it is available for R, but the algorithm could probably be > coded in R relatively easily. There are likely other solutions. > > Sean

ADD REPLY • link 17.5 years ago Wolfgang Huber ★ 13k

0

Entering edit mode

Dear Wolfgang, thank you for your reply! My goal is to compare my own ChIP-chip data (Nimblegen tiling) with some other ChIP-chip data (created on Affymetrix tiling). I normalized my data with vsn and got some nice signal-to-noise ratios (visual inspection, replicates show same trend). When I normalize with other algorithms (loess, quantile, Tukey-biweight) I get a similar output (based on visual inspection and correlation among them). Now, I normalized the Affymetrix data with vsn and got some terrible signal-to-noise ratios. One possible explanation might be the shorter probe sequence of the Affy probes compared to the Nimblegen probes. Fluorescence signals of shorter probes are more sensitive to the underlying sequence (in particular GC-content). Because vsn does not account for the GC- content I reasoned to try to adjust for it (therefore, I thought about using GCRMA). I will try to use the normalizeByReference function and report back when it works. Thanks again! Best wishes, Christian -----Original Message----- From: Wolfgang Huber [mailto:huber@ebi.ac.uk] Sent: Friday, July 11, 2008 12:55 AM To: Christian Feller Cc: 'Sean Davis'; bourgon at ebi.ac.uk; bioconductor at stat.math.ethz.ch Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix tiling arrays for ChIP-chip Dear Christian, few points: - afaIu the background correction method of GC-RMA does not make use of probe sets, it works on individual probes. Probe sets only come into play later, for the expression estimate. But getting it to work for your use case may be a hard problem (has anyone on the list managed?) - vsn2 does not do probe-sequence specific adjustments, so I am not sure why it was mentioned in this context. - the choice of language should be secondary to these criteria: quality of the underlying science and of the implementation. - you say "how can I take into accound (sic) the GC-effect of single probes", but would it make sense to take a step back and tell us why you want to do that and what you want to achieve? Perhaps your answer is somewhere else. - the normalizeByReference function in the tilingArray package offers a method to do probe(sequence)-specific background correction for Affymetrix tiling array data, and is described in a paper [1], but I have only used it on RNA expression data, not on ChIP, so porting it to that application would need some care. [1] http://bioinformatics.oxfordjournals.org/cgi/reprint/22/16/1963.pdf Best wishes Wolfgang Christian Feller wrote: > Hi Sean, > > Thank you for your quick response! We successfully used MAT under Python for a dataset with 3 control arrays (hybridized with input) and 3 IP arrays (all biological replicates). In comparison with vsn2, probe standardization via MAT significantly increased the signal-to-noise ratio. However, we have still some doubts about the reliability of those results since the raw data seem to be very noisy, and the correlation of the biological replicates is not very strong. > > Thanks again! > > Best > Christian > > -----Original Message----- > From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of Sean Davis > Sent: Wednesday, July 09, 2008 2:04 AM > To: Christian Feller > Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk > Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix tiling arrays for ChIP-chip > > On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller > <feller.christian at="" gmail.com=""> wrote: >> Dear Richard Bourgon and list, >> >> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays (GeneChip >> Drosophila Tiling 1.0R Array). >> My question is how can I take into accound the GC-effect of single probes if >> I do not have expression sets (due to the nature of a tiling array)? We had >> the idea of taking a fixed window size, defining the probes within them as a >> "probeset", and using GCRMA for background correction/normalization. In >> addition, can we use this configuration (normalization via GCRMA) for >> profiles with broad ChIP-enriched regions (as it is the case for many >> histone modifications). >> >> If there are some additional advice especially for the pre- processing steps >> I would be very happy! >> Until now, we do the normalization using vsn2. > > Hi, Christian. Do you have the input DNA from which you are going to > form a ratio, or are you attempting to do a single-channel analysis? > If the latter, then you might look at MAT from Shirley Liu's group. I > don't think it is available for R, but the algorithm could probably be > coded in R relatively easily. There are likely other solutions. > > Sean

ADD REPLY • link 17.5 years ago Christian Feller ▴ 50

0

Entering edit mode

On Fri, Jul 18, 2008 at 5:32 AM, Christian Feller <feller.christian at="" gmail.com=""> wrote: > Dear Wolfgang, > > thank you for your reply! > My goal is to compare my own ChIP-chip data (Nimblegen tiling) with some > other ChIP-chip data (created on Affymetrix tiling). I normalized my data > with vsn and got some nice signal-to-noise ratios (visual inspection, > replicates show same trend). When I normalize with other algorithms (loess, > quantile, Tukey-biweight) I get a similar output (based on visual inspection > and correlation among them). > > Now, I normalized the Affymetrix data with vsn and got some terrible > signal-to-noise ratios. One possible explanation might be the shorter probe > sequence of the Affy probes compared to the Nimblegen probes. Fluorescence > signals of shorter probes are more sensitive to the underlying sequence (in > particular GC-content). Because vsn does not account for the GC- content I > reasoned to try to adjust for it (therefore, I thought about using GCRMA). I assume that the Nimblegen data are two-color? If so, that accounts for the vast majority of the differences you observe, I would imagine. If not, then for single-color nimblegen arrays, I would expect that GC correction would be useful, also. However, such a correction probably does not need to account for the base positions, but only the GC count. Sean > I will try to use the normalizeByReference function and report back when it > works. > > Thanks again! > > Best wishes, > Christian > > -----Original Message----- > From: Wolfgang Huber [mailto:huber at ebi.ac.uk] > Sent: Friday, July 11, 2008 12:55 AM > To: Christian Feller > Cc: 'Sean Davis'; bourgon at ebi.ac.uk; bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix tiling > arrays for ChIP-chip > > Dear Christian, > > few points: > > - afaIu the background correction method of GC-RMA does not make use of > probe sets, it works on individual probes. Probe sets only come into > play later, for the expression estimate. But getting it to work for your > use case may be a hard problem (has anyone on the list managed?) > > - vsn2 does not do probe-sequence specific adjustments, so I am not > sure why it was mentioned in this context. > > - the choice of language should be secondary to these criteria: quality > of the underlying science and of the implementation. > > - you say "how can I take into accound (sic) the GC-effect of single > probes", but would it make sense to take a step back and tell us why you > want to do that and what you want to achieve? Perhaps your answer is > somewhere else. > > - the normalizeByReference function in the tilingArray package offers a > method to do probe(sequence)-specific background correction for > Affymetrix tiling array data, and is described in a paper [1], but I > have only used it on RNA expression data, not on ChIP, so porting it to > that application would need some care. > > [1] http://bioinformatics.oxfordjournals.org/cgi/reprint/22/16/1963.pdf > > Best wishes > Wolfgang > > > > > > > > Christian Feller wrote: >> Hi Sean, >> >> Thank you for your quick response! We successfully used MAT under Python > for a dataset with 3 control arrays (hybridized with input) and 3 IP arrays > (all biological replicates). In comparison with vsn2, probe standardization > via MAT significantly increased the signal-to-noise ratio. However, we have > still some doubts about the reliability of those results since the raw data > seem to be very noisy, and the correlation of the biological replicates is > not very strong. >> >> Thanks again! >> >> Best >> Christian >> >> -----Original Message----- >> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of Sean > Davis >> Sent: Wednesday, July 09, 2008 2:04 AM >> To: Christian Feller >> Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk >> Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix > tiling arrays for ChIP-chip >> >> On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller >> <feller.christian at="" gmail.com=""> wrote: >>> Dear Richard Bourgon and list, >>> >>> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays (GeneChip >>> Drosophila Tiling 1.0R Array). >>> My question is how can I take into accound the GC-effect of single probes > if >>> I do not have expression sets (due to the nature of a tiling array)? We > had >>> the idea of taking a fixed window size, defining the probes within them > as a >>> "probeset", and using GCRMA for background correction/normalization. In >>> addition, can we use this configuration (normalization via GCRMA) for >>> profiles with broad ChIP-enriched regions (as it is the case for many >>> histone modifications). >>> >>> If there are some additional advice especially for the pre- processing > steps >>> I would be very happy! >>> Until now, we do the normalization using vsn2. >> >> Hi, Christian. Do you have the input DNA from which you are going to >> form a ratio, or are you attempting to do a single-channel analysis? >> If the latter, then you might look at MAT from Shirley Liu's group. I >> don't think it is available for R, but the algorithm could probably be >> coded in R relatively easily. There are likely other solutions. >> >> Sean > >

ADD REPLY • link 17.5 years ago Sean Davis 21k

0

Entering edit mode

Yes, we use Nimblegen two-color. thx -----Original Message----- From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean Davis Sent: Friday, July 18, 2008 2:09 PM To: Christian Feller Cc: Wolfgang Huber; bourgon at ebi.ac.uk; bioconductor at stat.math.ethz.ch Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix tiling arrays for ChIP-chip On Fri, Jul 18, 2008 at 5:32 AM, Christian Feller <feller.christian at="" gmail.com=""> wrote: > Dear Wolfgang, > > thank you for your reply! > My goal is to compare my own ChIP-chip data (Nimblegen tiling) with some > other ChIP-chip data (created on Affymetrix tiling). I normalized my data > with vsn and got some nice signal-to-noise ratios (visual inspection, > replicates show same trend). When I normalize with other algorithms (loess, > quantile, Tukey-biweight) I get a similar output (based on visual inspection > and correlation among them). > > Now, I normalized the Affymetrix data with vsn and got some terrible > signal-to-noise ratios. One possible explanation might be the shorter probe > sequence of the Affy probes compared to the Nimblegen probes. Fluorescence > signals of shorter probes are more sensitive to the underlying sequence (in > particular GC-content). Because vsn does not account for the GC- content I > reasoned to try to adjust for it (therefore, I thought about using GCRMA). I assume that the Nimblegen data are two-color? If so, that accounts for the vast majority of the differences you observe, I would imagine. If not, then for single-color nimblegen arrays, I would expect that GC correction would be useful, also. However, such a correction probably does not need to account for the base positions, but only the GC count. Sean > I will try to use the normalizeByReference function and report back when it > works. > > Thanks again! > > Best wishes, > Christian > > -----Original Message----- > From: Wolfgang Huber [mailto:huber at ebi.ac.uk] > Sent: Friday, July 11, 2008 12:55 AM > To: Christian Feller > Cc: 'Sean Davis'; bourgon at ebi.ac.uk; bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix tiling > arrays for ChIP-chip > > Dear Christian, > > few points: > > - afaIu the background correction method of GC-RMA does not make use of > probe sets, it works on individual probes. Probe sets only come into > play later, for the expression estimate. But getting it to work for your > use case may be a hard problem (has anyone on the list managed?) > > - vsn2 does not do probe-sequence specific adjustments, so I am not > sure why it was mentioned in this context. > > - the choice of language should be secondary to these criteria: quality > of the underlying science and of the implementation. > > - you say "how can I take into accound (sic) the GC-effect of single > probes", but would it make sense to take a step back and tell us why you > want to do that and what you want to achieve? Perhaps your answer is > somewhere else. > > - the normalizeByReference function in the tilingArray package offers a > method to do probe(sequence)-specific background correction for > Affymetrix tiling array data, and is described in a paper [1], but I > have only used it on RNA expression data, not on ChIP, so porting it to > that application would need some care. > > [1] http://bioinformatics.oxfordjournals.org/cgi/reprint/22/16/1963.pdf > > Best wishes > Wolfgang > > > > > > > > Christian Feller wrote: >> Hi Sean, >> >> Thank you for your quick response! We successfully used MAT under Python > for a dataset with 3 control arrays (hybridized with input) and 3 IP arrays > (all biological replicates). In comparison with vsn2, probe standardization > via MAT significantly increased the signal-to-noise ratio. However, we have > still some doubts about the reliability of those results since the raw data > seem to be very noisy, and the correlation of the biological replicates is > not very strong. >> >> Thanks again! >> >> Best >> Christian >> >> -----Original Message----- >> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf Of Sean > Davis >> Sent: Wednesday, July 09, 2008 2:04 AM >> To: Christian Feller >> Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk >> Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix > tiling arrays for ChIP-chip >> >> On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller >> <feller.christian at="" gmail.com=""> wrote: >>> Dear Richard Bourgon and list, >>> >>> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays (GeneChip >>> Drosophila Tiling 1.0R Array). >>> My question is how can I take into accound the GC-effect of single probes > if >>> I do not have expression sets (due to the nature of a tiling array)? We > had >>> the idea of taking a fixed window size, defining the probes within them > as a >>> "probeset", and using GCRMA for background correction/normalization. In >>> addition, can we use this configuration (normalization via GCRMA) for >>> profiles with broad ChIP-enriched regions (as it is the case for many >>> histone modifications). >>> >>> If there are some additional advice especially for the pre- processing > steps >>> I would be very happy! >>> Until now, we do the normalization using vsn2. >> >> Hi, Christian. Do you have the input DNA from which you are going to >> form a ratio, or are you attempting to do a single-channel analysis? >> If the latter, then you might look at MAT from Shirley Liu's group. I >> don't think it is available for R, but the algorithm could probably be >> coded in R relatively easily. There are likely other solutions. >> >> Sean > >

ADD REPLY • link 17.5 years ago Christian Feller ▴ 50