Dear Richard Bourgon and list,
I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays
(GeneChip
Drosophila Tiling 1.0R Array).
My question is how can I take into accound the GC-effect of single
probes if
I do not have expression sets (due to the nature of a tiling array)?
We had
the idea of taking a fixed window size, defining the probes within
them as a
"probeset", and using GCRMA for background correction/normalization.
In
addition, can we use this configuration (normalization via GCRMA) for
profiles with broad ChIP-enriched regions (as it is the case for many
histone modifications).
If there are some additional advice especially for the pre-processing
steps
I would be very happy!
Until now, we do the normalization using vsn2.
Thank you in advance!
Best wishes,
Christian Feller
[[alternative HTML version deleted]]
On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller
<feller.christian at="" gmail.com=""> wrote:
> Dear Richard Bourgon and list,
>
> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays
(GeneChip
> Drosophila Tiling 1.0R Array).
> My question is how can I take into accound the GC-effect of single
probes if
> I do not have expression sets (due to the nature of a tiling array)?
We had
> the idea of taking a fixed window size, defining the probes within
them as a
> "probeset", and using GCRMA for background correction/normalization.
In
> addition, can we use this configuration (normalization via GCRMA)
for
> profiles with broad ChIP-enriched regions (as it is the case for
many
> histone modifications).
>
> If there are some additional advice especially for the pre-
processing steps
> I would be very happy!
> Until now, we do the normalization using vsn2.
Hi, Christian. Do you have the input DNA from which you are going to
form a ratio, or are you attempting to do a single-channel analysis?
If the latter, then you might look at MAT from Shirley Liu's group. I
don't think it is available for R, but the algorithm could probably be
coded in R relatively easily. There are likely other solutions.
Sean
GC-sensitive normalization: naja, MAT l?uft unter Python...
Und, die gestrigen Daten sind unterirdisch
c
-----Original Message-----
From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean
Davis
Sent: Wednesday, July 09, 2008 2:04 AM
To: Christian Feller
Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk
Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix
tiling arrays for ChIP-chip
On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller
<feller.christian at="" gmail.com=""> wrote:
> Dear Richard Bourgon and list,
>
> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays
(GeneChip
> Drosophila Tiling 1.0R Array).
> My question is how can I take into accound the GC-effect of single
probes if
> I do not have expression sets (due to the nature of a tiling array)?
We had
> the idea of taking a fixed window size, defining the probes within
them as a
> "probeset", and using GCRMA for background correction/normalization.
In
> addition, can we use this configuration (normalization via GCRMA)
for
> profiles with broad ChIP-enriched regions (as it is the case for
many
> histone modifications).
>
> If there are some additional advice especially for the pre-
processing steps
> I would be very happy!
> Until now, we do the normalization using vsn2.
Hi, Christian. Do you have the input DNA from which you are going to
form a ratio, or are you attempting to do a single-channel analysis?
If the latter, then you might look at MAT from Shirley Liu's group. I
don't think it is available for R, but the algorithm could probably be
coded in R relatively easily. There are likely other solutions.
Sean
Hi Sean,
Thank you for your quick response! We successfully used MAT under
Python for a dataset with 3 control arrays (hybridized with input) and
3 IP arrays (all biological replicates). In comparison with vsn2,
probe standardization via MAT significantly increased the signal-to-
noise ratio. However, we have still some doubts about the reliability
of those results since the raw data seem to be very noisy, and the
correlation of the biological replicates is not very strong.
Thanks again!
Best
Christian
-----Original Message-----
From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean
Davis
Sent: Wednesday, July 09, 2008 2:04 AM
To: Christian Feller
Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk
Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix
tiling arrays for ChIP-chip
On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller
<feller.christian at="" gmail.com=""> wrote:
> Dear Richard Bourgon and list,
>
> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays
(GeneChip
> Drosophila Tiling 1.0R Array).
> My question is how can I take into accound the GC-effect of single
probes if
> I do not have expression sets (due to the nature of a tiling array)?
We had
> the idea of taking a fixed window size, defining the probes within
them as a
> "probeset", and using GCRMA for background correction/normalization.
In
> addition, can we use this configuration (normalization via GCRMA)
for
> profiles with broad ChIP-enriched regions (as it is the case for
many
> histone modifications).
>
> If there are some additional advice especially for the pre-
processing steps
> I would be very happy!
> Until now, we do the normalization using vsn2.
Hi, Christian. Do you have the input DNA from which you are going to
form a ratio, or are you attempting to do a single-channel analysis?
If the latter, then you might look at MAT from Shirley Liu's group. I
don't think it is available for R, but the algorithm could probably be
coded in R relatively easily. There are likely other solutions.
Sean
Dear Christian,
few points:
- afaIu the background correction method of GC-RMA does not make use
of
probe sets, it works on individual probes. Probe sets only come into
play later, for the expression estimate. But getting it to work for
your
use case may be a hard problem (has anyone on the list managed?)
- vsn2 does not do probe-sequence specific adjustments, so I am not
sure why it was mentioned in this context.
- the choice of language should be secondary to these criteria:
quality
of the underlying science and of the implementation.
- you say "how can I take into accound (sic) the GC-effect of single
probes", but would it make sense to take a step back and tell us why
you
want to do that and what you want to achieve? Perhaps your answer is
somewhere else.
- the normalizeByReference function in the tilingArray package offers
a
method to do probe(sequence)-specific background correction for
Affymetrix tiling array data, and is described in a paper [1], but I
have only used it on RNA expression data, not on ChIP, so porting it
to
that application would need some care.
[1]
http://bioinformatics.oxfordjournals.org/cgi/reprint/22/16/1963.pdf
Best wishes
Wolfgang
Christian Feller wrote:
> Hi Sean,
>
> Thank you for your quick response! We successfully used MAT under
Python for a dataset with 3 control arrays (hybridized with input) and
3 IP arrays (all biological replicates). In comparison with vsn2,
probe standardization via MAT significantly increased the signal-to-
noise ratio. However, we have still some doubts about the reliability
of those results since the raw data seem to be very noisy, and the
correlation of the biological replicates is not very strong.
>
> Thanks again!
>
> Best
> Christian
>
> -----Original Message-----
> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf
Of Sean Davis
> Sent: Wednesday, July 09, 2008 2:04 AM
> To: Christian Feller
> Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk
> Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix
tiling arrays for ChIP-chip
>
> On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller
> <feller.christian at="" gmail.com=""> wrote:
>> Dear Richard Bourgon and list,
>>
>> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays
(GeneChip
>> Drosophila Tiling 1.0R Array).
>> My question is how can I take into accound the GC-effect of single
probes if
>> I do not have expression sets (due to the nature of a tiling
array)? We had
>> the idea of taking a fixed window size, defining the probes within
them as a
>> "probeset", and using GCRMA for background
correction/normalization. In
>> addition, can we use this configuration (normalization via GCRMA)
for
>> profiles with broad ChIP-enriched regions (as it is the case for
many
>> histone modifications).
>>
>> If there are some additional advice especially for the pre-
processing steps
>> I would be very happy!
>> Until now, we do the normalization using vsn2.
>
> Hi, Christian. Do you have the input DNA from which you are going
to
> form a ratio, or are you attempting to do a single-channel analysis?
> If the latter, then you might look at MAT from Shirley Liu's group.
I
> don't think it is available for R, but the algorithm could probably
be
> coded in R relatively easily. There are likely other solutions.
>
> Sean
Dear Wolfgang,
thank you for your reply!
My goal is to compare my own ChIP-chip data (Nimblegen tiling) with
some
other ChIP-chip data (created on Affymetrix tiling). I normalized my
data
with vsn and got some nice signal-to-noise ratios (visual inspection,
replicates show same trend). When I normalize with other algorithms
(loess,
quantile, Tukey-biweight) I get a similar output (based on visual
inspection
and correlation among them).
Now, I normalized the Affymetrix data with vsn and got some terrible
signal-to-noise ratios. One possible explanation might be the shorter
probe
sequence of the Affy probes compared to the Nimblegen probes.
Fluorescence
signals of shorter probes are more sensitive to the underlying
sequence (in
particular GC-content). Because vsn does not account for the GC-
content I
reasoned to try to adjust for it (therefore, I thought about using
GCRMA).
I will try to use the normalizeByReference function and report back
when it
works.
Thanks again!
Best wishes,
Christian
-----Original Message-----
From: Wolfgang Huber [mailto:huber@ebi.ac.uk]
Sent: Friday, July 11, 2008 12:55 AM
To: Christian Feller
Cc: 'Sean Davis'; bourgon at ebi.ac.uk; bioconductor at
stat.math.ethz.ch
Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix
tiling
arrays for ChIP-chip
Dear Christian,
few points:
- afaIu the background correction method of GC-RMA does not make use
of
probe sets, it works on individual probes. Probe sets only come into
play later, for the expression estimate. But getting it to work for
your
use case may be a hard problem (has anyone on the list managed?)
- vsn2 does not do probe-sequence specific adjustments, so I am not
sure why it was mentioned in this context.
- the choice of language should be secondary to these criteria:
quality
of the underlying science and of the implementation.
- you say "how can I take into accound (sic) the GC-effect of single
probes", but would it make sense to take a step back and tell us why
you
want to do that and what you want to achieve? Perhaps your answer is
somewhere else.
- the normalizeByReference function in the tilingArray package offers
a
method to do probe(sequence)-specific background correction for
Affymetrix tiling array data, and is described in a paper [1], but I
have only used it on RNA expression data, not on ChIP, so porting it
to
that application would need some care.
[1]
http://bioinformatics.oxfordjournals.org/cgi/reprint/22/16/1963.pdf
Best wishes
Wolfgang
Christian Feller wrote:
> Hi Sean,
>
> Thank you for your quick response! We successfully used MAT under
Python
for a dataset with 3 control arrays (hybridized with input) and 3 IP
arrays
(all biological replicates). In comparison with vsn2, probe
standardization
via MAT significantly increased the signal-to-noise ratio. However, we
have
still some doubts about the reliability of those results since the raw
data
seem to be very noisy, and the correlation of the biological
replicates is
not very strong.
>
> Thanks again!
>
> Best
> Christian
>
> -----Original Message-----
> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On Behalf
Of Sean
Davis
> Sent: Wednesday, July 09, 2008 2:04 AM
> To: Christian Feller
> Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk
> Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix
tiling arrays for ChIP-chip
>
> On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller
> <feller.christian at="" gmail.com=""> wrote:
>> Dear Richard Bourgon and list,
>>
>> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays
(GeneChip
>> Drosophila Tiling 1.0R Array).
>> My question is how can I take into accound the GC-effect of single
probes
if
>> I do not have expression sets (due to the nature of a tiling
array)? We
had
>> the idea of taking a fixed window size, defining the probes within
them
as a
>> "probeset", and using GCRMA for background
correction/normalization. In
>> addition, can we use this configuration (normalization via GCRMA)
for
>> profiles with broad ChIP-enriched regions (as it is the case for
many
>> histone modifications).
>>
>> If there are some additional advice especially for the pre-
processing
steps
>> I would be very happy!
>> Until now, we do the normalization using vsn2.
>
> Hi, Christian. Do you have the input DNA from which you are going
to
> form a ratio, or are you attempting to do a single-channel analysis?
> If the latter, then you might look at MAT from Shirley Liu's group.
I
> don't think it is available for R, but the algorithm could probably
be
> coded in R relatively easily. There are likely other solutions.
>
> Sean
On Fri, Jul 18, 2008 at 5:32 AM, Christian Feller
<feller.christian at="" gmail.com=""> wrote:
> Dear Wolfgang,
>
> thank you for your reply!
> My goal is to compare my own ChIP-chip data (Nimblegen tiling) with
some
> other ChIP-chip data (created on Affymetrix tiling). I normalized my
data
> with vsn and got some nice signal-to-noise ratios (visual
inspection,
> replicates show same trend). When I normalize with other algorithms
(loess,
> quantile, Tukey-biweight) I get a similar output (based on visual
inspection
> and correlation among them).
>
> Now, I normalized the Affymetrix data with vsn and got some terrible
> signal-to-noise ratios. One possible explanation might be the
shorter probe
> sequence of the Affy probes compared to the Nimblegen probes.
Fluorescence
> signals of shorter probes are more sensitive to the underlying
sequence (in
> particular GC-content). Because vsn does not account for the GC-
content I
> reasoned to try to adjust for it (therefore, I thought about using
GCRMA).
I assume that the Nimblegen data are two-color? If so, that accounts
for the vast majority of the differences you observe, I would imagine.
If not, then for single-color nimblegen arrays, I would expect that
GC correction would be useful, also. However, such a correction
probably does not need to account for the base positions, but only the
GC count.
Sean
> I will try to use the normalizeByReference function and report back
when it
> works.
>
> Thanks again!
>
> Best wishes,
> Christian
>
> -----Original Message-----
> From: Wolfgang Huber [mailto:huber at ebi.ac.uk]
> Sent: Friday, July 11, 2008 12:55 AM
> To: Christian Feller
> Cc: 'Sean Davis'; bourgon at ebi.ac.uk; bioconductor at
stat.math.ethz.ch
> Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix
tiling
> arrays for ChIP-chip
>
> Dear Christian,
>
> few points:
>
> - afaIu the background correction method of GC-RMA does not make use
of
> probe sets, it works on individual probes. Probe sets only come into
> play later, for the expression estimate. But getting it to work for
your
> use case may be a hard problem (has anyone on the list managed?)
>
> - vsn2 does not do probe-sequence specific adjustments, so I am not
> sure why it was mentioned in this context.
>
> - the choice of language should be secondary to these criteria:
quality
> of the underlying science and of the implementation.
>
> - you say "how can I take into accound (sic) the GC-effect of single
> probes", but would it make sense to take a step back and tell us why
you
> want to do that and what you want to achieve? Perhaps your answer is
> somewhere else.
>
> - the normalizeByReference function in the tilingArray package
offers a
> method to do probe(sequence)-specific background correction for
> Affymetrix tiling array data, and is described in a paper [1], but I
> have only used it on RNA expression data, not on ChIP, so porting it
to
> that application would need some care.
>
> [1]
http://bioinformatics.oxfordjournals.org/cgi/reprint/22/16/1963.pdf
>
> Best wishes
> Wolfgang
>
>
>
>
>
>
>
> Christian Feller wrote:
>> Hi Sean,
>>
>> Thank you for your quick response! We successfully used MAT under
Python
> for a dataset with 3 control arrays (hybridized with input) and 3 IP
arrays
> (all biological replicates). In comparison with vsn2, probe
standardization
> via MAT significantly increased the signal-to-noise ratio. However,
we have
> still some doubts about the reliability of those results since the
raw data
> seem to be very noisy, and the correlation of the biological
replicates is
> not very strong.
>>
>> Thanks again!
>>
>> Best
>> Christian
>>
>> -----Original Message-----
>> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On
Behalf Of Sean
> Davis
>> Sent: Wednesday, July 09, 2008 2:04 AM
>> To: Christian Feller
>> Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk
>> Subject: Re: [BioC] GC-content sensitive normalization of
Affymetrix
> tiling arrays for ChIP-chip
>>
>> On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller
>> <feller.christian at="" gmail.com=""> wrote:
>>> Dear Richard Bourgon and list,
>>>
>>> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays
(GeneChip
>>> Drosophila Tiling 1.0R Array).
>>> My question is how can I take into accound the GC-effect of single
probes
> if
>>> I do not have expression sets (due to the nature of a tiling
array)? We
> had
>>> the idea of taking a fixed window size, defining the probes within
them
> as a
>>> "probeset", and using GCRMA for background
correction/normalization. In
>>> addition, can we use this configuration (normalization via GCRMA)
for
>>> profiles with broad ChIP-enriched regions (as it is the case for
many
>>> histone modifications).
>>>
>>> If there are some additional advice especially for the pre-
processing
> steps
>>> I would be very happy!
>>> Until now, we do the normalization using vsn2.
>>
>> Hi, Christian. Do you have the input DNA from which you are going
to
>> form a ratio, or are you attempting to do a single-channel
analysis?
>> If the latter, then you might look at MAT from Shirley Liu's group.
I
>> don't think it is available for R, but the algorithm could probably
be
>> coded in R relatively easily. There are likely other solutions.
>>
>> Sean
>
>
Yes, we use Nimblegen two-color.
thx
-----Original Message-----
From: seandavi@gmail.com [mailto:seandavi@gmail.com] On Behalf Of Sean
Davis
Sent: Friday, July 18, 2008 2:09 PM
To: Christian Feller
Cc: Wolfgang Huber; bourgon at ebi.ac.uk; bioconductor at
stat.math.ethz.ch
Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix
tiling arrays for ChIP-chip
On Fri, Jul 18, 2008 at 5:32 AM, Christian Feller
<feller.christian at="" gmail.com=""> wrote:
> Dear Wolfgang,
>
> thank you for your reply!
> My goal is to compare my own ChIP-chip data (Nimblegen tiling) with
some
> other ChIP-chip data (created on Affymetrix tiling). I normalized my
data
> with vsn and got some nice signal-to-noise ratios (visual
inspection,
> replicates show same trend). When I normalize with other algorithms
(loess,
> quantile, Tukey-biweight) I get a similar output (based on visual
inspection
> and correlation among them).
>
> Now, I normalized the Affymetrix data with vsn and got some terrible
> signal-to-noise ratios. One possible explanation might be the
shorter probe
> sequence of the Affy probes compared to the Nimblegen probes.
Fluorescence
> signals of shorter probes are more sensitive to the underlying
sequence (in
> particular GC-content). Because vsn does not account for the GC-
content I
> reasoned to try to adjust for it (therefore, I thought about using
GCRMA).
I assume that the Nimblegen data are two-color? If so, that accounts
for the vast majority of the differences you observe, I would imagine.
If not, then for single-color nimblegen arrays, I would expect that
GC correction would be useful, also. However, such a correction
probably does not need to account for the base positions, but only the
GC count.
Sean
> I will try to use the normalizeByReference function and report back
when it
> works.
>
> Thanks again!
>
> Best wishes,
> Christian
>
> -----Original Message-----
> From: Wolfgang Huber [mailto:huber at ebi.ac.uk]
> Sent: Friday, July 11, 2008 12:55 AM
> To: Christian Feller
> Cc: 'Sean Davis'; bourgon at ebi.ac.uk; bioconductor at
stat.math.ethz.ch
> Subject: Re: [BioC] GC-content sensitive normalization of Affymetrix
tiling
> arrays for ChIP-chip
>
> Dear Christian,
>
> few points:
>
> - afaIu the background correction method of GC-RMA does not make use
of
> probe sets, it works on individual probes. Probe sets only come into
> play later, for the expression estimate. But getting it to work for
your
> use case may be a hard problem (has anyone on the list managed?)
>
> - vsn2 does not do probe-sequence specific adjustments, so I am not
> sure why it was mentioned in this context.
>
> - the choice of language should be secondary to these criteria:
quality
> of the underlying science and of the implementation.
>
> - you say "how can I take into accound (sic) the GC-effect of single
> probes", but would it make sense to take a step back and tell us why
you
> want to do that and what you want to achieve? Perhaps your answer is
> somewhere else.
>
> - the normalizeByReference function in the tilingArray package
offers a
> method to do probe(sequence)-specific background correction for
> Affymetrix tiling array data, and is described in a paper [1], but I
> have only used it on RNA expression data, not on ChIP, so porting it
to
> that application would need some care.
>
> [1]
http://bioinformatics.oxfordjournals.org/cgi/reprint/22/16/1963.pdf
>
> Best wishes
> Wolfgang
>
>
>
>
>
>
>
> Christian Feller wrote:
>> Hi Sean,
>>
>> Thank you for your quick response! We successfully used MAT under
Python
> for a dataset with 3 control arrays (hybridized with input) and 3 IP
arrays
> (all biological replicates). In comparison with vsn2, probe
standardization
> via MAT significantly increased the signal-to-noise ratio. However,
we have
> still some doubts about the reliability of those results since the
raw data
> seem to be very noisy, and the correlation of the biological
replicates is
> not very strong.
>>
>> Thanks again!
>>
>> Best
>> Christian
>>
>> -----Original Message-----
>> From: seandavi at gmail.com [mailto:seandavi at gmail.com] On
Behalf Of Sean
> Davis
>> Sent: Wednesday, July 09, 2008 2:04 AM
>> To: Christian Feller
>> Cc: bioconductor at stat.math.ethz.ch; bourgon at ebi.ac.uk
>> Subject: Re: [BioC] GC-content sensitive normalization of
Affymetrix
> tiling arrays for ChIP-chip
>>
>> On Tue, Jul 8, 2008 at 6:58 PM, Christian Feller
>> <feller.christian at="" gmail.com=""> wrote:
>>> Dear Richard Bourgon and list,
>>>
>>> I am a newbie in analyzing ChIP-chip Affymetrix tiling arrays
(GeneChip
>>> Drosophila Tiling 1.0R Array).
>>> My question is how can I take into accound the GC-effect of single
probes
> if
>>> I do not have expression sets (due to the nature of a tiling
array)? We
> had
>>> the idea of taking a fixed window size, defining the probes within
them
> as a
>>> "probeset", and using GCRMA for background
correction/normalization. In
>>> addition, can we use this configuration (normalization via GCRMA)
for
>>> profiles with broad ChIP-enriched regions (as it is the case for
many
>>> histone modifications).
>>>
>>> If there are some additional advice especially for the pre-
processing
> steps
>>> I would be very happy!
>>> Until now, we do the normalization using vsn2.
>>
>> Hi, Christian. Do you have the input DNA from which you are going
to
>> form a ratio, or are you attempting to do a single-channel
analysis?
>> If the latter, then you might look at MAT from Shirley Liu's group.
I
>> don't think it is available for R, but the algorithm could probably
be
>> coded in R relatively easily. There are likely other solutions.
>>
>> Sean
>
>