Dear List,
I am reprocessing a previously processed dataset from NCBI GEO. This
is a
Illumina microarray chip. The data provided at GEO is either
normalized with
negative probe values or unnormalized data without control spot
information.
If I avoid the probes with the negative values (can't transfer to
logs) that
leaves only 9500 out of 22000 probes?
Can anybody please suggest how to approach this problem?
Appreciate your help
Prasad
Hi Prasad
If you only have processed Illumina GEO data and the maximum of
expression
value is larger than 100, then I guess the negative values were caused
by
background correction. Also most negative values should be close to
zero,
or else the data may have some problem. If you don't want to throw
away
those negative values, you can do log2(x+offset) to force the negative
values as positives. This may affect the genes with low expression
values.
If you have BeadStudio output file, then you can use vst
transformation in
lumi package instead of log transform. The vst transformation can
handle
negative values.
Pan
Date: Thu, 24 Mar 2011 09:21:56 +1100
From: Wei Shi <shi@wehi.edu.au>
To: Prasad Siddavatam <siddavatam@gmail.com>
Cc: bioconductor@stat.math.ethz.ch
Subject: Re: [BioC] dealing with negative values in illumina
Message-ID: <11617AF1-F976-43EF-9560-F4D283A96CB5@wehi.edu.au>
Content-Type: text/plain; charset="us-ascii"
Dear Prasad:
I am not quite sure what your question is. But if you want to
normalize the raw data by yourself and you want use the control probes
for
the normalization, then you might try limma neqc function which can
infer
the intensities of negative control probes using regular probe
intensities
and their detection p values. The neqc function will then perform a
normexp
background correction aided by negative controls followed by quantile
normalization and log2 transformation.
Hope this helps.
Cheers,
Wei
On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote:
> Dear List,
>
> I am reprocessing a previously processed dataset from NCBI GEO. This
is a
> Illumina microarray chip. The data provided at GEO is either
normalized
with
> negative probe values or unnormalized data without control spot
information.
>
> If I avoid the probes with the negative values (can't transfer to
logs)
that
> leaves only 9500 out of 22000 probes?
>
> Can anybody please suggest how to approach this problem?
>
> Appreciate your help
>
> Prasad
>
[[alternative HTML version deleted]]
If BeadStudio output is available, there won't be a need to process
negative values.
It does not make sense to me to log transform a data set which has
already been normalized.
For a comparison between different BeadChip preprocessing algorithms,
please see http://www.ncbi.nlm.nih.gov/pubmed/20929874
On Mar 24, 2011, at 11:44 PM, Pan Du wrote:
> Hi Prasad
>
> If you only have processed Illumina GEO data and the maximum of
expression
> value is larger than 100, then I guess the negative values were
caused by
> background correction. Also most negative values should be close to
zero,
> or else the data may have some problem. If you don't want to throw
away
> those negative values, you can do log2(x+offset) to force the
negative
> values as positives. This may affect the genes with low expression
values.
> If you have BeadStudio output file, then you can use vst
transformation in
> lumi package instead of log transform. The vst transformation can
handle
> negative values.
>
>
> Pan
>
> Date: Thu, 24 Mar 2011 09:21:56 +1100
> From: Wei Shi <shi at="" wehi.edu.au="">
> To: Prasad Siddavatam <siddavatam at="" gmail.com="">
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] dealing with negative values in illumina
> Message-ID: <11617AF1-F976-43EF-9560-F4D283A96CB5 at wehi.edu.au>
> Content-Type: text/plain; charset="us-ascii"
>
> Dear Prasad:
>
> I am not quite sure what your question is. But if you want to
> normalize the raw data by yourself and you want use the control
probes for
> the normalization, then you might try limma neqc function which can
infer
> the intensities of negative control probes using regular probe
intensities
> and their detection p values. The neqc function will then perform a
normexp
> background correction aided by negative controls followed by
quantile
> normalization and log2 transformation.
>
> Hope this helps.
>
> Cheers,
> Wei
>
>
> On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote:
>
>> Dear List,
>>
>> I am reprocessing a previously processed dataset from NCBI GEO.
This is a
>> Illumina microarray chip. The data provided at GEO is either
normalized
> with
>> negative probe values or unnormalized data without control spot
> information.
>>
>> If I avoid the probes with the negative values (can't transfer to
logs)
> that
>> leaves only 9500 out of 22000 probes?
>>
>> Can anybody please suggest how to approach this problem?
>>
>> Appreciate your help
>>
>> Prasad
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}
Hi,
I have used both Lumi and limma to normalize the data using vst
followed by
rns (Lumi) and negqc (limma). Is there a function in either of the
package
that allows me to output the processed data at the "gene" level as
opposed
to "probe" level? I wrote my own script to look for the probes that
annotate
to the same gene and average them but I was wondering if there is
already a
built-in function in either of the packages.
Thanks
Mete Civelek
-----Original Message-----
From: bioconductor-bounces@r-project.org
[mailto:bioconductor-bounces at r-project.org] On Behalf Of Wei Shi
Sent: Thursday, March 24, 2011 2:43 PM
To: Pan Du
Cc: bioconductor at r-project.org; Prasad Siddavatam
Subject: Re: [BioC] dealing with negative values in illumina
If BeadStudio output is available, there won't be a need to process
negative
values.
It does not make sense to me to log transform a data set which has
already
been normalized.
For a comparison between different BeadChip preprocessing algorithms,
please
see http://www.ncbi.nlm.nih.gov/pubmed/20929874
On Mar 24, 2011, at 11:44 PM, Pan Du wrote:
> Hi Prasad
>
> If you only have processed Illumina GEO data and the maximum of
expression
> value is larger than 100, then I guess the negative values were
caused by
> background correction. Also most negative values should be close to
zero,
> or else the data may have some problem. If you don't want to throw
away
> those negative values, you can do log2(x+offset) to force the
negative
> values as positives. This may affect the genes with low expression
values.
> If you have BeadStudio output file, then you can use vst
transformation in
> lumi package instead of log transform. The vst transformation can
handle
> negative values.
>
>
> Pan
>
> Date: Thu, 24 Mar 2011 09:21:56 +1100
> From: Wei Shi <shi at="" wehi.edu.au="">
> To: Prasad Siddavatam <siddavatam at="" gmail.com="">
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] dealing with negative values in illumina
> Message-ID: <11617AF1-F976-43EF-9560-F4D283A96CB5 at wehi.edu.au>
> Content-Type: text/plain; charset="us-ascii"
>
> Dear Prasad:
>
> I am not quite sure what your question is. But if you want to
> normalize the raw data by yourself and you want use the control
probes for
> the normalization, then you might try limma neqc function which can
infer
> the intensities of negative control probes using regular probe
intensities
> and their detection p values. The neqc function will then perform a
normexp
> background correction aided by negative controls followed by
quantile
> normalization and log2 transformation.
>
> Hope this helps.
>
> Cheers,
> Wei
>
>
> On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote:
>
>> Dear List,
>>
>> I am reprocessing a previously processed dataset from NCBI GEO.
This is a
>> Illumina microarray chip. The data provided at GEO is either
normalized
> with
>> negative probe values or unnormalized data without control spot
> information.
>>
>> If I avoid the probes with the negative values (can't transfer to
logs)
> that
>> leaves only 9500 out of 22000 probes?
>>
>> Can anybody please suggest how to approach this problem?
>>
>> Appreciate your help
>>
>> Prasad
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:19}}
Hi Mete:
Limma does not provide such a function. Part of the reason for
this is that different people summarize probe level intensities to
gene level intensities in different ways. The way we use is to select
the probe which has the largest mean expression intensity across all
arrays among all probes which correspond to the same gene. This is a
more unbiased selection method compared to the method of selecting
probes by fold changes or other methods.
Cheers,
Wei
On Mar 29, 2011, at 3:59 AM, Mete Civelek wrote:
> Hi,
>
> I have used both Lumi and limma to normalize the data using vst
followed by
> rns (Lumi) and negqc (limma). Is there a function in either of the
package
> that allows me to output the processed data at the "gene" level as
opposed
> to "probe" level? I wrote my own script to look for the probes that
annotate
> to the same gene and average them but I was wondering if there is
already a
> built-in function in either of the packages.
>
> Thanks
>
> Mete Civelek
>
> -----Original Message-----
> From: bioconductor-bounces at r-project.org
> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Wei Shi
> Sent: Thursday, March 24, 2011 2:43 PM
> To: Pan Du
> Cc: bioconductor at r-project.org; Prasad Siddavatam
> Subject: Re: [BioC] dealing with negative values in illumina
>
> If BeadStudio output is available, there won't be a need to process
negative
> values.
>
> It does not make sense to me to log transform a data set which has
already
> been normalized.
>
> For a comparison between different BeadChip preprocessing
algorithms, please
> see http://www.ncbi.nlm.nih.gov/pubmed/20929874
>
>
> On Mar 24, 2011, at 11:44 PM, Pan Du wrote:
>
>> Hi Prasad
>>
>> If you only have processed Illumina GEO data and the maximum of
expression
>> value is larger than 100, then I guess the negative values were
caused by
>> background correction. Also most negative values should be close
to zero,
>> or else the data may have some problem. If you don't want to throw
away
>> those negative values, you can do log2(x+offset) to force the
negative
>> values as positives. This may affect the genes with low expression
values.
>> If you have BeadStudio output file, then you can use vst
transformation in
>> lumi package instead of log transform. The vst transformation can
handle
>> negative values.
>>
>>
>> Pan
>>
>> Date: Thu, 24 Mar 2011 09:21:56 +1100
>> From: Wei Shi <shi at="" wehi.edu.au="">
>> To: Prasad Siddavatam <siddavatam at="" gmail.com="">
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] dealing with negative values in illumina
>> Message-ID: <11617AF1-F976-43EF-9560-F4D283A96CB5 at wehi.edu.au>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> Dear Prasad:
>>
>> I am not quite sure what your question is. But if you want to
>> normalize the raw data by yourself and you want use the control
probes for
>> the normalization, then you might try limma neqc function which can
infer
>> the intensities of negative control probes using regular probe
intensities
>> and their detection p values. The neqc function will then perform a
> normexp
>> background correction aided by negative controls followed by
quantile
>> normalization and log2 transformation.
>>
>> Hope this helps.
>>
>> Cheers,
>> Wei
>>
>>
>> On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote:
>>
>>> Dear List,
>>>
>>> I am reprocessing a previously processed dataset from NCBI GEO.
This is a
>>> Illumina microarray chip. The data provided at GEO is either
normalized
>> with
>>> negative probe values or unnormalized data without control spot
>> information.
>>>
>>> If I avoid the probes with the negative values (can't transfer to
logs)
>> that
>>> leaves only 9500 out of 22000 probes?
>>>
>>> Can anybody please suggest how to approach this problem?
>>>
>>> Appreciate your help
>>>
>>> Prasad
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
______________________________________________________________________
> The information in this email is confidential and
inte...{{dropped:18}}
Hi Mete:
Gordon just pointed to me that there is a function in limma
which can summarize probe level intensities to the gene level. This
function is called avereps. This function replaces the intensities of
replicate probes (probes corresponding to the same gene in this case)
with their average. It works seamlessly with neqc output because it
supports EList objects.
Sorry about this.
Cheers,
Wei
On Mar 29, 2011, at 9:07 AM, Wei Shi wrote:
> Hi Mete:
>
> Limma does not provide such a function. Part of the reason for
this is that different people summarize probe level intensities to
gene level intensities in different ways. The way we use is to select
the probe which has the largest mean expression intensity across all
arrays among all probes which correspond to the same gene. This is a
more unbiased selection method compared to the method of selecting
probes by fold changes or other methods.
>
> Cheers,
> Wei
>
> On Mar 29, 2011, at 3:59 AM, Mete Civelek wrote:
>
>> Hi,
>>
>> I have used both Lumi and limma to normalize the data using vst
followed by
>> rns (Lumi) and negqc (limma). Is there a function in either of the
package
>> that allows me to output the processed data at the "gene" level as
opposed
>> to "probe" level? I wrote my own script to look for the probes that
annotate
>> to the same gene and average them but I was wondering if there is
already a
>> built-in function in either of the packages.
>>
>> Thanks
>>
>> Mete Civelek
>>
>> -----Original Message-----
>> From: bioconductor-bounces at r-project.org
>> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Wei Shi
>> Sent: Thursday, March 24, 2011 2:43 PM
>> To: Pan Du
>> Cc: bioconductor at r-project.org; Prasad Siddavatam
>> Subject: Re: [BioC] dealing with negative values in illumina
>>
>> If BeadStudio output is available, there won't be a need to process
negative
>> values.
>>
>> It does not make sense to me to log transform a data set which has
already
>> been normalized.
>>
>> For a comparison between different BeadChip preprocessing
algorithms, please
>> see http://www.ncbi.nlm.nih.gov/pubmed/20929874
>>
>>
>> On Mar 24, 2011, at 11:44 PM, Pan Du wrote:
>>
>>> Hi Prasad
>>>
>>> If you only have processed Illumina GEO data and the maximum of
expression
>>> value is larger than 100, then I guess the negative values were
caused by
>>> background correction. Also most negative values should be close
to zero,
>>> or else the data may have some problem. If you don't want to
throw away
>>> those negative values, you can do log2(x+offset) to force the
negative
>>> values as positives. This may affect the genes with low expression
values.
>>> If you have BeadStudio output file, then you can use vst
transformation in
>>> lumi package instead of log transform. The vst transformation can
handle
>>> negative values.
>>>
>>>
>>> Pan
>>>
>>> Date: Thu, 24 Mar 2011 09:21:56 +1100
>>> From: Wei Shi <shi at="" wehi.edu.au="">
>>> To: Prasad Siddavatam <siddavatam at="" gmail.com="">
>>> Cc: bioconductor at stat.math.ethz.ch
>>> Subject: Re: [BioC] dealing with negative values in illumina
>>> Message-ID: <11617AF1-F976-43EF-9560-F4D283A96CB5 at wehi.edu.au>
>>> Content-Type: text/plain; charset="us-ascii"
>>>
>>> Dear Prasad:
>>>
>>> I am not quite sure what your question is. But if you want to
>>> normalize the raw data by yourself and you want use the control
probes for
>>> the normalization, then you might try limma neqc function which
can infer
>>> the intensities of negative control probes using regular probe
intensities
>>> and their detection p values. The neqc function will then perform
a
>> normexp
>>> background correction aided by negative controls followed by
quantile
>>> normalization and log2 transformation.
>>>
>>> Hope this helps.
>>>
>>> Cheers,
>>> Wei
>>>
>>>
>>> On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote:
>>>
>>>> Dear List,
>>>>
>>>> I am reprocessing a previously processed dataset from NCBI GEO.
This is a
>>>> Illumina microarray chip. The data provided at GEO is either
normalized
>>> with
>>>> negative probe values or unnormalized data without control spot
>>> information.
>>>>
>>>> If I avoid the probes with the negative values (can't transfer to
logs)
>>> that
>>>> leaves only 9500 out of 22000 probes?
>>>>
>>>> Can anybody please suggest how to approach this problem?
>>>>
>>>> Appreciate your help
>>>>
>>>> Prasad
>>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>>
______________________________________________________________________
>> The information in this email is confidential and
intend...{{dropped:19}}
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}
Hi Prasad
On 03/23/2011 06:28 AM, Prasad Siddavatam wrote:
> I am reprocessing a previously processed dataset from NCBI GEO. This
is
a
> Illumina microarray chip. The data provided at GEO is either
normalized
with
> negative probe values or unnormalized data without control spot
information.
>
> If I avoid the probes with the negative values (can't transfer to
logs)
that
> leaves only 9500 out of 22000 probes?
>
> Can anybody please suggest how to approach this problem?
You should use the log transform only on florescence intensities or on
ratios of these, anyway -- for other kind of data, it makes no sense.
Raw florescence intensities typically have values between 0 and 2^16-1
=
65535. If there are negative numbers among your data, you are looking
at
something else, and you shouldn't proceed before understanding that.
Maybe your data is already log transformed? If they log transformed
ratios, you will get negative values whenever the ratio is smaller
than
1. If you log transform this a second time, you won't get anywhere.
Simon
Dear Prasad:
I am not quite sure what your question is. But if you want to
normalize the raw data by yourself and you want use the control probes
for the normalization, then you might try limma neqc function which
can infer the intensities of negative control probes using regular
probe intensities and their detection p values. The neqc function will
then perform a normexp background correction aided by negative
controls followed by quantile normalization and log2 transformation.
Hope this helps.
Cheers,
Wei
On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote:
> Dear List,
>
> I am reprocessing a previously processed dataset from NCBI GEO. This
is a
> Illumina microarray chip. The data provided at GEO is either
normalized with
> negative probe values or unnormalized data without control spot
information.
>
> If I avoid the probes with the negative values (can't transfer to
logs) that
> leaves only 9500 out of 22000 probes?
>
> Can anybody please suggest how to approach this problem?
>
> Appreciate your help
>
> Prasad
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}