dealing with negative values in illumina

0

Entering edit mode

Prasad Siddavatam ▴ 150

@prasad-siddavatam-4508

Last seen 9.6 years ago

United States

Dear List, I am reprocessing a previously processed dataset from NCBI GEO. This is a Illumina microarray chip. The data provided at GEO is either normalized with negative probe values or unnormalized data without control spot information. If I avoid the probes with the negative values (can't transfer to logs) that leaves only 9500 out of 22000 probes? Can anybody please suggest how to approach this problem? Appreciate your help Prasad

Microarray probe Microarray probe • 3.6k views

ADD COMMENT • link updated 13.4 years ago by Pan Du ▴ 80 • written 13.4 years ago by Prasad Siddavatam ▴ 150

0

Entering edit mode

Pan Du ▴ 80

@pan-du-4535

Last seen 9.5 years ago

United States

Hi Prasad If you only have processed Illumina GEO data and the maximum of expression value is larger than 100, then I guess the negative values were caused by background correction. Also most negative values should be close to zero, or else the data may have some problem. If you don't want to throw away those negative values, you can do log2(x+offset) to force the negative values as positives. This may affect the genes with low expression values. If you have BeadStudio output file, then you can use vst transformation in lumi package instead of log transform. The vst transformation can handle negative values. Pan Date: Thu, 24 Mar 2011 09:21:56 +1100 From: Wei Shi <shi@wehi.edu.au> To: Prasad Siddavatam <siddavatam@gmail.com> Cc: bioconductor@stat.math.ethz.ch Subject: Re: [BioC] dealing with negative values in illumina Message-ID: <11617AF1-F976-43EF-9560-F4D283A96CB5@wehi.edu.au> Content-Type: text/plain; charset="us-ascii" Dear Prasad: I am not quite sure what your question is. But if you want to normalize the raw data by yourself and you want use the control probes for the normalization, then you might try limma neqc function which can infer the intensities of negative control probes using regular probe intensities and their detection p values. The neqc function will then perform a normexp background correction aided by negative controls followed by quantile normalization and log2 transformation. Hope this helps. Cheers, Wei On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote: > Dear List, > > I am reprocessing a previously processed dataset from NCBI GEO. This is a > Illumina microarray chip. The data provided at GEO is either normalized with > negative probe values or unnormalized data without control spot information. > > If I avoid the probes with the negative values (can't transfer to logs) that > leaves only 9500 out of 22000 probes? > > Can anybody please suggest how to approach this problem? > > Appreciate your help > > Prasad > [[alternative HTML version deleted]]

ADD COMMENT • link 13.4 years ago Pan Du ▴ 80

0

Entering edit mode

If BeadStudio output is available, there won't be a need to process negative values. It does not make sense to me to log transform a data set which has already been normalized. For a comparison between different BeadChip preprocessing algorithms, please see http://www.ncbi.nlm.nih.gov/pubmed/20929874 On Mar 24, 2011, at 11:44 PM, Pan Du wrote: > Hi Prasad > > If you only have processed Illumina GEO data and the maximum of expression > value is larger than 100, then I guess the negative values were caused by > background correction. Also most negative values should be close to zero, > or else the data may have some problem. If you don't want to throw away > those negative values, you can do log2(x+offset) to force the negative > values as positives. This may affect the genes with low expression values. > If you have BeadStudio output file, then you can use vst transformation in > lumi package instead of log transform. The vst transformation can handle > negative values. > > > Pan > > Date: Thu, 24 Mar 2011 09:21:56 +1100 > From: Wei Shi <shi at="" wehi.edu.au=""> > To: Prasad Siddavatam <siddavatam at="" gmail.com=""> > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] dealing with negative values in illumina > Message-ID: <11617AF1-F976-43EF-9560-F4D283A96CB5 at wehi.edu.au> > Content-Type: text/plain; charset="us-ascii" > > Dear Prasad: > > I am not quite sure what your question is. But if you want to > normalize the raw data by yourself and you want use the control probes for > the normalization, then you might try limma neqc function which can infer > the intensities of negative control probes using regular probe intensities > and their detection p values. The neqc function will then perform a normexp > background correction aided by negative controls followed by quantile > normalization and log2 transformation. > > Hope this helps. > > Cheers, > Wei > > > On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote: > >> Dear List, >> >> I am reprocessing a previously processed dataset from NCBI GEO. This is a >> Illumina microarray chip. The data provided at GEO is either normalized > with >> negative probe values or unnormalized data without control spot > information. >> >> If I avoid the probes with the negative values (can't transfer to logs) > that >> leaves only 9500 out of 22000 probes? >> >> Can anybody please suggest how to approach this problem? >> >> Appreciate your help >> >> Prasad >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.3 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Hi, I have used both Lumi and limma to normalize the data using vst followed by rns (Lumi) and negqc (limma). Is there a function in either of the package that allows me to output the processed data at the "gene" level as opposed to "probe" level? I wrote my own script to look for the probes that annotate to the same gene and average them but I was wondering if there is already a built-in function in either of the packages. Thanks Mete Civelek -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of Wei Shi Sent: Thursday, March 24, 2011 2:43 PM To: Pan Du Cc: bioconductor at r-project.org; Prasad Siddavatam Subject: Re: [BioC] dealing with negative values in illumina If BeadStudio output is available, there won't be a need to process negative values. It does not make sense to me to log transform a data set which has already been normalized. For a comparison between different BeadChip preprocessing algorithms, please see http://www.ncbi.nlm.nih.gov/pubmed/20929874 On Mar 24, 2011, at 11:44 PM, Pan Du wrote: > Hi Prasad > > If you only have processed Illumina GEO data and the maximum of expression > value is larger than 100, then I guess the negative values were caused by > background correction. Also most negative values should be close to zero, > or else the data may have some problem. If you don't want to throw away > those negative values, you can do log2(x+offset) to force the negative > values as positives. This may affect the genes with low expression values. > If you have BeadStudio output file, then you can use vst transformation in > lumi package instead of log transform. The vst transformation can handle > negative values. > > > Pan > > Date: Thu, 24 Mar 2011 09:21:56 +1100 > From: Wei Shi <shi at="" wehi.edu.au=""> > To: Prasad Siddavatam <siddavatam at="" gmail.com=""> > Cc: bioconductor at stat.math.ethz.ch > Subject: Re: [BioC] dealing with negative values in illumina > Message-ID: <11617AF1-F976-43EF-9560-F4D283A96CB5 at wehi.edu.au> > Content-Type: text/plain; charset="us-ascii" > > Dear Prasad: > > I am not quite sure what your question is. But if you want to > normalize the raw data by yourself and you want use the control probes for > the normalization, then you might try limma neqc function which can infer > the intensities of negative control probes using regular probe intensities > and their detection p values. The neqc function will then perform a normexp > background correction aided by negative controls followed by quantile > normalization and log2 transformation. > > Hope this helps. > > Cheers, > Wei > > > On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote: > >> Dear List, >> >> I am reprocessing a previously processed dataset from NCBI GEO. This is a >> Illumina microarray chip. The data provided at GEO is either normalized > with >> negative probe values or unnormalized data without control spot > information. >> >> If I avoid the probes with the negative values (can't transfer to logs) > that >> leaves only 9500 out of 22000 probes? >> >> Can anybody please suggest how to approach this problem? >> >> Appreciate your help >> >> Prasad >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:19}}

ADD REPLY • link 13.3 years ago Mete Civelek ▴ 180

0

Entering edit mode

Hi Mete: Limma does not provide such a function. Part of the reason for this is that different people summarize probe level intensities to gene level intensities in different ways. The way we use is to select the probe which has the largest mean expression intensity across all arrays among all probes which correspond to the same gene. This is a more unbiased selection method compared to the method of selecting probes by fold changes or other methods. Cheers, Wei On Mar 29, 2011, at 3:59 AM, Mete Civelek wrote: > Hi, > > I have used both Lumi and limma to normalize the data using vst followed by > rns (Lumi) and negqc (limma). Is there a function in either of the package > that allows me to output the processed data at the "gene" level as opposed > to "probe" level? I wrote my own script to look for the probes that annotate > to the same gene and average them but I was wondering if there is already a > built-in function in either of the packages. > > Thanks > > Mete Civelek > > -----Original Message----- > From: bioconductor-bounces at r-project.org > [mailto:bioconductor-bounces at r-project.org] On Behalf Of Wei Shi > Sent: Thursday, March 24, 2011 2:43 PM > To: Pan Du > Cc: bioconductor at r-project.org; Prasad Siddavatam > Subject: Re: [BioC] dealing with negative values in illumina > > If BeadStudio output is available, there won't be a need to process negative > values. > > It does not make sense to me to log transform a data set which has already > been normalized. > > For a comparison between different BeadChip preprocessing algorithms, please > see http://www.ncbi.nlm.nih.gov/pubmed/20929874 > > > On Mar 24, 2011, at 11:44 PM, Pan Du wrote: > >> Hi Prasad >> >> If you only have processed Illumina GEO data and the maximum of expression >> value is larger than 100, then I guess the negative values were caused by >> background correction. Also most negative values should be close to zero, >> or else the data may have some problem. If you don't want to throw away >> those negative values, you can do log2(x+offset) to force the negative >> values as positives. This may affect the genes with low expression values. >> If you have BeadStudio output file, then you can use vst transformation in >> lumi package instead of log transform. The vst transformation can handle >> negative values. >> >> >> Pan >> >> Date: Thu, 24 Mar 2011 09:21:56 +1100 >> From: Wei Shi <shi at="" wehi.edu.au=""> >> To: Prasad Siddavatam <siddavatam at="" gmail.com=""> >> Cc: bioconductor at stat.math.ethz.ch >> Subject: Re: [BioC] dealing with negative values in illumina >> Message-ID: <11617AF1-F976-43EF-9560-F4D283A96CB5 at wehi.edu.au> >> Content-Type: text/plain; charset="us-ascii" >> >> Dear Prasad: >> >> I am not quite sure what your question is. But if you want to >> normalize the raw data by yourself and you want use the control probes for >> the normalization, then you might try limma neqc function which can infer >> the intensities of negative control probes using regular probe intensities >> and their detection p values. The neqc function will then perform a > normexp >> background correction aided by negative controls followed by quantile >> normalization and log2 transformation. >> >> Hope this helps. >> >> Cheers, >> Wei >> >> >> On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote: >> >>> Dear List, >>> >>> I am reprocessing a previously processed dataset from NCBI GEO. This is a >>> Illumina microarray chip. The data provided at GEO is either normalized >> with >>> negative probe values or unnormalized data without control spot >> information. >>> >>> If I avoid the probes with the negative values (can't transfer to logs) >> that >>> leaves only 9500 out of 22000 probes? >>> >>> Can anybody please suggest how to approach this problem? >>> >>> Appreciate your help >>> >>> Prasad >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:18}}

ADD REPLY • link 13.3 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Hi Mete: Gordon just pointed to me that there is a function in limma which can summarize probe level intensities to the gene level. This function is called avereps. This function replaces the intensities of replicate probes (probes corresponding to the same gene in this case) with their average. It works seamlessly with neqc output because it supports EList objects. Sorry about this. Cheers, Wei On Mar 29, 2011, at 9:07 AM, Wei Shi wrote: > Hi Mete: > > Limma does not provide such a function. Part of the reason for this is that different people summarize probe level intensities to gene level intensities in different ways. The way we use is to select the probe which has the largest mean expression intensity across all arrays among all probes which correspond to the same gene. This is a more unbiased selection method compared to the method of selecting probes by fold changes or other methods. > > Cheers, > Wei > > On Mar 29, 2011, at 3:59 AM, Mete Civelek wrote: > >> Hi, >> >> I have used both Lumi and limma to normalize the data using vst followed by >> rns (Lumi) and negqc (limma). Is there a function in either of the package >> that allows me to output the processed data at the "gene" level as opposed >> to "probe" level? I wrote my own script to look for the probes that annotate >> to the same gene and average them but I was wondering if there is already a >> built-in function in either of the packages. >> >> Thanks >> >> Mete Civelek >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org >> [mailto:bioconductor-bounces at r-project.org] On Behalf Of Wei Shi >> Sent: Thursday, March 24, 2011 2:43 PM >> To: Pan Du >> Cc: bioconductor at r-project.org; Prasad Siddavatam >> Subject: Re: [BioC] dealing with negative values in illumina >> >> If BeadStudio output is available, there won't be a need to process negative >> values. >> >> It does not make sense to me to log transform a data set which has already >> been normalized. >> >> For a comparison between different BeadChip preprocessing algorithms, please >> see http://www.ncbi.nlm.nih.gov/pubmed/20929874 >> >> >> On Mar 24, 2011, at 11:44 PM, Pan Du wrote: >> >>> Hi Prasad >>> >>> If you only have processed Illumina GEO data and the maximum of expression >>> value is larger than 100, then I guess the negative values were caused by >>> background correction. Also most negative values should be close to zero, >>> or else the data may have some problem. If you don't want to throw away >>> those negative values, you can do log2(x+offset) to force the negative >>> values as positives. This may affect the genes with low expression values. >>> If you have BeadStudio output file, then you can use vst transformation in >>> lumi package instead of log transform. The vst transformation can handle >>> negative values. >>> >>> >>> Pan >>> >>> Date: Thu, 24 Mar 2011 09:21:56 +1100 >>> From: Wei Shi <shi at="" wehi.edu.au=""> >>> To: Prasad Siddavatam <siddavatam at="" gmail.com=""> >>> Cc: bioconductor at stat.math.ethz.ch >>> Subject: Re: [BioC] dealing with negative values in illumina >>> Message-ID: <11617AF1-F976-43EF-9560-F4D283A96CB5 at wehi.edu.au> >>> Content-Type: text/plain; charset="us-ascii" >>> >>> Dear Prasad: >>> >>> I am not quite sure what your question is. But if you want to >>> normalize the raw data by yourself and you want use the control probes for >>> the normalization, then you might try limma neqc function which can infer >>> the intensities of negative control probes using regular probe intensities >>> and their detection p values. The neqc function will then perform a >> normexp >>> background correction aided by negative controls followed by quantile >>> normalization and log2 transformation. >>> >>> Hope this helps. >>> >>> Cheers, >>> Wei >>> >>> >>> On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote: >>> >>>> Dear List, >>>> >>>> I am reprocessing a previously processed dataset from NCBI GEO. This is a >>>> Illumina microarray chip. The data provided at GEO is either normalized >>> with >>>> negative probe values or unnormalized data without control spot >>> information. >>>> >>>> If I avoid the probes with the negative values (can't transfer to logs) >>> that >>>> leaves only 9500 out of 22000 probes? >>>> >>>> Can anybody please suggest how to approach this problem? >>>> >>>> Appreciate your help >>>> >>>> Prasad >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> ______________________________________________________________________ >> The information in this email is confidential and intend...{{dropped:19}} >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD REPLY • link 13.3 years ago Wei Shi ★ 3.6k

0

Entering edit mode

Simon Anders ★ 3.7k

@simon-anders-3855

Last seen 4.0 years ago

Zentrum für Molekularbiologie, Universi…

Hi Prasad On 03/23/2011 06:28 AM, Prasad Siddavatam wrote: > I am reprocessing a previously processed dataset from NCBI GEO. This is a > Illumina microarray chip. The data provided at GEO is either normalized with > negative probe values or unnormalized data without control spot information. > > If I avoid the probes with the negative values (can't transfer to logs) that > leaves only 9500 out of 22000 probes? > > Can anybody please suggest how to approach this problem? You should use the log transform only on florescence intensities or on ratios of these, anyway -- for other kind of data, it makes no sense. Raw florescence intensities typically have values between 0 and 2^16-1 = 65535. If there are negative numbers among your data, you are looking at something else, and you shouldn't proceed before understanding that. Maybe your data is already log transformed? If they log transformed ratios, you will get negative values whenever the ratio is smaller than 1. If you log transform this a second time, you won't get anywhere. Simon

ADD COMMENT • link 13.4 years ago Simon Anders ★ 3.7k

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 6 weeks ago

Australia/Melbourne/Olivia Newton-John …

Dear Prasad: I am not quite sure what your question is. But if you want to normalize the raw data by yourself and you want use the control probes for the normalization, then you might try limma neqc function which can infer the intensities of negative control probes using regular probe intensities and their detection p values. The neqc function will then perform a normexp background correction aided by negative controls followed by quantile normalization and log2 transformation. Hope this helps. Cheers, Wei On Mar 23, 2011, at 4:28 PM, Prasad Siddavatam wrote: > Dear List, > > I am reprocessing a previously processed dataset from NCBI GEO. This is a > Illumina microarray chip. The data provided at GEO is either normalized with > negative probe values or unnormalized data without control spot information. > > If I avoid the probes with the negative values (can't transfer to logs) that > leaves only 9500 out of 22000 probes? > > Can anybody please suggest how to approach this problem? > > Appreciate your help > > Prasad > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}

ADD COMMENT • link 13.4 years ago Wei Shi ★ 3.6k

Login before adding your answer.