Use probesets with highest baseline expression for differntial gene expression in LIMMA

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 11.4 years ago

Hello All, I am relatively new to R and bioconductor. I would like to know if there is a way to alter LIMMA defualt options such that the package instead of averaging signal intensities of probesets selects the probesets with highest baseline expression/signal intensity? Any help would be greatly appreciated. -- output of sessionInfo(): > sessionInfo() R version 2.9.1 (2009-06-26) i386-pc-mingw32 locale: LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONETARY= English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_2.18.3 -- Sent via the guest posting facility at bioconductor.org.

limma limma • 1.5k views

ADD COMMENT • link updated 14.0 years ago by James W. MacDonald 68k • written 14.0 years ago by Guest User ★ 13k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 days ago

United States

Hi Ekta, On 2/21/2012 10:57 PM, Ekta [guest] wrote: > Hello All, > I am relatively new to R and bioconductor. I would like to know if there is a way to alter LIMMA defualt options such that the package instead of averaging signal intensities of probesets selects the probesets with highest baseline > expression/signal intensity? You will have to be more precise than that. What exactly do you mean by 'selects the probesets with highest baseline expression'? Do you just want any probesets where one or more samples has high expression? That doesn't require limma. Or do you want probesets where some of the samples have much higher expression than others? Best, Jim > > Any help would be greatly appreciated. > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 2.9.1 (2009-06-26) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONETAR Y=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_2.18.3 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.0 years ago James W. MacDonald 68k

0

Entering edit mode

Hi Jim, I am using an affymetrix chip data. I need to analyse my dataset for differential gene expression (LIMMA). Each gene can be referenced by multiple probesets and while performing LIMMA the expression values of these multiple probesets gets averaged and this averaged value is assigned to that gene. I need to be able to simply select the probeset with the highest expression value to represent a gene. LIMMA by default averages the probeset values. I am not sure if i need to modify any default settings in LIMMA or use another package. Thanks Regards, Ekta -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 22 February 2012 19:26 To: Ekta [guest] Cc: bioconductor at r-project.org; Ekta Jain Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA Hi Ekta, On 2/21/2012 10:57 PM, Ekta [guest] wrote: > Hello All, > I am relatively new to R and bioconductor. I would like to know if there is a way to alter LIMMA defualt options such that the package instead of averaging signal intensities of probesets selects the probesets with highest baseline > expression/signal intensity? You will have to be more precise than that. What exactly do you mean by 'selects the probesets with highest baseline expression'? Do you just want any probesets where one or more samples has high expression? That doesn't require limma. Or do you want probesets where some of the samples have much higher expression than others? Best, Jim > > Any help would be greatly appreciated. > > > > -- output of sessionInfo(): > >> sessionInfo() > R version 2.9.1 (2009-06-26) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONETAR Y=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_2.18.3 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email. www.jubl.com

ADD REPLY • link 14.0 years ago Ekta Jain ▴ 370

0

Entering edit mode

Hi Ekta, On 2/22/2012 10:06 PM, Ekta Jain wrote: > Hi Jim, > I am using an affymetrix chip data. I need to analyse my dataset for differential gene expression (LIMMA). Each gene can be referenced by multiple probesets and while performing LIMMA the expression values of these multiple probesets gets averaged and this averaged value is assigned to that gene. I need to be able to simply select the probeset with the highest expression value to represent a gene. > > LIMMA by default averages the probeset values. This is not true. The limma package doesn't know or care that two probesets are intended to interrogate the same gene, and doesn't do the averaging that you think it does. You can't even do a mixed model, using the 'duplicate' probesets because they aren't duplicates, and you don't have the same number of probesets per gene. What limma does is make univariate comparisons by-probeset, so if you have four probesets that interrogate the same gene transcript, then you will do four tests. Now you could make the assumption (unfounded, IMO) that all the probesets that are intended to measure a particular transcript are really measuring the same thing, and then choose to use just one of them based on some metric. As an example, you could use 'highest expression value', which doesn't make any sense to me. To expound on that last statement, let's say you have two transcripts that are purported to measure the same gene. Now let's further stipulate that one of these probesets has really high expression (somewhere around 2^14), but the expression isn't materially different between any of your samples. In addition, the other probeset has almost undetectable expression in one set of samples, but some middling expression (say 2^8) in another set. Do you really want to throw out the latter probeset in favor of the former? Now back to your question. If you want to pre-filter the data (again, not recommended with the limma package, due to the empirical Bayes estimator), you can use the findLargest() function in the genefilter package. You have to supply a test statistic to this function, for which you could use either the rowMean(), which will give you the highest average expression, or you could do something like apply(exprs(eset),1 , max) to get the maximum expression value. Best, Jim > > I am not sure if i need to modify any default settings in LIMMA or use another package. > > Thanks > > Regards, > Ekta > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 22 February 2012 19:26 > To: Ekta [guest] > Cc: bioconductor at r-project.org; Ekta Jain > Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA > > Hi Ekta, > > On 2/21/2012 10:57 PM, Ekta [guest] wrote: >> Hello All, >> I am relatively new to R and bioconductor. I would like to know if there is a way to alter LIMMA defualt options such that the package instead of averaging signal intensities of probesets selects the probesets with highest baseline >> expression/signal intensity? > You will have to be more precise than that. What exactly do you mean by > 'selects the probesets with highest baseline expression'? Do you just > want any probesets where one or more samples has high expression? That > doesn't require limma. Or do you want probesets where some of the > samples have much higher expression than others? > > Best, > > Jim > > >> Any help would be greatly appreciated. >> >> >> >> -- output of sessionInfo(): >> >>> sessionInfo() >> R version 2.9.1 (2009-06-26) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONETA RY=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] limma_2.18.3 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email. > www.jubl.com >

ADD REPLY • link 14.0 years ago James W. MacDonald 68k

0

Entering edit mode

Hello Jim, Thank you very much for your detailed reply. I did have some misconceptions about LIMMA indeed. I am not much in charge of the methodology in this case unfortunately and the requirement is to ignore the other expression values for probesets and only keep the probeset with maximum expression value for that gene symbol. I am afraid i am unable to use the findLargest() function from the gene filter since it needs the ENTREZ ID annotation and i am using annotation from a tab delimited text file. Working on the Human Gene 1.0 Gene ST Array and the relevant packages do not exist for the latest version of R. I will try and tweak it in my favour. Alternatively I also tried the solution provided by Gordon but encounter memory errors. Will have to try the same on a higher RAM Machine. Thanks and Regards, Ekta -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 23 February 2012 19:55 To: Ekta Jain Cc: bioconductor at r-project.org Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA Hi Ekta, On 2/22/2012 10:06 PM, Ekta Jain wrote: > Hi Jim, > I am using an affymetrix chip data. I need to analyse my dataset for differential gene expression (LIMMA). Each gene can be referenced by multiple probesets and while performing LIMMA the expression values of these multiple probesets gets averaged and this averaged value is assigned to that gene. I need to be able to simply select the probeset with the highest expression value to represent a gene. > > LIMMA by default averages the probeset values. This is not true. The limma package doesn't know or care that two probesets are intended to interrogate the same gene, and doesn't do the averaging that you think it does. You can't even do a mixed model, using the 'duplicate' probesets because they aren't duplicates, and you don't have the same number of probesets per gene. What limma does is make univariate comparisons by-probeset, so if you have four probesets that interrogate the same gene transcript, then you will do four tests. Now you could make the assumption (unfounded, IMO) that all the probesets that are intended to measure a particular transcript are really measuring the same thing, and then choose to use just one of them based on some metric. As an example, you could use 'highest expression value', which doesn't make any sense to me. To expound on that last statement, let's say you have two transcripts that are purported to measure the same gene. Now let's further stipulate that one of these probesets has really high expression (somewhere around 2^14), but the expression isn't materially different between any of your samples. In addition, the other probeset has almost undetectable expression in one set of samples, but some middling expression (say 2^8) in another set. Do you really want to throw out the latter probeset in favor of the former? Now back to your question. If you want to pre-filter the data (again, not recommended with the limma package, due to the empirical Bayes estimator), you can use the findLargest() function in the genefilter package. You have to supply a test statistic to this function, for which you could use either the rowMean(), which will give you the highest average expression, or you could do something like apply(exprs(eset),1 , max) to get the maximum expression value. Best, Jim > > I am not sure if i need to modify any default settings in LIMMA or use another package. > > Thanks > > Regards, > Ekta > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 22 February 2012 19:26 > To: Ekta [guest] > Cc: bioconductor at r-project.org; Ekta Jain > Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA > > Hi Ekta, > > On 2/21/2012 10:57 PM, Ekta [guest] wrote: >> Hello All, >> I am relatively new to R and bioconductor. I would like to know if there is a way to alter LIMMA defualt options such that the package instead of averaging signal intensities of probesets selects the probesets with highest baseline >> expression/signal intensity? > You will have to be more precise than that. What exactly do you mean by > 'selects the probesets with highest baseline expression'? Do you just > want any probesets where one or more samples has high expression? That > doesn't require limma. Or do you want probesets where some of the > samples have much higher expression than others? > > Best, > > Jim > > >> Any help would be greatly appreciated. >> >> >> >> -- output of sessionInfo(): >> >>> sessionInfo() >> R version 2.9.1 (2009-06-26) >> i386-pc-mingw32 >> >> locale: >> LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONETA RY=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] limma_2.18.3 >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email. > www.jubl.com > The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email. www.jubl.com

ADD REPLY • link 14.0 years ago Ekta Jain ▴ 370

0

Entering edit mode

Hi Ekta, The relevant annotation packages do indeed exist for the Human Gene ST arrays on the latest version of R. Try source("http://www.bioconductor.org/biocLite.R") biocLite("hugene10sttranscriptcluster.db") You may also need to change the annotation of your ExpressionSet, so if you can do something like: annotation(eset) <- "hugene10sttranscriptcluster.db" Best, Jim On 2/27/12 3:10 AM, Ekta Jain wrote: > Hello Jim, > Thank you very much for your detailed reply. I did have some misconceptions about LIMMA indeed. I am not much in charge of the methodology in this case unfortunately and the requirement is to ignore the other expression values for probesets and only keep the probeset with maximum expression value for that gene symbol. > I am afraid i am unable to use the findLargest() function from the gene filter since it needs the ENTREZ ID annotation and i am using annotation from a tab delimited text file. Working on the Human Gene 1.0 Gene ST Array and the relevant packages do not exist for the latest version of R. I will try and tweak it in my favour. > > Alternatively I also tried the solution provided by Gordon but encounter memory errors. Will have to try the same on a higher RAM Machine. > > Thanks and Regards, > Ekta > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 23 February 2012 19:55 > To: Ekta Jain > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA > > Hi Ekta, > > On 2/22/2012 10:06 PM, Ekta Jain wrote: >> Hi Jim, >> I am using an affymetrix chip data. I need to analyse my dataset for differential gene expression (LIMMA). Each gene can be referenced by multiple probesets and while performing LIMMA the expression values of these multiple probesets gets averaged and this averaged value is assigned to that gene. I need to be able to simply select the probeset with the highest expression value to represent a gene. >> >> LIMMA by default averages the probeset values. > This is not true. The limma package doesn't know or care that two > probesets are intended to interrogate the same gene, and doesn't do the > averaging that you think it does. You can't even do a mixed model, using > the 'duplicate' probesets because they aren't duplicates, and you don't > have the same number of probesets per gene. What limma does is make > univariate comparisons by-probeset, so if you have four probesets that > interrogate the same gene transcript, then you will do four tests. > > Now you could make the assumption (unfounded, IMO) that all the > probesets that are intended to measure a particular transcript are > really measuring the same thing, and then choose to use just one of them > based on some metric. As an example, you could use 'highest expression > value', which doesn't make any sense to me. > > To expound on that last statement, let's say you have two transcripts > that are purported to measure the same gene. Now let's further stipulate > that one of these probesets has really high expression (somewhere around > 2^14), but the expression isn't materially different between any of your > samples. In addition, the other probeset has almost undetectable > expression in one set of samples, but some middling expression (say > 2^8) in another set. Do you really want to throw out the latter probeset > in favor of the former? > > Now back to your question. If you want to pre-filter the data (again, > not recommended with the limma package, due to the empirical Bayes > estimator), you can use the findLargest() function in the genefilter > package. You have to supply a test statistic to this function, for which > you could use either the rowMean(), which will give you the highest > average expression, or you could do something like apply(exprs(eset),1 , > max) to get the maximum expression value. > > Best, > > Jim > > >> I am not sure if i need to modify any default settings in LIMMA or use another package. >> >> Thanks >> >> Regards, >> Ekta >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: 22 February 2012 19:26 >> To: Ekta [guest] >> Cc: bioconductor at r-project.org; Ekta Jain >> Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA >> >> Hi Ekta, >> >> On 2/21/2012 10:57 PM, Ekta [guest] wrote: >>> Hello All, >>> I am relatively new to R and bioconductor. I would like to know if there is a way to alter LIMMA defualt options such that the package instead of averaging signal intensities of probesets selects the probesets with highest baseline >>> expression/signal intensity? >> You will have to be more precise than that. What exactly do you mean by >> 'selects the probesets with highest baseline expression'? Do you just >> want any probesets where one or more samples has high expression? That >> doesn't require limma. Or do you want probesets where some of the >> samples have much higher expression than others? >> >> Best, >> >> Jim >> >> >>> Any help would be greatly appreciated. >>> >>> >>> >>> -- output of sessionInfo(): >>> >>>> sessionInfo() >>> R version 2.9.1 (2009-06-26) >>> i386-pc-mingw32 >>> >>> locale: >>> LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONET ARY=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] limma_2.18.3 >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email. >> www.jubl.com >> > The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email. > www.jubl.com >

ADD REPLY • link 14.0 years ago James W. MacDonald 68k

0

Entering edit mode

Dear Jim, Yes i figured that out soon enough for me to proceed with the analysis. I annotated my eset as follows: Library(annoatate) ID <- featureNames(eset) Symbol <- getSYMBOL(ID, "mogene10sttranscriptcluster.db") tmp <- data.frame(ID=ID, Symbol=Symbol) tmp[tmp=="NA"] <- NA fData(eset) <- tmp and used the following to filter on highest expression fit <- lmFit(eset, design) o <- order(fit$Amean, decreasing=TRUE) dup <- duplicated(fit$genes$Symbol[o]) fit.unique <- fit[o,][!dup,] Thanks very much for your help. Best, Ekta -----Original Message----- From: James W. MacDonald [mailto:jmacdon@uw.edu] Sent: 28 February 2012 03:22 To: Ekta Jain Cc: bioconductor at r-project.org Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA Hi Ekta, The relevant annotation packages do indeed exist for the Human Gene ST arrays on the latest version of R. Try source("http://www.bioconductor.org/biocLite.R") biocLite("hugene10sttranscriptcluster.db") You may also need to change the annotation of your ExpressionSet, so if you can do something like: annotation(eset) <- "hugene10sttranscriptcluster.db" Best, Jim On 2/27/12 3:10 AM, Ekta Jain wrote: > Hello Jim, > Thank you very much for your detailed reply. I did have some misconceptions about LIMMA indeed. I am not much in charge of the methodology in this case unfortunately and the requirement is to ignore the other expression values for probesets and only keep the probeset with maximum expression value for that gene symbol. > I am afraid i am unable to use the findLargest() function from the gene filter since it needs the ENTREZ ID annotation and i am using annotation from a tab delimited text file. Working on the Human Gene 1.0 Gene ST Array and the relevant packages do not exist for the latest version of R. I will try and tweak it in my favour. > > Alternatively I also tried the solution provided by Gordon but encounter memory errors. Will have to try the same on a higher RAM Machine. > > Thanks and Regards, > Ekta > > > -----Original Message----- > From: James W. MacDonald [mailto:jmacdon at uw.edu] > Sent: 23 February 2012 19:55 > To: Ekta Jain > Cc: bioconductor at r-project.org > Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA > > Hi Ekta, > > On 2/22/2012 10:06 PM, Ekta Jain wrote: >> Hi Jim, >> I am using an affymetrix chip data. I need to analyse my dataset for differential gene expression (LIMMA). Each gene can be referenced by multiple probesets and while performing LIMMA the expression values of these multiple probesets gets averaged and this averaged value is assigned to that gene. I need to be able to simply select the probeset with the highest expression value to represent a gene. >> >> LIMMA by default averages the probeset values. > This is not true. The limma package doesn't know or care that two > probesets are intended to interrogate the same gene, and doesn't do the > averaging that you think it does. You can't even do a mixed model, using > the 'duplicate' probesets because they aren't duplicates, and you don't > have the same number of probesets per gene. What limma does is make > univariate comparisons by-probeset, so if you have four probesets that > interrogate the same gene transcript, then you will do four tests. > > Now you could make the assumption (unfounded, IMO) that all the > probesets that are intended to measure a particular transcript are > really measuring the same thing, and then choose to use just one of them > based on some metric. As an example, you could use 'highest expression > value', which doesn't make any sense to me. > > To expound on that last statement, let's say you have two transcripts > that are purported to measure the same gene. Now let's further stipulate > that one of these probesets has really high expression (somewhere around > 2^14), but the expression isn't materially different between any of your > samples. In addition, the other probeset has almost undetectable > expression in one set of samples, but some middling expression (say > 2^8) in another set. Do you really want to throw out the latter probeset > in favor of the former? > > Now back to your question. If you want to pre-filter the data (again, > not recommended with the limma package, due to the empirical Bayes > estimator), you can use the findLargest() function in the genefilter > package. You have to supply a test statistic to this function, for which > you could use either the rowMean(), which will give you the highest > average expression, or you could do something like apply(exprs(eset),1 , > max) to get the maximum expression value. > > Best, > > Jim > > >> I am not sure if i need to modify any default settings in LIMMA or use another package. >> >> Thanks >> >> Regards, >> Ekta >> >> -----Original Message----- >> From: James W. MacDonald [mailto:jmacdon at uw.edu] >> Sent: 22 February 2012 19:26 >> To: Ekta [guest] >> Cc: bioconductor at r-project.org; Ekta Jain >> Subject: Re: [BioC] Use probesets with highest baseline expression for differntial gene expression in LIMMA >> >> Hi Ekta, >> >> On 2/21/2012 10:57 PM, Ekta [guest] wrote: >>> Hello All, >>> I am relatively new to R and bioconductor. I would like to know if there is a way to alter LIMMA defualt options such that the package instead of averaging signal intensities of probesets selects the probesets with highest baseline >>> expression/signal intensity? >> You will have to be more precise than that. What exactly do you mean by >> 'selects the probesets with highest baseline expression'? Do you just >> want any probesets where one or more samples has high expression? That >> doesn't require limma. Or do you want probesets where some of the >> samples have much higher expression than others? >> >> Best, >> >> Jim >> >> >>> Any help would be greatly appreciated. >>> >>> >>> >>> -- output of sessionInfo(): >>> >>>> sessionInfo() >>> R version 2.9.1 (2009-06-26) >>> i386-pc-mingw32 >>> >>> locale: >>> LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONET ARY=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252 >>> >>> attached base packages: >>> [1] stats graphics grDevices utils datasets methods base >>> >>> other attached packages: >>> [1] limma_2.18.3 >>> >>> -- >>> Sent via the guest posting facility at bioconductor.org. >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email. >> www.jubl.com >> > The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email. > www.jubl.com > The information contained in this electronic message and in any attachments to this message is confidential, legally privileged and intended only for use by the person or entity to which this electronic message is addressed. If you are not the intended recipient, and have received this message in error, please notify the sender and system manager by return email and delete the message and its attachments and also you are hereby notified that any distribution, copying, review, retransmission, dissemination or other use of this electronic transmission or the information contained in it is strictly prohibited. Please note that any views or opinions presented in this email are solely those of the author and may not represent those of the Company or bind the Company. Any commitments made over e-mail are not financially binding on the company unless accompanied or followed by a valid purchase order. This message has been scanned for viruses and dangerous content by Mail Scanner, and is believed to be clean. The Company accepts no liability for any damage caused by any virus transmitted by this email. www.jubl.com

ADD REPLY • link 14.0 years ago Ekta Jain ▴ 370

Login before adding your answer.