Use probesets with highest baseline expression for differential gene
1
0
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

Dear Ekta,

Jim as already pointed out that you have some incorrect perceptions about what limma does by default.

If you need to keep one probe for each gene symbol after a limma lmFit, and you want to choose the probe with highest average expression, it is easy to do like this.  I will assume that your linear model fit object is called 'fit', and your annotation includes a column called "Symbol" containing the gene symbol.

o <- order(fit$Amean, decreasing=TRUE)
dup <- duplicated(fit$genes$Symbol[o])
fit.unique <- fit[o,][!dup,]

Now your fit object fit.unique has only one row for each symbol.

This sort of filtering has been done in many papers when it is wished to match symbols across platforms, or to do gene set testing.

Best wishes
Gordon


------------------ original message ----------------
[BioC]  Use probesets with highest baseline expression for differntial
gene expression in LIMMA

Ekta Jain Ekta_Jain at jubilantbiosys.com
Thu Feb 23 04:06:09 CET 2012

Hi Jim,
I am using an affymetrix chip data. I need to analyse my dataset for
differential gene expression (LIMMA). Each gene can be referenced by
multiple probesets and while performing LIMMA the expression values of
these multiple probesets gets averaged and this averaged value is assigned
to that gene. I need to be able to simply select the probeset with the
highest expression value to represent a gene.

LIMMA by default averages the probeset values.

I am not sure if i need to modify any default settings in LIMMA or use
another package.

Thanks

Regards,
Ekta

 

Annotation probe limma • 4.1k views
ADD COMMENT
0
Entering edit mode
ying chen ▴ 340
@ying-chen-5085
Last seen 9.6 years ago
Hi guys, When I ran arrayQualityMetrics, I got a strange error message and no result was generated. I did a similar run with another dataset last night and had no problem. I cannot tell what went wrong today. Any suggestion? Thanks a lot for the help! Ying > library("affy") Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation("pkgname")'. > library("hgu133plus2hsentrezgcdf") > mydata <- ReadAffy(cdfname="hgu133plus2hsentrezgcdf") > mydata AffyBatch object size of arrays=1164x1164 features (150 kb) cdf=hgu133plus2hsentrezgcdf (18185 affyids) number of samples=353 number of genes=18185 annotation=hgu133plus2hsentrezgcdf notes= > hist(mydata) > boxplot(mydata,col="red") > library("arrayQualityMetrics") > arrayQualityMetrics(expressionset=mydata,do.logtransform=TRUE) The directory 'arrayQualityMetrics report for mydata' has been created. Error in cpSubs(src, dest) : 'dest' does not exist, and it cannot be created: arrayQualityMetrics report for mydata In addition: Warning message: In dir.create(dest) : cannot create dir 'arrayQualityMetrics report for mydata', reason 'No such file or directory' > > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] arrayQualityMetrics_3.10.0 hgu133plus2hsentrezgcdf_13.0.0 [3] affy_1.32.1 Biobase_2.14.0 loaded via a namespace (and not attached): [1] affyio_1.22.0 affyPLM_1.30.0 annotate_1.32.1 [4] AnnotationDbi_1.16.17 beadarray_2.4.1 BiocInstaller_1.2.1 [7] Biostrings_2.22.0 Cairo_1.5-1 cluster_1.14.2 [10] DBI_0.2-5 genefilter_1.36.0 grid_2.14.1 [13] Hmisc_3.9-2 hwriter_1.3 IRanges_1.12.6 [16] lattice_0.20-0 latticeExtra_0.6-19 limma_3.10.2 [19] preprocessCore_1.16.0 RColorBrewer_1.0-5 RSQLite_0.11.1 [22] setRNG_2009.11-1 splines_2.14.1 survival_2.36-12 [25] SVGAnnotation_0.9-0 tcltk_2.14.1 tools_2.14.1 [28] vsn_3.22.0 XML_3.9-4 xtable_1.7-0 [31] zlibbioc_1.0.0 > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Dear Ying Something funny is going on with your filesystem. What is the output of 'getwd()' and 'dir()' after the error is thrown? Best wishes Wolfgang Feb/24/12 8:54 PM, ying chen scripsit:: > > > Hi guys, > > When I ran arrayQualityMetrics, I got a strange error message and no result was generated. I did a similar run with another dataset last night and had no problem. I cannot tell what went wrong today. Any suggestion? > > Thanks a lot for the help! > > Ying > > >> library("affy") > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'browseVignettes()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation("pkgname")'. > >> library("hgu133plus2hsentrezgcdf") >> mydata<- ReadAffy(cdfname="hgu133plus2hsentrezgcdf") >> mydata > AffyBatch object > size of arrays=1164x1164 features (150 kb) > cdf=hgu133plus2hsentrezgcdf (18185 affyids) > number of samples=353 > number of genes=18185 > annotation=hgu133plus2hsentrezgcdf > notes= >> hist(mydata) >> boxplot(mydata,col="red") >> library("arrayQualityMetrics") >> arrayQualityMetrics(expressionset=mydata,do.logtransform=TRUE) > The directory 'arrayQualityMetrics report for mydata' has been created. > Error in cpSubs(src, dest) : > 'dest' does not exist, and it cannot be created: arrayQualityMetrics report for mydata > In addition: Warning message: > In dir.create(dest) : > cannot create dir 'arrayQualityMetrics report for mydata', reason 'No such file or directory' >> >> sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] arrayQualityMetrics_3.10.0 hgu133plus2hsentrezgcdf_13.0.0 > [3] affy_1.32.1 Biobase_2.14.0 > > loaded via a namespace (and not attached): > [1] affyio_1.22.0 affyPLM_1.30.0 annotate_1.32.1 > [4] AnnotationDbi_1.16.17 beadarray_2.4.1 BiocInstaller_1.2.1 > [7] Biostrings_2.22.0 Cairo_1.5-1 cluster_1.14.2 > [10] DBI_0.2-5 genefilter_1.36.0 grid_2.14.1 > [13] Hmisc_3.9-2 hwriter_1.3 IRanges_1.12.6 > [16] lattice_0.20-0 latticeExtra_0.6-19 limma_3.10.2 > [19] preprocessCore_1.16.0 RColorBrewer_1.0-5 RSQLite_0.11.1 > [22] setRNG_2009.11-1 splines_2.14.1 survival_2.36-12 > [25] SVGAnnotation_0.9-0 tcltk_2.14.1 tools_2.14.1 > [28] vsn_3.22.0 XML_3.9-4 xtable_1.7-0 > [31] zlibbioc_1.0.0 >> > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Best wishes Wolfgang Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD REPLY

Login before adding your answer.

Traffic: 857 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6