Question

limma topTable annotation

0

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 2 minutes ago

WEHI, Melbourne, Australia

Dear David, You have two choices of how to do this in limma. You can either add the gene symbols (and GO category or anything else) into the fit object, or alternatively you can add the symbols into the eset data object right at the beginning. In either case, the gene symbols will appear automatically in any topTable output. First option: fit<-lmFit(meset,design) library(annotate) fit$genes$Symbol <- getSYMBOL(fit$genes$ID,"hgu133plus2.db") Second option: meset<-rma(mdat) library(annotate) ID <- featureNames(meset) Symbol <- getSYMBOL(ID,"hgu133plus2.db") fData(meset) <- data.frame(ID=ID,Symbol=Symbol) Note that you don't need to to both. Best wishes Gordon --------------------------------------------- Professor Gordon K Smyth, NHMRC Senior Research Fellow, Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3052, Australia. smyth at wehi.edu.au http://www.wehi.edu.au http://www.statsci.org/smyth > Date: Tue, 15 Feb 2011 21:18:13 +0000 > From: David Iles <d.e.iles at="" leeds.ac.uk=""> > To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> > Subject: [BioC] limma topTable annotation > > Hi Folks, > > As a relative newcomer to BioC, having spent most of the last 28 years > at the bench, I am finally getting my head round R programming. I've > been using limma and affy to analyse a fairly chunky (and expensive!) > Affymetrix hgu133plus2 data set and have been successful in generating > topTable results that actually make sense. Always good when an > experiment works (or seems to!) > > One thing that has completely stumped me so far though, despite > extensive vignette and email string searches, and attempts to adapt code > written for Agilent single channel data, is how to 'automatically' > include gene names, symbols, GO IDs etc in the topTable output. While it > may be easy enough to use mget to retrieve the necessary info for small > numbers of probesets, it gets tedious when one needs either to > cut-and-paste long lists of affy IDs into DAVID, or convert them to long > lines, with each probeset ID flanked by " ", which is what I have done > so far. > > Since there is such a rich repository of data in hgu133plus2.db, there > must be a way to tap into this without going 'outside' limma. Can anyone > suggest how to do this? I'd be most grateful. > > Code (comments on errors/shortcuts etc appreciated) and session info > below. > > Thanks. > > Dr David Iles > Institute for Integrative and Comparative Biology > University of Leeds > Leeds LS2 9JT > > d.e.iles at leeds.ac.uk > > The experiment is designed to detect differences in muscle gene > expression between patients with a myopathy (S) and controls (N), and > also how gender affects these differences. > >> library(affy) > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > >> library(limma) >> library(hgu133plus2.db) > Loading required package: AnnotationDbi > Loading required package: org.Hs.eg.db > Loading required package: DBI >> library(hgu133plus2cdf) >> mtargets<-readTargets("/Users/daveiles/Documents/R/muscle_data/musc ledat/mustargets.txt") >> mtargets > filename phen gen > 1 DF1U133plus25222M.CEL S M > 2 DF1U133plus25526M.CEL S F > 3 DF2U133plus22264M.CEL S M > 4 DF2U133plus22341M.CEL N M > 5 DF2U133plus22469M.CEL S M > 6 DF2U133plus22539M.CEL S M > 7 DF2U133plus22632M.CEL N F > 8 DF2U133plus23490M.CEL N F > 9 DF2U133plus23690M.CEL S M > 10 DF2U133plus24018M.CEL S M > > # plus 49 others, deleted for brevity > >> mdat<-ReadAffy(widget=T) > Loading required package: tkWidgets > Loading required package: widgetTools > Loading required package: tcltk > Loading Tcl/Tk interface ... done > Loading required package: DynDoc > Loading required package: tools >> meset<-rma(mdat) > Background correcting > Normalizing > Calculating Expression >> mphengen<-paste(mtargets$phen,mtargets$gen,sep=".") >> mphengen > [1] "S.M" "S.F" "S.M" "N.M" "S.M" "S.M" "N.F" "N.F" "S.M" "S.M" "N.F" "S.M" > > # etc - deleted for brevity > >> mphengen<-factor(mphengen,levels=c("S.M","S.F","N.M","N.F")) >> design<-model.matrix(~0+mphengen) >> colnames(design)<-levels(mphengen) >> fit<-lmFit(meset,design) >> fit<-eBayes(fit) > > # 1. influence of genotype within each phenotype >> cont.matrix <- makeContrasts(S.MvsF=S.M-S.F, N.MvsF=N.M-N.F, Diff=(N.M-N.F)-(S.M-S.F), levels=design) >> fit2<-contrasts.fit(fit,cont.matrix) >> fit2<-eBayes(fit2) >> msMvrfes<-topTable(fit2,coef="S.MvsF",n=100,adjust="BH") >> mnMvrfes<-topTable(fit2,coef="N.MvsF",n=100,adjust="BH") >> write.table(msMvrfes,file="mS_MvFres.txt") >> write.table(mnMvrfes,file="mN_MvFres.txt") > > # 2. influence of phenotype within each genotype >> cont.matrix1 <- makeContrasts(M.SvsN=S.M-N.M, F.SvsN=S.F-N.F, Diff=(S.M-S.F)-(N.M-N.F), levels=design) >> fit3<-contrasts.fit(fit,cont.matrix1) >> fit3<-eBayes(fit3) >> mM_SvsNres<-topTable(fit3,coef="M.SvsN",n=100,adjust="BH") >> mF_SvsNres<-topTable(fit3,coef="F.SvsN",n=100,adjust="BH") >> write.table(mM_SvsNres,file="mM_SvsNres.txt") >> write.table(mF_SvsNres,file="mF_SvsNres.txt") > >> sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.6.9 affy_1.28.0 hgu133plus2cdf_2.7.0 > [4] hgu133plus2.db_2.4.5 org.Hs.eg.db_2.4.6 RSQLite_0.9-4 > [7] DBI_0.2-5 AnnotationDbi_1.12.0 Biobase_2.10.0 > > loaded via a namespace (and not attached): > [1] affyio_1.18.0 preprocessCore_1.12.0 tools_2.12.0 ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

GO hgu133plus2 affy limma convert Category GO hgu133plus2 affy limma convert Category • 2.7k views

ADD COMMENT • link updated 13.2 years ago by David Iles ▴ 130 • written 13.2 years ago by Gordon Smyth 50k

score 0 · Answer 1 · 2011-02-17

Hi Gordon, That's exactly what I hoped one cold do. Thanks for the tip. Regards Dave Dr David Iles Institute for Integrative and Comparative Biology University of Leeds Leeds LS2 9JT d.e.iles at leeds.ac.uk On 16 Feb 2011, at 22:38, Gordon K Smyth wrote: > Dear David, > > You have two choices of how to do this in limma. You can either add the > gene symbols (and GO category or anything else) into the fit object, or > alternatively you can add the symbols into the eset data object right at > the beginning. In either case, the gene symbols will appear automatically > in any topTable output. > > First option: > > fit<-lmFit(meset,design) > library(annotate) > fit$genes$Symbol <- getSYMBOL(fit$genes$ID,"hgu133plus2.db") > > Second option: > > meset<-rma(mdat) > library(annotate) > ID <- featureNames(meset) > Symbol <- getSYMBOL(ID,"hgu133plus2.db") > fData(meset) <- data.frame(ID=ID,Symbol=Symbol) > > Note that you don't need to to both. > > Best wishes > Gordon > > --------------------------------------------- > Professor Gordon K Smyth, > NHMRC Senior Research Fellow, > Bioinformatics Division, > Walter and Eliza Hall Institute of Medical Research, > 1G Royal Parade, Parkville, Vic 3052, Australia. > smyth at wehi.edu.au > http://www.wehi.edu.au > http://www.statsci.org/smyth > > >> Date: Tue, 15 Feb 2011 21:18:13 +0000 >> From: David Iles <d.e.iles at="" leeds.ac.uk=""> >> To: "bioconductor at r-project.org" <bioconductor at="" r-project.org=""> >> Subject: [BioC] limma topTable annotation >> >> Hi Folks, >> >> As a relative newcomer to BioC, having spent most of the last 28 years >> at the bench, I am finally getting my head round R programming. I've >> been using limma and affy to analyse a fairly chunky (and expensive!) >> Affymetrix hgu133plus2 data set and have been successful in generating >> topTable results that actually make sense. Always good when an >> experiment works (or seems to!) >> >> One thing that has completely stumped me so far though, despite >> extensive vignette and email string searches, and attempts to adapt code >> written for Agilent single channel data, is how to 'automatically' >> include gene names, symbols, GO IDs etc in the topTable output. While it >> may be easy enough to use mget to retrieve the necessary info for small >> numbers of probesets, it gets tedious when one needs either to >> cut-and-paste long lists of affy IDs into DAVID, or convert them to long >> lines, with each probeset ID flanked by " ", which is what I have done >> so far. >> >> Since there is such a rich repository of data in hgu133plus2.db, there >> must be a way to tap into this without going 'outside' limma. Can anyone >> suggest how to do this? I'd be most grateful. >> >> Code (comments on errors/shortcuts etc appreciated) and session info >> below. >> >> Thanks. >> >> Dr David Iles >> Institute for Integrative and Comparative Biology >> University of Leeds >> Leeds LS2 9JT >> >> d.e.iles at leeds.ac.uk >> >> The experiment is designed to detect differences in muscle gene >> expression between patients with a myopathy (S) and controls (N), and >> also how gender affects these differences. >> >>> library(affy) >> Loading required package: Biobase >> >> Welcome to Bioconductor >> >> Vignettes contain introductory material. To view, type >> 'openVignette()'. To cite Bioconductor, see >> 'citation("Biobase")' and for packages 'citation(pkgname)'. >> >>> library(limma) >>> library(hgu133plus2.db) >> Loading required package: AnnotationDbi >> Loading required package: org.Hs.eg.db >> Loading required package: DBI >>> library(hgu133plus2cdf) >>> mtargets<-readTargets("/Users/daveiles/Documents/R/muscle_data/mus cledat/mustargets.txt") >>> mtargets >> filename phen gen >> 1 DF1U133plus25222M.CEL S M >> 2 DF1U133plus25526M.CEL S F >> 3 DF2U133plus22264M.CEL S M >> 4 DF2U133plus22341M.CEL N M >> 5 DF2U133plus22469M.CEL S M >> 6 DF2U133plus22539M.CEL S M >> 7 DF2U133plus22632M.CEL N F >> 8 DF2U133plus23490M.CEL N F >> 9 DF2U133plus23690M.CEL S M >> 10 DF2U133plus24018M.CEL S M >> >> # plus 49 others, deleted for brevity >> >>> mdat<-ReadAffy(widget=T) >> Loading required package: tkWidgets >> Loading required package: widgetTools >> Loading required package: tcltk >> Loading Tcl/Tk interface ... done >> Loading required package: DynDoc >> Loading required package: tools >>> meset<-rma(mdat) >> Background correcting >> Normalizing >> Calculating Expression >>> mphengen<-paste(mtargets$phen,mtargets$gen,sep=".") >>> mphengen >> [1] "S.M" "S.F" "S.M" "N.M" "S.M" "S.M" "N.F" "N.F" "S.M" "S.M" "N.F" "S.M" >> >> # etc - deleted for brevity >> >>> mphengen<-factor(mphengen,levels=c("S.M","S.F","N.M","N.F")) >>> design<-model.matrix(~0+mphengen) >>> colnames(design)<-levels(mphengen) >>> fit<-lmFit(meset,design) >>> fit<-eBayes(fit) >> >> # 1. influence of genotype within each phenotype >>> cont.matrix <- makeContrasts(S.MvsF=S.M-S.F, N.MvsF=N.M-N.F, Diff=(N.M-N.F)-(S.M-S.F), levels=design) >>> fit2<-contrasts.fit(fit,cont.matrix) >>> fit2<-eBayes(fit2) >>> msMvrfes<-topTable(fit2,coef="S.MvsF",n=100,adjust="BH") >>> mnMvrfes<-topTable(fit2,coef="N.MvsF",n=100,adjust="BH") >>> write.table(msMvrfes,file="mS_MvFres.txt") >>> write.table(mnMvrfes,file="mN_MvFres.txt") >> >> # 2. influence of phenotype within each genotype >>> cont.matrix1 <- makeContrasts(M.SvsN=S.M-N.M, F.SvsN=S.F-N.F, Diff=(S.M-S.F)-(N.M-N.F), levels=design) >>> fit3<-contrasts.fit(fit,cont.matrix1) >>> fit3<-eBayes(fit3) >>> mM_SvsNres<-topTable(fit3,coef="M.SvsN",n=100,adjust="BH") >>> mF_SvsNres<-topTable(fit3,coef="F.SvsN",n=100,adjust="BH") >>> write.table(mM_SvsNres,file="mM_SvsNres.txt") >>> write.table(mF_SvsNres,file="mF_SvsNres.txt") >> >>> sessionInfo() >> R version 2.12.0 (2010-10-15) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] limma_3.6.9 affy_1.28.0 hgu133plus2cdf_2.7.0 >> [4] hgu133plus2.db_2.4.5 org.Hs.eg.db_2.4.6 RSQLite_0.9-4 >> [7] DBI_0.2-5 AnnotationDbi_1.12.0 Biobase_2.10.0 >> >> loaded via a namespace (and not attached): >> [1] affyio_1.18.0 preprocessCore_1.12.0 tools_2.12.0 > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:6}}