limma topTable annotation

0

Entering edit mode

David Iles ▴ 130

@david-iles-4487

Last seen 9.2 years ago

Hi Folks, As a relative newcomer to BioC, having spent most of the last 28 years at the bench, I am finally getting my head round R programming. I've been using limma and affy to analyse a fairly chunky (and expensive!) Affymetrix hgu133plus2 data set and have been successful in generating topTable results that actually make sense. Always good when an experiment works (or seems to!) One thing that has completely stumped me so far though, despite extensive vignette and email string searches, and attempts to adapt code written for Agilent single channel data, is how to 'automatically' include gene names, symbols, GO IDs etc in the topTable output. While it may be easy enough to use mget to retrieve the necessary info for small numbers of probesets, it gets tedious when one needs either to cut-and-paste long lists of affy IDs into DAVID, or convert them to long lines, with each probeset ID flanked by " ", which is what I have done so far. Since there is such a rich repository of data in hgu133plus2.db, there must be a way to tap into this without going 'outside' limma. Can anyone suggest how to do this? I'd be most grateful. Code (comments on errors/shortcuts etc appreciated) and session info below. Thanks. Dr David Iles Institute for Integrative and Comparative Biology University of Leeds Leeds LS2 9JT d.e.iles at leeds.ac.uk The experiment is designed to detect differences in muscle gene expression between patients with a myopathy (S) and controls (N), and also how gender affects these differences. > library(affy) Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material. To view, type 'openVignette()'. To cite Bioconductor, see 'citation("Biobase")' and for packages 'citation(pkgname)'. > library(limma) > library(hgu133plus2.db) Loading required package: AnnotationDbi Loading required package: org.Hs.eg.db Loading required package: DBI > library(hgu133plus2cdf) > mtargets<-readTargets("/Users/daveiles/Documents/R/muscle_data/muscl edat/mustargets.txt") > mtargets filename phen gen 1 DF1U133plus25222M.CEL S M 2 DF1U133plus25526M.CEL S F 3 DF2U133plus22264M.CEL S M 4 DF2U133plus22341M.CEL N M 5 DF2U133plus22469M.CEL S M 6 DF2U133plus22539M.CEL S M 7 DF2U133plus22632M.CEL N F 8 DF2U133plus23490M.CEL N F 9 DF2U133plus23690M.CEL S M 10 DF2U133plus24018M.CEL S M # plus 49 others, deleted for brevity > mdat<-ReadAffy(widget=T) Loading required package: tkWidgets Loading required package: widgetTools Loading required package: tcltk Loading Tcl/Tk interface ... done Loading required package: DynDoc Loading required package: tools > meset<-rma(mdat) Background correcting Normalizing Calculating Expression > mphengen<-paste(mtargets$phen,mtargets$gen,sep=".") > mphengen [1] "S.M" "S.F" "S.M" "N.M" "S.M" "S.M" "N.F" "N.F" "S.M" "S.M" "N.F" "S.M" # etc - deleted for brevity > mphengen<-factor(mphengen,levels=c("S.M","S.F","N.M","N.F")) > design<-model.matrix(~0+mphengen) > colnames(design)<-levels(mphengen) > fit<-lmFit(meset,design) > fit<-eBayes(fit) # 1. influence of genotype within each phenotype > cont.matrix <- makeContrasts(S.MvsF=S.M-S.F, N.MvsF=N.M-N.F, Diff=(N.M-N.F)-(S.M-S.F), levels=design) > fit2<-contrasts.fit(fit,cont.matrix) > fit2<-eBayes(fit2) > msMvrfes<-topTable(fit2,coef="S.MvsF",n=100,adjust="BH") > mnMvrfes<-topTable(fit2,coef="N.MvsF",n=100,adjust="BH") > write.table(msMvrfes,file="mS_MvFres.txt") > write.table(mnMvrfes,file="mN_MvFres.txt") # 2. influence of phenotype within each genotype > cont.matrix1 <- makeContrasts(M.SvsN=S.M-N.M, F.SvsN=S.F-N.F, Diff=(S.M-S.F)-(N.M-N.F), levels=design) > fit3<-contrasts.fit(fit,cont.matrix1) > fit3<-eBayes(fit3) > mM_SvsNres<-topTable(fit3,coef="M.SvsN",n=100,adjust="BH") > mF_SvsNres<-topTable(fit3,coef="F.SvsN",n=100,adjust="BH") > write.table(mM_SvsNres,file="mM_SvsNres.txt") > write.table(mF_SvsNres,file="mF_SvsNres.txt") > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] limma_3.6.9 affy_1.28.0 hgu133plus2cdf_2.7.0 [4] hgu133plus2.db_2.4.5 org.Hs.eg.db_2.4.6 RSQLite_0.9-4 [7] DBI_0.2-5 AnnotationDbi_1.12.0 Biobase_2.10.0 loaded via a namespace (and not attached): [1] affyio_1.18.0 preprocessCore_1.12.0 tools_2.12.0 >

GO hgu133plus2 affy limma convert GO hgu133plus2 affy limma convert • 806 views

ADD COMMENT • link updated 13.2 years ago by James W. MacDonald 65k • written 13.2 years ago by David Iles ▴ 130

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 4 hours ago

United States

Hi David, On 2/15/2011 4:18 PM, David Iles wrote: > Hi Folks, > > As a relative newcomer to BioC, having spent most of the last 28 years at the bench, I am finally getting my head round R programming. I've been using limma and affy to analyse a fairly chunky (and expensive!) Affymetrix hgu133plus2 data set and have been successful in generating topTable results that actually make sense. Always good when an experiment works (or seems to!) > > One thing that has completely stumped me so far though, despite extensive vignette and email string searches, and attempts to adapt code written for Agilent single channel data, is how to 'automatically' include gene names, symbols, GO IDs etc in the topTable output. While it may be easy enough to use mget to retrieve the necessary info for small numbers of probesets, it gets tedious when one needs either to cut-and-paste long lists of affy IDs into DAVID, or convert them to long lines, with each probeset ID flanked by " ", which is what I have done so far. > > Since there is such a rich repository of data in hgu133plus2.db, there must be a way to tap into this without going 'outside' limma. Can anyone suggest how to do this? I'd be most grateful. > Well, there isn't code in limma to do this, so by default you will have to go outside of limma to do this. That said, the limma2annaffy() function (or probes2table()) in affycoretools may be to your liking. You can output both html and text files, so cutting and pasting into DAVID should be fairly simple. Best, Jim > Code (comments on errors/shortcuts etc appreciated) and session info below. > > Thanks. > > Dr David Iles > Institute for Integrative and Comparative Biology > University of Leeds > Leeds LS2 9JT > > d.e.iles at leeds.ac.uk > > The experiment is designed to detect differences in muscle gene expression between patients with a myopathy (S) and controls (N), and also how gender affects these differences. > >> library(affy) > Loading required package: Biobase > > Welcome to Bioconductor > > Vignettes contain introductory material. To view, type > 'openVignette()'. To cite Bioconductor, see > 'citation("Biobase")' and for packages 'citation(pkgname)'. > >> library(limma) >> library(hgu133plus2.db) > Loading required package: AnnotationDbi > Loading required package: org.Hs.eg.db > Loading required package: DBI >> library(hgu133plus2cdf) >> mtargets<-readTargets("/Users/daveiles/Documents/R/muscle_data/musc ledat/mustargets.txt") >> mtargets > filename phen gen > 1 DF1U133plus25222M.CEL S M > 2 DF1U133plus25526M.CEL S F > 3 DF2U133plus22264M.CEL S M > 4 DF2U133plus22341M.CEL N M > 5 DF2U133plus22469M.CEL S M > 6 DF2U133plus22539M.CEL S M > 7 DF2U133plus22632M.CEL N F > 8 DF2U133plus23490M.CEL N F > 9 DF2U133plus23690M.CEL S M > 10 DF2U133plus24018M.CEL S M > > # plus 49 others, deleted for brevity > >> mdat<-ReadAffy(widget=T) > Loading required package: tkWidgets > Loading required package: widgetTools > Loading required package: tcltk > Loading Tcl/Tk interface ... done > Loading required package: DynDoc > Loading required package: tools >> meset<-rma(mdat) > Background correcting > Normalizing > Calculating Expression >> mphengen<-paste(mtargets$phen,mtargets$gen,sep=".") >> mphengen > [1] "S.M" "S.F" "S.M" "N.M" "S.M" "S.M" "N.F" "N.F" "S.M" "S.M" "N.F" "S.M" > > # etc - deleted for brevity > >> mphengen<-factor(mphengen,levels=c("S.M","S.F","N.M","N.F")) >> design<-model.matrix(~0+mphengen) >> colnames(design)<-levels(mphengen) >> fit<-lmFit(meset,design) >> fit<-eBayes(fit) > > # 1. influence of genotype within each phenotype >> cont.matrix<- makeContrasts(S.MvsF=S.M-S.F, N.MvsF=N.M-N.F, Diff=(N.M-N.F)-(S.M-S.F), levels=design) >> fit2<-contrasts.fit(fit,cont.matrix) >> fit2<-eBayes(fit2) >> msMvrfes<-topTable(fit2,coef="S.MvsF",n=100,adjust="BH") >> mnMvrfes<-topTable(fit2,coef="N.MvsF",n=100,adjust="BH") >> write.table(msMvrfes,file="mS_MvFres.txt") >> write.table(mnMvrfes,file="mN_MvFres.txt") > > # 2. influence of phenotype within each genotype >> cont.matrix1<- makeContrasts(M.SvsN=S.M-N.M, F.SvsN=S.F-N.F, Diff=(S.M-S.F)-(N.M-N.F), levels=design) >> fit3<-contrasts.fit(fit,cont.matrix1) >> fit3<-eBayes(fit3) >> mM_SvsNres<-topTable(fit3,coef="M.SvsN",n=100,adjust="BH") >> mF_SvsNres<-topTable(fit3,coef="F.SvsN",n=100,adjust="BH") >> write.table(mM_SvsNres,file="mM_SvsNres.txt") >> write.table(mF_SvsNres,file="mF_SvsNres.txt") > >> sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] limma_3.6.9 affy_1.28.0 hgu133plus2cdf_2.7.0 > [4] hgu133plus2.db_2.4.5 org.Hs.eg.db_2.4.6 RSQLite_0.9-4 > [7] DBI_0.2-5 AnnotationDbi_1.12.0 Biobase_2.10.0 > > loaded via a namespace (and not attached): > [1] affyio_1.18.0 preprocessCore_1.12.0 tools_2.12.0 >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 13.2 years ago James W. MacDonald 65k

Login before adding your answer.