Question

Romer and symbols2indices query

0

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 11 hours ago

WEHI, Melbourne, Australia

Dear Loren, I don't understand why you would want to read in a gmt file from the Broad Institute rather than use the curated rdata files that we provide for use with romer. The raw gmt files contain a mix of gene symbols of different species and a mix of official and non-official symbols. So one can't expect to match the symbols you get from a raw gmt file to your own data with any reliability. To construct the rdata files, we have carefully converted all gene aliases to official symbols and have mapped mouse to human and human to mouse orthologs. This is the reason why we don't provide a read.gmt() function in limma, or a pre-made pipeline from the GSEABase read functions. We don't want you to get unreliable results simply because the gene symbols haven't been curated. Best wishes Gordon > Date: Tue, 04 May 2010 07:43:46 -0700 > From: Loren Engrav <engrav at="" u.washington.edu=""> > To: rbioc <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] Romer and symbols2indices query > Message-ID: <c80580b2.27d83%engrav at="" u.washington.edu=""> > Content-Type: text/plain; charset="US-ASCII" > > Thank you, got it > > Downloading rdata objects saves reading them into an rdata object, cool > > But for interest, in R/GSA there is > GSA.read.gmt(filename.gmt) to read in a .gmt file > > Does limma or romer have an equivalent function? > > >> From: Matthew Ritchie <mritchie at="" wehi.edu.au=""> >> Date: Tue, 4 May 2010 14:44:23 +1000 (EST) >> To: Loren Engrav <engrav at="" u.washington.edu=""> >> Cc: rbioc <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] Romer and symbols2indices query >> >> Dear Loren, >> >> You can find rdata objects of the Broad's MSigDB gene sets at >> >> http://bioinf.wehi.edu.au/software/MSigDB/index.html >> >> You are right, the 'symbols' argument in the function symbols2indicies() >> are the gene symbols corresponding to the probes from your microarray >> data. >> >> For example, to use the human C2 collection, download the rdata file, then >> run the following. >> >> load("human_c2.rdata") >> c2 = symbols2indices(Hs.gmtl.c2, symbols) >> >> (this assumes 'symbols' is a vector containing the gene symbols from your >> array data) >> >> Best wishes, >> >> Matt >> >>> Have done GSEA and GSA for set enrichment and am setting out to try romer >>> and have probably "simple" question >>> >>> To get the Broad set into a list of indices there is >>> symbols2indices(gmtl.official, symbols) but >>> >>> 1)how do I get the Broad set into gmtl.official? And >>> 2)is symbols a vector of MY probe sets of interest? >>> >>> I checked gmane and found only one comment about romer >>> Also checked limma reference pdf >>> >>> Thank you ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}

probe limma GSEABase probe limma GSEABase • 2.1k views

ADD COMMENT • link 15.5 years ago Gordon Smyth 53k

score 0 · Answer 1 · 2010-05-05

Thank you for the response Well maybe I don't, and maybe I shouldn't. My thought was that tomorrow or day after or ??? there will be a new version of the .gmt file and it would be useful to just be able to quickly rerun things. But maybe that is faulty logic, maybe the .gmt files do not change that often. And I thought the c2 set was curated. And then simple curiosity. And I am also running GSEA, GSA, GSEA in MEV so seemed best to keep the files similar. However... I have now run romer and romer2 at weeks 1 2 3 12 and 20 as below but have not perused the results. And I have only ~2000 genes of interest so romer does not take very long so can easily run again with the files you mention. But I am searching for the files you mention in help(romer) and the 3 pdfs and have missed them. Where may I find them? Or are they the files Matthew mentioned below at <http: bioinf.wehi.edu.au="" software="" msigdb="" index.html="">? I also have the other two questions in the "Romer warning serious? and nrot=9999?" thread. Thank you again for the response ===================== romerDesign <- model.matrix(~ 0+factor(c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9,10,10 ,10))) colnames(romerDesign) <- c("DW1", "YW1", "DW2", "YW2", "DW3", "YW3", "DW12","YW12", "DW20", "YW20") romerWkxcontrast.matrix <- makeContrasts(YWx-DWx, levels=romerDesign) romerLR <- read.delim (file="romer1842LR.txt", header= FALSE, sep = "\t") romerLRmatrix<- as.matrix (romerLR) romerSymbols <- GSA1842symbolsvector Broad_c2.all.v2.5.symbols.gmt <- getGmt("c2.all.v2.5.symbols.gmt", collectionType=BroadCollection(category="c2"), geneIdType=SymbolIdentifier()) Broad_c2.all.v2.5.symbols.gmtList <- geneIds(Broad_c2.all.v2.5.symbols.gmt) names(Broad_c2.all.v2.5.symbols.gmtList) <- names(Broad_c2.all.v2.5.symbols.gmt) Broad_c2.all.v2.5.symbols.gmtIndices = symbols2indices(Broad_c2.all.v2.5.symbols.gmtList, romerSymbols) romerResultWkx <- romer(Broad_c2.all.v2.5.symbols.gmtIndices,romerLRmatrix,romerDesign,c ontras t=romerWxcontrast.matrix,array.weights=NULL,block=NULL,correlation,flo or=FAL SE,nrot=1000) romer2ResultWkx <- romer2(Broad_c2.all.v2.5.symbols.gmtIndices,romerLRmatrix,romerDesign, contra st=romerWxcontrast.matrix,array.weights=NULL,block=NULL,correlation,nr ot=100 0) > From: Gordon K Smyth <smyth at="" wehi.edu.au=""> > Date: Thu, 6 May 2010 09:26:50 +1000 (AUS Eastern Standard Time) > To: Loren Engrav <engrav at="" u.washington.edu=""> > Cc: Yifang Hu <hu at="" wehi.edu.au="">, rbioc <bioconductor at="" stat.math.ethz.ch=""> > Subject: [BioC] Romer and symbols2indices query > > Dear Loren, > > I don't understand why you would want to read in a gmt file from the Broad > Institute rather than use the curated rdata files that we provide for use > with romer. The raw gmt files contain a mix of gene symbols of different > species and a mix of official and non-official symbols. So one can't > expect to match the symbols you get from a raw gmt file to your own data > with any reliability. To construct the rdata files, we have carefully > converted all gene aliases to official symbols and have mapped mouse to > human and human to mouse orthologs. > > This is the reason why we don't provide a read.gmt() function in limma, or > a pre-made pipeline from the GSEABase read functions. We don't want you > to get unreliable results simply because the gene symbols haven't been > curated. > > Best wishes > Gordon > >> Date: Tue, 04 May 2010 07:43:46 -0700 >> From: Loren Engrav <engrav at="" u.washington.edu=""> >> To: rbioc <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] Romer and symbols2indices query >> Message-ID: <c80580b2.27d83%engrav at="" u.washington.edu=""> >> Content-Type: text/plain; charset="US-ASCII" >> >> Thank you, got it >> >> Downloading rdata objects saves reading them into an rdata object, cool >> >> But for interest, in R/GSA there is >> GSA.read.gmt(filename.gmt) to read in a .gmt file >> >> Does limma or romer have an equivalent function? >> >> >>> From: Matthew Ritchie <mritchie at="" wehi.edu.au=""> >>> Date: Tue, 4 May 2010 14:44:23 +1000 (EST) >>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>> Cc: rbioc <bioconductor at="" stat.math.ethz.ch=""> >>> Subject: Re: [BioC] Romer and symbols2indices query >>> >>> Dear Loren, >>> >>> You can find rdata objects of the Broad's MSigDB gene sets at >>> >>> http://bioinf.wehi.edu.au/software/MSigDB/index.html >>> >>> You are right, the 'symbols' argument in the function symbols2indicies() >>> are the gene symbols corresponding to the probes from your microarray >>> data. >>> >>> For example, to use the human C2 collection, download the rdata file, then >>> run the following. >>> >>> load("human_c2.rdata") >>> c2 = symbols2indices(Hs.gmtl.c2, symbols) >>> >>> (this assumes 'symbols' is a vector containing the gene symbols from your >>> array data) >>> >>> Best wishes, >>> >>> Matt >>> >>>> Have done GSEA and GSA for set enrichment and am setting out to try romer >>>> and have probably "simple" question >>>> >>>> To get the Broad set into a list of indices there is >>>> symbols2indices(gmtl.official, symbols) but >>>> >>>> 1)how do I get the Broad set into gmtl.official? And >>>> 2)is symbols a vector of MY probe sets of interest? >>>> >>>> I checked gmane and found only one comment about romer >>>> Also checked limma reference pdf >>>> >>>> Thank you > > ______________________________________________________________________ > The information in this email is confidential and inte...{{dropped:7}}

score 0 · Answer 2 · 2010-05-07

Dear Loren, Don't forget to run MYsymbolsOfficial <- alias2SymbolsTable(MYsymbols,species="Hs") (choose the appropriate species) before running X <- symbols2indices (ListOfGeneSets, MYsymbolOfficial) otherwise you may miss many matches. Matching by gene symbol is unreliable unless you convert everything to current official symbols. Best wishes Gordon > Date: Wed, 05 May 2010 08:15:04 -0700 > From: Loren Engrav <engrav at="" u.washington.edu=""> > To: rbioc <bioconductor at="" stat.math.ethz.ch=""> > Subject: Re: [BioC] Romer and symbols2indices query > Message-ID: <c806d988.27e94%engrav at="" u.washington.edu=""> > Content-Type: text/plain; charset="US-ASCII" > > > Bingo, thank you > and romer ran > > These missing little tidbits can be brutal > >> From: Martin Morgan <mtmorgan at="" fhcrc.org=""> >> Date: Wed, 05 May 2010 05:11:02 -0700 >> To: Loren Engrav <engrav at="" u.washington.edu=""> >> Cc: rbioc <bioconductor at="" stat.math.ethz.ch=""> >> Subject: Re: [BioC] Romer and symbols2indices query >> >> On 05/04/2010 07:57 PM, Loren Engrav wrote: >>> Am back >>> >>> So I have romer and GSEABase running via previous help thank you, but while >>> running I explore GSEABase >>> >>> And I have a lesser question for interest >>> >>> In GSEABase I do >>> gmtObject <- getGMT("c2all.v2.5.symbols.gmt", >>> collectionType=BroadCollection(category="c2"), geneType=SymbolIdentifier()) >> >> or maybe getBroadSets ? >> >>> which finishes without error >>> Then >>> class(gmtObject) is GeneSetCollection >>> >>> How do I convert gmtObject to a list of gene sets as required in romer when >>> using >> >> gmtl <- geneIds(gmtObject) >> names(gmtl) <- names(gmtObject) >> >> ? >> >> Martin >> >>> X <- symbols2indices (ListOfGeneSets, MYsymbols) >>> >>> >>> >>> From: Vincent Carey <stvjc at="" channing.harvard.edu=""> >>> Date: Tue, 4 May 2010 11:40:39 -0400 >>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>> Cc: rbioc <bioconductor at="" stat.math.ethz.ch=""> >>> Subject: Re: [BioC] Romer and symbols2indices query >>> >>> Very briefly, the GSEABase package has relevant utilities for gmt file >>> import/export and may be worth considering for these tasks. >>> >>> On Tue, May 4, 2010 at 10:43 AM, Loren Engrav <engrav at="" u.washington.edu=""> >>> wrote: >>>> Thank you, got it >>>> >>>> Downloading rdata objects saves reading them into an rdata object, cool >>>> >>>> But for interest, in R/GSA there is >>>> GSA.read.gmt(filename.gmt) to read in a .gmt file >>>> >>>> Does limma or romer have an equivalent function? >>>> >>>> >>>>> From: Matthew Ritchie <mritchie at="" wehi.edu.au=""> >>>>> Date: Tue, 4 May 2010 14:44:23 +1000 (EST) >>>>> To: Loren Engrav <engrav at="" u.washington.edu=""> >>>>> Cc: rbioc <bioconductor at="" stat.math.ethz.ch=""> >>>>> Subject: Re: [BioC] Romer and symbols2indices query >>>>> >>>>> Dear Loren, >>>>> >>>>> You can find rdata objects of the Broad's MSigDB gene sets at >>>>> >>>>> http://bioinf.wehi.edu.au/software/MSigDB/index.html >>>>> >>>>> You are right, the 'symbols' argument in the function symbols2indicies() >>>>> are the gene symbols corresponding to the probes from your microarray >>>>> data. >>>>> >>>>> For example, to use the human C2 collection, download the rdata file, then >>>>> run the following. >>>>> >>>>> load("human_c2.rdata") >>>>> c2 = symbols2indices(Hs.gmtl.c2, symbols) >>>>> >>>>> (this assumes 'symbols' is a vector containing the gene symbols from your >>>>> array data) >>>>> >>>>> Best wishes, >>>>> >>>>> Matt >>>>> >>>>>> Have done GSEA and GSA for set enrichment and am setting out to try romer >>>>>> and have probably "simple" question >>>>>> >>>>>> To get the Broad set into a list of indices there is >>>>>> symbols2indices(gmtl.official, symbols) but >>>>>> >>>>>> 1)how do I get the Broad set into gmtl.official? And >>>>>> 2)is symbols a vector of MY probe sets of interest? >>>>>> >>>>>> I checked gmane and found only one comment about romer >>>>>> Also checked limma reference pdf >>>>>> >>>>>> Thank you ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:4}}