GSEABase and Broad Inst Sets

0

Entering edit mode

Iain Gallagher ▴ 930

@iain-gallagher-2532

Last seen 8.8 years ago

United Kingdom

Hi List I'm trying to carry out a GSEA analysis on an ExpressionSet object using GSEABase and the Broad Institute genesets (well the C2 subset, specifically). library(GSEABase) broadSets <- getBroadSets("/home/iain/Desktop/prostateProjectJN_GS/CEL /msigdb_v2.5.xml")# file downloaded from Broad site isC2 <- sapply(broadSets, function(x) bcCategory(collectionType(x))) == "c2" broadSetsC2<-broadSets[isC2] relevantArrays <- grep('Hypo.No.None|Norm.No.None', TS) relevantArrays <- rmaDataFiltered[ ,relevantArrays] So this get me to the point where I have my expression data and the genesets I want. This is where I'm having trouble. Following the GSEABase tutorials with KEGG annotation I have no problems; but I can't calculate an incidence matrix from my expression data using the Broad genesets I have downloaded. i.e. testGSC <- GeneSetCollection(relevantArrays, setType=BroadCollection()) Error in get(mapName, envir = pkgEnv, inherits = FALSE) : object 'hgu133plus2BROAD' not found Error in revmap(getAnnMap(toupper(collectionType(setType)), annotation(idType))) : error in evaluating the argument 'x' in selecting a method for function 'revmap' This is a mapping issue I know but I'm having a conceptual block getting over it. If anyone could offer any help I'd be grateful. iain > sessionInfo() R version 2.10.1 (2009-12-14) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] affyQCReport_1.24.0 affyPLM_1.22.0 preprocessCore_1.8.0 [4] xtable_1.5-6 simpleaffy_2.22.0 gcrma_2.18.1 [7] latticeExtra_0.6-11 lattice_0.18-3 RColorBrewer_1.0-2 [10] hgu133plus2.db_2.3.5 hgu133plus2cdf_2.5.0 affy_1.24.2 [13] limma_3.2.3 GSEABase_1.8.0 graph_1.26.0 [16] annotate_1.24.1 hgu95av2.db_2.3.5 org.Hs.eg.db_2.3.6 [19] RSQLite_0.9-0 DBI_0.2-5 AnnotationDbi_1.8.2 [22] genefilter_1.28.2 ALL_1.4.7 Biobase_2.6.1 loaded via a namespace (and not attached): [1] affyio_1.14.0 Biostrings_2.14.12 grid_2.10.1 IRanges_1.4.16 [5] splines_2.10.1 survival_2.35-8 tools_2.10.1 XML_3.1-0 >

Annotation hgu133plus2 hgu95av2 GSEABase Annotation hgu133plus2 hgu95av2 GSEABase • 1.2k views

ADD COMMENT • link updated 13.8 years ago by Martin Morgan 25k • written 13.8 years ago by Iain Gallagher ▴ 930

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 5 days ago

United States

On 07/06/2010 04:32 AM, Iain Gallagher wrote: > Hi List > > I'm trying to carry out a GSEA analysis on an ExpressionSet object using GSEABase and the Broad Institute genesets (well the C2 subset, specifically). > > library(GSEABase) > > broadSets <- getBroadSets("/home/iain/Desktop/prostateProjectJN_GS/C EL/msigdb_v2.5.xml")# file downloaded from Broad site > > isC2 <- sapply(broadSets, function(x) bcCategory(collectionType(x))) == "c2" > > broadSetsC2<-broadSets[isC2] > > relevantArrays <- grep('Hypo.No.None|Norm.No.None', TS) > > relevantArrays <- rmaDataFiltered[ ,relevantArrays] > > So this get me to the point where I have my expression data and the genesets I want. This is where I'm having trouble. Following the GSEABase tutorials with KEGG annotation I have no problems; but I can't calculate an incidence matrix from my expression data using the Broad genesets I have downloaded. > > i.e. > > testGSC <- GeneSetCollection(relevantArrays, setType=BroadCollection()) > Error in get(mapName, envir = pkgEnv, inherits = FALSE) : > object 'hgu133plus2BROAD' not found > Error in revmap(getAnnMap(toupper(collectionType(setType)), annotation(idType))) : > error in evaluating the argument 'x' in selecting a method for function 'revmap' > > > This is a mapping issue I know but I'm having a conceptual block getting over it. If anyone could offer any help I'd be grateful. For a reproducible example, after library(GSEABase) example(getBroadSets) data(sample.ExpressionSet) eset = sample.ExpressionSet # less typing! If you're interested in creating a GeneSetCollection that contains just those symbols that are relevant to your ExpressionSet 'eset' then gss1 = mapIdentifiers(gss, AnnotationIdentifier(annotation(eset))) Subsetting eset might look like idx = featureNames(eset) %in% unlist(geneIds(gss1), use.names=FALSE) eset[idx,] In answering this question, I realized that getBroadSets does not correctly interpret the identifiers as 'Symbols'; until this is fixed in GSEABase, you should library(limma) sids <- lapply(geneIds(gss), alias2Symbol, "Hs", TRUE) gss = GeneSetCollection(mapply("geneIds<-", gss, sids)) Martin > > iain > >> sessionInfo() > R version 2.10.1 (2009-12-14) > x86_64-pc-linux-gnu > > locale: > [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C > [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 > [7] LC_PAPER=en_GB.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] affyQCReport_1.24.0 affyPLM_1.22.0 preprocessCore_1.8.0 > [4] xtable_1.5-6 simpleaffy_2.22.0 gcrma_2.18.1 > [7] latticeExtra_0.6-11 lattice_0.18-3 RColorBrewer_1.0-2 > [10] hgu133plus2.db_2.3.5 hgu133plus2cdf_2.5.0 affy_1.24.2 > [13] limma_3.2.3 GSEABase_1.8.0 graph_1.26.0 > [16] annotate_1.24.1 hgu95av2.db_2.3.5 org.Hs.eg.db_2.3.6 > [19] RSQLite_0.9-0 DBI_0.2-5 AnnotationDbi_1.8.2 > [22] genefilter_1.28.2 ALL_1.4.7 Biobase_2.6.1 > > loaded via a namespace (and not attached): > [1] affyio_1.14.0 Biostrings_2.14.12 grid_2.10.1 IRanges_1.4.16 > [5] splines_2.10.1 survival_2.35-8 tools_2.10.1 XML_3.1-0 >> > > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 13.8 years ago Martin Morgan 25k

Login before adding your answer.