getGmt error
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
When I write GMT file into R,like > C2allBroadSets <- getGmt("c2.all.v4.0.orig.gmt") Error in GeneSetCollection(lapply(lines, function(line) { : error in evaluating the argument 'object' in selecting a method for function 'GeneSetCollection': Error in validObject(.Object) : invalid class "GeneSet" object: gene symbols must be unique how to fix it out? -- output of sessionInfo(): R version 3.0.1 (2013-05-16) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 [4] LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 attached base packages: [1] splines grid parallel stats graphics grDevices utils [8] datasets methods base other attached packages: [1] GSVA_1.8.0 GSVAdata_0.99.10 [3] hgu95a.db_2.9.0 hgu133plus2hsentrezgprobe_17.1.0 [5] hgu133plus2hsentrezgcdf_17.1.0 hgu133plus2hsentrezg.db_17.1.0 [7] hgu95av2.db_2.9.0 a4Classif_1.8.0 [9] varSelRF_0.7-3 randomForest_4.6-7 [11] pamr_1.54.1 survival_2.37-4 [13] ROCR_1.0-5 gplots_2.11.3 [15] KernSmooth_2.23-10 caTools_1.14 [17] gdata_2.13.2 gtools_3.0.0 [19] MLInterfaces_1.40.0 sfsmisc_1.0-24 [21] cluster_1.14.4 rda_1.0.2-2 [23] rpart_4.1-3 MASS_7.3-29 [25] a4Preproc_1.8.0 a4Core_1.8.0 [27] glmnet_1.9-5 Matrix_1.0-12 [29] lattice_0.20-23 GSEABase_1.22.0 [31] affy_1.38.1 GOstats_2.26.0 [33] graph_1.38.3 Category_2.26.0 [35] VennDiagram_1.6.5 pheatmap_0.7.6 [37] statmod_1.4.17 limma_3.16.7 [39] biomaRt_2.16.0 annotate_1.38.0 [41] genefilter_1.42.0 primeviewhsentrezgprobe_17.1.0 [43] primeviewhsentrezg.db_17.1.0 org.Hs.eg.db_2.9.0 [45] RSQLite_0.11.4 DBI_0.2-7 [47] primeviewhsentrezgcdf_17.1.0 AnnotationDbi_1.22.6 [49] Biobase_2.20.1 BiocGenerics_0.6.0 [51] rj_1.1.3-1 loaded via a namespace (and not attached): [1] affyio_1.28.0 AnnotationForge_1.2.2 BiocInstaller_1.10.3 [4] bitops_1.0-6 GO.db_2.9.0 IRanges_1.18.3 [7] mboost_2.2-2 preprocessCore_1.22.0 RBGL_1.36.2 [10] RCurl_1.95-4.1 rj.gd_1.1.3-1 stats4_3.0.1 [13] tools_3.0.1 XML_3.98-1.1 xtable_1.7-1 [16] zlibbioc_1.6.0 -- Sent via the guest posting facility at bioconductor.org.
GO hgu95a hgu95av2 GO hgu95a hgu95av2 • 3.2k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 days ago
United States
On 09/05/2013 10:52 PM, Maintainer wrote: > > When I write GMT file into R,like >> C2allBroadSets <- getGmt("c2.all.v4.0.orig.gmt") > Error in GeneSetCollection(lapply(lines, function(line) { : > error in evaluating the argument 'object' in selecting a method for function 'GeneSetCollection': Error in validObject(.Object) : > invalid class "GeneSet" object: gene symbols must be unique the problem is that c2.all.v4.0.orig.gmt (from http://www.broadinstitute.org/gsea/msigdb/collections.jsp) is poorly formed. I did (the output is edited) > options(error=recover) > xx = getGmt("c2.all.v4.0.orig.gmt") Enter a frame number, or 0 to exit 1: getGmt("c2.all.v4.0.orig.gmt") 2: GeneSetCollection(lapply(lines, function(line) { GeneSet(unlist(line[-(1 3: lapply(lines, function(line) { GeneSet(unlist(line[-(1:2)]), geneIdType 4: FUN(X[[4694]], ...) 5: GeneSet(unlist(line[-(1:2)]), geneIdType = geneIdType, collectionType = col 6: GeneSet(unlist(line[-(1:2)]), geneIdType = geneIdType, collectionType = col 7: do.call(new, c("GeneSet", list(geneIds = type), list(... = ..., setIdentifi 8: (function (Class, ...) { ClassDef <- getClass(Class, where = topenv(pare 9: initialize(value, ...) 10: initialize(value, ...) 11: .local(.Object, ...) 12: callNextMethod(.Object, .Template, ..., setIdentifier = mkScalar(setIdentif 13: eval(call, callEnv) 14: eval(expr, envir, enclos) 15: .nextMethod(.Object, .Template, ..., setIdentifier = mkScalar(setIdentifier 16: validObject(.Object) Selection: line 4 gives a hint that the problem in in line ~ 4694 of the file. I then responded with Selection: 16 Called from: top level Browse[1]> getValidity(getClass("GeneSet")) function (object) { if (any(duplicated(geneIds(object)))) "gene symbols must be unique" else TRUE } <environment: namespace:gseabase=""> Browse[1]> geneIds(object)[which(duplicated(geneIds(object)))] [1] "NM_009369" and then verified that in the original file this is indeed the only line with a duplicated identifier > txt = readLines("c2.all.v4.0.orig.gmt") > fld = strsplit(txt, "\t") > dups = sapply(fld, function(x) any(table(x) != 1)) > which(dups) [1] 4694 The short term solution is to edit c2.all.v4.0.orig.gmt to remove the duplicate entry txt[4694] = sub("NM_009369\t", "", txt[4694]) writeLines(txt, "c2.all.v4.0.orig_MODIFIED_.gmt") the longer term solution is to report the problem to the MSigDB maintainers. Martin > > how to fix it out? > > -- output of sessionInfo(): > > R version 3.0.1 (2013-05-16) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 > [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936 > [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 > [4] LC_NUMERIC=C > [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936 > > attached base packages: > [1] splines grid parallel stats graphics grDevices utils > [8] datasets methods base > > other attached packages: > [1] GSVA_1.8.0 GSVAdata_0.99.10 > [3] hgu95a.db_2.9.0 hgu133plus2hsentrezgprobe_17.1.0 > [5] hgu133plus2hsentrezgcdf_17.1.0 hgu133plus2hsentrezg.db_17.1.0 > [7] hgu95av2.db_2.9.0 a4Classif_1.8.0 > [9] varSelRF_0.7-3 randomForest_4.6-7 > [11] pamr_1.54.1 survival_2.37-4 > [13] ROCR_1.0-5 gplots_2.11.3 > [15] KernSmooth_2.23-10 caTools_1.14 > [17] gdata_2.13.2 gtools_3.0.0 > [19] MLInterfaces_1.40.0 sfsmisc_1.0-24 > [21] cluster_1.14.4 rda_1.0.2-2 > [23] rpart_4.1-3 MASS_7.3-29 > [25] a4Preproc_1.8.0 a4Core_1.8.0 > [27] glmnet_1.9-5 Matrix_1.0-12 > [29] lattice_0.20-23 GSEABase_1.22.0 > [31] affy_1.38.1 GOstats_2.26.0 > [33] graph_1.38.3 Category_2.26.0 > [35] VennDiagram_1.6.5 pheatmap_0.7.6 > [37] statmod_1.4.17 limma_3.16.7 > [39] biomaRt_2.16.0 annotate_1.38.0 > [41] genefilter_1.42.0 primeviewhsentrezgprobe_17.1.0 > [43] primeviewhsentrezg.db_17.1.0 org.Hs.eg.db_2.9.0 > [45] RSQLite_0.11.4 DBI_0.2-7 > [47] primeviewhsentrezgcdf_17.1.0 AnnotationDbi_1.22.6 > [49] Biobase_2.20.1 BiocGenerics_0.6.0 > [51] rj_1.1.3-1 > > loaded via a namespace (and not attached): > [1] affyio_1.28.0 AnnotationForge_1.2.2 BiocInstaller_1.10.3 > [4] bitops_1.0-6 GO.db_2.9.0 IRanges_1.18.3 > [7] mboost_2.2-2 preprocessCore_1.22.0 RBGL_1.36.2 > [10] RCurl_1.95-4.1 rj.gd_1.1.3-1 stats4_3.0.1 > [13] tools_3.0.1 XML_3.98-1.1 xtable_1.7-1 > [16] zlibbioc_1.6.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > ____________________________________________________________________ ____ > devteam-bioc mailing list > To unsubscribe from this mailing list send a blank email to > devteam-bioc-leave at lists.fhcrc.org > You can also unsubscribe or change your personal options at > https://lists.fhcrc.org/mailman/listinfo/devteam-bioc > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
On 09/06/2013 06:17 AM, Martin Morgan wrote: > > txt[4694] = sub("NM_009369\t", "", txt[4694]) > writeLines(txt, "c2.all.v4.0.orig_MODIFIED_.gmt") maybe a better way is to avoid writing to the intermediate file txt = readLines("c2.all.v4.0.orig_MODIFIED_.gmt") ## remove duplicate entry in gene set txt[4694] = sub("NM_009369\t", "", txt[4694]) getGmt(textConnection(txt)) and in this way more tightly coupling the transformation with your analysis. Martin -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 905 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6