On 09/05/2013 10:52 PM, Maintainer wrote:
>
> When I write GMT file into R,like
>> C2allBroadSets <- getGmt("c2.all.v4.0.orig.gmt")
> Error in GeneSetCollection(lapply(lines, function(line) { :
> error in evaluating the argument 'object' in selecting a method
for function 'GeneSetCollection': Error in validObject(.Object) :
> invalid class "GeneSet" object: gene symbols must be unique
the problem is that c2.all.v4.0.orig.gmt (from
http://www.broadinstitute.org/gsea/msigdb/collections.jsp) is poorly
formed. I
did (the output is edited)
> options(error=recover)
> xx = getGmt("c2.all.v4.0.orig.gmt")
Enter a frame number, or 0 to exit
1: getGmt("c2.all.v4.0.orig.gmt")
2: GeneSetCollection(lapply(lines, function(line) {
GeneSet(unlist(line[-(1
3: lapply(lines, function(line) {
GeneSet(unlist(line[-(1:2)]), geneIdType
4: FUN(X[[4694]], ...)
5: GeneSet(unlist(line[-(1:2)]), geneIdType = geneIdType,
collectionType = col
6: GeneSet(unlist(line[-(1:2)]), geneIdType = geneIdType,
collectionType = col
7: do.call(new, c("GeneSet", list(geneIds = type), list(... = ...,
setIdentifi
8: (function (Class, ...)
{
ClassDef <- getClass(Class, where = topenv(pare
9: initialize(value, ...)
10: initialize(value, ...)
11: .local(.Object, ...)
12: callNextMethod(.Object, .Template, ..., setIdentifier =
mkScalar(setIdentif
13: eval(call, callEnv)
14: eval(expr, envir, enclos)
15: .nextMethod(.Object, .Template, ..., setIdentifier =
mkScalar(setIdentifier
16: validObject(.Object)
Selection:
line 4 gives a hint that the problem in in line ~ 4694 of the file. I
then
responded with
Selection: 16
Called from: top level
Browse[1]> getValidity(getClass("GeneSet"))
function (object)
{
if (any(duplicated(geneIds(object))))
"gene symbols must be unique"
else TRUE
}
<environment: namespace:gseabase="">
Browse[1]> geneIds(object)[which(duplicated(geneIds(object)))]
[1] "NM_009369"
and then verified that in the original file this is indeed the only
line with a
duplicated identifier
> txt = readLines("c2.all.v4.0.orig.gmt")
> fld = strsplit(txt, "\t")
> dups = sapply(fld, function(x) any(table(x) != 1))
> which(dups)
[1] 4694
The short term solution is to edit c2.all.v4.0.orig.gmt to remove the
duplicate
entry
txt[4694] = sub("NM_009369\t", "", txt[4694])
writeLines(txt, "c2.all.v4.0.orig_MODIFIED_.gmt")
the longer term solution is to report the problem to the MSigDB
maintainers.
Martin
>
> how to fix it out?
>
> -- output of sessionInfo():
>
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
> [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
> [4] LC_NUMERIC=C
> [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
>
> attached base packages:
> [1] splines grid parallel stats graphics grDevices
utils
> [8] datasets methods base
>
> other attached packages:
> [1] GSVA_1.8.0 GSVAdata_0.99.10
> [3] hgu95a.db_2.9.0
hgu133plus2hsentrezgprobe_17.1.0
> [5] hgu133plus2hsentrezgcdf_17.1.0 hgu133plus2hsentrezg.db_17.1.0
> [7] hgu95av2.db_2.9.0 a4Classif_1.8.0
> [9] varSelRF_0.7-3 randomForest_4.6-7
> [11] pamr_1.54.1 survival_2.37-4
> [13] ROCR_1.0-5 gplots_2.11.3
> [15] KernSmooth_2.23-10 caTools_1.14
> [17] gdata_2.13.2 gtools_3.0.0
> [19] MLInterfaces_1.40.0 sfsmisc_1.0-24
> [21] cluster_1.14.4 rda_1.0.2-2
> [23] rpart_4.1-3 MASS_7.3-29
> [25] a4Preproc_1.8.0 a4Core_1.8.0
> [27] glmnet_1.9-5 Matrix_1.0-12
> [29] lattice_0.20-23 GSEABase_1.22.0
> [31] affy_1.38.1 GOstats_2.26.0
> [33] graph_1.38.3 Category_2.26.0
> [35] VennDiagram_1.6.5 pheatmap_0.7.6
> [37] statmod_1.4.17 limma_3.16.7
> [39] biomaRt_2.16.0 annotate_1.38.0
> [41] genefilter_1.42.0 primeviewhsentrezgprobe_17.1.0
> [43] primeviewhsentrezg.db_17.1.0 org.Hs.eg.db_2.9.0
> [45] RSQLite_0.11.4 DBI_0.2-7
> [47] primeviewhsentrezgcdf_17.1.0 AnnotationDbi_1.22.6
> [49] Biobase_2.20.1 BiocGenerics_0.6.0
> [51] rj_1.1.3-1
>
> loaded via a namespace (and not attached):
> [1] affyio_1.28.0 AnnotationForge_1.2.2 BiocInstaller_1.10.3
> [4] bitops_1.0-6 GO.db_2.9.0 IRanges_1.18.3
> [7] mboost_2.2-2 preprocessCore_1.22.0 RBGL_1.36.2
> [10] RCurl_1.95-4.1 rj.gd_1.1.3-1 stats4_3.0.1
> [13] tools_3.0.1 XML_3.98-1.1 xtable_1.7-1
> [16] zlibbioc_1.6.0
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> ____________________________________________________________________
____
> devteam-bioc mailing list
> To unsubscribe from this mailing list send a blank email to
> devteam-bioc-leave at lists.fhcrc.org
> You can also unsubscribe or change your personal options at
> https://lists.fhcrc.org/mailman/listinfo/devteam-bioc
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
On 09/06/2013 06:17 AM, Martin Morgan wrote:
>
> txt[4694] = sub("NM_009369\t", "", txt[4694])
> writeLines(txt, "c2.all.v4.0.orig_MODIFIED_.gmt")
maybe a better way is to avoid writing to the intermediate file
txt = readLines("c2.all.v4.0.orig_MODIFIED_.gmt")
## remove duplicate entry in gene set
txt[4694] = sub("NM_009369\t", "", txt[4694])
getGmt(textConnection(txt))
and in this way more tightly coupling the transformation with your
analysis.
Martin
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793