topGO and Arabidopsis data
1
0
Entering edit mode
@johannes-hanson-3621
Last seen 10.2 years ago
Dear All, I am analyzing affymetrix expression data using the topGO package. I basically follow the script in the topGO vingette. It works fine for the CC and BP ontologies but for MF i get the following error: > sampleGOdata <- new("topGOdata", description = "Simple session", ontology = "MF", allGenes = geneList, geneSel = topDiffGenes, nodeSize = 10, annot = annFUN.db,affyLib = affyLib) Building most specific GOs ..... ( 1348 GO terms found. ) Build GO DAG topology .......... There are no adj nodes for node: GO:0010241 Error in switch(type, isa = 0, partof = 1, -1) : EXPR must be a length 1 vector I guess that there is something wrong in the annotation packages but honestly, I might well have misunderstood the error message. Using the examples in the topGO vingette and human annotation no errors are given. Thanks in advance, Johannes > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] topGO_2.2.0 SparseM_0.86 GO.db_2.4.5 graph_1.28.0 ath1121501.db_2.4.5 org.At.tair.db_2.4.6 RSQLite_0.9-3 [8] DBI_0.2-5 AnnotationDbi_1.12.0 ath1121501cdf_2.7.0 affy_1.28.0 Biobase_2.10.0 loaded via a namespace (and not attached): [1] affyio_1.18.0 grid_2.12.0 lattice_0.19-13 preprocessCore_1.12.0 tools_2.12.0
Annotation GO ath1121501 topGO Annotation GO ath1121501 topGO • 3.0k views
ADD COMMENT
0
Entering edit mode
Adrian Alexa ▴ 400
@adrian-alexa-936
Last seen 10.2 years ago
Hi Johannes, I apologise for the late reply. I had a chance to look into the issue you reported and apparently the problem is with the org.At.tair.db_2.4.6. Namely the GO id reported in your error belongs to the BPontology and to the MF ontology, at least based on GO.db_2.4.5 and some other only resources. For example, if you run: library(ath1121501.db) goID <- "GO:0010241" .sql <- "SELECT gene_id, go_id FROM genes INNER JOIN go_mf USING('_id')" retVal <- dbGetQuery(org.At.tair_dbconn(), .sql) go2prob <- split(retVal[["gene_id"]], retVal[["go_id"]]) go2prob[goID] you will get: $`GO:0010241` [1] "AT5G25900" But if you search this GO in the GO.db database you will find that it belongs to the BP ontology. GOTERM[[goID] I hope this bug will be solved in the org.At.tair.db_2.4.6 package. Until then I did a quick fix such that you can further use the topGO package. The annFUN.db2 will keep only the GO terms available in the GO.db specific for the chosen ontology: annFUN.db2 <- function(whichOnto, feasibleGenes = NULL, affyLib) { ## we add the .db ending if needed affyLib <- paste(sub(".db$", "", affyLib), ".db", sep = "") require(affyLib, character.only = TRUE) || stop(paste("package", affyLib, "is required", sep = " ")) affyLib <- sub(".db$", "", affyLib) orgFile <- get(paste(get(paste(affyLib, "ORGPKG", sep = "")), "_dbfile", sep = "")) try(dbGetQuery(get(paste(affyLib, "dbconn", sep = "_"))(), paste("ATTACH '", orgFile(), "' as org;", sep ="")), silent = TRUE) .sql <- paste("SELECT DISTINCT probe_id, go_id FROM probes INNER JOIN ", "(SELECT * FROM org.genes INNER JOIN org.go_", tolower(whichOnto)," USING('_id')) USING('gene_id');", sep = "") retVal <- dbGetQuery(get(paste(affyLib, "dbconn", sep = "_"))(), .sql) ## restric to the set of feasibleGenes if(!is.null(feasibleGenes)) retVal <- retVal[retVal[["probe_id"]] %in% feasibleGenes, ] ## split the table into a named list of GOs retVal <- split(retVal[["probe_id"]], retVal[["go_id"]]) ## return only the GOs mapped in GO.db return(retVal[names(retVal) %in% ls(get(paste("GO", toupper(whichOnto), "Term", sep = "")))]) } Using this function you can build a topGOdata object as before (just replace annFUN.db with annFUN.db2): library(topGO) allProbes <- ls(ath1121501GO) ## generate a random set of interesting pro myInterestingGenes <- sample(allProbes, 100) geneList <- factor(as.integer(allProbes %in% myInterestingGenes)) names(geneList) <- allProbes ## build a topGOdata object sampleGOdata <- new("topGOdata", description = "Simple session", ontology = "MF", allGenes = geneList, nodeSize = 10, annot = annFUN.db2, affyLib = "ath1121501.db") sampleGOdata This works without a problem on the latest stable R/Bioconductor version. Let me know if you have further questions. Best regards, Adrian > sessionInfo() R version 2.12.1 beta (2010-12-06 r53802) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US LC_NUMERIC=C LC_TIME=en_US [4] LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=en_US [7] LC_PAPER=en_US LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ath1121501.db_2.4.5 org.At.tair.db_2.4.6 topGO_2.2.0 [4] SparseM_0.86 GO.db_2.4.5 RSQLite_0.9-4 [7] DBI_0.2-5 AnnotationDbi_1.12.0 Biobase_2.10.0 [10] graph_1.28.0 loaded via a namespace (and not attached): [1] grid_2.12.0 lattice_0.19-13 tools_2.12.0 On Thu, Dec 2, 2010 at 5:44 PM, Johannes Hanson <s.j.hanson at="" uu.nl=""> wrote: > Dear All, > > I am analyzing affymetrix expression data using the topGO package. > I basically follow the script in the topGO vingette. It works fine for the CC and BP ontologies but for MF i get the following error: > >> sampleGOdata <- new("topGOdata", description = "Simple session", ontology = "MF", allGenes = geneList, geneSel = topDiffGenes, nodeSize = 10, annot = annFUN.db,affyLib = affyLib) > > Building most specific GOs ..... ? ? ? ?( 1348 GO terms found. ) > > Build GO DAG topology .......... > ?There are no adj nodes for node: ?GO:0010241 > Error in switch(type, isa = 0, partof = 1, -1) : > ?EXPR must be a length 1 vector > > I guess that there is something wrong in the annotation packages but honestly, I might well have misunderstood the error message. Using the examples in the topGO vingette and human annotation no errors are given. > > Thanks in advance, > Johannes > > >> sessionInfo() > R version 2.12.0 (2010-10-15) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > ?[1] topGO_2.2.0 ? ? ? ? ?SparseM_0.86 ? ? ? ? GO.db_2.4.5 ? ? ? ? ?graph_1.28.0 ? ? ? ? ath1121501.db_2.4.5 ?org.At.tair.db_2.4.6 RSQLite_0.9-3 > ?[8] DBI_0.2-5 ? ? ? ? ? ?AnnotationDbi_1.12.0 ath1121501cdf_2.7.0 ?affy_1.28.0 ? ? ? ? ?Biobase_2.10.0 > > loaded via a namespace (and not attached): > [1] affyio_1.18.0 ? ? ? ? grid_2.12.0 ? ? ? ? ? lattice_0.19-13 ? ? ? preprocessCore_1.12.0 tools_2.12.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6