I’m using topGO a lot for GO analysis and I’m worrying about the accuracy of the genes to GO terms mapping
As an example, 2600+ genes are associated with a given GO term (e.g "GO:1903561”, Extracellular vesicle) using the topGO annFUN.org mapping and the org.Mm.eg.db database.
When manually searching in the org.Mm.eg.db database, I get only 49 genes which is way less…
What could explain these differences?
thanks in advance
set.seed(1234) require(org.Mm.eg.db) require(DBI) require(topGO) # select a random list of gene symbol x <- unique(unlist(as.list(org.Mm.egSYMBOL))) names(x)=x genesOfInterest=sample(x,2000,replace = F) # format this list for topGO geneList = x geneList[!geneList %in% genesOfInterest] <- 0 geneList[geneList %in% genesOfInterest] <- 1 geneList = factor(geneList) table(geneList) # Create topGO object GOdata_CC = NULL GOdata_CC <- new( "topGOdata", ontology = "CC", allGenes = geneList, description = "Test", annot = annFUN.org, mapping = "org.Mm.eg.db", ID = "SYMBOL" ) # number of genes for the "extracellular vesicle" GO term, GO:1903561 length(genesInTerm(GOdata_CC,"GO:1903561")[[1]]) # Comparison with manual searching in the org.Mm.eg.db package anno <- AnnotationDbi::select(org.Mm.eg.db, keys="GO:1903561", columns=c("SYMBOL","GO"), keytype="GO") unique(anno$GO) dim(anno)
sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] topGO_2.28.0 SparseM_1.77 GO.db_3.4.1 graph_1.54.0
[5] DBI_0.7 org.Mm.eg.db_3.4.1 AnnotationDbi_1.38.2 IRanges_2.10.5
[9] S4Vectors_0.14.7 Biobase_2.36.2 BiocGenerics_0.22.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 bit_1.1-12 lattice_0.20-35 rlang_0.1.6 blob_1.1.0
[6] tools_3.4.1 grid_3.4.1 matrixStats_0.53.0 bit64_0.9-7 digest_0.6.15
[11] tibble_1.4.2 memoise_1.1.0 RSQLite_2.0 compiler_3.4.1 pillar_1.1.0
[16] pkgconfig_2.0.1