I’m using topGO a lot for GO analysis and I’m worrying about the accuracy of the genes to GO terms mapping
As an example, 2600+ genes are associated with a given GO term (e.g "GO:1903561”, Extracellular vesicle) using the topGO annFUN.org mapping and the org.Mm.eg.db database.
When manually searching in the org.Mm.eg.db database, I get only 49 genes which is way less…
What could explain these differences?
thanks in advance
set.seed(1234)
require(org.Mm.eg.db)
require(DBI)
require(topGO)
# select a random list of gene symbol
x <- unique(unlist(as.list(org.Mm.egSYMBOL)))
names(x)=x
genesOfInterest=sample(x,2000,replace = F)
# format this list for topGO
geneList = x
geneList[!geneList %in% genesOfInterest] <- 0
geneList[geneList %in% genesOfInterest] <- 1
geneList = factor(geneList)
table(geneList)
# Create topGO object
GOdata_CC = NULL
GOdata_CC <-
new(
"topGOdata",
ontology = "CC",
allGenes = geneList,
description = "Test",
annot = annFUN.org,
mapping = "org.Mm.eg.db",
ID = "SYMBOL"
)
# number of genes for the "extracellular vesicle" GO term, GO:1903561
length(genesInTerm(GOdata_CC,"GO:1903561")[[1]])
# Comparison with manual searching in the org.Mm.eg.db package
anno <- AnnotationDbi::select(org.Mm.eg.db,
keys="GO:1903561",
columns=c("SYMBOL","GO"),
keytype="GO")
unique(anno$GO)
dim(anno)
sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] topGO_2.28.0 SparseM_1.77 GO.db_3.4.1 graph_1.54.0
[5] DBI_0.7 org.Mm.eg.db_3.4.1 AnnotationDbi_1.38.2 IRanges_2.10.5
[9] S4Vectors_0.14.7 Biobase_2.36.2 BiocGenerics_0.22.1
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 bit_1.1-12 lattice_0.20-35 rlang_0.1.6 blob_1.1.0
[6] tools_3.4.1 grid_3.4.1 matrixStats_0.53.0 bit64_0.9-7 digest_0.6.15
[11] tibble_1.4.2 memoise_1.1.0 RSQLite_2.0 compiler_3.4.1 pillar_1.1.0
[16] pkgconfig_2.0.1
