Question: TopGO - Incorrect mapping genes to GO terms
0
gravatar for lehallib
15 months ago by
lehallib0
lehallib0 wrote:

I’m using topGO a lot for GO analysis and I’m worrying about the accuracy of the genes to GO terms mapping

As an example, 2600+ genes are associated with a given GO term (e.g "GO:1903561”, Extracellular vesicle) using the topGO annFUN.org mapping and the org.Mm.eg.db database.

When manually searching in the org.Mm.eg.db database, I get only 49 genes which is way less…

What could explain these differences?

thanks in advance

set.seed(1234)
require(org.Mm.eg.db)
require(DBI)
require(topGO)

# select a random list of gene symbol
x <- unique(unlist(as.list(org.Mm.egSYMBOL)))
names(x)=x
genesOfInterest=sample(x,2000,replace = F)
# format  this list for topGO
geneList = x
geneList[!geneList %in% genesOfInterest] <- 0
geneList[geneList %in% genesOfInterest] <- 1
geneList = factor(geneList)
table(geneList)

# Create topGO object
GOdata_CC = NULL
GOdata_CC <-
  new(
    "topGOdata",
    ontology = "CC",
    allGenes = geneList,
    description = "Test",
    annot = annFUN.org,
    mapping = "org.Mm.eg.db",
    ID = "SYMBOL"
  )

# number of genes for the "extracellular vesicle" GO term, GO:1903561
length(genesInTerm(GOdata_CC,"GO:1903561")[[1]])

# Comparison with manual searching in the org.Mm.eg.db package
anno <- AnnotationDbi::select(org.Mm.eg.db, 
                              keys="GO:1903561",
                              columns=c("SYMBOL","GO"),
                              keytype="GO")
unique(anno$GO)
dim(anno)

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] topGO_2.28.0         SparseM_1.77         GO.db_3.4.1          graph_1.54.0        
 [5] DBI_0.7              org.Mm.eg.db_3.4.1   AnnotationDbi_1.38.2 IRanges_2.10.5      
 [9] S4Vectors_0.14.7     Biobase_2.36.2       BiocGenerics_0.22.1 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.15       bit_1.1-12         lattice_0.20-35    rlang_0.1.6        blob_1.1.0        
 [6] tools_3.4.1        grid_3.4.1         matrixStats_0.53.0 bit64_0.9-7        digest_0.6.15     
[11] tibble_1.4.2       memoise_1.1.0      RSQLite_2.0        compiler_3.4.1     pillar_1.1.0      
[16] pkgconfig_2.0.1   

go topgo org.mm.eg.db • 319 views
ADD COMMENTlink modified 15 months ago by James W. MacDonald50k • written 15 months ago by lehallib0
Answer: TopGO - Incorrect mapping genes to GO terms
2
gravatar for James W. MacDonald
15 months ago by
United States
James W. MacDonald50k wrote:

You are looking at the genes that have a direct mapping to that GO term, whereas topGO uses all genes that map directly to that term as well as all of its progeny. 

Put another way, topGO uses GOALL, whereas you are using GO:

> nrow(select(org.Mm.eg.db, "GO:1903561", "ENTREZID", "GOALL"))
'select()' returned 1:many mapping between keys and columns
[1] 2636
> nrow(select(org.Mm.eg.db, "GO:1903561", "ENTREZID", "GO"))
'select()' returned 1:many mapping between keys and columns
[1] 53​
ADD COMMENTlink written 15 months ago by James W. MacDonald50k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 118 users visited in the last hour