The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: topGO - genesInTerm returns genes not in the annotation
1
gravatar for samuel collombet
3.9 years ago by
France
samuel collombet130 wrote:

Hi,

I am using the topGO package and I got very strange results:

I made a go2genes list myself, downloading go annotation mapping ensembl gene id with biomart :

> mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL",host="feb2014.archive.ensembl.org", path="/biomart/martservice", dataset="mmusculus_gene_ensembl")

> ensemblGene_go <- getBM(attributes=c("ensembl_gene_id","go_id","external_gene_id"),filters="ensembl_gene_id", values=ensembl$ensembl_geneID,mart=mart)
> head(ensemblGene_go )

     ensembl_gene_id      go_id external_gene_id
1 ENSMUSG00000013653 GO:0008150    1810065E05Rik
2 ENSMUSG00000013653 GO:0005575    1810065E05Rik
3 ENSMUSG00000013653 GO:0003674    1810065E05Rik
4 ENSMUSG00000058287 GO:0008150          Gm12253
5 ENSMUSG00000058287 GO:0046849          Gm12253
6 ENSMUSG00000058287 GO:0005575          Gm12253

> go2ensemblGene <- split(ensemblGene_go$ensembl_gene_id,ensemblGene_go$go_id)
> go2ensemblGene[1:2]

$`GO:0000002`
[1] "ENSMUSG00000022889" "ENSMUSG00000033845" "ENSMUSG00000030879"
[4] "ENSMUSG00000090262" "ENSMUSG00000019699" "ENSMUSG00000030557"
[7] "ENSMUSG00000027424"

$`GO:0000003`
[1] "ENSMUSG00000029061"

I then make my topGO object:

> GOdata <- new("topGOdata", ontology="BP", annot=annFUN.GO2genes, GO2genes=go2ensemblGene, allGenes=GeneList,nodeSize=5,geneSel=topClusterGenes)

Then, if I call genesInTerm() for some GO term, the mapping between genes and go term does not fit at all!

> genesInTerm(GOdata,"GO:0051053")
$`GO:0051053`
[1] "ENSMUSG00000022878" "ENSMUSG00000032633" "ENSMUSG00000036086"
[4] "ENSMUSG00000036986" "ENSMUSG00000045658" "ENSMUSG00000046323"
[7] "ENSMUSG00000046697" "ENSMUSG00000054272" "ENSMUSG00000056758"

> go2ensemblGene["GO:0051053"]
$`GO:0051053`
[1] "ENSMUSG00000026241" "ENSMUSG00000053647"

another example:

> genesInTerm(GOdata,"GO:0051055")
$`GO:0051055`
[1] "ENSMUSG00000025856" "ENSMUSG00000032715" "ENSMUSG00000033161"
[4] "ENSMUSG00000036856" "ENSMUSG00000047638"

> go2ensemblGene["GO:0051055"]
$`GO:0051055`
[1] "ENSMUSG00000041333" "ENSMUSG00000078686" "ENSMUSG00000094793"
[4] "ENSMUSG00000078675" "ENSMUSG00000078673" "ENSMUSG00000078672"

I guess I do something wrong when I create the topGO object, but I followed the vignette and my annotation seems alright...
Any idea?

topgo • 863 views
ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by samuel collombet130

If needed my sessionInfo:

> sessionInfo()
R version 3.1.0 RC (2014-04-05 r65382)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] tcltk     grid      parallel  stats4    stats     graphics  grDevices
 [8] utils     datasets  methods   base     

other attached packages:
 [1] topGO_2.18.0              SparseM_1.6              
 [3] GO.db_3.0.0               RSQLite_1.0.0            
 [5] DBI_0.3.1                 AnnotationDbi_1.28.1     
 [7] graph_1.44.1              biomaRt_2.22.0           
 [9] Mfuzz_2.26.0              DynDoc_1.44.0            
[11] widgetTools_1.44.0        e1071_1.6-4              
[13] Biobase_2.26.0            wq_0.4-1                 
[15] zoo_1.7-12                reshape2_1.4.1           
[17] ggplot2_1.0.1             RColorBrewer_1.1-2       
[19] DESeq2_1.6.3              RcppArmadillo_0.4.650.1.1
[21] Rcpp_0.11.5               GenomicRanges_1.18.4     
[23] GenomeInfoDb_1.2.4        IRanges_2.0.1            
[25] S4Vectors_0.4.0           BiocGenerics_0.12.1      

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3     annotate_1.44.0     base64enc_0.1-2    
 [4] BatchJobs_1.5       BBmisc_1.9          BiocParallel_1.0.3 
 [7] bitops_1.0-6        brew_1.0-6          checkmate_1.5.1    
[10] class_7.3-12        cluster_2.0.1       codetools_0.2-11   
[13] colorspace_1.2-6    digest_0.6.8        fail_1.2           
[16] foreach_1.4.2       foreign_0.8-63      Formula_1.2-0      
[19] genefilter_1.48.1   geneplotter_1.44.0  gtable_0.1.2       
[22] Hmisc_3.15-0        iterators_1.0.7     lattice_0.20-30    
[25] latticeExtra_0.6-26 locfit_1.5-9.1      MASS_7.3-39        
[28] munsell_0.4.2       nnet_7.3-9          plyr_1.8.1         
[31] proto_0.3-10        RCurl_1.95-4.5      rpart_4.1-9        
[34] scales_0.2.4        sendmailR_1.2-1     splines_3.1.0      
[37] stringr_0.6.2       survival_2.38-1     tkWidgets_1.44.0   
[40] tools_3.1.0         XML_3.98-1.1        xtable_1.7-4       
[43] XVector_0.6.0      
>
ADD REPLYlink written 3.9 years ago by samuel collombet130
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 468 users visited in the last hour