Search
Question: topGO - genesInTerm returns genes not in the annotation
1
gravatar for samuel collombet
3.6 years ago by
France
samuel collombet120 wrote:

Hi,

I am using the topGO package and I got very strange results:

I made a go2genes list myself, downloading go annotation mapping ensembl gene id with biomart :

> mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL",host="feb2014.archive.ensembl.org", path="/biomart/martservice", dataset="mmusculus_gene_ensembl")

> ensemblGene_go <- getBM(attributes=c("ensembl_gene_id","go_id","external_gene_id"),filters="ensembl_gene_id", values=ensembl$ensembl_geneID,mart=mart)
> head(ensemblGene_go )

     ensembl_gene_id      go_id external_gene_id
1 ENSMUSG00000013653 GO:0008150    1810065E05Rik
2 ENSMUSG00000013653 GO:0005575    1810065E05Rik
3 ENSMUSG00000013653 GO:0003674    1810065E05Rik
4 ENSMUSG00000058287 GO:0008150          Gm12253
5 ENSMUSG00000058287 GO:0046849          Gm12253
6 ENSMUSG00000058287 GO:0005575          Gm12253

> go2ensemblGene <- split(ensemblGene_go$ensembl_gene_id,ensemblGene_go$go_id)
> go2ensemblGene[1:2]

$`GO:0000002`
[1] "ENSMUSG00000022889" "ENSMUSG00000033845" "ENSMUSG00000030879"
[4] "ENSMUSG00000090262" "ENSMUSG00000019699" "ENSMUSG00000030557"
[7] "ENSMUSG00000027424"

$`GO:0000003`
[1] "ENSMUSG00000029061"

I then make my topGO object:

> GOdata <- new("topGOdata", ontology="BP", annot=annFUN.GO2genes, GO2genes=go2ensemblGene, allGenes=GeneList,nodeSize=5,geneSel=topClusterGenes)

Then, if I call genesInTerm() for some GO term, the mapping between genes and go term does not fit at all!

> genesInTerm(GOdata,"GO:0051053")
$`GO:0051053`
[1] "ENSMUSG00000022878" "ENSMUSG00000032633" "ENSMUSG00000036086"
[4] "ENSMUSG00000036986" "ENSMUSG00000045658" "ENSMUSG00000046323"
[7] "ENSMUSG00000046697" "ENSMUSG00000054272" "ENSMUSG00000056758"

> go2ensemblGene["GO:0051053"]
$`GO:0051053`
[1] "ENSMUSG00000026241" "ENSMUSG00000053647"

another example:

> genesInTerm(GOdata,"GO:0051055")
$`GO:0051055`
[1] "ENSMUSG00000025856" "ENSMUSG00000032715" "ENSMUSG00000033161"
[4] "ENSMUSG00000036856" "ENSMUSG00000047638"

> go2ensemblGene["GO:0051055"]
$`GO:0051055`
[1] "ENSMUSG00000041333" "ENSMUSG00000078686" "ENSMUSG00000094793"
[4] "ENSMUSG00000078675" "ENSMUSG00000078673" "ENSMUSG00000078672"

I guess I do something wrong when I create the topGO object, but I followed the vignette and my annotation seems alright...
Any idea?

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by samuel collombet120

If needed my sessionInfo:

> sessionInfo()
R version 3.1.0 RC (2014-04-05 r65382)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] tcltk     grid      parallel  stats4    stats     graphics  grDevices
 [8] utils     datasets  methods   base     

other attached packages:
 [1] topGO_2.18.0              SparseM_1.6              
 [3] GO.db_3.0.0               RSQLite_1.0.0            
 [5] DBI_0.3.1                 AnnotationDbi_1.28.1     
 [7] graph_1.44.1              biomaRt_2.22.0           
 [9] Mfuzz_2.26.0              DynDoc_1.44.0            
[11] widgetTools_1.44.0        e1071_1.6-4              
[13] Biobase_2.26.0            wq_0.4-1                 
[15] zoo_1.7-12                reshape2_1.4.1           
[17] ggplot2_1.0.1             RColorBrewer_1.1-2       
[19] DESeq2_1.6.3              RcppArmadillo_0.4.650.1.1
[21] Rcpp_0.11.5               GenomicRanges_1.18.4     
[23] GenomeInfoDb_1.2.4        IRanges_2.0.1            
[25] S4Vectors_0.4.0           BiocGenerics_0.12.1      

loaded via a namespace (and not attached):
 [1] acepack_1.3-3.3     annotate_1.44.0     base64enc_0.1-2    
 [4] BatchJobs_1.5       BBmisc_1.9          BiocParallel_1.0.3 
 [7] bitops_1.0-6        brew_1.0-6          checkmate_1.5.1    
[10] class_7.3-12        cluster_2.0.1       codetools_0.2-11   
[13] colorspace_1.2-6    digest_0.6.8        fail_1.2           
[16] foreach_1.4.2       foreign_0.8-63      Formula_1.2-0      
[19] genefilter_1.48.1   geneplotter_1.44.0  gtable_0.1.2       
[22] Hmisc_3.15-0        iterators_1.0.7     lattice_0.20-30    
[25] latticeExtra_0.6-26 locfit_1.5-9.1      MASS_7.3-39        
[28] munsell_0.4.2       nnet_7.3-9          plyr_1.8.1         
[31] proto_0.3-10        RCurl_1.95-4.5      rpart_4.1-9        
[34] scales_0.2.4        sendmailR_1.2-1     splines_3.1.0      
[37] stringr_0.6.2       survival_2.38-1     tkWidgets_1.44.0   
[40] tools_3.1.0         XML_3.98-1.1        xtable_1.7-4       
[43] XVector_0.6.0      
>
ADD REPLYlink written 3.6 years ago by samuel collombet120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 430 users visited in the last hour