Search
Question: Difference between biomart R and biomart website GO query
0
gravatar for daniel.gaffney
5 months ago by
daniel.gaffney20 wrote:

Hi,

I'm searching for all ENSEMBL gene IDs associated with a particular GO ID (GO:0065005), which - using biomaRt - returns a single gene. I do:

library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
out <- getBM(attributes=c('ensembl_gene_id', 'go_id'),filters = 'go', values = 'GO:0065005', mart = ensembl)
unique(out$ensembl_gene_id)
[1] "ENSG00000110244"

However, when use the biomart website to run the same query I get results for 36 genes. Here are links to images of the query I used:

https://drive.google.com/open?id=17GSR8JdZuOzcc8ScxMTjFTexQgeE_Xi-

The results in the browser:

https://drive.google.com/open?id=1j1CRz5AEDB7_SICob1qlM5SAkUYsEneO

And the results file itself:

https://drive.google.com/open?id=1ppbOnNilrTGiZY6o_q-X_ue-MAXMrTOA

Can anyone suggest a reason for this?

Dan

###

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.34.2

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16         AnnotationDbi_1.40.0 magrittr_1.5        
 [4] BiocGenerics_0.24.0  progress_1.1.2       IRanges_2.12.0      
 [7] bit_1.1-12           R6_2.2.2             httr_1.3.1          
[10] stringr_1.3.0        blob_1.1.1           tools_3.4.3         
[13] parallel_3.4.3       Biobase_2.38.0       DBI_0.8             
[16] bit64_0.9-7          digest_0.6.15        assertthat_0.2.0    
[19] S4Vectors_0.16.0     bitops_1.0-6         curl_3.2            
[22] RCurl_1.95-4.10      memoise_1.1.0        RSQLite_2.1.0       
[25] stringi_1.1.7        compiler_3.4.3       prettyunits_1.0.2   
[28] stats4_3.4.3         XML_3.98-1.10       

 

 

 

ADD COMMENTlink modified 5 months ago by Mike Smith2.9k • written 5 months ago by daniel.gaffney20
0
gravatar for Mike Smith
5 months ago by
Mike Smith2.9k
EMBL Heidelberg / de.NBI
Mike Smith2.9k wrote:

This is because when you supply a 'GO Term Accession' in the web interface, the filter it is actually applying is go_parent_term rather than just go.  This mean you also get back genes that are annotated with a child term of the GO term you've specified, where as with your biomaRt query you get back only those that are directly annotated with the search term. 

You can get the full list of 36 genes like in the web interface with:

out2 <- getBM(attributes=c('ensembl_gene_id', 'go_id'), 
              filters = 'go_parent_term', 
              values = 'GO:0065005', 
              mart = ensembl)
> length(unique(out2$ensembl_gene_id))
[1] 36

```

ADD COMMENTlink written 5 months ago by Mike Smith2.9k

Thanks very much Mike, this solved my problem!

Dan

ADD REPLYlink written 4 months ago by daniel.gaffney20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 224 users visited in the last hour