Difference between biomart R and biomart website GO query
1
0
Entering edit mode
@danielgaffney-14911
Last seen 6.0 years ago

Hi,

I'm searching for all ENSEMBL gene IDs associated with a particular GO ID (GO:0065005), which - using biomaRt - returns a single gene. I do:

library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
out <- getBM(attributes=c('ensembl_gene_id', 'go_id'),filters = 'go', values = 'GO:0065005', mart = ensembl)
unique(out$ensembl_gene_id)
[1] "ENSG00000110244"

However, when use the biomart website to run the same query I get results for 36 genes. Here are links to images of the query I used:

https://drive.google.com/open?id=17GSR8JdZuOzcc8ScxMTjFTexQgeE_Xi-

The results in the browser:

https://drive.google.com/open?id=1j1CRz5AEDB7_SICob1qlM5SAkUYsEneO

And the results file itself:

https://drive.google.com/open?id=1ppbOnNilrTGiZY6o_q-X_ue-MAXMrTOA

Can anyone suggest a reason for this?

Dan

###

sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.34.2

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.16         AnnotationDbi_1.40.0 magrittr_1.5        
 [4] BiocGenerics_0.24.0  progress_1.1.2       IRanges_2.12.0      
 [7] bit_1.1-12           R6_2.2.2             httr_1.3.1          
[10] stringr_1.3.0        blob_1.1.1           tools_3.4.3         
[13] parallel_3.4.3       Biobase_2.38.0       DBI_0.8             
[16] bit64_0.9-7          digest_0.6.15        assertthat_0.2.0    
[19] S4Vectors_0.16.0     bitops_1.0-6         curl_3.2            
[22] RCurl_1.95-4.10      memoise_1.1.0        RSQLite_2.1.0       
[25] stringi_1.1.7        compiler_3.4.3       prettyunits_1.0.2   
[28] stats4_3.4.3         XML_3.98-1.10       

 

 

 

biomart ensemblbiomart go getbm • 1.1k views
ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.5k
@mike-smith
Last seen 9 hours ago
EMBL Heidelberg

This is because when you supply a 'GO Term Accession' in the web interface, the filter it is actually applying is go_parent_term rather than just go.  This mean you also get back genes that are annotated with a child term of the GO term you've specified, where as with your biomaRt query you get back only those that are directly annotated with the search term. 

You can get the full list of 36 genes like in the web interface with:

out2 <- getBM(attributes=c('ensembl_gene_id', 'go_id'), 
              filters = 'go_parent_term', 
              values = 'GO:0065005', 
              mart = ensembl)
> length(unique(out2$ensembl_gene_id))
[1] 36

```

ADD COMMENT
0
Entering edit mode

Thanks very much Mike, this solved my problem!

Dan

ADD REPLY

Login before adding your answer.

Traffic: 734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6