The editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: biomaRt: filtering on attributes that aren't in listFilters
gravatar for so
15 days ago by
so0 wrote:

Hi, I have a “strategy” question.

I searched the documentation and forums and think it's not possible to filter by attributes that don’t come up using the listFilters function (eg. GO description). (If it’s not clear what I want to do, I essentially want to follow this example, but filter the GO description using the value “MAP kinase activity”, rather than GO IDs using the value “GO:0004707”)

My current solution is to download all the GO IDs and GO descriptions in a mart, search that table to get unique GO IDs, then use biomaRt. Is this the recommended way to do it? I think I would really only need the unique GO IDs and descriptions (vs. downloading everything from each mart), but I'm not confident the data in eg. Go.db would match the data in biomaRt.

I would appreciate any advice/comments. Thank you in advance for your help!

biomart • 45 views
ADD COMMENTlink modified 15 days ago by Mike Smith3.2k • written 15 days ago by so0
Answer: biomaRt: filtering on attributes that aren't in listFilters
gravatar for Mike Smith
15 days ago by
Mike Smith3.2k
EMBL Heidelberg / de.NBI
Mike Smith3.2k wrote:

You can use the function searchFilters() to try and find a filter you're interested in. Since the filter ids can sometimes be a bit cryptic, it looks in both the id and the more verbose description to try and find a match, and hopefully returns a list that's a bit easier to look through than getting everything back via listFilters(). Here's an example with the Human Genes mart:

mart <- useEnsembl('ensembl', dataset = 'hsapiens_gene_ensembl')
searchFilters(mart, 'go_')
                   name             description
188      go_parent_term   Parent term accession
189      go_parent_name        Parent term name
190    go_evidence_code        GO Evidence code
230 with_cdingo_homolog Orthologous Dingo Genes

Here the code description is probably 'Parent term name' - it's still not a perfect match to how things are named on the Ensembl website, but hopefully it's easier to check the few options here if it's not immediately clear.

You can then use that as a filter on the mart e.g.

getBM(mart = mart,
      filter = "go_parent_name",
      values = "MAP kinase activity",
      attributes = c("ensembl_gene_id"))
1  ENSG00000188130
2  ENSG00000166484
3  ENSG00000185386
4  ENSG00000181085
5  ENSG00000141639

One think to bear in mind is that the search here is case sensitive, so it's very easy to get zero results for an otherwise fine looking query e.g.

getBM(mart = mart,
      filter = "go_parent_name",
      values = "MAP Kinase Activity",
      attributes = c("ensembl_gene_id"))
[1] ensembl_gene_id
<0 rows> (or 0-length row.names)

It might be preferable to stick with using GO IDs unless you're confident that your list of description terms matches the form used internally by Ensembl.

ADD COMMENTlink modified 15 days ago • written 15 days ago by Mike Smith3.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 188 users visited in the last hour