8 days ago by
EMBL Heidelberg / de.NBI
Any BioMart query you can do in the browser you can also do via the biomaRt package, although I'll agree that it's not always obvious what the correct options are. Here's a little example trying to get mouse orthologs for three fly genes we're interest in.
First, lets load biomaRt and define that we want to use the fruit fly dataset:
ensembl <- useEnsembl(biomart = "ensembl",
dataset = "dmelanogaster_gene_ensembl")
Now lets define our three genes we're interest in. I'm using what Ensembl refers to as the 'Gene Stable ID' here, which I guess is also the FlyBase ID. You'll have to adapt the
filters argument in final query if your IDs are of a different type.
genes_of_interest <- c("FBgn0036531","FBgn0037375","FBgn0035252")
You can now query BioMart using something like:
getBM(mart = ensembl,
filters = "ensembl_gene_id",
values = genes_of_interest,
attributes = c("ensembl_gene_id",
ensembl_gene_id external_gene_name mmusculus_homolog_ensembl_gene mmusculus_homolog_associated_gene_name
1 FBgn0035252 CG7970 ENSMUSG00000029499 Pxmp2
2 FBgn0036531 CG6244 ENSMUSG00000013822 Elof1
3 FBgn0037375 kat-60L1
What information you retrieve is determined by the
attributes argument. Here we're getting back same ID we used to search and the fly gene name, plus the analogous values for orthologs found in mouse. If you want different information, then you can use the functions
searchAttributes() to find what else is available (or you can look on the web interface). Missing values presumably indicate that Ensembl does not have an ortholog mapping between these two species for that gene.
You should also note that I think this may return duplicate matches, because paralogs within one genome may be matched to paralogs in the second genome. e.g. if FlyGeneA & FlyGeneB are paralogs, and MouseGeneA & MouseGeneB are also paralogs then you'll end up with 4 reported parings.