I am trying to retrieve all genes that match a particular GO-ID using biomaRt:
ensembl <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
goGenes <- getBM(attributes = c("mgi_symbol", "go_id"),
filters = "go_id",
values = "GO:0098793",
mart = ensembl)
nrow(goGenes)
This returns a value of 53. However, if you look at the AmiGO page for this GO term and filter for M. musculus, you see that there are actually 779 genes (384 when you remove duplicated MGI symbols).
For this GO term, the page shows 591 genes after duplicates are removed. But running the function above with this GO term returns 0 genes.
What am I doing wrong here? Why don't the numbers match up?
Sorry for the late response! This worked for me. Only thing to note is that if you pass multiple GO IDs to the
values
parameter it will not work the way I was intending in the question. Thanks!What output are you hoping for when you supply multiple GO terms?
I just realized I never really specified in my question what output I wanted. If you run the code you have supplied above, you get the genes for all child terms. If you use multiple genes, you will get a combination of all child terms for all the parent GO terms you supplied which makes sense. But then, you have no way of knowing which child term is associated with what parent process. It's all lumped together. Ideally, I could search for a bunch of parent terms and there would be another column indicating what parent term a given child term is associated with. I got around this by just creating a function that accepts GO terms and uses
rbind()
to combine each output together with a separate column identifying what the original GO parent term was. There could be a better way though.