It seems you've stumbled across a combination of attributes that results in an invalid query. If you try running your same query in the Ensembl BioMart web interface you get back the following:
Validation Error: Too many attributes selected for External References
I don't know of any way for biomaRt to check for this, but I suspect whatever the issue is server-side is why you're seeing the complete set of
As for why it's happening, one possible reason is that this is a case where that attribute name is really misleading. There's very little documentation, but I think this field is only populated for poorly annotated genes that don't have an MGI symbol but have been assigned some speculative HGNC ortholog e.g. SPATA24 If that's the case then your query, which explicitly selects genes with MGI symbols, would only ever return results with no value assigned to this field - hence the
I assume you actually want to find the set of orthologous human genes for your starting set of MGI symbols. If that's the case, here's one approach to finding the HGNC symbols for orthologous genes. First we'll load the library, initialise the mart, and list some example MGI gene symbols:
symbols <- c("0610005C13Rik", "Cdc6", "Gfap")
bm <- useMart(biomart = 'ensembl', dataset = "mmusculus_gene_ensembl")
Next we get the table of mappings between MGI and Ensembl IDs:
mgi2ensembl <- getBM(attributes = c("mgi_symbol", "ensembl_gene_id"),
filters = "mgi_symbol",
mart = bm,
value = symbols)
We then ask for all human orthologs for those Ensembl IDs. As this is an Ensembl dataset you have to use Ensembl IDs as the primary key here.
ensembl2hgnc <- getBM(attributes = c("hsapiens_homolog_associated_gene_name", "ensembl_gene_id"),
filters = "ensembl_gene_id",
mart = bm,
value = mgi2ensembl$ensembl_gene_id)
Finally we merge our two results into a single table to get the final mapping. A blank value indicates no ortholog was reported in Ensembl.
> merge(mgi2ensembl, ensembl2hgnc)
ensembl_gene_id mgi_symbol hsapiens_homolog_associated_gene_name
1 ENSMUSG00000017499 Cdc6 CDC6
2 ENSMUSG00000020932 Gfap GFAP
3 ENSMUSG00000109644 0610005C13Rik
There are several ways you can do this biomaRt and I wouldn't be surprised if they came up with slightly different results as mapping between gene symbols/annotation within an organism is fraught with oddities, as is defining orthologs, but they should be broadly similar.