120315
This may be related to a getBM biomaRt returns different results for the same attribute, depending on which attributes I request I posted, but if so, I'm still in need of guidance:
I made a Mart object as follows:
Mmmart2 <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl")
Now I will use "getBM" to ask for six different sets of attributes for 1 non-coding RNA (Xist) and three protein-coding genes, and I will get what seem to me to be rather unpredictable responses depending on which set of attributes I request:
(1)
> getBM(attributes = c("refseq_mrna", "refseq_ncrna", "refseq_peptide"),
+ filters = "mgi_symbol",
+ values = c("Xist", "Yy1", "Hist1h1c", "Hist1h3c"),
+ mart = Mmmart)
refseq_mrna refseq_ncrna refseq_peptide
1 NM_015786 NA NP_056601
2 NM_175653 NA NP_783584
3 NM_009537 NA NP_033563
Why did I get nothing for my non-coding RNAs? There is a non-coding RNA in there, and Biomart returns it to me if I use the same command except changing the attributes that I request to only the non-coding RNA:
(2)
attributes = c("refseq_ncrna")
refseq_ncrna
1 NR_001463
So why didn't my request in (1) give something like this:
refseq_mrna refseq_ncrna refseq_peptide 1 NM_015786 NA NP_056601 2 NM_175653 NA NP_783584 3 NM_009537 NA NP_033563 4 NA NR_001463 NA
?
(3)
Now I try to add names to my request in (1) so that I know which pair of "NM_" and "NP_" identifiers go with which gene symbol, but it complains with an error:
attributes = c("refseq_mrna", "refseq_ncrna", "refseq_peptide", "mgi_symbol")
Error in getBM(attributes = c("refseq_mrna", "refseq_ncrna", "refseq_peptide", :
Query ERROR: caught BioMart::Exception::Usage: Too many attributes selected for External References
Is 4 requested attributes really too many, or is it instead that the attributes I've requested are somehow incompatible?
(5)
If I ask for the same thing except dropping the request for the non-coding RNA attribute, it works fine, but doesn't acknowledge anything for Xist, which is still in the values:
attributes = c("refseq_mrna", "refseq_peptide", "mgi_symbol")
refseq_mrna refseq_peptide mgi_symbol
1 NM_015786 NP_056601 Hist1h1c
2 NM_175653 NP_783584 Hist1h3c
3 NM_009537 NP_033563 Yy1
(6)
But then when I do the converse and ask for the non-coding RNA with its symbol, it acknowledges the coding genes (unlike in (5) for the non-coding RNA) and gives me two Xist rows, one with and one without an "NR_" identifier:
attributes = c("refseq_ncrna", "mgi_symbol")
refseq_ncrna mgi_symbol
1 Hist1h1c
2 Hist1h3c
3 Xist
4 NR_001463 Xist
5 Yy1
>
If I do things like this with an OrgDb object, I get everything with one simple request:
desired <- c("Xist", "Yy1", "Hist1h1c", "Hist1h3c")
desiredRefs <- select(x = mouseOrgDb,
keys = keys(mouseOrgDb, keytype = "SYMBOL")[which(keys(mouseOrgDb, keytype = "SYMBOL") %in% desired)],
keytype = "SYMBOL",
columns = c("SYMBOL", "REFSEQ"))
> desiredRefs
SYMBOL REFSEQ
1 Yy1 NM_009537
2 Yy1 NP_033563
3 Yy1 XM_006515820
4 Yy1 XP_006515883
5 Hist1h1c NM_015786
6 Hist1h1c NP_056601
7 Xist NR_001463
8 Xist NR_001570
9 Hist1h3c NM_175653
10 Hist1h3c NP_783584
(I initially had requested the "refseq_ncrna_predicted" and "refseq_peptide_predicted" attributes in my getBM queries, but that caused even more trouble.)
Looking back at my getBM request (6), I see that the empty Xist slot was probably for "NR_001570", since it seems to know that there were two, though it only gave me "NR_001463".
One response to my question could, of course, be to just forget about biomaRt and stick with OrgDb-style requests, but I would like to understand what is going on with biomaRt. Any advice would be appreciated.
Thanks.
Eric

Dear Thomas,
Thank you very much. I really appreciate it.
Eric