120315
This may be related to a getBM biomaRt returns different results for the same attribute, depending on which attributes I request I posted, but if so, I'm still in need of guidance:
I made a Mart object as follows:
Mmmart2 <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "mmusculus_gene_ensembl")
Now I will use "getBM" to ask for six different sets of attributes for 1 non-coding RNA (Xist) and three protein-coding genes, and I will get what seem to me to be rather unpredictable responses depending on which set of attributes I request:
(1)
> getBM(attributes = c("refseq_mrna", "refseq_ncrna", "refseq_peptide"), + filters = "mgi_symbol", + values = c("Xist", "Yy1", "Hist1h1c", "Hist1h3c"), + mart = Mmmart) refseq_mrna refseq_ncrna refseq_peptide 1 NM_015786 NA NP_056601 2 NM_175653 NA NP_783584 3 NM_009537 NA NP_033563
Why did I get nothing for my non-coding RNAs? There is a non-coding RNA in there, and Biomart returns it to me if I use the same command except changing the attributes that I request to only the non-coding RNA:
(2)
attributes = c("refseq_ncrna") refseq_ncrna 1 NR_001463
So why didn't my request in (1) give something like this:
refseq_mrna refseq_ncrna refseq_peptide 1 NM_015786 NA NP_056601 2 NM_175653 NA NP_783584 3 NM_009537 NA NP_033563 4 NA NR_001463 NA
?
(3)
Now I try to add names to my request in (1) so that I know which pair of "NM_" and "NP_" identifiers go with which gene symbol, but it complains with an error:
attributes = c("refseq_mrna", "refseq_ncrna", "refseq_peptide", "mgi_symbol") Error in getBM(attributes = c("refseq_mrna", "refseq_ncrna", "refseq_peptide", : Query ERROR: caught BioMart::Exception::Usage: Too many attributes selected for External References
Is 4 requested attributes really too many, or is it instead that the attributes I've requested are somehow incompatible?
(5)
If I ask for the same thing except dropping the request for the non-coding RNA attribute, it works fine, but doesn't acknowledge anything for Xist, which is still in the values:
attributes = c("refseq_mrna", "refseq_peptide", "mgi_symbol") refseq_mrna refseq_peptide mgi_symbol 1 NM_015786 NP_056601 Hist1h1c 2 NM_175653 NP_783584 Hist1h3c 3 NM_009537 NP_033563 Yy1
(6)
But then when I do the converse and ask for the non-coding RNA with its symbol, it acknowledges the coding genes (unlike in (5) for the non-coding RNA) and gives me two Xist rows, one with and one without an "NR_" identifier:
attributes = c("refseq_ncrna", "mgi_symbol") refseq_ncrna mgi_symbol 1 Hist1h1c 2 Hist1h3c 3 Xist 4 NR_001463 Xist 5 Yy1 >
If I do things like this with an OrgDb object, I get everything with one simple request:
desired <- c("Xist", "Yy1", "Hist1h1c", "Hist1h3c") desiredRefs <- select(x = mouseOrgDb, keys = keys(mouseOrgDb, keytype = "SYMBOL")[which(keys(mouseOrgDb, keytype = "SYMBOL") %in% desired)], keytype = "SYMBOL", columns = c("SYMBOL", "REFSEQ")) > desiredRefs SYMBOL REFSEQ 1 Yy1 NM_009537 2 Yy1 NP_033563 3 Yy1 XM_006515820 4 Yy1 XP_006515883 5 Hist1h1c NM_015786 6 Hist1h1c NP_056601 7 Xist NR_001463 8 Xist NR_001570 9 Hist1h3c NM_175653 10 Hist1h3c NP_783584
(I initially had requested the "refseq_ncrna_predicted" and "refseq_peptide_predicted" attributes in my getBM queries, but that caused even more trouble.)
Looking back at my getBM request (6), I see that the empty Xist slot was probably for "NR_001570", since it seems to know that there were two, though it only gave me "NR_001463".
One response to my question could, of course, be to just forget about biomaRt and stick with OrgDb-style requests, but I would like to understand what is going on with biomaRt. Any advice would be appreciated.
Thanks.
Eric
Dear Thomas,
Thank you very much. I really appreciate it.
Eric