Mixed-up column names for getBM()?
1
0
Entering edit mode
@sarahsandmann-8159
Last seen 14 months ago
Germany

Hi,

I got a question regarding biomaRt. When I execute:

library(biomaRt)
mart38<-useMart(biomart="ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl")
getBM(attributes=c("hgnc_symbol","rank","ensembl_gene_id","cdna","ensembl_transcript_id","peptide","ensembl_exon_id"),
filters="ensembl_transcript_id",
values="ENST00000634301",
mart=mart38)


The output is:

             rank                           ensembl_transcript_id                 cdna peptide hgnc_symbol ensembl_gene_id ensembl_exon_id
1 ENSG00000110395 ENSE00003791646;ENSE00003787952;ENSE00003787287 Sequence unavailable     CBL       3;1;2 ENST00000634301       HGNC:1541


So, the column names don't seem to be assigned in the correct order?! Furthermore, there is "Sequence unavailable" and HGNC:1541" instead of cDNA and peptide sequence. That seems odd, because if we go for

getBM(attributes=c("ensembl_transcript_id","peptide"),
filters="ensembl_transcript_id",
values="ENST00000634301",
mart=mart38)


or

getBM(attributes=c("ensembl_transcript_id","cdna"),
filters="ensembl_transcript_id",
values="ENST00000634301",
mart=mart38)


information for peptide and cdna is reported.

I mean, sure, we can get the required information by using several commands, but why is it not working when just using one command? Thanks a lot in advance for your help!

Best, Sarah

annotation • 215 views
1
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States

This looks like an issue on the Biomart server side, rather than the biomaRt package. I would imagine it has to do with the fact that you are asking the Biomart to give you two things that you cannot ask for simultaneously, which are the peptide and cDNA sequences. If you go to the Biomart website and try to do the same query as you have done here, you will note under the Sequences attribute page that you are limited to only one selection at a time (both Peptide and cDNA are on this page).

Anyway, looking at the results from the XML query sent to the Biomart server, it looks like the problem is on their end, as the header and body sent back are not ordered correctly. Probably this needs to be caught on the biomaRt side, because trying to fix incorrect return data seems like the harder thing to do.

0
Entering edit mode

James' answer looks spot on to me. This is because the Ensembl BioMart isn't designed to give you two different sequence types. The web interface doesn't let you, but we can force it using biomaRt and it doesn't know how to cope.

I this isn't handled elegantly because biomaRt was designed as a generic tool at a time when there were lots of distinct BioMart servers, and adding specific edge cases like this wasn't practical. There's also the getSequence() function which is Ensembl specific, and (I think) doesn't allow you to try and get more than one type of sequence.

Short answer is that BioMart doesn't let you do this query in one, so it can't be enabled in biomaRt, but I'll take a look at how easy it is to detect and let a user know, rather than give you back scrambled results without any kind of warning.

0
Entering edit mode