Question

getBM functtion in biomaRt pacakge

0

Entering edit mode

Yongchao Ge ▴ 10

@yongchao-ge-11664

Last seen 7.8 years ago

Icahn School of Medicien at Mount Sinai…

Hi,

For the getBM function in the biomaRt package. There is a bug when if we had the attributes "cdna" or "gene_exon" in the getBM function (see the print out of the seq variable in the following R code) where the column names has been shifted. It would be nice to have this bug fixed.

##The R code

library(biomaRt)
mart<-useMart(biomart = "ensembl",host="www.ensembl.org",dataset ="mmusculus_gene_ensembl")
seq<-getBM(filter="ensembl_gene_id",values="ENSMUSG00000000103",
attributes=c("ensembl_transcript_id","cdna"),#"gene_exon very messy"
mart=mart)
print(seq)

##the output of print(seq)

ensembl_transcript_id
1 AGAACTATGGGGCCAG.....
2 AGAAAGACTGGTGAA.....
cdna
1 ENSMUST00000115891
2 ENSMUST00000187148

biomart • 838 views

ADD COMMENT • link updated 7.7 years ago by Mike Smith ★ 6.5k • written 7.8 years ago by Yongchao Ge ▴ 10

score 1 · Answer 1 · 2017-01-12

I have committed a patch which should hopefully stop this problem occurring in the future. It should be available from biomaRt version 2.31.4.

The correct column names will be assigned to the results, and the column order should now always match the ordering of the attributes as they were requested in the function call e.g.

seq <- getBM(filter = "ensembl_gene_id",
             values = "ENSMUSG00000000103",
             attributes = c("ensembl_transcript_id","cdna"),
             mart = mart,
             bmHeader = FALSE)

apply(seq, 2, strtrim, 30)
     ensembl_transcript_id cdna                            
[1,] "ENSMUST00000115891"  "AGAACTATGGGGCCAGTCTCTGGAGAGCTC"
[2,] "ENSMUST00000187148"  "AGAAAGACTGGTGAACATCAAACGGCCGTT"

You can still get the plain text description for each column using bmHeader = TRUE.

Wolfgang Huber · Answer 2 · 2016-12-26

I agree this is strange behavior. As a temporary fix you can try with the argument bmHeader = TRUE. This will use the plain text description of the field, rather than its ID. e.g.

seq<-getBM(filter="ensembl_gene_id",
           values="ENSMUSG00000000103",
           attributes=c("ensembl_transcript_id","cdna"),
           mart=mart,
           bmHeader = TRUE)

> colnames(seq)
[1] "cDNA sequences" "Transcript ID"

I'll take a look at making sure the ID version matches correctly in the next few weeks.