getBM functtion in biomaRt pacakge
3
0
Entering edit mode
Yongchao Ge ▴ 10
@yongchao-ge-11664
Last seen 7.9 years ago
Icahn School of Medicien at Mount Sinai…

Hi,

For the getBM function in the biomaRt package. There is a bug when  if we had the attributes "cdna" or "gene_exon" in the getBM function (see the print out of the seq variable in the following R code) where the column names has been shifted. It would be nice to have this bug fixed.

##The R code

library(biomaRt)
mart<-useMart(biomart = "ensembl",host="www.ensembl.org",dataset ="mmusculus_gene_ensembl")
seq<-getBM(filter="ensembl_gene_id",values="ENSMUSG00000000103",
           attributes=c("ensembl_transcript_id","cdna"),#"gene_exon very messy"
           mart=mart)
print(seq)

##the output of print(seq)

ensembl_transcript_id
1 AGAACTATGGGGCCAG.....
2 AGAAAGACTGGTGAA.....
     cdna
1 ENSMUST00000115891
2 ENSMUST00000187148

 

 

biomart • 876 views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 30 minutes ago
EMBL Heidelberg

I have committed a patch which should hopefully stop this problem occurring in the future.  It should be available from biomaRt version 2.31.4.

The correct column names will be assigned to the results, and the column order should now always match the ordering of the attributes as they were requested in the function call e.g.

seq <- getBM(filter = "ensembl_gene_id",
             values = "ENSMUSG00000000103",
             attributes = c("ensembl_transcript_id","cdna"),
             mart = mart,
             bmHeader = FALSE)
apply(seq, 2, strtrim, 30)
     ensembl_transcript_id cdna                            
[1,] "ENSMUST00000115891"  "AGAACTATGGGGCCAGTCTCTGGAGAGCTC"
[2,] "ENSMUST00000187148"  "AGAAAGACTGGTGAACATCAAACGGCCGTT"

You can still get the plain text description for each column using bmHeader = TRUE.

ADD COMMENT
0
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 30 minutes ago
EMBL Heidelberg

I agree this is strange behavior.  As a temporary fix you can try with the argument bmHeader = TRUE. This will use the plain text description of the field, rather than its ID.  e.g.

seq<-getBM(filter="ensembl_gene_id",
           values="ENSMUSG00000000103",
           attributes=c("ensembl_transcript_id","cdna"),
           mart=mart,
           bmHeader = TRUE)
> colnames(seq)
[1] "cDNA sequences" "Transcript ID" 

I'll take a look at making sure the ID version matches correctly in the next few weeks.

ADD COMMENT

Login before adding your answer.

Traffic: 831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6