Question

biomaRt getBM() function does not recognize gene ID character vector values generated from csv file for annotation

0

Entering edit mode

agdif ▴ 10

@agdif-16034

Last seen 7.0 years ago

I have a multi-column csv file. In the first column are ensembl IDs that I would like to annotate using biomaRt in RStudio.

After reading the csv file into R as a dataframe I have converted the first column with the IDs to a character vector, since the getBM() parameter, values, requires a list of vectors as an argument. However, after running getBM() function it outputs 0 observations of 4 variables, thus not recognizing my character vector as the appropriate values to assigned the annotation to.

Any input as to how this can be resolved? Does it have to do with the data frame or character vector?

(file structure and R code below)

rbh,id,baseMean,log2FoldChange,lfcSE,stat,pvalue,padj
ENSACAP00000006835,gopAga1_00006538-RA,4.014843684,22.3613989,3.478310891,6.428809729,1.29E-10,1.58E-06
ENSACAP00000013416,gopAga1_00003775-RA,5.311678741,-3.207734101,0.845538991,-3.793715173,0.00014841,0.036437969
ENSACAP00000021108,gopAga1_00009907-RA,13.1840533,-2.49788257,0.67830511,-3.682535384,0.000230926,0.044218683
ENSACAP00000006847,gopAga1_00001219-RA,16.53058893,-1.282170299,0.351344313,-3.64932703,0.000262928,0.048092316
ENSACAP00000020399,gopAga1_00019120-RA,23.57386411,2.167299405,0.537139555,4.034890717,5.46E-05,0.021658211

#load matched id file into R as csv
dat=read.csv("/Users/cindyxu/Desktop/test/rbh_healthy_vs_unhealthy.csv", header=TRUE)

#set column with gene IDs as a list of character vectors
geneid=as.character(dat$rbh)

#load biomart and acar dataset
library("biomaRt")
ensembl=useMart("ensembl", dataset="acarolinensis_gene_ensembl")
genemap=getBM(attributes=c("ensembl_gene_id", "entrezgene", "hgnc_symbol", "description"), filters="ensembl_gene_id", values=geneid, mart=ensembl)

Output: genemap 0 obs. of 4 variables

biomart getBM() csv files annotation ensembl • 1.6k views

ADD COMMENT • link updated 7.6 years ago by Mike Smith ★ 6.6k • written 7.6 years ago by agdif ▴ 10

score 1 · Answer 1 · 2018-07-05

Your rbh column contains protein IDs, not gene IDs, so you need to change your query accordingly e.g.

> getBM(attributes=c("ensembl_gene_id", "entrezgene", "hgnc_symbol", "description"),
+       filters="ensembl_peptide_id", 
+       values=geneid, mart=ensembl)

     ensembl_gene_id entrezgene hgnc_symbol                                                     description
1 ENSACAG00000006713  100559986       FOXP2             forkhead box P2 [Source:HGNC Symbol;Acc:HGNC:13875]
2 ENSACAG00000006970  100560550       IFT46 intraflagellar transport 46 [Source:HGNC Symbol;Acc:HGNC:26146]
3 ENSACAG00000013687  100552532                                                                            
4 ENSACAG00000023622         NA                                                                            
5 ENSACAG00000026443         NA