biomaRt getBM() function does not recognize gene ID character vector values generated from csv file for annotation
1
0
Entering edit mode
agdif ▴ 10
@agdif-16034
Last seen 5.9 years ago

I have a multi-column csv file. In the first column are ensembl IDs that I would like to annotate using biomaRt in RStudio.

After reading the csv file into R as a dataframe I have converted the first column with the IDs to a character vector, since the getBM() parameter, values, requires a list of vectors as an argument. However, after running getBM() function it outputs 0 observations of 4 variables, thus not recognizing my character vector as the appropriate values to assigned the annotation to. 

Any input as to how this can be resolved? Does it have to do with the data frame or character vector? 

(file structure and R code below)

rbh,id,baseMean,log2FoldChange,lfcSE,stat,pvalue,padj
ENSACAP00000006835,gopAga1_00006538-RA,4.014843684,22.3613989,3.478310891,6.428809729,1.29E-10,1.58E-06
ENSACAP00000013416,gopAga1_00003775-RA,5.311678741,-3.207734101,0.845538991,-3.793715173,0.00014841,0.036437969
ENSACAP00000021108,gopAga1_00009907-RA,13.1840533,-2.49788257,0.67830511,-3.682535384,0.000230926,0.044218683
ENSACAP00000006847,gopAga1_00001219-RA,16.53058893,-1.282170299,0.351344313,-3.64932703,0.000262928,0.048092316
ENSACAP00000020399,gopAga1_00019120-RA,23.57386411,2.167299405,0.537139555,4.034890717,5.46E-05,0.021658211

#load matched id file into R as csv
dat=read.csv("/Users/cindyxu/Desktop/test/rbh_healthy_vs_unhealthy.csv", header=TRUE)

#set column with gene IDs as a list of character vectors
geneid=as.character(dat$rbh)

#load biomart and acar dataset
library("biomaRt")
ensembl=useMart("ensembl", dataset="acarolinensis_gene_ensembl")
genemap=getBM(attributes=c("ensembl_gene_id", "entrezgene", "hgnc_symbol", "description"), filters="ensembl_gene_id", values=geneid, mart=ensembl)

Output: genemap 0 obs. of 4 variables

 

biomart getBM() csv files annotation ensembl • 1.2k views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen just now
EMBL Heidelberg

Your rbh column contains protein IDs, not gene IDs, so you need to change your query accordingly e.g.

> getBM(attributes=c("ensembl_gene_id", "entrezgene", "hgnc_symbol", "description"),
+       filters="ensembl_peptide_id", 
+       values=geneid, mart=ensembl)

     ensembl_gene_id entrezgene hgnc_symbol                                                     description
1 ENSACAG00000006713  100559986       FOXP2             forkhead box P2 [Source:HGNC Symbol;Acc:HGNC:13875]
2 ENSACAG00000006970  100560550       IFT46 intraflagellar transport 46 [Source:HGNC Symbol;Acc:HGNC:26146]
3 ENSACAG00000013687  100552532                                                                            
4 ENSACAG00000023622         NA                                                                            
5 ENSACAG00000026443         NA

ADD COMMENT

Login before adding your answer.

Traffic: 1059 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6