Question: biomaRt getBM() function does not recognize gene ID character vector values generated from csv file for annotation
9 months ago by
agdif0 wrote:

I have a multi-column csv file. In the first column are ensembl IDs that I would like to annotate using biomaRt in RStudio.

After reading the csv file into R as a dataframe I have converted the first column with the IDs to a character vector, since the getBM() parameter, values, requires a list of vectors as an argument. However, after running getBM() function it outputs 0 observations of 4 variables, thus not recognizing my character vector as the appropriate values to assigned the annotation to. 

Any input as to how this can be resolved? Does it have to do with the data frame or character vector? 

(file structure and R code below)


#load matched id file into R as csv
dat=read.csv("/Users/cindyxu/Desktop/test/rbh_healthy_vs_unhealthy.csv", header=TRUE)

#set column with gene IDs as a list of character vectors

#load biomart and acar dataset
ensembl=useMart("ensembl", dataset="acarolinensis_gene_ensembl")
genemap=getBM(attributes=c("ensembl_gene_id", "entrezgene", "hgnc_symbol", "description"), filters="ensembl_gene_id", values=geneid, mart=ensembl)

Output: genemap 0 obs. of 4 variables


Answer: biomaRt getBM() function does not recognize gene ID character vector values gene
9 months ago by
Mike Smith3.3k
EMBL Heidelberg / de.NBI
Mike Smith3.3k wrote:

Your rbh column contains protein IDs, not gene IDs, so you need to change your query accordingly e.g.

> getBM(attributes=c("ensembl_gene_id", "entrezgene", "hgnc_symbol", "description"),
+       filters="ensembl_peptide_id", 
+       values=geneid, mart=ensembl)

     ensembl_gene_id entrezgene hgnc_symbol                                                     description
1 ENSACAG00000006713  100559986       FOXP2             forkhead box P2 [Source:HGNC Symbol;Acc:HGNC:13875]
2 ENSACAG00000006970  100560550       IFT46 intraflagellar transport 46 [Source:HGNC Symbol;Acc:HGNC:26146]
3 ENSACAG00000013687  100552532                                                                            
4 ENSACAG00000023622         NA                                                                            
5 ENSACAG00000026443         NA

