Search
Question: Converting gene symbol list to Entrez IDs
0
gravatar for imalumberjack
9 months ago by
imalumberjack0 wrote:

Hello all, 

I'm not very experienced with bioconductor and R, and I am struggling with converting a list of gene symbols I've read in from a .csv file into R into their relevant ENTREZ ID(s). I was wondering if anyone had any tips for how to address this? The code I'd been attempting to use was the following:

>prog<-read.csv(file="mydata.csv," header=TRUE, sep="/")

> gns<-select(org.Hs.eg.db, prog, c("ENTREZID","GENENAME"))

Error in .testForValidKeys(x, keys, keytype, fks) :

  None of the keys entered are valid keys for 'ENTREZID'. Please use the keys method to see a listing of valid arguments.

Many thanks for your help!

ADD COMMENTlink modified 9 months ago by mat14920 • written 9 months ago by imalumberjack0

Where did you get the list of gene symbols from? From a published paper? I ask this many published sources include gene symbols that are no longer current official symbols.

Your file has a "csv" extension, suggesting that it is a comma-separated file, but then you specify sep="/". What gives with that? Can you show us the first few lines of your file? Does your data file have a column containing gene symbols?

What will you do with the Entrez Gene Ids when you get them? What will be the next step?

ADD REPLYlink written 9 months ago by Gordon Smyth35k
1
gravatar for James W. MacDonald
9 months ago by
United States
James W. MacDonald48k wrote:

You are passing a data.frame to select, rather than a character vector. Presumably one of the columns of prog contains the Entrez Gene IDs, so you should subset to that column. Also note that the default of read.csv is to convert strings to factors, so you should probably include stringsAsFactors = FALSE to your call to read.csv.

ADD COMMENTlink written 9 months ago by James W. MacDonald48k
0
gravatar for mat149
9 months ago by
mat14920
mat14920 wrote:

Here is a code chunk that I use to convert zebrafish gene symbols to Entrez gene ID's:

("t" in this case is of class character with random genes that I'm interested in, but you can use your "read.csv" object)

library(org.Dr.eg.db)
keytypes(org.Dr.eg.db)
library(clusterProfiler)

t <- c("lepa","lepr","lepb","leprot")
et <- bitr(t, fromType="SYMBOL", toType=(c("ENTREZID","PATH","GO","ALIAS","GENENAME")), OrgDb="org.Dr.eg.db")
head(et)

and the reverse:

tt<-c("100150233","567241","564348","550484")
ett <- bitr(tt, fromType="ENTREZID", toType="SYMBOL", OrgDb="org.Dr.eg.db")
head(ett)
ADD COMMENTlink written 9 months ago by mat14920
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 233 users visited in the last hour