How to get ENTREZID from Gene symbol in bioconductor

0

Entering edit mode

Srinivas Iyyer ▴ 600

@srinivas-iyyer-939

Last seen 9.6 years ago

Dear group: I have a vector of gene symbols. I want to get EntrezID for those gene symbols. I want to use hgu133plus2 as my annotation environment. how can I do it. here is my vector: msba <- c('AURKA','CCNB2','TRIP13','EZH2','TYMS') when I have probesetid it is straightforward for me. thanks for your help. srini ________________________________________________________________ ____________________ Be a better friend, newshound, and

hgu133plus2 hgu133plus2 • 2.7k views

ADD COMMENT • link updated 16.1 years ago by Marc Carlson ★ 7.2k • written 16.1 years ago by Srinivas Iyyer ▴ 600

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 7.7 years ago

United States

Srinivas Iyyer wrote: > Dear group: > I have a vector of gene symbols. > I want to get EntrezID for those gene symbols. > I want to use hgu133plus2 as my annotation > environment. > > how can I do it. > > here is my vector: > > msba <- c('AURKA','CCNB2','TRIP13','EZH2','TYMS') > > when I have probesetid it is straightforward for me. > > thanks for your help. > > srini > > > > > ______________________________________________________________ ______________________ > Be a better friend, newshound, and > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > #I would use this library to save a couple steps: library(org.Hs.eg.db) #then you can choose one of two options: #use this if you are confident that your gene symbols are mainstream: mget(msba, revmap(org.Hs.egSYMBOL)) #OR if you want to be more inclusive about your symbols you can also use this: mget(msba, org.Hs.egALIAS2EG) #I see that in your case the 1st option is really better since you find everything with that already. # Marc

ADD COMMENT • link 16.1 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Dear Marc, thanks for the tip. I obtained gene symboles from hgug133plus2SYMBOL env (from probesets if u133plus2). I do not have have data matrix for these genes. I just have only list of gene symbols. Is there a way to juggle between SYMBOL <-> PROBEsetID <-> SYMBOL/ENTREZID/....and rest of all functionalities. > xx = mget(msba, revmap(org.Hs.egSYMBOL)) Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : invalid key "KNTC2" I get the error with org.Hs.egSYMBOL. Thanks Srini --- Marc Carlson <mcarlson at="" fhcrc.org=""> wrote: > Srinivas Iyyer wrote: > > Dear group: > > I have a vector of gene symbols. > > I want to get EntrezID for those gene symbols. > > I want to use hgu133plus2 as my annotation > > environment. > > > > how can I do it. > > > > here is my vector: > > > > msba <- c('AURKA','CCNB2','TRIP13','EZH2','TYMS') > > > > when I have probesetid it is straightforward for > me. > > > > thanks for your help. > > > > srini > > > > > > > > > > > ______________________________________________________________________ ______________ > > Be a better friend, newshound, and > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > #I would use this library to save a couple steps: > > library(org.Hs.eg.db) > > #then you can choose one of two options: > #use this if you are confident that your gene > symbols are mainstream: > mget(msba, revmap(org.Hs.egSYMBOL)) > > #OR if you want to be more inclusive about your > symbols you can also use > this: > mget(msba, org.Hs.egALIAS2EG) > > > #I see that in your case the 1st option is really > better since you find > everything with that already. > > # Marc >

ADD REPLY • link 16.1 years ago Srinivas Iyyer ▴ 600

0

Entering edit mode

Srinivas Iyyer wrote: > Dear Marc, > thanks for the tip. > > I obtained gene symboles from hgug133plus2SYMBOL env > (from probesets if u133plus2). > > I do not have have data matrix for these genes. I just > have only list of gene symbols. > > > Is there a way to juggle between SYMBOL <-> PROBEsetID > <-> SYMBOL/ENTREZID/....and rest of all > functionalities. > > >> xx = mget(msba, revmap(org.Hs.egSYMBOL)) >> > Error in .checkKeys(value, Rkeys(x), x at ifnotfound) : > invalid key "KNTC2" > > I get the error with org.Hs.egSYMBOL. > > Thanks > Srini > > > The error you are listing here just means that the symbol KNTC2 is not in the environment you are searching. Since you say you got the list of genes from the hgu133plus2.db package, it makes me suspicious that your packages are not all from the same time period. Do you think you could show your sessionInfo() for me? If your annotation packages are all from the same build, then the symbols that you get from hgu133plus2.db should be found inside of the org.Hs.eg.db package. Otherwise all bets are off since these annotations necessarily change over time (which is why we make a new set of builds every 6 months). Using recent annotation packages from devel, I don't find "official" (by which I mean primary) gene symbols for KNTC2 in either package (or at NCBI for humans). This symbol is listed at NCBI only an "alternate symbol" which means you can only expect to get a value back of you use the org.Hs.egALIAS2EG map. That is because this map has all the standard symbols plus all the alternate symbols within it. In other words this should work: mget("KNTC2", org.Hs.egALIAS2EG) I am guessing that you have an older annotation package for hgug133plus2 that is from a time when KNTC2 was considered to be the primary gene symbol for entrez gene ID = 10403. That would cause the error you are reporting. But this is all speculation without your sessionInfo(). Here is mine: > sessionInfo() R version 2.7.0 Under development (unstable) (2008-03-06 r44691) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY =en_US;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELE PHONE=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices datasets utils methods [8] base other attached packages: [1] hgu133plus2.db_2.1.3 org.Hs.eg.db_2.1.3 AnnotationDbi_1.1.25 [4] RSQLite_0.6-8 DBI_0.2-4 Biobase_1.17.15 In general I would urge extreme caution when using gene symbols to map to anything. They are absolutely awful as identifiers since there is no guarantee of uniqueness and they are prone to changing on the whims of the people who coin them. We have done what we can to make them accessible, but please be careful when using gene symbols. I am not sure what exactly you are asking with your more general mapping question, but the package hgu133plus2.db is really a "probe set centric" package. That means that everything in it maps (somehow) to a probeset ID. In contrast the org.Hs.eg.db package is really an "Entrez Gene centric" package. Hope this helps you, Marc

ADD REPLY • link 16.1 years ago Marc Carlson ★ 7.2k

Login before adding your answer.