Question: Affymetrix probeset ids to gene symbols
0
gravatar for peter robinson
11.3 years ago by
peter robinson300 wrote:
Dear all, I have a list of affymetrix probeset ids from another program and would like to use annaffy to extract the corresponding gene names. I am still something of a novice at R and am probably doing something silly, but found no answer in the package vignette. My script: library(annaffy) dat <- read.table('sign.txt.cdt',header=T) psets<-dat[,3] symbols<-aafSymbol(as.character(psets),"moe430b.db") s<-as.character(symbols) I was surprisied that so few of the probeset ids got identified by this script. WHat am I doing wrong? THanks Peter s<-as.character(symbols) > s [1] "character(0)" "character(0)" "character(0)" [4] "character(0)" "character(0)" "character(0)" [7] "character(0)" "character(0)" "character(0)" [10] "character(0)" "character(0)" "character(0)" [13] "character(0)" "character(0)" "Egr3" [16] "character(0)" "character(0)" "character(0)" [19] "character(0)" "character(0)" "character(0)" [22] "character(0)" "character(0)" "character(0)" [25] "Irak2" "character(0)" "Coq10b" [28] "character(0)" "BC063749" "character(0)" [31] "4631422O05Rik" "character(0)" "Coq10b" [34] "character(0)" "character(0)" "AI452195" [37] "character(0)" "character(0)" "character(0)" [40] "Mobkl2a" "character(0)" "character(0)" (...snip....)
annaffy • 2.4k views
ADD COMMENTlink modified 11.3 years ago by MARIA STALTERI160 • written 11.3 years ago by peter robinson300
Answer: Affymetrix probeset ids to gene symbols
0
gravatar for Vincent J. Carey, Jr.
11.3 years ago by
United States
Vincent J. Carey, Jr.6.3k wrote:
> Dear all, > > I have a list of affymetrix probeset ids from another program and would > like to use annaffy to extract the corresponding gene names. I am still > something of a novice at R and am probably doing something silly, but > found no answer in the package vignette. My script: > > > library(annaffy) > > dat <- read.table('sign.txt.cdt',header=T) > psets<-dat[,3] > symbols<-aafSymbol(as.character(psets),"moe430b.db") > s<-as.character(symbols) > > I was surprisied that so few of the probeset ids got identified by this > script. WHat am I doing wrong? you got some hits so it seems to me that conceptually the solution is OK. you do not need to use annaffy for this task. library(moe430b.db) mget(psets, moe430bSYMBOL) # or moe430bGENENAME for actual names would in principle work and would return a little more info if there are specific elements of psets that you think should map to names, but don't, state what they are and the symbols that you think they should resolve to. also provide a sessionInfo()... > > THanks Peter > s<-as.character(symbols) > > s > [1] "character(0)" "character(0)" "character(0)" > [4] "character(0)" "character(0)" "character(0)" > [7] "character(0)" "character(0)" "character(0)" > [10] "character(0)" "character(0)" "character(0)" > [13] "character(0)" "character(0)" "Egr3" > [16] "character(0)" "character(0)" "character(0)" > [19] "character(0)" "character(0)" "character(0)" > [22] "character(0)" "character(0)" "character(0)" > [25] "Irak2" "character(0)" "Coq10b" > [28] "character(0)" "BC063749" "character(0)" > [31] "4631422O05Rik" "character(0)" "Coq10b" > [34] "character(0)" "character(0)" "AI452195" > [37] "character(0)" "character(0)" "character(0)" > [40] "Mobkl2a" "character(0)" "character(0)" > > (...snip....) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > The information transmitted in this electronic communica...{{dropped:10}}
ADD COMMENTlink written 11.3 years ago by Vincent J. Carey, Jr.6.3k
Answer: Affymetrix probeset ids to gene symbols
0
gravatar for Thomas Hampton
11.3 years ago by
Thomas Hampton740 wrote:
getSYMBOL in package annotate is a nice way to handle this. I found it easier, at least. Cheers Tom On Jul 3, 2008, at 4:31 PM, Peter Robinson wrote: > Dear all, > > I have a list of affymetrix probeset ids from another program and > would like to use annaffy to extract the corresponding gene names. > I am still something of a novice at R and am probably doing > something silly, but found no answer in the package vignette. My > script: > > > library(annaffy) > > dat <- read.table('sign.txt.cdt',header=T) > psets<-dat[,3] > symbols<-aafSymbol(as.character(psets),"moe430b.db") > s<-as.character(symbols) > > I was surprisied that so few of the probeset ids got identified by > this script. WHat am I doing wrong? > > THanks Peter > s<-as.character(symbols) > > s > [1] "character(0)" "character(0)" "character(0)" > [4] "character(0)" "character(0)" "character(0)" > [7] "character(0)" "character(0)" "character(0)" > [10] "character(0)" "character(0)" "character(0)" > [13] "character(0)" "character(0)" "Egr3" > [16] "character(0)" "character(0)" "character(0)" > [19] "character(0)" "character(0)" "character(0)" > [22] "character(0)" "character(0)" "character(0)" > [25] "Irak2" "character(0)" "Coq10b" > [28] "character(0)" "BC063749" "character(0)" > [31] "4631422O05Rik" "character(0)" "Coq10b" > [34] "character(0)" "character(0)" "AI452195" > [37] "character(0)" "character(0)" "character(0)" > [40] "Mobkl2a" "character(0)" "character(0)" > > (...snip....) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/ > gmane.science.biology.informatics.conductor
ADD COMMENTlink written 11.3 years ago by Thomas Hampton740
Answer: Affymetrix probeset ids to gene symbols
0
gravatar for Kurt Vanhoutte
11.3 years ago by
Kurt Vanhoutte10 wrote:
Dear Tom & co, I used getSymbol but retrieved a limited and variable number of probes (1-5) with the same name. What could be the reason for this? (in the context of >10 MisMatch/PerfectMatch probes for each gene) Some background: We are applying a contrast analysis to a pathological Affy micro-array dataset. The dataset is available in GEO as a series matrix txt file ( 22645 probes/ 35 samples- http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2240). We interrogated the set with the open access R/Bioconductor packages (hgu133b and KEGG, annotate). Short code: Loading the libraries: library(hgu133b.db) library(KEGG.db) library(annotate) In particular we wanted to analyse the apoptosis genes from the KEGG apoptosis pathway. xx <- as.list(hgu133bPATH2PROBE) Alternatives ? xx$'04210' #genes However when we retrieve the gene names, listtemp<-getSYMBOL(xx$'04210',"hgu133b.db") we get a variable number of probes (1-5) with the same name, see appendix below and we do not retrieve all genes from the KEGG pathway. Though the probes are all apoptosis genes, I did not anticipate finding 5 XIAP probes for example. Any suggestions to resolve this issue (ic the difference between the probes)? Kind regards, Kurt Appendix: listtemp<-getSYMBOL(xx$'04210',"hgu133b.db") > listtemp 225471_s_at 226156_at 236664_at 225858_s_at 225859_at 228363_at 235222_x_at "AKT2" "AKT2" "AKT2" "XIAP" "XIAP" "XIAP" "XIAP" 243026_x_at 237522_at 232660_at 231228_at 232012_at 231218_at 223518_at "XIAP" "FAS" "BAD" "BCL2L1" "CAPN1" "CASP8" "DFFA" 228465_at 244383_at 231779_at 231699_at 235980_at 229392_s_at 229606_at "IRAK1" "IRAK1" "IRAK2" "NFKBIA" "PIK3CA" "PIK3R2" "PPP3CA" 231304_at 244782_at 235780_at 225000_at 225011_at 230202_at 241325_at "PPP3R2" "PPP3R2" "PRKACB" "PRKAR2A" "PRKAR2A" "RELA" "PIK3R3" 226551_at 227345_at 231775_at 237367_x_at 239629_at 222880_at 224229_s_at "RIPK1" "TNFRSF10D" "TNFRSF10A" "CFLAR" "CFLAR" "AKT3" "AKT3" 242876_at 227553_at 227645_at 229415_at 244546_at "AKT3" "PIK3R5" "PIK3R5" "CYCS" "CYCS" > sessionInfo() R version 2.7.1 (2008-06-23) i386-pc-mingw32 locale: LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETARY= Dutch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252 attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] annotate_1.18.0 xtable_1.5-2 KEGG.db_2.2.0 [4] hgu133b.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9 [7] DBI_0.2-4 affy_1.18.2 preprocessCore_1.2.0 [10] affyio_1.8.0 Biobase_2.0.1 /////////////////////////////////////////Archive postings on the subject July 2008 getSYMBOL in package annotate is a nice way to handle this. I found it easier, at least. Cheers Tom On Jul 3, 2008, at 4:31 PM, Peter Robinson wrote: > Dear all, > > I have a list of affymetrix probeset ids from another program and > would like to use annaffy to extract the corresponding gene names. > I am still something of a novice at R and am probably doing > something silly, but found no answer in the package vignette. My > script: > > > library(annaffy) > > dat <- read.table('sign.txt.cdt',header=T) > psets<-dat[,3] > symbols<-aafSymbol(as.character(psets),"moe430b.db") > s<-as.character(symbols) > > I was surprisied that so few of the probeset ids got identified by > this script. WHat am I doing wrong? > > THanks Peter > s<-as.character(symbols) > > s > [1] "character(0)" "character(0)" "character(0)" > [4] "character(0)" "character(0)" "character(0)" > [7] "character(0)" "character(0)" "character(0)" > [10] "character(0)" "character(0)" "character(0)" > [13] "character(0)" "character(0)" "Egr3" > [16] "character(0)" "character(0)" "character(0)" > [19] "character(0)" "character(0)" "character(0)" > [22] "character(0)" "character(0)" "character(0)" > [25] "Irak2" "character(0)" "Coq10b" > [28] "character(0)" "BC063749" "character(0)" > [31] "4631422O05Rik" "character(0)" "Coq10b" > [34] "character(0)" "character(0)" "AI452195" > [37] "character(0)" "character(0)" "character(0)" > [40] "Mobkl2a" "character(0)" "character(0)" > > (...snip....) > [[alternative HTML version deleted]]
ADD COMMENTlink written 11.3 years ago by Kurt Vanhoutte10
Hi Kurt, There is not a one-to-one mapping between Affy probeset and gene. There can be many reasons for this. For instance, there may be splice variants that could be interrogated by different probesets (not that likely IMO, since they target the first 600 bp of the transcript). Another possibility could be different transcripts that were originally considered to be ESTs that have subsequently been mapped to the same gene. I am sure there are other reasons for the one-to-many mapping of probeset to gene as well. Best, Jim Kurt Vanhoutte wrote: > Dear Tom & co, > > I used getSymbol but retrieved a limited and variable number of > probes (1-5) with the same name. What could be the reason for this? > (in the context of >10 MisMatch/PerfectMatch probes for each gene) > > Some background: > We are applying a contrast analysis to a pathological Affy > micro-array dataset. > The dataset is available in GEO as a series matrix txt file ( 22645 > probes/ 35 samples- http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2240). > > We interrogated the set with the open access R/Bioconductor > packages (hgu133b and KEGG, annotate). > > Short code: > Loading the libraries: > library(hgu133b.db) > library(KEGG.db) > library(annotate) > In particular we wanted to analyse the apoptosis genes from the KEGG > apoptosis pathway. > xx <- as.list(hgu133bPATH2PROBE) Alternatives ? > xx$'04210' #genes > However when we retrieve the gene names, > listtemp<-getSYMBOL(xx$'04210',"hgu133b.db") > we get a variable number of probes (1-5) with the same name, see > appendix below and we do not retrieve all genes from the KEGG pathway. > Though the probes are all apoptosis genes, I did not anticipate > finding 5 XIAP probes for example. > > Any suggestions to resolve this issue (ic the difference between the probes)? > > > Kind regards, > Kurt > > Appendix: > listtemp<-getSYMBOL(xx$'04210',"hgu133b.db") > > listtemp > 225471_s_at 226156_at 236664_at > 225858_s_at 225859_at 228363_at 235222_x_at > "AKT2" "AKT2" "AKT2" "XIAP" "XIAP" "XIAP" > "XIAP" > 243026_x_at 237522_at 232660_at 231228_at 232012_at > 231218_at 223518_at > "XIAP" "FAS" "BAD" "BCL2L1" "CAPN1" > "CASP8" "DFFA" > 228465_at 244383_at 231779_at 231699_at 235980_at > 229392_s_at 229606_at > "IRAK1" "IRAK1" "IRAK2" "NFKBIA" "PIK3CA" > "PIK3R2" "PPP3CA" > 231304_at 244782_at 235780_at 225000_at 225011_at > 230202_at 241325_at > "PPP3R2" "PPP3R2" "PRKACB" "PRKAR2A" "PRKAR2A" "RELA" "PIK3R3" > 226551_at 227345_at 231775_at > 237367_x_at 239629_at 222880_at 224229_s_at > "RIPK1" "TNFRSF10D" > "TNFRSF10A" "CFLAR" "CFLAR" "AKT3" "AKT3" > 242876_at 227553_at 227645_at 229415_at 244546_at > "AKT3" "PIK3R5" "PIK3R5" "CYCS" "CYCS" > > > > sessionInfo() > R version 2.7.1 (2008-06-23) > i386-pc-mingw32 > > locale: > LC_COLLATE=Dutch_Belgium.1252;LC_CTYPE=Dutch_Belgium.1252;LC_MONETAR Y=Dutch_Belgium.1252;LC_NUMERIC=C;LC_TIME=Dutch_Belgium.1252 > > attached base packages: > [1] tools stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] annotate_1.18.0 xtable_1.5-2 KEGG.db_2.2.0 > [4] hgu133b.db_2.2.0 AnnotationDbi_1.2.2 RSQLite_0.6-9 > [7] DBI_0.2-4 affy_1.18.2 preprocessCore_1.2.0 > [10] affyio_1.8.0 Biobase_2.0.1 > > /////////////////////////////////////////Archive postings on the > subject July 2008 > getSYMBOL in package annotate is a nice way to handle this. > > I found it easier, at least. > > Cheers > > Tom > On Jul 3, 2008, at 4:31 PM, Peter Robinson wrote: > > > Dear all, > > > > I have a list of affymetrix probeset ids from another program and > > would like to use annaffy to extract the corresponding gene names. > > I am still something of a novice at R and am probably doing > > something silly, but found no answer in the package vignette. My > > script: > > > > > > library(annaffy) > > > > dat <- read.table('sign.txt.cdt',header=T) > > psets<-dat[,3] > > symbols<-aafSymbol(as.character(psets),"moe430b.db") > > s<-as.character(symbols) > > > > I was surprisied that so few of the probeset ids got identified by > > this script. WHat am I doing wrong? > > > > THanks Peter > > s<-as.character(symbols) > > > s > > [1] "character(0)" "character(0)" "character(0)" > > [4] "character(0)" "character(0)" "character(0)" > > [7] "character(0)" "character(0)" "character(0)" > > [10] "character(0)" "character(0)" "character(0)" > > [13] "character(0)" "character(0)" "Egr3" > > [16] "character(0)" "character(0)" "character(0)" > > [19] "character(0)" "character(0)" "character(0)" > > [22] "character(0)" "character(0)" "character(0)" > > [25] "Irak2" "character(0)" "Coq10b" > > [28] "character(0)" "BC063749" "character(0)" > > [31] "4631422O05Rik" "character(0)" "Coq10b" > > [34] "character(0)" "character(0)" "AI452195" > > [37] "character(0)" "character(0)" "character(0)" > > [40] "Mobkl2a" "character(0)" "character(0)" > > > > (...snip....) > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, MS Biostatistician UMCCC cDNA and Affymetrix Core University of Michigan 1500 E Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD REPLYlink written 11.3 years ago by James W. MacDonald51k
Answer: Affymetrix probeset ids to gene symbols
0
gravatar for MARIA STALTERI
11.3 years ago by
MARIA STALTERI160 wrote:
Hi Kurt, Jim, Affymetrix arrays such as the hg-u133b were designed to target the 600 bp at the 3' end of the transcript, not the start of the transcript. We have found that the many-to-one mappings between probesets and genes are often due to alternative splicing, use of alternative poly(A) sites, or annotation errors. Cheers, Maria
ADD COMMENTlink written 11.3 years ago by MARIA STALTERI160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 333 users visited in the last hour