Is there a package or a way to convert "probesets" to "genes"
1
0
Entering edit mode
@cheng-yuan-kao-3472
Last seen 7.2 years ago
Taiwan
Hi, there, I have a question regarding Affy chip data. We did many expression arrays and used LIMMA to get the differentially expressed "genes" (control vs treatment). However I found that some probesets have multiple genes according to Affy annotation file. On the other hand, multiple probesets could match to the same gene. Even more, some probesets matched to the same gene could be regulated in different way. So actually what LIMMA gave us is differentially expressed "probesets". Say we have 500 probesets up-regulated but we indeed want to know how many "genes" are up-regulated. I don't know how to reasonably convert the probesets to genes due to the non-one-to-one relationship. What's the convention in the microarray field? Any suggestion would be greatly appreciated. Richie [[alternative HTML version deleted]]
Microarray affy limma convert Microarray affy limma convert • 1.7k views
ADD COMMENT
0
Entering edit mode
Tobias Straub ▴ 430
@tobias-straub-2182
Last seen 10.3 years ago
hi richie, one easy way to handle the multiple probesets per gene problem is to keep only the one probeset with the highest variance across replicates. the 'nsFilter' function in the 'genefilter' package provides this operation for ExpressionSet objects. using this filter approach you might of course miss some differentially regulated splicing events. best regards tobias On Nov 17, 2009, at 12:01 AM, Cheng-Yuan Kao wrote: > Hi, there, > > I have a question regarding Affy chip data. > > We did many expression arrays and used LIMMA to get the differentially > expressed "genes" (control vs treatment). > > However I found that some probesets have multiple genes according to > Affy > annotation file. > > On the other hand, multiple probesets could match to the same gene. > Even more, some probesets matched to the same gene could be > regulated in > different way. > > So actually what LIMMA gave us is differentially expressed > "probesets". > > Say we have 500 probesets up-regulated but we indeed want to know > how many > "genes" are up-regulated. > > I don't know how to reasonably convert the probesets to genes due to > the > non-one-to-one relationship. > > What's the convention in the microarray field? > > Any suggestion would be greatly appreciated. > > > Richie > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------------------------------------------------------------------- Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D
ADD COMMENT
0
Entering edit mode
Hi Richie, I am not sure which data set and annotation package you are working on, taking hgu133plus2 chip for example. if you just want to get Entrez genes corresponding to your probesets, you can simply do: library("hgu133plus2.db") entrezIDs<-unlist(mget(probesets, hgu133plus2ENTREZID)) entrezIDs<-entrezIDs[!is.na(entrezIDs) & !duplicated(entrezIDs)] Hope it is what you want. Cheers, Yuan On 17 Nov 2009, at 08:15, Tobias Straub wrote: > hi richie, > > one easy way to handle the multiple probesets per gene problem is to > keep only the one probeset with the highest variance across > replicates. the 'nsFilter' function in the 'genefilter' package > provides this operation for ExpressionSet objects. > using this filter approach you might of course miss some > differentially regulated splicing events. > > best regards > tobias > > > On Nov 17, 2009, at 12:01 AM, Cheng-Yuan Kao wrote: > >> Hi, there, >> >> I have a question regarding Affy chip data. >> >> We did many expression arrays and used LIMMA to get the >> differentially >> expressed "genes" (control vs treatment). >> >> However I found that some probesets have multiple genes according >> to Affy >> annotation file. >> >> On the other hand, multiple probesets could match to the same gene. >> Even more, some probesets matched to the same gene could be >> regulated in >> different way. >> >> So actually what LIMMA gave us is differentially expressed >> "probesets". >> >> Say we have 500 probesets up-regulated but we indeed want to know >> how many >> "genes" are up-regulated. >> >> I don't know how to reasonably convert the probesets to genes due >> to the >> non-one-to-one relationship. >> >> What's the convention in the microarray field? >> >> Any suggestion would be greatly appreciated. >> >> >> Richie >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > ---------------------------------------------------------------------- > Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi, We have C. elegans expression data set. Do you know what exactly [!is.na(entrezIDs) & !duplicated(entrezIDs)] does? Thanks. Cheng-Yuan On Tue, Nov 17, 2009 at 3:34 AM, Yuan Hao <yuan.hao@ucd.ie> wrote: > Hi Richie, > > I am not sure which data set and annotation package you are working on, > taking hgu133plus2 chip for example. if you just want to get Entrez genes > corresponding to your probesets, you can simply do: > > library("hgu133plus2.db") > entrezIDs<-unlist(mget(probesets, hgu133plus2ENTREZID)) > entrezIDs<-entrezIDs[!is.na(entrezIDs) & !duplicated(entrezIDs)] > > Hope it is what you want. > > Cheers, > Yuan > > > > > On 17 Nov 2009, at 08:15, Tobias Straub wrote: > > hi richie, >> >> one easy way to handle the multiple probesets per gene problem is to keep >> only the one probeset with the highest variance across replicates. the >> 'nsFilter' function in the 'genefilter' package provides this operation for >> ExpressionSet objects. >> using this filter approach you might of course miss some differentially >> regulated splicing events. >> >> best regards >> tobias >> >> >> On Nov 17, 2009, at 12:01 AM, Cheng-Yuan Kao wrote: >> >> Hi, there, >>> >>> I have a question regarding Affy chip data. >>> >>> We did many expression arrays and used LIMMA to get the differentially >>> expressed "genes" (control vs treatment). >>> >>> However I found that some probesets have multiple genes according to Affy >>> annotation file. >>> >>> On the other hand, multiple probesets could match to the same gene. >>> Even more, some probesets matched to the same gene could be regulated in >>> different way. >>> >>> So actually what LIMMA gave us is differentially expressed "probesets". >>> >>> Say we have 500 probesets up-regulated but we indeed want to know how >>> many >>> "genes" are up-regulated. >>> >>> I don't know how to reasonably convert the probesets to genes due to the >>> non-one-to-one relationship. >>> >>> What's the convention in the microarray field? >>> >>> Any suggestion would be greatly appreciated. >>> >>> >>> Richie >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> ---------------------------------------------------------------------- >> Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, München D >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Cheng-Yuan, That expression just removes any unmapped entrez IDs (which will be given as an NA), and then also removes any that are duplicated from the list. However, this will only work for that case where your probeset IDs are each mapped to one entrez gene ID each (a many to one relationship between probes and genes). And by default, the annotation packages will only display data for probesets that map like this. And for most probes on most platforms, this will be perfectly adequate. But, if you really want to explore the many to many mappings between some genes and their more ambiguously designed probesets, then you need to look at the help page for the toggleProbes() method in AnnotationDbi. Using this method can allow you to expose these relationships so that you can see the more troublesome probes. Once you have done that, you will be able to see that some probes map to several different genes (a many to many relationship). library(AnnotationDbi) ?toggleProbes Should help a bit if you really want to go there. Let me know if you have further questions, Marc Cheng-Yuan Kao wrote: > Hi, > > We have C. elegans expression data set. > Do you know what exactly [!is.na(entrezIDs) & !duplicated(entrezIDs)] does? > > Thanks. > > Cheng-Yuan > > On Tue, Nov 17, 2009 at 3:34 AM, Yuan Hao <yuan.hao at="" ucd.ie=""> wrote: > > >> Hi Richie, >> >> I am not sure which data set and annotation package you are working on, >> taking hgu133plus2 chip for example. if you just want to get Entrez genes >> corresponding to your probesets, you can simply do: >> >> library("hgu133plus2.db") >> entrezIDs<-unlist(mget(probesets, hgu133plus2ENTREZID)) >> entrezIDs<-entrezIDs[!is.na(entrezIDs) & !duplicated(entrezIDs)] >> >> Hope it is what you want. >> >> Cheers, >> Yuan >> >> >> >> >> On 17 Nov 2009, at 08:15, Tobias Straub wrote: >> >> hi richie, >> >>> one easy way to handle the multiple probesets per gene problem is to keep >>> only the one probeset with the highest variance across replicates. the >>> 'nsFilter' function in the 'genefilter' package provides this operation for >>> ExpressionSet objects. >>> using this filter approach you might of course miss some differentially >>> regulated splicing events. >>> >>> best regards >>> tobias >>> >>> >>> On Nov 17, 2009, at 12:01 AM, Cheng-Yuan Kao wrote: >>> >>> Hi, there, >>> >>>> I have a question regarding Affy chip data. >>>> >>>> We did many expression arrays and used LIMMA to get the differentially >>>> expressed "genes" (control vs treatment). >>>> >>>> However I found that some probesets have multiple genes according to Affy >>>> annotation file. >>>> >>>> On the other hand, multiple probesets could match to the same gene. >>>> Even more, some probesets matched to the same gene could be regulated in >>>> different way. >>>> >>>> So actually what LIMMA gave us is differentially expressed "probesets". >>>> >>>> Say we have 500 probesets up-regulated but we indeed want to know how >>>> many >>>> "genes" are up-regulated. >>>> >>>> I don't know how to reasonably convert the probesets to genes due to the >>>> non-one-to-one relationship. >>>> >>>> What's the convention in the microarray field? >>>> >>>> Any suggestion would be greatly appreciated. >>>> >>>> >>>> Richie >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> >>> ---------------------------------------------------------------------- >>> Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> > > [[alternative HTML version deleted]] > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Thanks a lot. It is very helpful and I will look into the help pages as well. I am wondering how many "gene-based summary" in articles I read are actually "transcript-based summary". Most of these papers report differentially expressed "genes". When I met the first authors and asked them the question I posted here, they told me they did not do the analysis themselves and they don't have the answer for me. Finally, if I am going to do a transcript-based summary, basically I just need to use the differentially-regulated probeset numbers as the differentially-regulated transcript numbers, it this correct? Richie On Tue, Nov 17, 2009 at 10:05 AM, Marc Carlson <mcarlson@fhcrc.org> wrote: > Hi Cheng-Yuan, > > That expression just removes any unmapped entrez IDs (which will be > given as an NA), and then also removes any that are duplicated from the > list. However, this will only work for that case where your probeset > IDs are each mapped to one entrez gene ID each (a many to one > relationship between probes and genes). And by default, the annotation > packages will only display data for probesets that map like this. And > for most probes on most platforms, this will be perfectly adequate. > But, if you really want to explore the many to many mappings between > some genes and their more ambiguously designed probesets, then you need > to look at the help page for the toggleProbes() method in > AnnotationDbi. Using this method can allow you to expose these > relationships so that you can see the more troublesome probes. Once you > have done that, you will be able to see that some probes map to several > different genes (a many to many relationship). > > library(AnnotationDbi) > ?toggleProbes > > Should help a bit if you really want to go there. Let me know if you > have further questions, > > > Marc > > > > Cheng-Yuan Kao wrote: > > Hi, > > > > We have C. elegans expression data set. > > Do you know what exactly [!is.na(entrezIDs) & !duplicated(entrezIDs)] > does? > > > > Thanks. > > > > Cheng-Yuan > > > > On Tue, Nov 17, 2009 at 3:34 AM, Yuan Hao <yuan.hao@ucd.ie> wrote: > > > > > >> Hi Richie, > >> > >> I am not sure which data set and annotation package you are working on, > >> taking hgu133plus2 chip for example. if you just want to get Entrez > genes > >> corresponding to your probesets, you can simply do: > >> > >> library("hgu133plus2.db") > >> entrezIDs<-unlist(mget(probesets, hgu133plus2ENTREZID)) > >> entrezIDs<-entrezIDs[!is.na(entrezIDs) & !duplicated(entrezIDs)] > >> > >> Hope it is what you want. > >> > >> Cheers, > >> Yuan > >> > >> > >> > >> > >> On 17 Nov 2009, at 08:15, Tobias Straub wrote: > >> > >> hi richie, > >> > >>> one easy way to handle the multiple probesets per gene problem is to > keep > >>> only the one probeset with the highest variance across replicates. the > >>> 'nsFilter' function in the 'genefilter' package provides this operation > for > >>> ExpressionSet objects. > >>> using this filter approach you might of course miss some differentially > >>> regulated splicing events. > >>> > >>> best regards > >>> tobias > >>> > >>> > >>> On Nov 17, 2009, at 12:01 AM, Cheng-Yuan Kao wrote: > >>> > >>> Hi, there, > >>> > >>>> I have a question regarding Affy chip data. > >>>> > >>>> We did many expression arrays and used LIMMA to get the differentially > >>>> expressed "genes" (control vs treatment). > >>>> > >>>> However I found that some probesets have multiple genes according to > Affy > >>>> annotation file. > >>>> > >>>> On the other hand, multiple probesets could match to the same gene. > >>>> Even more, some probesets matched to the same gene could be regulated > in > >>>> different way. > >>>> > >>>> So actually what LIMMA gave us is differentially expressed > "probesets". > >>>> > >>>> Say we have 500 probesets up-regulated but we indeed want to know how > >>>> many > >>>> "genes" are up-regulated. > >>>> > >>>> I don't know how to reasonably convert the probesets to genes due to > the > >>>> non-one-to-one relationship. > >>>> > >>>> What's the convention in the microarray field? > >>>> > >>>> Any suggestion would be greatly appreciated. > >>>> > >>>> > >>>> Richie > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@stat.math.ethz.ch > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >>>> > >>> ---------------------------------------------------------------------- > >>> Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, München D > >>> > >>> _______________________________________________ > >>> Bioconductor mailing list > >>> Bioconductor@stat.math.ethz.ch > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> Search the archives: > >>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>> > >>> > >> > > > > [[alternative HTML version deleted]] > > > > > > ------------------------------------------------------------------ ------ > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Cheng-Yuan, Cheng-Yuan Kao wrote: > Hi, > > We have C. elegans expression data set. > Do you know what exactly [!is.na(entrezIDs) & !duplicated(entrezIDs)] does? Yes, and so can you. All you need to do is use the built-in help pages. ?"!" ?is.na ?duplicated Best, Jim > > Thanks. > > Cheng-Yuan > > On Tue, Nov 17, 2009 at 3:34 AM, Yuan Hao <yuan.hao at="" ucd.ie=""> wrote: > >> Hi Richie, >> >> I am not sure which data set and annotation package you are working on, >> taking hgu133plus2 chip for example. if you just want to get Entrez genes >> corresponding to your probesets, you can simply do: >> >> library("hgu133plus2.db") >> entrezIDs<-unlist(mget(probesets, hgu133plus2ENTREZID)) >> entrezIDs<-entrezIDs[!is.na(entrezIDs) & !duplicated(entrezIDs)] >> >> Hope it is what you want. >> >> Cheers, >> Yuan >> >> >> >> >> On 17 Nov 2009, at 08:15, Tobias Straub wrote: >> >> hi richie, >>> one easy way to handle the multiple probesets per gene problem is to keep >>> only the one probeset with the highest variance across replicates. the >>> 'nsFilter' function in the 'genefilter' package provides this operation for >>> ExpressionSet objects. >>> using this filter approach you might of course miss some differentially >>> regulated splicing events. >>> >>> best regards >>> tobias >>> >>> >>> On Nov 17, 2009, at 12:01 AM, Cheng-Yuan Kao wrote: >>> >>> Hi, there, >>>> I have a question regarding Affy chip data. >>>> >>>> We did many expression arrays and used LIMMA to get the differentially >>>> expressed "genes" (control vs treatment). >>>> >>>> However I found that some probesets have multiple genes according to Affy >>>> annotation file. >>>> >>>> On the other hand, multiple probesets could match to the same gene. >>>> Even more, some probesets matched to the same gene could be regulated in >>>> different way. >>>> >>>> So actually what LIMMA gave us is differentially expressed "probesets". >>>> >>>> Say we have 500 probesets up-regulated but we indeed want to know how >>>> many >>>> "genes" are up-regulated. >>>> >>>> I don't know how to reasonably convert the probesets to genes due to the >>>> non-one-to-one relationship. >>>> >>>> What's the convention in the microarray field? >>>> >>>> Any suggestion would be greatly appreciated. >>>> >>>> >>>> Richie >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> ---------------------------------------------------------------------- >>> Dr. Tobias Straub ++4989218075439 Adolf-Butenandt-Institute, M?nchen D >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> > > [[alternative HTML version deleted]] > > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826
ADD REPLY

Login before adding your answer.

Traffic: 407 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6