Affy: probeset to gene expression with expresso

0

Entering edit mode

Martin Preusse ▴ 50

@martin-preusse-6088

Last seen 9.6 years ago

I am trying to get the gene level expression values from an Affy micro array, i.e. merge the values for probe sets representing the same gene. I tried to use the 'expresso' function from the affy package, but I always end up with an ExpressionSet containing probe sets, not genes. What is an easy way to summarize/merge probe sets to (entrez) genes? library(affydata) library(affy) # get the 'Dilution' affy batch data(Dilution) eset <- expresso(Dilution, bgcorrect.method='rma', normalize.method='constant', pmcorrect.method='pmonly', summary.method='avgdiff') write.exprs(eset,'testfile.txt') P.S.: I know it might not be the best idea to average probe sets, but I would like to try ;) Cheers Martin

probe affy probe affy • 2.7k views

ADD COMMENT • link updated 10.7 years ago by James W. MacDonald 65k • written 10.7 years ago by Martin Preusse ▴ 50

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 3 minutes ago

United States

Hi Martin, I just answered a very closely related question. See if this helps: https://stat.ethz.ch/pipermail/bioconductor/2013-August/054353.html Best, Jim On 8/13/2013 9:47 AM, Martin Preusse wrote: > I am trying to get the gene level expression values from an Affy micro array, i.e. merge the values for probe sets representing the same gene. > > I tried to use the 'expresso' function from the affy package, but I always end up with an ExpressionSet containing probe sets, not genes. > > What is an easy way to summarize/merge probe sets to (entrez) genes? > > > library(affydata) > library(affy) > > # get the 'Dilution' affy batch > data(Dilution) > > eset<- expresso(Dilution, bgcorrect.method='rma', > normalize.method='constant', > pmcorrect.method='pmonly', > summary.method='avgdiff') > > > write.exprs(eset,'testfile.txt') > > > P.S.: I know it might not be the best idea to average probe sets, but I would like to try ;) > > Cheers > Martin > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 10.7 years ago James W. MacDonald 65k

0

Entering edit mode

Hi, Thank you Jim. Can I ask, I have always averaged the expressions and they completed pathway analysis for the genes rather than the probes. Do you consider it better to leave it as individual probes and assess individual expression at the pathway level? I'm torn as to which is the best approach, Thanks, Helen -----Original Message----- From: bioconductor-bounces@r-project.org [mailto:bioconductor- bounces@r-project.org] On Behalf Of James W. MacDonald Sent: 13 August 2013 16:28 To: Martin Preusse Cc: bioconductor at r-project.org Subject: Re: [BioC] Affy: probeset to gene expression with expresso Hi Martin, I just answered a very closely related question. See if this helps: https://stat.ethz.ch/pipermail/bioconductor/2013-August/054353.html Best, Jim On 8/13/2013 9:47 AM, Martin Preusse wrote: > I am trying to get the gene level expression values from an Affy micro array, i.e. merge the values for probe sets representing the same gene. > > I tried to use the 'expresso' function from the affy package, but I always end up with an ExpressionSet containing probe sets, not genes. > > What is an easy way to summarize/merge probe sets to (entrez) genes? > > > library(affydata) > library(affy) > > # get the 'Dilution' affy batch > data(Dilution) > > eset<- expresso(Dilution, bgcorrect.method='rma', > normalize.method='constant', pmcorrect.method='pmonly', > summary.method='avgdiff') > > > write.exprs(eset,'testfile.txt') > > > P.S.: I know it might not be the best idea to average probe sets, but > I would like to try ;) > > Cheers > Martin > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099 _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.7 years ago Helen Smith ▴ 100

0

Entering edit mode

I am trying to figure out the same. There are ENDLESS publications dealing with exactly this topic. Obviously, different probes bind to different parts of the transcript. So they might represent different transcripts of the same gene or genomic locus. Maybe a mapping to transcript instead of gene is more useful. Another issue is that not all probes bind to the transcript with the same affinity. Some probes might even be pure noise. So if you average all of them the noise could cancel the signal from the more useful probes. I try to dig deeper into this, but there is to much stuff published ? does one of you have tips for good papers/reviews? Or maybe good books that help getting into microarray analysis? Martin Am Dienstag, 13. August 2013 um 17:49 schrieb Helen Smith: > Hi, > > Thank you Jim. > > Can I ask, I have always averaged the expressions and they completed pathway analysis for the genes rather than the probes. Do you consider it better to leave it as individual probes and assess individual expression at the pathway level? > I'm torn as to which is the best approach, > > Thanks, > Helen > > -----Original Message----- > From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of James W. MacDonald > Sent: 13 August 2013 16:28 > To: Martin Preusse > Cc: bioconductor at r-project.org (mailto:bioconductor at r-project.org) > Subject: Re: [BioC] Affy: probeset to gene expression with expresso > > Hi Martin, > > I just answered a very closely related question. See if this helps: > > https://stat.ethz.ch/pipermail/bioconductor/2013-August/054353.html > > Best, > > Jim > > > > On 8/13/2013 9:47 AM, Martin Preusse wrote: > > I am trying to get the gene level expression values from an Affy micro array, i.e. merge the values for probe sets representing the same gene. > > > > I tried to use the 'expresso' function from the affy package, but I always end up with an ExpressionSet containing probe sets, not genes. > > > > What is an easy way to summarize/merge probe sets to (entrez) genes? > > > > > > library(affydata) > > library(affy) > > > > # get the 'Dilution' affy batch > > data(Dilution) > > > > eset<- expresso(Dilution, bgcorrect.method='rma', > > normalize.method='constant', pmcorrect.method='pmonly', > > summary.method='avgdiff') > > > > > > write.exprs(eset,'testfile.txt') > > > > > > P.S.: I know it might not be the best idea to average probe sets, but > > I would like to try ;) > > > > Cheers > > Martin > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 10.7 years ago Martin Preusse ▴ 50

0

Entering edit mode

Hi Jim, thanks, this is a very clever (and R like) way to average the expressions on the expression matrix. In my example this would work for 'exprs(eset)' when I replace the probe set ids with gene symbols. Still, it would be great to know if can be achieved with a convenience function (e.g. expresso). Martin Am Dienstag, 13. August 2013 um 17:28 schrieb James W. MacDonald: > Hi Martin, > > I just answered a very closely related question. See if this helps: > > https://stat.ethz.ch/pipermail/bioconductor/2013-August/054353.html > > Best, > > Jim > > > > On 8/13/2013 9:47 AM, Martin Preusse wrote: > > I am trying to get the gene level expression values from an Affy micro array, i.e. merge the values for probe sets representing the same gene. > > > > I tried to use the 'expresso' function from the affy package, but I always end up with an ExpressionSet containing probe sets, not genes. > > > > What is an easy way to summarize/merge probe sets to (entrez) genes? > > > > > > library(affydata) > > library(affy) > > > > # get the 'Dilution' affy batch > > data(Dilution) > > > > eset<- expresso(Dilution, bgcorrect.method='rma', > > normalize.method='constant', > > pmcorrect.method='pmonly', > > summary.method='avgdiff') > > > > > > write.exprs(eset,'testfile.txt') > > > > > > P.S.: I know it might not be the best idea to average probe sets, but I would like to try ;) > > > > Cheers > > Martin > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099

ADD REPLY • link 10.7 years ago Martin Preusse ▴ 50

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 3 minutes ago

United States

Hi Martin, None of the functions in the affy package know anything about the annotation of the various probesets, so there is no facility there to summarize at anything but the probeset level. However, you could use one of the MBNI remapped probesets, which pre-aggregate the probes into probesets based on a few different annotation databases. Best, Jim On 8/13/2013 11:38 AM, Martin Preusse wrote: > Hi Jim, > > thanks, this is a very clever (and R like) way to average the expressions on the expression matrix. In my example this would work for 'exprs(eset)' when I replace the probe set ids with gene symbols. > > Still, it would be great to know if can be achieved with a convenience function (e.g. expresso). > > Martin > > > Am Dienstag, 13. August 2013 um 17:28 schrieb James W. MacDonald: > >> Hi Martin, >> >> I just answered a very closely related question. See if this helps: >> >> https://stat.ethz.ch/pipermail/bioconductor/2013-August/054353.html >> >> Best, >> >> Jim >> >> >> >> On 8/13/2013 9:47 AM, Martin Preusse wrote: >>> I am trying to get the gene level expression values from an Affy micro array, i.e. merge the values for probe sets representing the same gene. >>> >>> I tried to use the 'expresso' function from the affy package, but I always end up with an ExpressionSet containing probe sets, not genes. >>> >>> What is an easy way to summarize/merge probe sets to (entrez) genes? >>> >>> >>> library(affydata) >>> library(affy) >>> >>> # get the 'Dilution' affy batch >>> data(Dilution) >>> >>> eset<- expresso(Dilution, bgcorrect.method='rma', >>> normalize.method='constant', >>> pmcorrect.method='pmonly', >>> summary.method='avgdiff') >>> >>> >>> write.exprs(eset,'testfile.txt') >>> >>> >>> P.S.: I know it might not be the best idea to average probe sets, but I would like to try ;) >>> >>> Cheers >>> Martin >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 10.7 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks! This clarifies everything ;) I misunderstood the 'summary' part of expresso. Am Dienstag, 13. August 2013 um 17:41 schrieb James W. MacDonald: > Hi Martin, > > None of the functions in the affy package know anything about the > annotation of the various probesets, so there is no facility there to > summarize at anything but the probeset level. > > However, you could use one of the MBNI remapped probesets, which > pre-aggregate the probes into probesets based on a few different > annotation databases. > > Best, > > Jim > > > > On 8/13/2013 11:38 AM, Martin Preusse wrote: > > Hi Jim, > > > > thanks, this is a very clever (and R like) way to average the expressions on the expression matrix. In my example this would work for 'exprs(eset)' when I replace the probe set ids with gene symbols. > > > > Still, it would be great to know if can be achieved with a convenience function (e.g. expresso). > > > > Martin > > > > > > Am Dienstag, 13. August 2013 um 17:28 schrieb James W. MacDonald: > > > > > Hi Martin, > > > > > > I just answered a very closely related question. See if this helps: > > > > > > https://stat.ethz.ch/pipermail/bioconductor/2013-August/054353.html > > > > > > Best, > > > > > > Jim > > > > > > > > > > > > On 8/13/2013 9:47 AM, Martin Preusse wrote: > > > > I am trying to get the gene level expression values from an Affy micro array, i.e. merge the values for probe sets representing the same gene. > > > > > > > > I tried to use the 'expresso' function from the affy package, but I always end up with an ExpressionSet containing probe sets, not genes. > > > > > > > > What is an easy way to summarize/merge probe sets to (entrez) genes? > > > > > > > > > > > > library(affydata) > > > > library(affy) > > > > > > > > # get the 'Dilution' affy batch > > > > data(Dilution) > > > > > > > > eset<- expresso(Dilution, bgcorrect.method='rma', > > > > normalize.method='constant', > > > > pmcorrect.method='pmonly', > > > > summary.method='avgdiff') > > > > > > > > > > > > write.exprs(eset,'testfile.txt') > > > > > > > > > > > > P.S.: I know it might not be the best idea to average probe sets, but I would like to try ;) > > > > > > > > Cheers > > > > Martin > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > -- > > > James W. MacDonald, M.S. > > > Biostatistician > > > University of Washington > > > Environmental and Occupational Health Sciences > > > 4225 Roosevelt Way NE, # 100 > > > Seattle WA 98105-6099 > > > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099

ADD REPLY • link 10.7 years ago Martin Preusse ▴ 50

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 3 minutes ago

United States

I don't think there is an answer to these questions. Well, I think there might be several hundred or maybe thousands of answers (e.g., for each gene that is measured more than once there might be something reasonable to do, based on what the duplicates are measuring), but we can only do things in aggregate, and I don't think there is a simple solution that can be applied en mass to all duplicated transcripts without making pretty strong assumptions. Because of this, I tend to default to the status quo and just report probeset level data because I don't have any idea what the 'right' thing to do is. Best, Jim On 8/13/2013 12:03 PM, Martin Preusse wrote: > I am trying to figure out the same. There are ENDLESS publications dealing with exactly this topic. > > Obviously, different probes bind to different parts of the transcript. So they might represent different transcripts of the same gene or genomic locus. > > Maybe a mapping to transcript instead of gene is more useful. Another issue is that not all probes bind to the transcript with the same affinity. Some probes might even be pure noise. So if you average all of them the noise could cancel the signal from the more useful probes. > > I try to dig deeper into this, but there is to much stuff published ? does one of you have tips for good papers/reviews? Or maybe good books that help getting into microarray analysis? > > Martin > > > Am Dienstag, 13. August 2013 um 17:49 schrieb Helen Smith: > >> Hi, >> >> Thank you Jim. >> >> Can I ask, I have always averaged the expressions and they completed pathway analysis for the genes rather than the probes. Do you consider it better to leave it as individual probes and assess individual expression at the pathway level? >> I'm torn as to which is the best approach, >> >> Thanks, >> Helen >> >> -----Original Message----- >> From: bioconductor-bounces at r-project.org [mailto:bioconductor- bounces at r-project.org] On Behalf Of James W. MacDonald >> Sent: 13 August 2013 16:28 >> To: Martin Preusse >> Cc: bioconductor at r-project.org (mailto:bioconductor at r-project.org) >> Subject: Re: [BioC] Affy: probeset to gene expression with expresso >> >> Hi Martin, >> >> I just answered a very closely related question. See if this helps: >> >> https://stat.ethz.ch/pipermail/bioconductor/2013-August/054353.html >> >> Best, >> >> Jim >> >> >> >> On 8/13/2013 9:47 AM, Martin Preusse wrote: >>> I am trying to get the gene level expression values from an Affy micro array, i.e. merge the values for probe sets representing the same gene. >>> >>> I tried to use the 'expresso' function from the affy package, but I always end up with an ExpressionSet containing probe sets, not genes. >>> >>> What is an easy way to summarize/merge probe sets to (entrez) genes? >>> >>> >>> library(affydata) >>> library(affy) >>> >>> # get the 'Dilution' affy batch >>> data(Dilution) >>> >>> eset<- expresso(Dilution, bgcorrect.method='rma', >>> normalize.method='constant', pmcorrect.method='pmonly', >>> summary.method='avgdiff') >>> >>> >>> write.exprs(eset,'testfile.txt') >>> >>> >>> P.S.: I know it might not be the best idea to average probe sets, but >>> I would like to try ;) >>> >>> Cheers >>> Martin >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 10.7 years ago James W. MacDonald 65k

0

Entering edit mode

True ? ;) There a lot of algorithms (i.e. papers) trying to evaluate each probeset and answer this question. E.g. this one: Jetset: selecting the optimal microarray probe set to represent a gene http://www.biomedcentral.com/1471-2105/12/474 Unfortunately, as it is so often in bioinformatics, there are a lot of papers ? and no validation or comparison between them. Martin Am Dienstag, 13. August 2013 um 19:47 schrieb James W. MacDonald: > I don't think there is an answer to these questions. Well, I think there > might be several hundred or maybe thousands of answers (e.g., for each > gene that is measured more than once there might be something reasonable > to do, based on what the duplicates are measuring), but we can only do > things in aggregate, and I don't think there is a simple solution that > can be applied en mass to all duplicated transcripts without making > pretty strong assumptions. > > Because of this, I tend to default to the status quo and just report > probeset level data because I don't have any idea what the 'right' thing > to do is. > > Best, > > Jim > > > > On 8/13/2013 12:03 PM, Martin Preusse wrote: > > I am trying to figure out the same. There are ENDLESS publications dealing with exactly this topic. > > > > Obviously, different probes bind to different parts of the transcript. So they might represent different transcripts of the same gene or genomic locus. > > > > Maybe a mapping to transcript instead of gene is more useful. Another issue is that not all probes bind to the transcript with the same affinity. Some probes might even be pure noise. So if you average all of them the noise could cancel the signal from the more useful probes. > > > > I try to dig deeper into this, but there is to much stuff published ? does one of you have tips for good papers/reviews? Or maybe good books that help getting into microarray analysis? > > > > Martin > > > > > > Am Dienstag, 13. August 2013 um 17:49 schrieb Helen Smith: > > > > > Hi, > > > > > > Thank you Jim. > > > > > > Can I ask, I have always averaged the expressions and they completed pathway analysis for the genes rather than the probes. Do you consider it better to leave it as individual probes and assess individual expression at the pathway level? > > > I'm torn as to which is the best approach, > > > > > > Thanks, > > > Helen > > > > > > -----Original Message----- > > > From: bioconductor-bounces at r-project.org [mailto :bioconductor-bounces at r-project.org] On Behalf Of James W. MacDonald > > > Sent: 13 August 2013 16:28 > > > To: Martin Preusse > > > Cc: bioconductor at r-project.org (mailto:bioconductor at r-project.org) > > > Subject: Re: [BioC] Affy: probeset to gene expression with expresso > > > > > > Hi Martin, > > > > > > I just answered a very closely related question. See if this helps: > > > > > > https://stat.ethz.ch/pipermail/bioconductor/2013-August/054353.html > > > > > > Best, > > > > > > Jim > > > > > > > > > > > > On 8/13/2013 9:47 AM, Martin Preusse wrote: > > > > I am trying to get the gene level expression values from an Affy micro array, i.e. merge the values for probe sets representing the same gene. > > > > > > > > I tried to use the 'expresso' function from the affy package, but I always end up with an ExpressionSet containing probe sets, not genes. > > > > > > > > What is an easy way to summarize/merge probe sets to (entrez) genes? > > > > > > > > > > > > library(affydata) > > > > library(affy) > > > > > > > > # get the 'Dilution' affy batch > > > > data(Dilution) > > > > > > > > eset<- expresso(Dilution, bgcorrect.method='rma', > > > > normalize.method='constant', pmcorrect.method='pmonly', > > > > summary.method='avgdiff') > > > > > > > > > > > > write.exprs(eset,'testfile.txt') > > > > > > > > > > > > P.S.: I know it might not be the best idea to average probe sets, but > > > > I would like to try ;) > > > > > > > > Cheers > > > > Martin > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > > > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > > -- > > > James W. MacDonald, M.S. > > > Biostatistician > > > University of Washington > > > Environmental and Occupational Health Sciences > > > 4225 Roosevelt Way NE, # 100 > > > Seattle WA 98105-6099 > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org (mailto:Bioconductor at r-project.org) > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099

ADD REPLY • link 10.7 years ago Martin Preusse ▴ 50

Login before adding your answer.