RMA normalization and MAS5.0 detection calls

0

Entering edit mode

haiyan wu ▴ 40

@haiyan-wu-1953

Last seen 9.6 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20061207/ a91ebb1d/attachment.pl

• 1.7k views

ADD COMMENT • link updated 17.4 years ago by Naomi Altman ★ 6.0k • written 17.4 years ago by haiyan wu ▴ 40

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

Hi Haiyan, haiyan wu wrote: > Hi,all > > I'm using Bioconductor for analyze some Affymetrix Genechip.When I use RMA > to normalize probe sets,it give no info for whether the probe sets present > or absent. > So I get these info from MAS5 detection calls.But in some case,the DE > probes sets which was selected seems absent .On other situation for contrast > different treatment, > some probe sets presnt in treatment1 and absent in treatment2, but limma > give me a conclusion that this probe sets have no changed between these 2 > treatment. > How can I solve this problem? Is it right only using RMA value for limma and > igore present/absent calls? There are many ways to approach an analysis, and I don't think there is any objective way to determine which is the best way. Some people do just what you have done, computing expression values using RMA or GCRMA and then using P/A calls to filter out those they think are not expressed. The rationale for doing this is that the MM probes give a reasonable estimate of background for the majority of the probes on a given chip, so if there is no statistical difference between PM and MM, then you might be able to consider that gene unexpressed. However, RMA does not make use of the MM probes at all, and GCRMA only uses the MM data in aggregate (rather than a probe-by-probe fashion), so it is not surprising that you get the results you mention for some probesets. Another way to approach filtering probesets is based on the variability of the probesets over all samples. If the variance is low (below some constant c), then you might assume that the gene is not differentially expressed in any samples (which is different than saying it is expressed or not). These genes are uninteresting by definition, and can be removed from the dataset. HTH, Jim > > > Regards! > > > haiyan > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD COMMENT • link 17.4 years ago James W. MacDonald 65k

0

Entering edit mode

Hi all, Actually, the problem of the statistical results not always making sense with the P/A calls also happens with MAS5 values, not RMA or GCRMA specifically. Many years ago I as using Affy's two sample comparison in the MAS 5.0 software, and I noticed that a probeset would be called "A" in sample 1 and "P" in sample 2, but supposedly sample 2 had higher expression than sample 1!! The calls and the expression level comparisons sometimes don't correspond, but this is because they use different algorithms and values in their computation, as Jim explained. I tend to like and use the calls in a conservative matter, but they may only be about 85% accurate (Choe et al. Genome Biology 2005, 6:R16). >Another way to approach filtering probesets is based on the variability >of the probesets over all samples. If the variance is low (below some >constant c), then you might assume that the gene is not differentially >expressed in any samples (which is different than saying it is expressed >or not). These genes are uninteresting by definition, and can be removed >from the dataset. I still haven't convinced myself that I like this approach. And wouldn't it be better to filter on CV, which takes into account expression level, rather than variance? I know there was a recent exchange on what sort of cutoff value to use... I really need to find the time to play around with filtering on some aspect of variability - unless something has been published on it? Cheers, Jenny >HTH, > >Jim > > > > > > > > Regards! > > > > > > haiyan > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > >-- >James W. MacDonald, M.S. >Biostatistician >Affymetrix and cDNA Microarray Core >University of Michigan Cancer Center >1500 E. Medical Center Drive >7410 CCGC >Ann Arbor MI 48109 >734-647-5623 > > >********************************************************** >Electronic Mail is not secure, may not be read every day, and should not >be used for urgent or sensitive issues. > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Jenny Drnevich, Ph.D. Functional Genomics Bioinformatics Specialist W.M. Keck Center for Comparative and Functional Genomics Roy J. Carver Biotechnology Center University of Illinois, Urbana-Champaign 330 ERML 1201 W. Gregory Dr. Urbana, IL 61801 USA ph: 217-244-7355 fax: 217-265-5066 e-mail: drnevich at uiuc.edu

ADD REPLY • link 17.4 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Hi Jenny, Jenny Drnevich wrote: > Hi all, > > Actually, the problem of the statistical results not always making sense > with the P/A calls also happens with MAS5 values, not RMA or GCRMA > specifically. Many years ago I as using Affy's two sample comparison in > the MAS 5.0 software, and I noticed that a probeset would be called "A" > in sample 1 and "P" in sample 2, but supposedly sample 2 had higher > expression than sample 1!! The calls and the expression level > comparisons sometimes don't correspond, but this is because they use > different algorithms and values in their computation, as Jim explained. > I tend to like and use the calls in a conservative matter, but they may > only be about 85% accurate (Choe et al. Genome Biology 2005, 6:R16). > >> Another way to approach filtering probesets is based on the variability >> of the probesets over all samples. If the variance is low (below some >> constant c), then you might assume that the gene is not differentially >> expressed in any samples (which is different than saying it is expressed >> or not). These genes are uninteresting by definition, and can be removed >> from the dataset. > > > I still haven't convinced myself that I like this approach. And wouldn't > it be better to filter on CV, which takes into account expression level, > rather than variance? I know there was a recent exchange on what sort of > cutoff value to use... I really need to find the time to play around > with filtering on some aspect of variability - unless something has been > published on it? I think you want to use CV for data that show a mean/variance dependence. With RMA (and I suppose GCRMA) values, most of the dependence has been decoupled by taking logs. For instance, a plot of mean expression vs variance usually shows nearly constant variance except at the tails, where the variance appears to go down precipitously (which CV won't affect anyway). Best, Jim > > Cheers, > Jenny > > > > >> HTH, >> >> Jim >> >> >> > >> > >> > Regards! >> > >> > >> > haiyan >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Affymetrix and cDNA Microarray Core >> University of Michigan Cancer Center >> 1500 E. Medical Center Drive >> 7410 CCGC >> Ann Arbor MI 48109 >> 734-647-5623 >> >> >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should >> not be used for urgent or sensitive issues. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at uiuc.edu -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 17.4 years ago James W. MacDonald 65k

0

Entering edit mode

Hi all Regardless of whichever approach is being used, a cut-off needs to be applied. What sort of numbers do you use as your thresholds for variance or CV? What %age (roughly) of probesets does it exclude? I have to admit using simple RMA signal of 150 (again another arbitrary figure) for the higher of the 2 means in the group, and have found that this eliminates a lot of noise Regards Rizwan Dr Rizwan Sarwar Physiological Genomics & Medicine MRC-Clinical Sciences Centre Imperial College London (Hammersmith Campus) Du Cane Rd W12 0NN -----Original Message----- From: James W. MacDonald [mailto:jmacdon@med.umich.edu] Sent: 07 December 2006 16:19 To: Jenny Drnevich Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] RMA normalization and MAS5.0 detection calls Hi Jenny, Jenny Drnevich wrote: > Hi all, > > Actually, the problem of the statistical results not always making sense > with the P/A calls also happens with MAS5 values, not RMA or GCRMA > specifically. Many years ago I as using Affy's two sample comparison in > the MAS 5.0 software, and I noticed that a probeset would be called "A" > in sample 1 and "P" in sample 2, but supposedly sample 2 had higher > expression than sample 1!! The calls and the expression level > comparisons sometimes don't correspond, but this is because they use > different algorithms and values in their computation, as Jim explained. > I tend to like and use the calls in a conservative matter, but they may > only be about 85% accurate (Choe et al. Genome Biology 2005, 6:R16). > >> Another way to approach filtering probesets is based on the variability >> of the probesets over all samples. If the variance is low (below some >> constant c), then you might assume that the gene is not differentially >> expressed in any samples (which is different than saying it is expressed >> or not). These genes are uninteresting by definition, and can be removed >> from the dataset. > > > I still haven't convinced myself that I like this approach. And wouldn't > it be better to filter on CV, which takes into account expression level, > rather than variance? I know there was a recent exchange on what sort of > cutoff value to use... I really need to find the time to play around > with filtering on some aspect of variability - unless something has been > published on it? I think you want to use CV for data that show a mean/variance dependence. With RMA (and I suppose GCRMA) values, most of the dependence has been decoupled by taking logs. For instance, a plot of mean expression vs variance usually shows nearly constant variance except at the tails, where the variance appears to go down precipitously (which CV won't affect anyway). Best, Jim > > Cheers, > Jenny > > > > >> HTH, >> >> Jim >> >> >> > >> > >> > Regards! >> > >> > >> > haiyan >> > >> > [[alternative HTML version deleted]] >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Affymetrix and cDNA Microarray Core >> University of Michigan Cancer Center >> 1500 E. Medical Center Drive >> 7410 CCGC >> Ann Arbor MI 48109 >> 734-647-5623 >> >> >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should >> not be used for urgent or sensitive issues. >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > Jenny Drnevich, Ph.D. > > Functional Genomics Bioinformatics Specialist > W.M. Keck Center for Comparative and Functional Genomics > Roy J. Carver Biotechnology Center > University of Illinois, Urbana-Champaign > > 330 ERML > 1201 W. Gregory Dr. > Urbana, IL 61801 > USA > > ph: 217-244-7355 > fax: 217-265-5066 > e-mail: drnevich at uiuc.edu -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues.

ADD REPLY • link 17.4 years ago Sarwar, Rizwan ▴ 30

0

Entering edit mode

On Friday 08 December 2006 07:06, Sarwar, Rizwan wrote: > Hi all > > Regardless of whichever approach is being used, a cut-off needs to be > applied. What sort of numbers do you use as your thresholds for variance > or CV? What %age (roughly) of probesets does it exclude? A similar discussion happened just a few days ago on this list. You might want to refer to the list archives for details: http://article.gmane.org/gmane.science.biology.informatics.conductor/1 0788 Sean

ADD REPLY • link 17.4 years ago Sean Davis 21k

0

Entering edit mode

I just noticed that what I wrote before was the opposite of what I meant: >Actually, the problem of the statistical results not always making sense >with the P/A calls also happens with MAS5 values, not RMA or GCRMA >specifically. Many years ago I as using Affy's two sample comparison in the >MAS 5.0 software, and I noticed that a probeset would be called "A" in >sample 1 and "P" in sample 2, but supposedly sample 2 had higher expression >than sample 1!! It should read sample 1 = A, sample 2 = P, but sample 1 > sample 2. Otherwise, why would I think it made no sense? :) Have a good day! Jenny >The calls and the expression level comparisons sometimes >don't correspond, but this is because they use different algorithms and >values in their computation, as Jim explained. I tend to like and use the >calls in a conservative matter, but they may only be about 85% accurate >(Choe et al. Genome Biology 2005, 6:R16). > > >Another way to approach filtering probesets is based on the variability > >of the probesets over all samples. If the variance is low (below some > >constant c), then you might assume that the gene is not differentially > >expressed in any samples (which is different than saying it is expressed > >or not). These genes are uninteresting by definition, and can be removed > >from the dataset. > >I still haven't convinced myself that I like this approach. And wouldn't it >be better to filter on CV, which takes into account expression level, >rather than variance? I know there was a recent exchange on what sort of >cutoff value to use... I really need to find the time to play around with >filtering on some aspect of variability - unless something has been >published on it? > >Cheers, >Jenny > > > > > >HTH, > > > >Jim > > > > > > > > > > > > > Regards! > > > > > > > > > haiyan > > > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at stat.math.ethz.ch > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > >-- > >James W. MacDonald, M.S. > >Biostatistician > >Affymetrix and cDNA Microarray Core > >University of Michigan Cancer Center > >1500 E. Medical Center Drive > >7410 CCGC > >Ann Arbor MI 48109 > >734-647-5623 > > > > > >********************************************************** > >Electronic Mail is not secure, may not be read every day, and should not > >be used for urgent or sensitive issues. > > > >_______________________________________________ > >Bioconductor mailing list > >Bioconductor at stat.math.ethz.ch > >https://stat.ethz.ch/mailman/listinfo/bioconductor > >Search the archives: > >http://news.gmane.org/gmane.science.biology.informatics.conductor > >Jenny Drnevich, Ph.D. > >Functional Genomics Bioinformatics Specialist >W.M. Keck Center for Comparative and Functional Genomics >Roy J. Carver Biotechnology Center >University of Illinois, Urbana-Champaign > >330 ERML >1201 W. Gregory Dr. >Urbana, IL 61801 >USA > >ph: 217-244-7355 >fax: 217-265-5066 >e-mail: drnevich at uiuc.edu > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD REPLY • link 17.4 years ago Jenny Drnevich ★ 2.2k

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 3.0 years ago

United States

If the probeset is mainly absent in 1 condition and mainly present in another, then I think you would not want to filter these out, since this seems to be an important gene for understanding the difference in conditions. --Naomi At 08:25 AM 12/7/2006, haiyan wu wrote: >Hi,all > >I'm using Bioconductor for analyze some Affymetrix Genechip.When I use RMA >to normalize probe sets,it give no info for whether the probe sets present >or absent. >So I get these info from MAS5 detection calls.But in some case,the DE >probes sets which was selected seems absent .On other situation for contrast >different treatment, >some probe sets presnt in treatment1 and absent in treatment2, but limma >give me a conclusion that this probe sets have no changed between these 2 >treatment. >How can I solve this problem? Is it right only using RMA value for limma and >igore present/absent calls? > > >Regards! > > >haiyan > > [[alternative HTML version deleted]] > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: >http://news.gmane.org/gmane.science.biology.informatics.conductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

ADD COMMENT • link 17.4 years ago Naomi Altman ★ 6.0k

Login before adding your answer.