P calls (VSN and RMA)
4
0
Entering edit mode
Isaac Neuhaus ▴ 360
@isaac-neuhaus-22
Last seen 8.9 years ago
United States
We agree with Rafael that using algorithms such as VSN and RMA greatly reduces the noise at the lower intensities (hurray!). However, there is still the biological fact that a lot of the probesets on any chip assay genes that are not expressed in the cell type being tested. Incorporating the null measurements for these probesets greatly affects any kind of multiple test corrections applied in analysis of the data. For this reason, we would also really like to filter out these unexpressed genes. One way is simply using an intensity cut, but this seems arbitrary, unless it is genuinely linked to detection ability of the technology. Does anyone have comments or ideas? Petra and Isaac Rafael A. Irizarry wrote: >in my opinionm the main reason affy uses these is because MAS 5.0 has so >much noise at "the bottom". if they didnt, all their large fold changes >would be for genes with low expression. with RMA, what i use, you dont >have this problem so i dont see the need to throw away information. there >are other ways to get rid of the "noise" at the bottom: dChip (pm- only) >and vsn are two exaples. > >There is no designated place to stick them into exprSet. you could create >another exprSet just for these or since MAS doesnt give SEs, >you could stick them in the se.exprs slot. a better (but you need to code >some) solution is to extend the exprSet class to a new class that includes >a slot for these calls. > >On Mon, 17 Mar 2003, Stephen Henderson wrote: > > > >>I'd like to use the Present (P) and Absent (A) calls for some rudimentary >>filtering of data prior to analysis. Is there an appropriate slot for >>inserting P and A calls within the exprSet object? >> >>I'd like to garner opinions: Does anyone else use these , think them >>worthwhile, or perhaps use some other surrogate? >> >> >>******************************************************************** ** >>This email and any files transmitted with it are confidential an... [[dropped]] >> >>_______________________________________________ >>Bioconductor mailing list >>Bioconductor@stat.math.ethz.ch >>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >> >> >> > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor > > [[alternative HTML version deleted]]
affy vsn affy vsn • 1.5k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 17 days ago
EMBL European Molecular Biology Laborat…
Hi Isaac and Petra, On Fri, 9 Jan 2004, Isaac Neuhaus wrote: > still the biological fact that a lot of the probesets on any chip assay > genes that are not expressed in the cell type being tested. > Incorporating the null measurements for these probesets greatly affects > any kind of multiple test corrections applied in analysis of the data. The P/A calls from Affymetrix are trying to decide on the presence or absence of a gene in a single sample. For the kind of statistical analyses you are talking about, genes that are absent in some samples and present in others may still be very interesting, maybe the most interesting ones. On the other hand side, to reduce the multiple testing problem, it is good to throw out genes that never do anything. However, this is necessarily a trade-off between false positives and false negatives, and the decision should, among other things, involve the costs of false positives and false negatives, respectively. In my opinion, the P/A calls are so popular mostly because they give a false sense of objectivity and simplicity. ...but, as I said, that's just an opinion. Best wishes Wolfgang ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/abt0840/whuber
ADD COMMENT
0
Entering edit mode
I have also been working on this problem. I compared the Affy "present" calls, and calls based on various levels of normalized expression. Needless to say, these do not match well. In our study, it is known that some genes do express at very low levels in one of our conditions, and do not express under the other conditions. These genes were declared "not present" in all conditions both by Affy and by our (admittedly arbitrary) cut point (which was 50). I did a gene-by-gene ANOVA (which included all genes, even if "absent"). Interestingly enough, a few of these genes had a statistically significant ANOVA F-test and a look at the expression values confirmed that this was due to much higher expression values (2-fold or more) in the known condition. This seemed to me to indicate that perhaps we ought to consider lowering the cut point. However, if we do this, we also include a lot more genes that appear (by RT-PCR) to really be absent. So, now I wonder if I can use the ANOVA to provide information about when a gene is present. I appreciate this discussion, because it is an important issue for the group of biologists I work with. --Naomi At 04:39 PM 1/9/2004, w.huber@dkfz-heidelberg.de wrote: >Hi Isaac and Petra, > >On Fri, 9 Jan 2004, Isaac Neuhaus wrote: > > still the biological fact that a lot of the probesets on any chip assay > > genes that are not expressed in the cell type being tested. > > Incorporating the null measurements for these probesets greatly affects > > any kind of multiple test corrections applied in analysis of the data. > >The P/A calls from Affymetrix are trying to decide on the presence or >absence of a gene in a single sample. For the kind of statistical analyses >you are talking about, genes that are absent in some samples and present >in others may still be very interesting, maybe the most interesting ones. >On the other hand side, to reduce the multiple testing problem, it is good >to throw out genes that never do anything. However, this is necessarily a >trade-off between false positives and false negatives, and the decision >should, among other things, involve the costs of false positives and false >negatives, respectively. In my opinion, the P/A calls are so popular >mostly because they give a false sense of objectivity and simplicity. >...but, as I said, that's just an opinion. > >Best wishes > Wolfgang > > > >------------------------------------- >Wolfgang Huber >Division of Molecular Genome Analysis >German Cancer Research Center >Heidelberg, Germany >Phone: +49 6221 424709 >Fax: +49 6221 42524709 >Http: www.dkfz.de/abt0840/whuber > >_______________________________________________ >Bioconductor mailing list >Bioconductor@stat.math.ethz.ch >https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
ADD REPLY
0
Entering edit mode
@wolfgang-huber-3550
Last seen 17 days ago
EMBL European Molecular Biology Laborat…
> ... One way is simply using an intensity cut, but this > seems arbitrary, unless it is genuinely linked to detection ability of > the technology. Does anyone have comments or ideas? Btw, there is a very neat paper by Felix Naef that explains why it can be that for some probes the MM signal is higher than the PM signal - even if the gene is expressed and no other interfering transcripts are around. >From this it follows that a proper interpretation of MMs is rather more complex than in MAS 5.0 (and I am sure Rafael & Zhijin Wu would have much more to say about this). Naef F, Magnasco MO. Solving the riddle of the bright mismatches: labeling and effective binding in oligonucleotide arrays. Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Jul;68(1 Pt 1):011906. PMID: 12935175 Best wishes Wolfgang
ADD COMMENT
0
Entering edit mode
i agree with wolfgang. to reiterate, what is very important is that you remember that due to uncertainity in measurement (not to mention Naef's observation), there is also uncertainty in P/A calls and any other filtering operation. what is lacking is a rigorous assssment of this uncertainty. thus, by filtering genes before looking at other stats (such as fold change) you will likely introduce false negatives but wont have a clue of how much. i havent seen evidence suggesting that, when using RMA, this sacrifice in sensitivity is worth the gain in specificity. although, for MAS 5.0 it certailiny helps. another issue is how do you define P/A when you get differing calls in technical (or biological) replicates. i have seen many arbitrary choices used such as requiring 80% presence... but again, i havent seen evidence suggesting this helps, when using RMA. On Fri, 9 Jan 2004 w.huber@dkfz-heidelberg.de wrote: > > > ... One way is simply using an intensity cut, but this > > seems arbitrary, unless it is genuinely linked to detection ability of > > the technology. Does anyone have comments or ideas? > > Btw, there is a very neat paper by Felix Naef that explains why it can be > that for some probes the MM signal is higher than the PM signal - even if > the gene is expressed and no other interfering transcripts are around. > >From this it follows that a proper interpretation of MMs is rather more > complex than in MAS 5.0 (and I am sure Rafael & Zhijin Wu would have much > more to say about this). > > Naef F, Magnasco MO. > Solving the riddle of the bright mismatches: labeling and > effective binding in oligonucleotide arrays. > Phys Rev E Stat Nonlin Soft Matter Phys. 2003 Jul;68(1 Pt 1):011906. > PMID: 12935175 > > Best wishes > Wolfgang > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor >
ADD REPLY
0
Entering edit mode
@petra-b-ross-macdonald-598
Last seen 9.6 years ago
Okay, I spent today checking how much these "non-hybridizing" probesets are costing me. I found that a conservative filter does not change my results much if I use a Bonferroni or Dunn-Sidak multiple test correction, but it matters a lot for a False Discovery Rate correction. Details below. I used VSN data and did a filtering step based on the Affymetrix bacterial probes on the U133A as follows: Non-hybridizing probesets were removed from the analysis using the following criteria: Twelve probesets derived from the Bacillis subtilis genome are present on the U133A chip. These probesets should not hybridize to human cDNA, and are thus a control for selectivity. In the 16 samples analyzed, these twelve probesets showed intensity values with a median of 6.8 and a maximum of 7.9. The E. coli BioB transcript is spiked into the labeling reaction at a concentration of 1.5pM, providing a control for sensitivity. Affymetrix specifies that this concentration is at the lower boundary for detection. In the 16 samples analyzed, intensity values for the six BioB probesets had median of 8 and a minimum value of 7.4. Thus, the transition between sensitivity and selectivity occurs at intensity values between 7.4 and 7.9. Probesets for which any two or more of the 16 samples showed intensity values of 7.5 or greater were included in the statistical analysis. 4,444 probesets did not meet this criteria and were excluded.. Data was imported into Partek. A mixed model ANOVA, using treatment as a fixed effect and treatment block as a random effect, was used to identify markers that showed a treatment effect. Using the Partek False Discovery Rate (FDR) tool, a step-up FDR analysis was performed on the p values, allowing us to identify p-value cutoffs for certain levels of significance. Using the mixed ANOVA without filtering (22284 probesets), I identified no probesets that made an FDR cutoff of 0.05, and 11 that made a cutoff of 0.1. A list of the top 300 probesets for this ANOVA had a FDR of between 40 and 50%. Using the filtered set (17840 probesets), I identified 4 probesets that made an FDR cutoff of 0.05, and 300 that made a cutoff of 0.1. These 300 included 272 that were in the top 300 of the mixed ANOVA without filtering ie 28 were lost in the filter step. So filtering doesn't really change the probesets or rankings that come out of the ANOVA, but do you agree it gives the False Discovery Rate more meaning? It's a lot better if I can give the biologist back a list of 300 genes and tell them 10% are false positives, than a list of 11 with one false positive. Maybe the 28 markers have to be looked at seperately, to avoid the false negatives that Naomi was worried about. Based on what I saw today, I thnk it is valid to make a "P calling" function for VSN data based on the intensity values for non-human probes on the arrays. Isaac? cheers, Petra "Rafael A. Irizarry" wrote: > i agree with wolfgang. to reiterate, what is very important is that you > remember that due to uncertainity in measurement (not to mention Naef's > observation), there is also > uncertainty in P/A calls and any other filtering operation. what is > lacking is a rigorous assssment of this uncertainty. > thus, by filtering genes before looking > at other stats (such as fold change) you will likely introduce false > negatives but wont have a clue of how much. i havent seen evidence > suggesting that, when using RMA, this > sacrifice in sensitivity is worth the gain in specificity. although, for > MAS 5.0 it certailiny helps. > > another issue is how do you define P/A when you get differing calls in > technical (or biological) replicates. i have seen many arbitrary choices > used such as requiring 80% presence... but again, i havent seen evidence > suggesting this helps, when using RMA. >
ADD COMMENT
0
Entering edit mode
Hi Petra, two questions: > I used VSN data and did a filtering step ... Did you use just the PM values or PM and MM (and if so, how)? > Thus, the transition between sensitivity and selectivity occurs at > intensity values between 7.4 and 7.9. >From what other people have told me, I'd expect that there is not a universal value, but that it would be different for different probe sequences (according to the distribution of C, G, T, and A across the positions 1..25). Also, the exact value would depend on scanner settings etc. .... Opinions? Best wishes Wolfgang ------------------------------------- Wolfgang Huber Division of Molecular Genome Analysis German Cancer Research Center Heidelberg, Germany Phone: +49 6221 424709 Fax: +49 6221 42524709 Http: www.dkfz.de/abt0840/whuber
ADD REPLY
0
Entering edit mode
Wolfgang: For the VSN and RMA we used only the PMvalues with the affy version 1.2.25. raw.data <- ReadAffy(filenames=cels, sampleNames=names, verbose=F) vsn.data <- expresso(raw.data, normalize.method="vsn", bg.correct=F, pmcorrect.method="pmonly", summary.method="medianpolish") rma.data <- rma(raw.data) Isaac w.huber@dkfz-heidelberg.de wrote: >Hi Petra, >two questions: > > > >>I used VSN data and did a filtering step ... >> >> > >Did you use just the PM values or PM and MM (and if so, how)? > > > >>Thus, the transition between sensitivity and selectivity occurs at >>intensity values between 7.4 and 7.9. >> >> > >From what other people have told me, I'd expect that there is not a >universal value, but that it would be different for different probe >sequences (according to the distribution of C, G, T, and A across the >positions 1..25). Also, the exact value would depend on scanner settings >etc. .... Opinions? > >Best wishes > Wolfgang > >------------------------------------- >Wolfgang Huber >Division of Molecular Genome Analysis >German Cancer Research Center >Heidelberg, Germany >Phone: +49 6221 424709 >Fax: +49 6221 42524709 >Http: www.dkfz.de/abt0840/whuber >------------------------------------- > > >
ADD REPLY
0
Entering edit mode
We used norm.data <- expresso(raw.data,bgcorrect.method="rma",normalize.method="quantiles", pmcorrect.method="pmonly",summary.method="mas") I did not look at the spiking controls (I should) but the investigator did RT-PCR for several of the genes that expressed near 50 in all conditions. A couple of these were present at low levels, but most were not. 10 seems very low to me. After the flurry of January grants is over, I will have a look at the spiking controls on these arrays. --Naomi At 08:42 AM 1/13/2004, Isaac Neuhaus wrote: >Wolfgang: > >For the VSN and RMA we used only the PMvalues with the affy version 1.2.25. > >raw.data <- ReadAffy(filenames=cels, sampleNames=names, verbose=F) >vsn.data <- expresso(raw.data, normalize.method="vsn", bg.correct=F, >pmcorrect.method="pmonly", summary.method="medianpolish") >rma.data <- rma(raw.data) > >Isaac > > >w.huber@dkfz-heidelberg.de wrote: > >>Hi Petra, >>two questions: >> >> >> >>>I used VSN data and did a filtering step ... >>> >> >>Did you use just the PM values or PM and MM (and if so, how)? >> >> >> >>>Thus, the transition between sensitivity and selectivity occurs at >>>intensity values between 7.4 and 7.9. >>> > From what other people have told me, I'd expect that there is not a >>universal value, but that it would be different for different probe >>sequences (according to the distribution of C, G, T, and A across the >>positions 1..25). Also, the exact value would depend on scanner settings >>etc. .... Opinions? >> >>Best wishes >>Wolfgang >> >>------------------------------------- >>Wolfgang Huber >>Division of Molecular Genome Analysis >>German Cancer Research Center >>Heidelberg, Germany >>Phone: +49 6221 424709 >>Fax: +49 6221 42524709 >>Http: www.dkfz.de/abt0840/whuber >>------------------------------------- >> >> > Naomi S. Altman 814-865-3791 (voice) Associate Professor Bioinformatics Consulting Center Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
ADD REPLY
0
Entering edit mode
@petra-b-ross-macdonald-598
Last seen 9.6 years ago
I agree there would not be a universal value for the transition point - I think that's why in this experiment, twelve B subtilis probesets that shouldn't hybridize can give intensities as high as 7.9, and six BioB probesets that should hybridize give intensities as low as 7.4. This is the region that Naomi talked about, where some low level mRNAs are giving the same values as some absent mRNAs. I'd also expect these overlap values to shift around somewhat with every experiment batch processed with VSN (since I have seen absolute values in for some genes control samples jump quite a bit in different projects). But within one dataset, it seems these negative control probesets provide an indicator of where the region is, and allow you to make decisions about which probesets to throw out if you want to use a False Discovery Rate metric. You could make a cut low in the region for sensitivity, or high for selectivity. In Naomi's case, if you really care about identifying low level RNAs that are upregulated by a small amount, you could run the ANOVA to identify genes of interest without the cut, but run it again with the cut to get a better idea of what the p values really are. I am going to try and find out more about the control spots. The Affy site doesn't have anything obvious about using them. Cheers, Petra w.huber@dkfz-heidelberg.de wrote: > > Thus, the transition between sensitivity and selectivity occurs at > > intensity values between 7.4 and 7.9. > > From what other people have told me, I'd expect that there is not a > universal value, but that it would be different for different probe > sequences (according to the distribution of C, G, T, and A across the > positions 1..25). Also, the exact value would depend on scanner settings > etc. .... Opinions? >
ADD COMMENT

Login before adding your answer.

Traffic: 787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6