GCRMA: low intensity exprs estimates / pval distributions
1
0
Entering edit mode
@matthew-hannah-621
Last seen 9.6 years ago
Hi, I noticed this a while ago but with some of the recent threads, maybe now is suitable for a general discussion. This will be easiest if you view the attached files on the bioconductor archive site. Basically GCRMA changed it's BG parameter estimation from using a low quantile of strata of affinity levels (1.1.0 or less) to a smoother way using loess. There is also a fast=FALSE option which does not use the (default) faster ad-hoc algorithm (MLE vs. EB?). If you compare v1.1.0 and 1.1.3 (current stable release) (+/- fast=F) there are significant differences in the expression estimates, particularly at the low end. This is not really too surprising as the data is noisy and each measure will have its own specifics. What is more interesting are changes in expression. I looked at a simple 3 vs. 3 comparison (limma, ebayes) within a larger normalized dataset (~50 arrays) and as you can see high p-values are over-represented when the default(fast=T) version is used. To me this questions whether the statistical test would still be valid, also it raises questions about estimating true/false -/+tives. I think (quick bioC search but no documentation) that a step-up FDR is used within p.adjust (used in limma). Could such a distribution affect the validity of using FDR correction. Or is this the p.value equivalent of having positive dependency of the test statistics? This all results from the different intensity distributions from GCRMA. All are bimodal which is likely to result from the genes that are not present giving the peak at lower intensities. I guess that these absent genes are responsible for the over-representation of high p-values as these genes are just BG. However, I prefer to work with the fast=F version due to their more conventional p-value distributions. As a thought - I assume a peak area extraction of the lower peak might be a nice way of detecting the number of 'present' genes. Any comments? Cheers, MAtt -------------- next part -------------- A non-text attachment was scrubbed... Name: GCRMA_comparison.png Type: image/png Size: 10346 bytes Desc: GCRMA_comparison.png Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20050314 /1ce953b7/GCRMA_comparison.png -------------- next part -------------- A non-text attachment was scrubbed... Name: GCRMA_comparison_Limma.pvals.png Type: image/png Size: 9072 bytes Desc: GCRMA_comparison_Limma.pvals.png Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20050314 /1ce953b7/GCRMA_comparison_Limma.pvals.png
gcrma gcrma • 935 views
ADD COMMENT
0
Entering edit mode
@matthew-hannah-621
Last seen 9.6 years ago
Hi, Just looking for opinions again. Basically any comments on the bimodal p-value distributions that can sometimes be observed when limma (although I guess this would apply for any test) is used to test for differential expression of data using the gcrma expression estimate. In my last mail I didn't really label the plots clearly (see links below or on BioC archives) First plots - comparisons of expression estimates using GCRMA1.1.0, 1.1.3, Fast=T or F -> There are big differences. Second plots - p-value distributions from limma comparison (lmFit,eBayes) of 3 vs. 3 arrays (out of a larger 50 array set) for each GCRMA normalisation -> The standard (fast=T) GCRMA algorithms produce a peak at high p-values in addition to the standard distribution. I also noticed the convest function in limma, but after a quick glance at the linked paper, that too shows the 'standard' p-value distribution (as produced by the gcrma (fast=F)). In fact I've not seen any discussion or recieved any comment on how these non-standard p-value distributions should be interpreted! Any takers? (see previous mail for more details) Thanks, Matt ####previous message#### Hi, I noticed this a while ago but with some of the recent threads, maybe now is suitable for a general discussion. This will be easiest if you view the attached files on the bioconductor archive site. Basically GCRMA changed it's BG parameter estimation from using a low quantile of strata of affinity levels (1.1.0 or less) to a smoother way using loess. There is also a fast=FALSE option which does not use the (default) faster ad-hoc algorithm (MLE vs. EB?). If you compare v1.1.0 and 1.1.3 (current stable release) (+/- fast=F) there are significant differences in the expression estimates, particularly at the low end. This is not really too surprising as the data is noisy and each measure will have its own specifics. What is more interesting are changes in expression. I looked at a simple 3 vs. 3 comparison (limma, ebayes) within a larger normalized dataset (~50 arrays) and as you can see high p-values are over-represented when the default(fast=T) version is used. To me this questions whether the statistical test would still be valid, also it raises questions about estimating true/false -/+tives. I think (quick bioC search but no documentation) that a step-up FDR is used within p.adjust (used in limma). Could such a distribution affect the validity of using FDR correction. Or is this the p.value equivalent of having positive dependency of the test statistics? This all results from the different intensity distributions from GCRMA. All are bimodal which is likely to result from the genes that are not present giving the peak at lower intensities. I guess that these absent genes are responsible for the over-representation of high p-values as these genes are just BG. However, I prefer to work with the fast=F version due to their more conventional p-value distributions. As a thought - I assume a peak area extraction of the lower peak might be a nice way of detecting the number of 'present' genes. Any comments? Cheers, MAtt -------------- next part -------------- A non-text attachment was scrubbed... Name: GCRMA_comparison.png Type: image/png Size: 10346 bytes Desc: GCRMA_comparison.png Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20050314/1ce95 3b 7/GCRMA_comparison.png -------------- next part -------------- A non-text attachment was scrubbed... Name: GCRMA_comparison_Limma.pvals.png Type: image/png Size: 9072 bytes Desc: GCRMA_comparison_Limma.pvals.png Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20050314/1ce95 3b 7/GCRMA_comparison_Limma.pvals.png
ADD COMMENT

Login before adding your answer.

Traffic: 841 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6