Question: confusing P-value of one gene

0

Wolfgang Huber ♦

**13k**wrote:Dear Xinwei
as far as I can see, this is a case where transformation of the count
data (e.g. regularized log, or variance-stabilizing) and ANOVA with a
normal linear model should give useful results. Since these
transformations are like the logarithm for higher counts but avoid the
singularity around zero.
One place to start with that is chapter 2 of the DESeq2 vignette,
functions rlogTransformation and varianceStabilizingTransformation.
Doing the testing in this way is also not entirely without pitfalls
(e.g. it tends to have less statistical power esp. in the small sample
and low-counts regime, and may be susceptible to library-size related
biases). I'd be interested to hear how it turns out in your case.
Best wishes
Wolfgang
On Aug 29, 2013, at 4:15 am, Gordon K Smyth <smyth at="" wehi.edu.au="">
wrote:
> Dear Xinwei,
>
> This is a correct result. The reason that the interaction is not
statistically significant is inherent in the log-linear model, and
hence in the definition of interaction for this sort of model.
>
> You are probably thinking that the cpm values are much higher for
the joint condition CX&RGF than for the other conditions, hence there
should be a positive interaction, and this should be statistically
significant.
>
> Indeed, had you tested the joint condition vs the other three
conditions it would certainly be significantly higher.
>
> However the interaction is different. The problem is that there are
zero counts for the controls. Hence the fold change from control to
CX is infinity, and the fold change from control to RGF is infinity.
Hence the counts in the joint condition can be indefinitely large even
the absence of any positive interaction. Hence there is no evidence
for any positive interaction. In fact, you could make the counts for
the CX&RGF libraries as large as you like, and the interaction would
never become significant. To make this clear, the counts could have
been:
>
> 0 0 0 0 0 1 0 0 1 1e10 1e10 1e10
>
> and this would not give a significant interaction. So long as there
are zero counts for the controls, and least one count for the single
treatments CX and RGF, the interaction will never become significant.
>
> You should ignore the logFC in this case, because the interaction
logFC is not defined in any meaningful way for this data.
>
> On the other hand, if you had any positive counts for the controls,
then the interaction would suddenly become significant, because the
fold changes from control to CX and control to RGF would now be
finite.
>
> I suspect that you might find it more meaningful to test for
>
> CX&RGF - (control+CX+RGF)/3
>
> This will certainly be significant. Or else test for CX&RGF vs each
of the other three individually.
>
> As I've said before, I am not a fan of factorial interaction models
for genomic data, and this is yet another example of why this is so.
>
> Best wishes
> Gordon
>
>
> On Wed, 28 Aug 2013, Xinwei Han wrote:
>
>> Hi,
>>
>> I manually checked p-values from edgeR and found the p-value of
this particular gene, AT1G04500, difficult to understand. The CPM of
this gene is like this:
>>
>> control replicate1: 0
>> control replicate2: 0
>> control replicate3: 0
>> CX replicate1: 0
>> CX replicate2: 0.24
>> CX replicate3: 0
>> RGF replicate1: 0
>> RGF replicate2: 0.14
>> RGF replicate3: 0.19
>> CX&RGF replicate1: 25.14
>> CX&RGF replicate2: 44.36
>> CX&RGF replicate3: 34.62
>>
>> I fitted GLM with model.matrix(~RGF + CX + RGF:CX). To find out
genes under significant interaction effect, lrt <- glmLRT(fit, coef=4)
gives the following results to this gene:
>>
>> logFC: 5.43
>> logCPM: 3.19
>> LR: 0.012
>> PValue: 0.91
>>
>> I do not understand why such dramatic change and such large logFC
have p-value of 0.91. I attached the data and R script I used. Could
you take a look to see whether I did something wrong in the script? Or
there are some other reasons for that?
>>
>> I used the latest version of R and edgeR. "ms" in the data and
script is
>> the control.
>>
>> Thanks
>> Xinwei
>>
>
>
______________________________________________________________________
> The information in this email is confidential and
intend...{{dropped:4}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT
• link
•
modified 5.8 years ago
by
Gordon Smyth ♦

**37k**• written 5.8 years ago by Wolfgang Huber ♦**13k**