Hi all, this might be a bit complicated post I was trying to get a better than the log2fold change to rank my Deseq2 results, so that for example to get important genes ranked poth by log2fold change and by p.value One of my colleagues proposed something that I never heard of, t-value and provided the following code in R:
qnorm((1 - results$pvalue / 2)) * sign(results$log2FoldChange)
I am not that good with mathematics but it is my appreciation that this is a computation of the confidence level with a +/- sign taken by the log2fold change. I do not understand the value of this and why for example not using the shrinckage factor with the sign taken from the log2foldchange, or even better the s-value, instead of the pvalue.
Also, qnorm is for normal distributions, RNA-seq data are not that normaly distributed (as far as I have understood) and if I want to extend that analysis to other omics data, that are not even close to normally distributed, I assume it would be wrong.
Anyone could give my a hint, what to read, or any other information? I have read the "Analyzing RNA-seq data with DESeq2" but I can not find any relevant information Thank you all
Thank you for your comment, could you please elaborate? t-statistics refer to the students test, and in R to calculate it there is the t.test() function. what I am referring to seems to be a bit different. Even so, where is the t-statistics located and how it is calculated?
Asymptotically the Wald is a t-test. And for ease of use the p-value for the Wald test in DESeq2 is calculated using
pt
, so that's pretty much a t-test in my book. Anyway,Sorry if I sound arrogant, but isn't it that, Wald test is for non-parametric data as are rba-seq data t-test is for parametric. But, if you see the formula that was provided, this is no t-test. They called a t-value and takes the p.value as calculated by a t-test or Wald's test. That function it does not calculate a pvalue as a t-test, or Wald's test will. So, for parametric data this function will give the same result as a t-test? And then what about the sign? P-value has no sign.
I don't know what you mean. The Wald test is a way to generate a t-test looking thing when using non-normal data. It is actually distributed Chi-square, which is a parameter, so no, it is not a non-parametric test.
But anyway, I just showed you that the formula your colleague gave you results in the exact same values you already get from using
results
on yourDESeqDataSet
. Which is why I presume somebody is having a laugh. Why would you calculate something you already have?Also, from ?nbinomWaldTest
Your colleague has shown you how to compute a z-score using the p-value and sign. Which is, as I already noted, just calculating the existing statistic that the p-value was computed from.
thank you for this reply, this makes more sense to me. My worry then is, if you do not mind going a bit further, is, if this is an appropriate/desirable way to rank all genes, to use it for example in GSEA?
I'd recommend lfcShrink.
There have been some posts on this:
https://www.google.com/search?q=site%3Asupport.bioconductor.org+gsea+deseq2