ranking the DeSeq2 results using t-values?
1
0
Entering edit mode
Theo ▴ 10
@theodoregeorgomanolis-7993
Last seen 5 months ago
Germany

Hi all, this might be a bit complicated post I was trying to get a better than the log2fold change to rank my Deseq2 results, so that for example to get important genes ranked poth by log2fold change and by p.value One of my colleagues proposed something that I never heard of, t-value and provided the following code in R:

qnorm((1 - results$pvalue / 2)) * sign(results$log2FoldChange)

I am not that good with mathematics but it is my appreciation that this is a computation of the confidence level with a +/- sign taken by the log2fold change. I do not understand the value of this and why for example not using the shrinckage factor with the sign taken from the log2foldchange, or even better the s-value, instead of the pvalue.

Also, qnorm is for normal distributions, RNA-seq data are not that normaly distributed (as far as I have understood) and if I want to extend that analysis to other omics data, that are not even close to normally distributed, I assume it would be wrong.

Anyone could give my a hint, what to read, or any other information? I have read the "Analyzing RNA-seq data with DESeq2" but I can not find any relevant information Thank you all

DESeq2 • 2.0k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

I think your friend (or possibly you) are having a laugh. You don't have to calculate the t-statistics for the DESeqResults object, as they are already there.

ADD COMMENT
0
Entering edit mode

Thank you for your comment, could you please elaborate? t-statistics refer to the students test, and in R to calculate it there is the t.test() function. what I am referring to seems to be a bit different. Even so, where is the t-statistics located and how it is calculated?

ADD REPLY
0
Entering edit mode

Asymptotically the Wald is a t-test. And for ease of use the p-value for the Wald test in DESeq2 is calculated using pt, so that's pretty much a t-test in my book. Anyway,

## use example data 
> library(DESeq2)
> example(results)
< snip>
## get some data
> z <- DataFrame(results(dds, contrast=c("group", "IIIB", "IIIA"))
> all.equal(z$stat, qnorm(1 - z$pvalue/2) * sign(z$log2FoldChange))
[1] TRUE

## QED
ADD REPLY
0
Entering edit mode

Sorry if I sound arrogant, but isn't it that, Wald test is for non-parametric data as are rba-seq data t-test is for parametric. But, if you see the formula that was provided, this is no t-test. They called a t-value and takes the p.value as calculated by a t-test or Wald's test. That function it does not calculate a pvalue as a t-test, or Wald's test will. So, for parametric data this function will give the same result as a t-test? And then what about the sign? P-value has no sign.

ADD REPLY
0
Entering edit mode

I don't know what you mean. The Wald test is a way to generate a t-test looking thing when using non-normal data. It is actually distributed Chi-square, which is a parameter, so no, it is not a non-parametric test.

But anyway, I just showed you that the formula your colleague gave you results in the exact same values you already get from using results on your DESeqDataSet. Which is why I presume somebody is having a laugh. Why would you calculate something you already have?

ADD REPLY
0
Entering edit mode

Also, from ?nbinomWaldTest

 useT: whether to use a t-distribution as a null distribution, for
          significance testing of the Wald statistics. If FALSE, a
          standard normal null distribution is used. See next argument
          'df' for information about which t is used. If 'useT=TRUE'
          then further calls to 'results' will make use of
          'mcols(object)$tDegreesFreedom' that is stored by
          'nbinomWaldTest'.

Your colleague has shown you how to compute a z-score using the p-value and sign. Which is, as I already noted, just calculating the existing statistic that the p-value was computed from.

ADD REPLY
0
Entering edit mode

thank you for this reply, this makes more sense to me. My worry then is, if you do not mind going a bit further, is, if this is an appropriate/desirable way to rank all genes, to use it for example in GSEA?

ADD REPLY
0
Entering edit mode

I'd recommend lfcShrink.

There have been some posts on this:

https://www.google.com/search?q=site%3Asupport.bioconductor.org+gsea+deseq2

ADD REPLY

Login before adding your answer.

Traffic: 643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6