Hi all;
I understand that stat column represents the wald statistics but why does the entire column disappear after lfcshrink() shrinking? how does shrinking affects those values in stat column
Additionally; i dont get why does fgsea uses stat column (whose values are presumably not affected by lfc shrink) instead of a shrunken column of values
I am neither the DESeq2 nor fgsea developer/maintainer, but my two cents. fgsea uses whatever you give as ranked input. There are some biostars threads where the developer (if memory serves) recommends the signed (so put a minus if fold change is negative) stat column but this is only one of many ranking methods. You can also use
-log10(pvalue)
which is somewhat similar or, given that you have the shrunken effect sizes vialfcShrink
, use these fold changes. That is actually the crux of the entirelfcShrink
procedure, so obtaining fold changes that are unlocked from the biased trend of large but unreliable (=large standard errors) FCs when counts are low and/or replicate numbers are low. DESeq2 puts strong emphasis on the effect sizes, by best knowledge the edgeR philosophy is more pvalue-centered afaik. Their recommendation (again by best knowledge) is to use the pvalue after usingglmTreat
. Please search for threads towards edgeR and gene ranking if you need details on that.In any case, the choice of ranking method is completely on you, fgsea accepts any ranked list of genes. It should be something that is continuous with minimal ties between the genes, therefore stat column, fold changes or pvalues are probably favoured compared to adjusted p-values. I do not know how the s-values behave in terms of numbers of ties, you would need to check that yourself.
"obtaining fold changes that are unlocked from the biased trend of large but unreliable (= large standard errors) FCs when counts are low"
This is my understanding as well, the reason downstream methods use t-stat is to deal with noisy LFC.
lfcShrink()
addresses this directly, and has been benchmarked with respect to reproducibility of ranking genes by effect size.