why does stat column disappears after lfcshrink on deseq 2 and why does fgsea uses stat column (not affected by shrinking) for fgsea?
0
Entering edit mode
kavator • 20
@kavator-22955
Last seen 6 days ago

Hi all;

I understand that stat column represents the wald statistics but why does the entire column disappear after lfcshrink() shrinking? how does shrinking affects those values in stat column

Additionally; i dont get why does fgsea uses stat column (whose values are presumably not affected by lfc shrink) instead of a shrunken column of values

fgsea DESeq2 • 64 views
0
Entering edit mode

I am neither the DESeq2 nor fgsea developer/maintainer, but my two cents. fgsea uses whatever you give as ranked input. There are some biostars threads where the developer (if memory serves) recommends the signed (so put a minus if fold change is negative) stat column but this is only one of many ranking methods. You can also use -log10(pvalue) which is somewhat similar or, given that you have the shrunken effect sizes via lfcShrink, use these fold changes. That is actually the crux of the entire lfcShrink procedure, so obtaining fold changes that are unlocked from the biased trend of large but unreliable (=large standard errors) FCs when counts are low and/or replicate numbers are low. DESeq2 puts strong emphasis on the effect sizes, by best knowledge the edgeR philosophy is more pvalue-centered afaik. Their recommendation (again by best knowledge) is to use the pvalue after using glmTreat. Please search for threads towards edgeR and gene ranking if you need details on that.

In any case, the choice of ranking method is completely on you, fgsea accepts any ranked list of genes. It should be something that is continuous with minimal ties between the genes, therefore stat column, fold changes or pvalues are probably favoured compared to adjusted p-values. I do not know how the s-values behave in terms of numbers of ties, you would need to check that yourself.

0
Entering edit mode

"obtaining fold changes that are unlocked from the biased trend of large but unreliable (= large standard errors) FCs when counts are low"

This is my understanding as well, the reason downstream methods use t-stat is to deal with noisy LFC. lfcShrink() addresses this directly, and has been benchmarked with respect to reproducibility of ranking genes by effect size.

0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

I don't know about fgsea or its required input.

The reason we remove the stat column is that we are no longer computing a Wald statistic of an MLE coefficient and its SE, but we are returning a posterior distribution for the LFC, with a mean and a SD. You can compute tail intervals using svalue=TRUE.