Question

why does stat column disappears after lfcshrink on deseq 2 and why does fgsea uses stat column (not affected by shrinking) for fgsea?

0

Entering edit mode

kavator ▴ 30

@kavator-22955

Last seen 2.2 years ago

Singapore

Hi all;

I understand that stat column represents the wald statistics but why does the entire column disappear after lfcshrink() shrinking? how does shrinking affects those values in stat column

Additionally; i dont get why does fgsea uses stat column (whose values are presumably not affected by lfc shrink) instead of a shrunken column of values

fgsea DESeq2 • 2.8k views

ADD COMMENT • link updated 4.1 years ago by ATpoint ★ 4.7k • written 4.1 years ago by kavator ▴ 30

0

Entering edit mode

I am neither the DESeq2 nor fgsea developer/maintainer, but my two cents. fgsea uses whatever you give as ranked input. There are some biostars threads where the developer (if memory serves) recommends the signed (so put a minus if fold change is negative) stat column but this is only one of many ranking methods. You can also use -log10(pvalue) which is somewhat similar or, given that you have the shrunken effect sizes via lfcShrink, use these fold changes. That is actually the crux of the entire lfcShrink procedure, so obtaining fold changes that are unlocked from the biased trend of large but unreliable (=large standard errors) FCs when counts are low and/or replicate numbers are low. DESeq2 puts strong emphasis on the effect sizes, by best knowledge the edgeR philosophy is more pvalue-centered afaik. Their recommendation (again by best knowledge) is to use the pvalue after using glmTreat. Please search for threads towards edgeR and gene ranking if you need details on that.

In any case, the choice of ranking method is completely on you, fgsea accepts any ranked list of genes. It should be something that is continuous with minimal ties between the genes, therefore stat column, fold changes or pvalues are probably favoured compared to adjusted p-values. I do not know how the s-values behave in terms of numbers of ties, you would need to check that yourself.

ADD REPLY • link 4.1 years ago ATpoint ★ 4.7k

0

Entering edit mode

"obtaining fold changes that are unlocked from the biased trend of large but unreliable (= large standard errors) FCs when counts are low"

This is my understanding as well, the reason downstream methods use t-stat is to deal with noisy LFC. lfcShrink() addresses this directly, and has been benchmarked with respect to reproducibility of ranking genes by effect size.

ADD REPLY • link 4.1 years ago Michael Love 43k

score 0 · Answer 1 · 2021-02-19

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

I don't know about fgsea or its required input.

The reason we remove the stat column is that we are no longer computing a Wald statistic of an MLE coefficient and its SE, but we are returning a posterior distribution for the LFC, with a mean and a SD. You can compute tail intervals using svalue=TRUE.

ADD COMMENT • link 4.1 years ago Michael Love 43k