Question

Bias of lowly expressed genes in DESeq2

0

Entering edit mode

Jason • 0

@f0b26999

Last seen 18 months ago

Switzerland

Hi Michael,

I have treated vs untreated(wt) samples. And I know a subset of genes are very lowly expressed in wt but will be up-regulated in treated samples. When I do the DEseq2 analysis, most of them are at the top if I rank them by adjust p values or by fold change which makes sense. But in this case, it looks like the genes ranking at the top (high pvalue or foldchange) will bias to genes lowly expressed in wt. Their baseMeans are intermediate since they consider all the samples (treated+wt). Thus I think shrinkage method will also not help if it is relative to baseMeans. So I wonder how DESeq2 deals with such bias??

Thank you in advance for your answer.

DESeq2 • 922 views

ADD COMMENT • link updated 2.8 years ago by Michael Love 41k • written 2.8 years ago by Jason • 0

score 0 · Answer 1 · 2021-06-28

0

Entering edit mode

ATpoint ★ 4.0k

@atpoint-13662

Last seen 16 hours ago

Germany

Their baseMeans are intermediate since they consider all the samples (treated+wt)

If a gene is decently expressed in conditionA and almost shut down in conditionB, wouldn't you exactly expect an intermediate baseMean? And if this scenario is true, wouldn't you also expect that these are then the most significant changes, both in terms of effect size and significance?

I guess it would be good to add some details, such as the counts, and results output for this gene, and where it is on the MA-plot.

ADD COMMENT • link 2.8 years ago ATpoint ★ 4.0k

0

Entering edit mode

Hi ATpoint,

Thanks for your reply. Yes, I expect an intermedia baseMean. That is why I think shrinkage probably not helps much (correct me if I am wrong). And I expect them to be the most significant ones too but since the fold change X/Y is anti-correlated with Y (spurious correlation), I am worried that the high fold change they have is only due to the super small Y. How much should I trust them at the top list over the other significant genes if I want to rank all the significant genes?

ADD REPLY • link 2.8 years ago Jason • 0

0

Entering edit mode

You really should to show some data, I doubt that this can be answered based on textual descriptions.

ADD REPLY • link 2.8 years ago ATpoint ★ 4.0k

0

Entering edit mode

Just a note:

The LFC shrinkage does not depend on the baseMean. It just uses the counts and the adaptive prior for LFC (looking across all genes). Unlike for dispersion estimation, our prior is experiment-wide for LFC, not specific to the gene's baseMean.

ADD REPLY • link 2.8 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

You said "uses the counts". Here the "counts" means the counts from (treated + wt) or just wt? How does the DEseq2 deal with the genes with very low counts only in wt but not in treated? Thx.