Question

Large DE LogFC range

0

Entering edit mode

save ▴ 20

@3d20f23f

Last seen 12 months ago

Italy

I'm working with DESeq2 to make a DE analysis between samples in two different conditions. During the analysis, I identified a batch effect due to the sequencing time modelled as a covariate in the design formula. From the differential expression (Wald test), I was able to retrieve a good number of significant genes (~100) but the LogFC range looks not reliable going from -30 to +30.

What could be the cause of these extreme large values and how can I solve the problem? I tried to use lfcShrink() to re-estimate the logFC but I'm not sure that is sufficient to achieve reliable results.

My second question is about the design of the model. Is it reasonable to add covariates to the model also if they don't show a strong effect on the data (looking PCA or clustering)?

Follow the code used for the analysis. Thanks for your help!


dds <- DESeqDataSetFromMatrix(count, coldata, design = ~ Group + Condition)

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds <- DESeq(dds)

res <- results(dds, contrast=c("Condition","dis","hea"))
res <- res[which(res$padj <0.05),]

DESeq2 • 840 views

ADD COMMENT • link updated 2.2 years ago by Michael Love 41k • written 2.2 years ago by save ▴ 20

0

Entering edit mode

You could at least show some counts of the genes with large FCs. Also the PCA would help.

ADD REPLY • link 2.2 years ago ATpoint ★ 4.0k

score 0 · Answer 1 · 2022-01-24

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 14 minutes ago

United States

Agree with ATpoint, look at the genes with large LFC (from lfcShrink) using plotCounts. Usually large MLE LFC are from all 0's in one group.

"Is it reasonable to add covariates to the model also if they don't show a strong effect on the data"

This is up to the analyst. You can add them to be careful but it does come at a loss of degrees of freedom. So if they truly don't have any effect on any genes, it's best to leave out unnecessary covariates.

ADD COMMENT • link 2.2 years ago Michael Love 41k

0

Entering edit mode

Thanks for your comments. Regarding the large LFC, I checked the counts of the DE genes and seems that the majority of them is 0 in one condition while for other genes the expression level seems low in all the samples. I attached an example of plot of counts:

enter image description here

I also show the PCA plot colored by condition:

enter image description here

What do you suggest to do? Should I filter out more genes before to run DESeq()? Or is it enough to shrink the logFC?

ADD REPLY • link 2.2 years ago save ▴ 20

0

Entering edit mode

Shrinking the LFC is the recommendation from the vignette.

We spent a lot of time developing and benchmarking the posterior estimates of LFC.

ADD REPLY • link 2.2 years ago Michael Love 41k