Large DE LogFC range
1
0
Entering edit mode
save ▴ 10
@3d20f23f
Last seen 3 months ago
Italy

I'm working with DESeq2 to make a DE analysis between samples in two different conditions. During the analysis, I identified a batch effect due to the sequencing time modelled as a covariate in the design formula. From the differential expression (Wald test), I was able to retrieve a good number of significant genes (~100) but the LogFC range looks not reliable going from -30 to +30.

• What could be the cause of these extreme large values and how can I solve the problem? I tried to use lfcShrink() to re-estimate the logFC but I'm not sure that is sufficient to achieve reliable results.
• My second question is about the design of the model. Is it reasonable to add covariates to the model also if they don't show a strong effect on the data (looking PCA or clustering)?


dds <- DESeqDataSetFromMatrix(count, coldata, design = ~ Group + Condition)

keep <- rowSums(counts(dds)) >= 10
dds <- dds[keep,]

dds <- DESeq(dds)

res <- results(dds, contrast=c("Condition","dis","hea"))

DESeq2 • 266 views
0
Entering edit mode

You could at least show some counts of the genes with large FCs. Also the PCA would help.

0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Agree with ATpoint, look at the genes with large LFC (from lfcShrink) using plotCounts. Usually large MLE LFC are from all 0's in one group.

"Is it reasonable to add covariates to the model also if they don't show a strong effect on the data"

This is up to the analyst. You can add them to be careful but it does come at a loss of degrees of freedom. So if they truly don't have any effect on any genes, it's best to leave out unnecessary covariates.

0
Entering edit mode

Thanks for your comments. Regarding the large LFC, I checked the counts of the DE genes and seems that the majority of them is 0 in one condition while for other genes the expression level seems low in all the samples. I attached an example of plot of counts:

I also show the PCA plot colored by condition:

What do you suggest to do? Should I filter out more genes before to run DESeq()? Or is it enough to shrink the logFC?

0
Entering edit mode

Shrinking the LFC is the recommendation from the vignette.

We spent a lot of time developing and benchmarking the posterior estimates of LFC.