DESeq2 with many outliers
0
Entering edit mode
@dequattroconcetta-21510
Last seen 6 weeks ago
Italy

Hi! I am performing differential expression analysis and when I created the dds model and I got the following message:

txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)

txi.rsem$length[txi.rsem$length == 0] <- 1
########Creation of dds model
ddsTxi <- DESeqDataSetFromTximport(txi.rsem, colData = samples_selected, design = ~ condition)

using counts and average transcript lengths from tximport

ddsTxi$condition <- relevel(ddsTxi$condition, ref = cond2)

dds <- DESeq(ddsTxi)

estimating size factors
using 'avgTxLength' from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 7581 genes
-- DESeq argument 'minReplicatesForReplace' = 7
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing

summary(res)
out of 47011 with nonzero total read count
LFC > 0 (up)       : 1, 0.0021%
LFC < 0 (down)     : 0, 0%
outliers [1]       : 2008, 4.3%
low counts [2]     : 191, 0.41%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results


I was wondering if I can keep the results that I obtained considering only the p-value, or it is better to get a more reliable result to perform the analysis modifying the minReplicatesForReplace option?

Thanks,

Concetta

DESeq2 • 133 views
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

With thousands of genes with an outlier, have you taken a look at some of these with plotCounts? You can also see whether it is a single sample that is tending to contribute to the outliers (see vignette).

You might find a pattern that can be addressed either by trimming genes in a non-specific manner.

0
Entering edit mode

Thank you! I have removed from the dds sub-optimal genes and I re-created the dds model and I did not get the message replacing outliers and refitting for 7581 genes . However when I ran summary(res) I got this results:

summary(res)
out of 47100 with nonzero total read count
LFC > 0 (up)       : 0, 0%
LFC < 0 (down)     : 0, 0%
outliers [1]       : 1795, 3.8%
low counts [2]     : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results


Considering that I did not get any message when I ran DESeq(dds), I did not understand if now I can trust my model and the results.

Thank you.

Concetta

0
Entering edit mode

So there are still many outliers, which I would encourage you to look at with plotCounts.

0
Entering edit mode

I looked the plotCounts but I noticed that looking at different genes I have every time different samples that behave as outliers samples. What do you suggest to do?

0
Entering edit mode

If you think that, from those plots, the genes do contain outliers and are likely not exhibiting DE signal, then you can go with the results table above (the one with 1,795 outliers). If you think that there is DE but it is not being found, I would look at the PCA plot and see if you think that the biological variability within groups is smaller than across groups.