Question

DESeq2 with many outliers

0

Entering edit mode

dequattro.concetta ▴ 10

@dequattroconcetta-21510

Last seen 2.2 years ago

Italy

Hi! I am performing differential expression analysis and when I created the dds model and I got the following message:

txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)

txi.rsem$length[txi.rsem$length == 0] <- 1
########Creation of dds model
ddsTxi <- DESeqDataSetFromTximport(txi.rsem, colData = samples_selected, design = ~ condition)

using counts and average transcript lengths from tximport

ddsTxi$condition <- relevel(ddsTxi$condition, ref = cond2)

dds <- DESeq(ddsTxi)

    estimating size factors
    using 'avgTxLength' from assays(dds), correcting for library size
    estimating dispersions
    gene-wise dispersion estimates
    mean-dispersion relationship
    final dispersion estimates
    fitting model and testing
    -- replacing outliers and refitting for 7581 genes
    -- DESeq argument 'minReplicatesForReplace' = 7
    -- original counts are preserved in counts(dds)
    estimating dispersions
    fitting model and testing

summary(res)
out of 47011 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 1, 0.0021%
LFC < 0 (down)     : 0, 0%
outliers [1]       : 2008, 4.3%
low counts [2]     : 191, 0.41%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I was wondering if I can keep the results that I obtained considering only the p-value, or it is better to get a more reliable result to perform the analysis modifying the minReplicatesForReplace option?

Thanks,

Concetta

DESeq2 • 1.5k views

ADD COMMENT • link updated 3.4 years ago by Michael Love 42k • written 3.4 years ago by dequattro.concetta ▴ 10

score 0 · Answer 1 · 2020-12-04

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 4 hours ago

United States

With thousands of genes with an outlier, have you taken a look at some of these with plotCounts? You can also see whether it is a single sample that is tending to contribute to the outliers (see vignette).

You might find a pattern that can be addressed either by trimming genes in a non-specific manner.

ADD COMMENT • link 3.4 years ago Michael Love 42k

0

Entering edit mode

Thank you! I have removed from the dds sub-optimal genes and I re-created the dds model and I did not get the message replacing outliers and refitting for 7581 genes . However when I ran summary(res) I got this results:

summary(res)    
out of 47100 with nonzero total read count
        adjusted p-value < 0.1
        LFC > 0 (up)       : 0, 0%
        LFC < 0 (down)     : 0, 0%
        outliers [1]       : 1795, 3.8%
        low counts [2]     : 0, 0%
        (mean count < 0)
        [1] see 'cooksCutoff' argument of ?results
        [2] see 'independentFiltering' argument of ?results

Considering that I did not get any message when I ran DESeq(dds), I did not understand if now I can trust my model and the results.

Thank you.

Concetta

ADD REPLY • link 3.4 years ago dequattro.concetta ▴ 10

0

Entering edit mode

So there are still many outliers, which I would encourage you to look at with plotCounts.

ADD REPLY • link 3.4 years ago Michael Love 42k

0

Entering edit mode

I looked the plotCounts but I noticed that looking at different genes I have every time different samples that behave as outliers samples. What do you suggest to do?

ADD REPLY • link 3.4 years ago dequattro.concetta ▴ 10

0

Entering edit mode

If you think that, from those plots, the genes do contain outliers and are likely not exhibiting DE signal, then you can go with the results table above (the one with 1,795 outliers). If you think that there is DE but it is not being found, I would look at the PCA plot and see if you think that the biological variability within groups is smaller than across groups.

ADD REPLY • link 3.4 years ago Michael Love 42k