DESeq2 with many outliers
1
0
Entering edit mode
@dequattroconcetta-21510
Last seen 2.1 years ago
Italy

Hi! I am performing differential expression analysis and when I created the dds model and I got the following message:

txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)

txi.rsem$length[txi.rsem$length == 0] <- 1
########Creation of dds model
ddsTxi <- DESeqDataSetFromTximport(txi.rsem, colData = samples_selected, design = ~ condition)

using counts and average transcript lengths from tximport

ddsTxi$condition <- relevel(ddsTxi$condition, ref = cond2)

dds <- DESeq(ddsTxi)

    estimating size factors
    using 'avgTxLength' from assays(dds), correcting for library size
    estimating dispersions
    gene-wise dispersion estimates
    mean-dispersion relationship
    final dispersion estimates
    fitting model and testing
    -- replacing outliers and refitting for 7581 genes
    -- DESeq argument 'minReplicatesForReplace' = 7
    -- original counts are preserved in counts(dds)
    estimating dispersions
    fitting model and testing

summary(res)
out of 47011 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 1, 0.0021%
LFC < 0 (down)     : 0, 0%
outliers [1]       : 2008, 4.3%
low counts [2]     : 191, 0.41%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

I was wondering if I can keep the results that I obtained considering only the p-value, or it is better to get a more reliable result to perform the analysis modifying the minReplicatesForReplace option?

Thanks,

Concetta

DESeq2 • 1.5k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

With thousands of genes with an outlier, have you taken a look at some of these with plotCounts? You can also see whether it is a single sample that is tending to contribute to the outliers (see vignette).

You might find a pattern that can be addressed either by trimming genes in a non-specific manner.

ADD COMMENT
0
Entering edit mode

Thank you! I have removed from the dds sub-optimal genes and I re-created the dds model and I did not get the message replacing outliers and refitting for 7581 genes . However when I ran summary(res) I got this results:

summary(res)    
out of 47100 with nonzero total read count
        adjusted p-value < 0.1
        LFC > 0 (up)       : 0, 0%
        LFC < 0 (down)     : 0, 0%
        outliers [1]       : 1795, 3.8%
        low counts [2]     : 0, 0%
        (mean count < 0)
        [1] see 'cooksCutoff' argument of ?results
        [2] see 'independentFiltering' argument of ?results

Considering that I did not get any message when I ran DESeq(dds), I did not understand if now I can trust my model and the results.

Thank you.

Concetta

ADD REPLY
0
Entering edit mode

So there are still many outliers, which I would encourage you to look at with plotCounts.

ADD REPLY
0
Entering edit mode

I looked the plotCounts but I noticed that looking at different genes I have every time different samples that behave as outliers samples. What do you suggest to do?

ADD REPLY
0
Entering edit mode

If you think that, from those plots, the genes do contain outliers and are likely not exhibiting DE signal, then you can go with the results table above (the one with 1,795 outliers). If you think that there is DE but it is not being found, I would look at the PCA plot and see if you think that the biological variability within groups is smaller than across groups.

ADD REPLY

Login before adding your answer.

Traffic: 838 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6