Which genes are outliers in DESeq2?
2
0
Entering edit mode
James • 0
@c2575e9b
Last seen 2.7 years ago
United States

Hello! I had a question that I could not find the answer to on the DESeq2 vignettes page or previous forum posts:

After running res <- results(dds) and summary(res), the code says I have 8 outliers (or 80 outliers for another dataset). Is there an easy way to figure out the identities of the genes that are identified as outliers? I could manually check differences in the res objects after running results(dds) and results(dds, cooksCutoff=FALSE), but that felt clunky and time-consuming.

Thank you!

DESeq2 RNAseq outlier • 2.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Outliers will have high maxCooks values in mcols(dds) -- the threshold is qf(.99, p, n - p) where n is sample size and p is number of coefficients in the model (length of resultsNames(dds)). These will also have a NA pvalue.

ADD COMMENT
0
Entering edit mode
ATpoint ★ 5.0k
@atpoint-13662
Last seen 13 days ago
Germany

You can take over some of the lines from results() to find them easily:

suppressPackageStartupMessages(library(DESeq2))

set.seed(10)
dds <- makeExampleDESeqDataSet()

dds <- DESeq(dds)
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
res <- results(dds)
summary(res)
#> 
#> out of 999 with nonzero total read count
#> adjusted p-value < 0.1
#> LFC > 0 (up)       : 0, 0%
#> LFC < 0 (down)     : 0, 0%
#> outliers [1]       : 4, 0.4%
#> low counts [2]     : 0, 0%
#> (mean count < 0)
#> [1] see 'cooksCutoff' argument of ?results
#> [2] see 'independentFiltering' argument of ?results

# From the results() function to get the outlier genes
m <- nrow(attr(dds, "dispModelMatrix"))
p <- ncol(attr(dds, "dispModelMatrix"))
cooksCutoff <- qf(0.99, p, m - p)
cooksOutlier <- mcols(dds)$maxCooks > cooksCutoff

w <- which(cooksOutlier)

# that is the outliers, four as reported in the summary(res)
rownames(dds)[w]
#> [1] "gene36"  "gene200" "gene216" "gene333"
ADD COMMENT

Login before adding your answer.

Traffic: 562 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6