Question

Which genes are outliers in DESeq2?

0

Entering edit mode

James • 0

@c2575e9b

Last seen 2.0 years ago

United States

Hello! I had a question that I could not find the answer to on the DESeq2 vignettes page or previous forum posts:

After running res <- results(dds) and summary(res), the code says I have 8 outliers (or 80 outliers for another dataset). Is there an easy way to figure out the identities of the genes that are identified as outliers? I could manually check differences in the res objects after running results(dds) and results(dds, cooksCutoff=FALSE), but that felt clunky and time-consuming.

Thank you!

DESeq2 RNAseq outlier • 1.7k views

ADD COMMENT • link updated 2.0 years ago by ATpoint ★ 4.7k • written 2.0 years ago by James • 0

score 0 · Answer 1 · 2023-02-03

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 3 hours ago

United States

Outliers will have high maxCooks values in mcols(dds) -- the threshold is qf(.99, p, n - p) where n is sample size and p is number of coefficients in the model (length of resultsNames(dds)). These will also have a NA pvalue.

ADD COMMENT • link 2.0 years ago Michael Love 43k

score 0 · Answer 2 · 2023-02-03

You can take over some of the lines from results() to find them easily:

suppressPackageStartupMessages(library(DESeq2))

set.seed(10)
dds <- makeExampleDESeqDataSet()

dds <- DESeq(dds)
#> estimating size factors
#> estimating dispersions
#> gene-wise dispersion estimates
#> mean-dispersion relationship
#> final dispersion estimates
#> fitting model and testing
res <- results(dds)
summary(res)
#> 
#> out of 999 with nonzero total read count
#> adjusted p-value < 0.1
#> LFC > 0 (up)       : 0, 0%
#> LFC < 0 (down)     : 0, 0%
#> outliers [1]       : 4, 0.4%
#> low counts [2]     : 0, 0%
#> (mean count < 0)
#> [1] see 'cooksCutoff' argument of ?results
#> [2] see 'independentFiltering' argument of ?results

# From the results() function to get the outlier genes
m <- nrow(attr(dds, "dispModelMatrix"))
p <- ncol(attr(dds, "dispModelMatrix"))
cooksCutoff <- qf(0.99, p, m - p)
cooksOutlier <- mcols(dds)$maxCooks > cooksCutoff

w <- which(cooksOutlier)

# that is the outliers, four as reported in the summary(res)
rownames(dds)[w]
#> [1] "gene36"  "gene200" "gene216" "gene333"