Question

DESeq2 output number of genes padj <0.1 is 0?

0

Entering edit mode

hs.lansdell ▴ 20

@hslansdell-14246

Last seen 8.3 years ago

Morning! I've just run DESeq2 on my RNAseq data with a dichotomous outcome, and I'm getting results that mean I have absolutely no deferentially expressed genes...

My input is a count matrix with samples in columns and genes in rows, i.e:

XXX1 XXX2 XXX3

Gene 1

Gene 2

Gene 3

My sample information table:

condition

XXX1 y

XXX2 y

XXX3 n

My code:

data<-read.csv("Input.csv", header=TRUE, row.names = 1, stringsAsFactors = FALSE) colnames(data) <- substring(colnames(data), 2)

colData<-read.csv("Condition.csv",header = TRUE, row.names = 1) #Double check names match up between Sample and data matrix all(rownames(colData)==colnames(data))

dds<-DESeqDataSetFromMatrix(countData = data, colData = colData, design= ~condition)

dds$condition <- factor(dds$condition, levels = c("no","yes"))

dds<-DESeq(dds) res<-results(dds)

My results:

summary(res)

out of 20338 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 0, 0%
LFC < 0 (down) : 0, 0%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 0)

sum(res$padj < 0.1, na.rm=TRUE)
[1] 0

Since it doesn't follow to have 0 differentially expressed genes, I'm not sure what I've done wrong.

Thanks!

deseq2 adjusted pvalue • 2.1k views

ADD COMMENT • link updated 8.3 years ago by Michael Love 43k • written 8.3 years ago by hs.lansdell ▴ 20

score 0 · Answer 1 · 2017-11-03

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

"Since it doesn't follow to have 0 differentially expressed genes, I'm not sure what I've done wrong."

I presume by this that you expect there to be some differentially expressed genes comparing Y to the two N samples.

The reason you cannot detect DE here (presuming there are true differences) is due to the sample size.

2 vs 1 is actually the absolute minimum for the software to be able to estimate variance. A sample size of 3 vs 3 I would consider a practical minimal to have some power to detect large effect sizes, and that only works if there is limited biological variability.

Here is a paper which explores sensitivity as a function of sample size for RNA-seq:

https://www.ncbi.nlm.nih.gov/pubmed/27022035

ADD COMMENT • link 8.3 years ago Michael Love 43k

0

Entering edit mode

So, that was just to show how my files are arranged. My input has 183 samples and 20338 genes.

ADD REPLY • link 8.3 years ago hs.lansdell ▴ 20

0

Entering edit mode

And how do you know that the two groups have any differences in gene expression?

ADD REPLY • link 8.3 years ago Michael Love 43k