Question

Unusual DESeq2 results

0

Entering edit mode

taamslab • 0

@taamslab-19253

Last seen 5.3 years ago

Hello,

Im currently re performing some analyses of RNA-seq data via DESeq2, however since the first analysis (about a year ago) Ive got some very strange genes turning up as significantly expressed. These genes have exceedingly high p-values and fold changes (>2^15 !!) while also being lowly expressed ( Basemean of <100) and often driven by a single sample.
the experiment is as follows: an RNA seq analysis (Unstranded ) of Cytokine expressing cells from 3 separate patients, comparing 2 cell types (CD4/CD8) expressing 1 of 2 cytokines (pop1/pop2).

any advice would be appreciated. This is the code Ive been using (Cant give the data sets for reproduction due to confidentiality)

dds <- DESeqDataSetFromMatrix(countData = SFpops.454[,3:11], colData = samples.454, design = ~ EXPERIMENT.ID + CELL.CYTOKINE)

vsd <- vst(dds, blind=FALSE)

rld <- rlogTransformation(dds)

res <- results(dds2, contrast = (c("CELL.CYTOKINE", "CD8-pop1", "CD8-pop2")))

and a selection of output results below

genename

baseMean

log2FoldChange

lfcSE

stat

pvalue

padj

significant

TAAR7P	1.175253	25.03149	3.981587	6.286812	3.24E-10	2.22E-07
PRODH	0.815967	23.86939	3.989826	5.982564	2.20E-09	1.16E-06
FLJ46284	6.843331	23.5942	3.779361	6.242907	4.30E-10	2.79E-07
KRT81	0.998216	23.42814	4.022196	5.824714	5.72E-09	2.61E-06
AC007655.1	0.696205	23.32993	3.994779	5.840105	5.22E-09	2.40E-06
AC022916.4	0.698751	23.09876	4.023616	5.740795	9.42E-09	4.06E-06
SMIM25	3.920455	22.88505	2.405183	9.51489	1.82E-21	2.40E-17

Is there something altered in the package which makes these really lowly abundant genes now below the critical threshold?

deseq2 rnaseq • 427 views

ADD COMMENT • link updated 5.3 years ago by Michael Love 41k • written 5.3 years ago by taamslab • 0

score 0 · Answer 1 · 2019-01-07

Lowly expressed is not a barrier to having evidence of DE. A simple example: [10,10,10] vs [100,100,100]. Surely this is a DE gene for an experiment with low dispersion. The p-value will certainly be low enough for this gene to enter, e.g. a 1% FDR bounded set. The logFC will be very large, and undefined if the first group has all 0s, but if you use lfcShrink() you can obtain finite LFC even for such cases. I recommend to use lfcShrink() with type="apeglm" which provides reliable LFC estimates as opposed to the MLE which is printed in the table above. If you say the genes are driven by a single sample, usually that would be detected by the outlier filtering, but it can be missed sometimes, because our outlier filtering approach is not a perfect solution. You can also remove these cases via lfcShrink(). They will very likely have a reduced LFC when you apply shrinkage, and you can filter out such genes either using lfcThrehsold (which generates new posterior probabilities) or a post-hoc threshold on the shrunken LFC.