Unusual DESeq2 results
1
0
Entering edit mode
taamslab • 0
@taamslab-19253
Last seen 5.3 years ago

Hello,

Im currently re performing some analyses of RNA-seq data via DESeq2, however since the first analysis (about a year ago) Ive got some very strange genes turning up as significantly expressed. These genes have exceedingly high p-values and fold changes (>2^15 !!) while also being lowly expressed ( Basemean of <100) and often driven by a single sample.
the experiment is as follows: an RNA seq analysis (Unstranded ) of Cytokine expressing cells from 3 separate patients, comparing 2 cell types (CD4/CD8) expressing 1 of 2 cytokines (pop1/pop2).

any advice would be appreciated. This is the code Ive been using (Cant give the data sets for reproduction due to confidentiality)

 

dds <- DESeqDataSetFromMatrix(countData = SFpops.454[,3:11], colData = samples.454, design = ~ EXPERIMENT.ID + CELL.CYTOKINE)

vsd <- vst(dds, blind=FALSE)

rld <- rlogTransformation(dds)

res <- results(dds2, contrast = (c("CELL.CYTOKINE", "CD8-pop1", "CD8-pop2")))

and a selection of output results below

genename baseMean log2FoldChange lfcSE stat pvalue padj significant
TAAR7P 1.175253 25.03149 3.981587 6.286812 3.24E-10 2.22E-07
PRODH 0.815967 23.86939 3.989826 5.982564 2.20E-09 1.16E-06
FLJ46284 6.843331 23.5942 3.779361 6.242907 4.30E-10 2.79E-07
KRT81 0.998216 23.42814 4.022196 5.824714 5.72E-09 2.61E-06
AC007655.1 0.696205 23.32993 3.994779 5.840105 5.22E-09 2.40E-06
AC022916.4 0.698751 23.09876 4.023616 5.740795 9.42E-09 4.06E-06
SMIM25 3.920455 22.88505 2.405183 9.51489 1.82E-21 2.40E-17


Is there something altered in the package which makes these really lowly abundant genes now below the critical threshold?
 

deseq2 rnaseq • 427 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 5 hours ago
United States

Lowly expressed is not a barrier to having evidence of DE. A simple example: [10,10,10] vs [100,100,100]. Surely this is a DE gene for an experiment with low dispersion. The p-value will certainly be low enough for this gene to enter, e.g. a 1% FDR bounded set. The logFC will be very large, and undefined if the first group has all 0s, but if you use lfcShrink() you can obtain finite LFC even for such cases. I recommend to use lfcShrink() with type="apeglm" which provides reliable LFC estimates as opposed to the MLE which is printed in the table above. If you say the genes are driven by a single sample, usually that would be detected by the outlier filtering, but it can be missed sometimes, because our outlier filtering approach is not a perfect solution. You can also remove these cases via lfcShrink(). They will very likely have a reduced LFC when you apply shrinkage, and you can filter out such genes either using lfcThrehsold (which generates new posterior probabilities) or a post-hoc threshold on the shrunken LFC.

ADD COMMENT

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6