Hello there,
I am having problems understanding the differential expression analysis from DEseq2. My data is RNAseq of a pathogen in planta. I have 4 biological replicates for each control and treatment samples and I want to compare them. I used Stringtie for assembly and then Tximport to extract the gene-count data. I have read the Vignette multiple times and several posts here but I am still confused about something:
After running the DEseq function, I applied the specific filters to my data - Lfcthreshold of 0.058 and alpha of 0.05. My understanding based on this post lfcThrehold on p-values is that if I filter results after the statistic test I will be doing a *post-hoc** test and invalidating the original results from the Wald test. But when I run the lfcShrinkage function, I cannot add the lfcThreshold because the output is going to be s-values which I do not want. Yet I see the LFC changes to 0 and it filters all the outliers and genes with low counts, but I have the same number of genes for up and down. I was just wondering if it is a correct process regardless of applying lfcThreshold and then Shrinkage or should I stick just with one?
I read other posts when they do both or just one want and it is never consistent and I know it is up to me anyways. It is just confusing to follow and I am hoping I can get some clarification. Thank you so much
dds <- DESeq(dds)
dds <- estimateSizeFactors(dds)
res <- results(dds)
out of 13948 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 2999, 22%
LFC < 0 (down) : 3372, 24%
outliers [1] : 169, 1.2%
low counts [2] : 1833, 13%
(mean count < 0)
res = res[complete.cases(res),]
summary(res)
out of 11947 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 2999, 25%
LFC < 0 (down) : 3372, 28%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 0)
res <- results(dds,
name="condition_treatment_vs_control.",
alpha=0.05,lfcThreshold =0.585,
altHypothesis="greaterAbs")
summary(res)
out of 13948 with nonzero total read count
adjusted p-value < 0.05
LFC > 0.58 (up) : 1592, 11%
LFC < -0.58 (down) : 1839, 13%
outliers [1] : 169, 1.2%
low counts [2] : 2093, 15%
(mean count < 1)
resDEG <- lfcShrink(dds, coef = "condition_treatment_vs_control.", type= "apeglm",
res=res)
resSig <- subset(resDEG, padj < 0.05)
mcols(res, use.names=TRUE)
summary(res)
nrow(res)
out of 3431 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up) : 1593, 46%
LFC < 0 (down) : 1838, 54%
outliers [1] : 0, 0%
low counts [2] : 0, 0%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
Thank you so much, Dr. Love. I compare my results with and without
lfcShrink
. I can notice the difference now clearly. My concern is more that I want P-values and adjusted P-values rather than S-values and I know if I includelfcThreshold
as you mentioned, I will get them. I was confused about how to run both functions on my data.I will just continue my analysis with the traditional padjusted.