Hi,
I am using DESeq2 for DE analysis in RNA-Seq experiment. I have 28 CLL patients raw counts and pooled male control and pooled female control raw counts. When I try to find differentially expressed miRNAs, I am getting wrong results (i.e. not matching with existing literature survey). e.g. hsa-mir-155 should be over-expressed but DESeq2 analysis shows under-expressed etc.
Here are my codes :
expDesign = data.frame(row.names = colnames(cts), condition = c("treated",..."treated","untreated","untreated"))
dds = DESeqDataSetFromMatrix(countData = cts, colData = expDesign, design = ~condition)
featureData = data.frame(gene=rownames(cts))
mcols(dds) = DataFrame(mcols(dds), featureData)
dds = DESeq(dds)
normalized = counts(dds, normalized=TRUE)
res = results(dds)
When I run DESeq2 on my raw counts then I found following results:
LFC(mir-155) = -2.0649
LFC(mir-7e) = 3.5927
LFC(mir-30a) = 4.1988
But after shrinkin LFC, I got the following result:
LFC(mir-155) = 12.3063
LFC(mir-7e) = 5.2165
LFC(mir-30a) = 6.9991
My questions are as follows:
- Which one of above LFC is I should consider as correct
- How to decide whether I should go for shrunken LFC
- How to decide the experiment design as group and condition. Here I have two conditions only (either treated or untreated).
- What is the role of normal while doing expression analysis in DESeq2.
Guidance much needed. Thanks.
Thanks @Michel : Here is my command for generating cts : cts = as.matrix(read.csv('./counts.csv', sep=", ", row.names=1))
So go over your scripts and make sure that the columns of this matrix are in the same order that you are telling DESeq2, and that it’s not in reverse order.
Also, when you compute LFC in DESeq2 make sure the numerator and denominator are correct. See the vignette: “Note on factor levels”
I have checked it many times. Columns of matrix are of same order them I am giving in condition. I'll check the vignette for notes on factor levels. Can you please answer the remaining questions also. An update I would like to give, in first attempt of DE analysis, hsa-mir-155 was down-regulated and when I calculate shrunkun lfc after checking dispersion plot, then hsa-mir-155 was up-regulated. Which one is right? What should I see or check to go for shrnkun lfc?
I’m sorry but I don’t really understand the above questions. Can you rephrase them? The shrunken LFC should not be opposite sign of the MLE. I haven’t seen this happen. Can you show this is really happening and provide all your code?
I have added more information in question, please check the question again.
There is no code here for shrinkage.
Sorry for that. I have done lfc shrinking separately from main program. That's why I forgot to attach in this question. There are two ways for shrinking. I am not sure which way to go. Here is the code :
res1 = lfcShrink(dds, coef = "Intercept", res = results(dds))
Ok so your two problems are that you are not setting factors levels as directed in the vignette (software cannot guess what is the reference in your experiment) and you are shrinking the intercept and comparing to the coefficient of condition differences.
ok. I got it. When I try with
coef = "condition_untreated_vs_treated
then I am getting change in LFC magnitude only not sign. So according to you I should use this instead of Intercept. By changing LFC coef as untreatedvstreated , I have got some good results but I still don't understand what are those statistical conditions or results which tells me that LFC results are not good and I should go for shrinking (because of may be high dispersion/small dataset/ high variance in LFC for very small gene counts ....)At this point I would recommend that you read the papers associated with the software, or partner with someone who can explain how the methods work. Our recommendation is to use the LFC shrinkage, and we have two papers (the 2014 and the new, apeglm paper) showing that it produces better estimates compared to the maximum likelihood estimate (MLE). I'm not going to recap those papers here because they are both open access and easy to read.
Ok.Thanks for your help.
How does DESeq2 take care of correlation in between raw counts of genes expression i.e. if two genes are positively or negatively correlated then it may affect GLM parameter estimation. Is it taken care by normalizing the counts or not?
Many genes are correlated, for example all up-regulated genes are correlated. This is not an issue.