Getting wrong result using DESeq2
1
0
Entering edit mode
vivekr • 0
@vivekr-14904
Last seen 5.6 years ago

Hi,

I am using DESeq2 for DE analysis in RNA-Seq experiment. I have 28 CLL patients raw counts and pooled male control and pooled female control raw counts. When I try to find differentially expressed miRNAs, I am getting wrong results (i.e. not matching with existing literature survey). e.g. hsa-mir-155 should be over-expressed but DESeq2 analysis shows under-expressed etc.

Here are my codes :

expDesign = data.frame(row.names = colnames(cts), condition = c("treated",..."treated","untreated","untreated"))
dds = DESeqDataSetFromMatrix(countData = cts, colData = expDesign, design = ~condition)
featureData = data.frame(gene=rownames(cts))
mcols(dds) = DataFrame(mcols(dds), featureData)
dds = DESeq(dds)
normalized = counts(dds, normalized=TRUE)
res = results(dds)

When I run DESeq2 on my raw counts then I found following results:

LFC(mir-155) = -2.0649
LFC(mir-7e) = 3.5927
LFC(mir-30a) = 4.1988

But after shrinkin LFC, I got the following result:

LFC(mir-155) = 12.3063
LFC(mir-7e) = 5.2165
LFC(mir-30a) = 6.9991

My questions are as follows:

  1. Which one of above LFC is I should consider as correct
  2. How to decide whether I should go for shrunken LFC
  3. How to decide the experiment design as group and condition. Here I have two conditions only (either treated or untreated).
  4. What is the role of normal while doing expression analysis in DESeq2.

Guidance much needed. Thanks.

deseq2 • 3.4k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

How did you construct “cts”?

In the vignette we say it is very important that the user not mix up the columns of the count matrix and the rows of colData. This would be my first guess what might have happened if you’re not seeing what’s expected.

ADD COMMENT
0
Entering edit mode

Thanks @Michel : Here is my command for generating cts :   cts = as.matrix(read.csv('./counts.csv', sep=", ", row.names=1))

ADD REPLY
0
Entering edit mode

So go over your scripts and make sure that the columns of this matrix are in the same order that you are telling DESeq2, and that it’s not in reverse order.

Also, when you compute LFC in DESeq2 make sure the numerator and denominator are correct. See the vignette: “Note on factor levels”

ADD REPLY
0
Entering edit mode

I have checked it many times. Columns of matrix are of same order them I am giving in condition. I'll check the vignette for notes on factor levels. Can you please answer the remaining questions also. An update I would like to give, in first attempt of DE analysis, hsa-mir-155 was down-regulated and when I calculate shrunkun lfc after checking dispersion plot, then hsa-mir-155 was up-regulated. Which one is right? What should I see or check to go for shrnkun lfc?

ADD REPLY
0
Entering edit mode

I’m sorry but I don’t really understand the above questions. Can you rephrase them? The shrunken LFC should not be opposite sign of the MLE. I haven’t seen this happen. Can you show this is really happening and provide all your code?

ADD REPLY
0
Entering edit mode

I have added more information in question, please check the question again.

ADD REPLY
0
Entering edit mode

There is no code here for shrinkage.

ADD REPLY
0
Entering edit mode

Sorry for that. I have done lfc shrinking separately from main program. That's why I forgot to attach in this question. There are two ways for shrinking. I am not sure which way to go. Here is the code : res1 = lfcShrink(dds, coef = "Intercept", res = results(dds))

ADD REPLY
0
Entering edit mode

Ok so your two problems are that you are not setting factors levels as directed in the vignette (software cannot guess what is the reference in your experiment) and you are shrinking the intercept and comparing to the coefficient of condition differences.

ADD REPLY
0
Entering edit mode

ok. I got it. When I try with coef = "condition_untreated_vs_treated then I am getting change in LFC magnitude only not sign. So according to you I should use this instead of Intercept. By changing LFC coef as untreatedvstreated , I have got some good results but I still don't understand what are those statistical conditions or results which tells me that LFC results are not good and I should go for shrinking (because of may be high dispersion/small dataset/ high variance in LFC for very small gene counts ....)

ADD REPLY
0
Entering edit mode

At this point I would recommend that you read the papers associated with the software, or partner with someone who can explain how the methods work. Our recommendation is to use the LFC shrinkage, and we have two papers (the 2014 and the new, apeglm paper) showing that it produces better estimates compared to the maximum likelihood estimate (MLE). I'm not going to recap those papers here because they are both open access and easy to read.

ADD REPLY
0
Entering edit mode

Ok.Thanks for your help.

ADD REPLY
0
Entering edit mode

How does DESeq2 take care of correlation in between raw counts of genes expression i.e. if two genes are positively or negatively correlated then it may affect GLM parameter estimation. Is it taken care by normalizing the counts or not?

ADD REPLY
0
Entering edit mode

Many genes are correlated, for example all up-regulated genes are correlated. This is not an issue.

ADD REPLY

Login before adding your answer.

Traffic: 332 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6