Question: Getting wrong result using DESeq2
0
gravatar for vivekr
9 weeks ago by
vivekr0
vivekr0 wrote:

Hi,

I am using DESeq2 for DE analysis in RNA-Seq experiment. I have 28 CLL patients raw counts and pooled male control and pooled female control raw counts. When I try to find differentially expressed miRNAs, I am getting wrong results (i.e. not matching with existing literature survey). e.g. hsa-mir-155 should be over-expressed but DESeq2 analysis shows under-expressed etc.

Here are my codes :

expDesign = data.frame(row.names = colnames(cts), condition = c("treated",..."treated","untreated","untreated"))
dds = DESeqDataSetFromMatrix(countData = cts, colData = expDesign, design = ~condition)
featureData = data.frame(gene=rownames(cts))
mcols(dds) = DataFrame(mcols(dds), featureData)
dds = DESeq(dds)
normalized = counts(dds, normalized=TRUE)
res = results(dds)

When I run DESeq2 on my raw counts then I found following results:

LFC(mir-155) = -2.0649
LFC(mir-7e) = 3.5927
LFC(mir-30a) = 4.1988

But after shrinkin LFC, I got the following result:

LFC(mir-155) = 12.3063
LFC(mir-7e) = 5.2165
LFC(mir-30a) = 6.9991

My questions are as follows:

  1. Which one of above LFC is I should consider as correct
  2. How to decide whether I should go for shrunken LFC
  3. How to decide the experiment design as group and condition. Here I have two conditions only (either treated or untreated).
  4. What is the role of normal while doing expression analysis in DESeq2.

Guidance much needed. Thanks.

deseq2 • 208 views
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by vivekr0
Answer: Getting wrong result using DESeq2
0
gravatar for Michael Love
9 weeks ago by
Michael Love22k
United States
Michael Love22k wrote:

How did you construct “cts”?

In the vignette we say it is very important that the user not mix up the columns of the count matrix and the rows of colData. This would be my first guess what might have happened if you’re not seeing what’s expected.

ADD COMMENTlink written 9 weeks ago by Michael Love22k

Thanks @Michel : Here is my command for generating cts :   cts = as.matrix(read.csv('./counts.csv', sep=", ", row.names=1))

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by vivekr0

So go over your scripts and make sure that the columns of this matrix are in the same order that you are telling DESeq2, and that it’s not in reverse order.

Also, when you compute LFC in DESeq2 make sure the numerator and denominator are correct. See the vignette: “Note on factor levels”

ADD REPLYlink written 9 weeks ago by Michael Love22k

I have checked it many times. Columns of matrix are of same order them I am giving in condition. I'll check the vignette for notes on factor levels. Can you please answer the remaining questions also. An update I would like to give, in first attempt of DE analysis, hsa-mir-155 was down-regulated and when I calculate shrunkun lfc after checking dispersion plot, then hsa-mir-155 was up-regulated. Which one is right? What should I see or check to go for shrnkun lfc?

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by vivekr0

I’m sorry but I don’t really understand the above questions. Can you rephrase them? The shrunken LFC should not be opposite sign of the MLE. I haven’t seen this happen. Can you show this is really happening and provide all your code?

ADD REPLYlink written 9 weeks ago by Michael Love22k

I have added more information in question, please check the question again.

ADD REPLYlink written 9 weeks ago by vivekr0

There is no code here for shrinkage.

ADD REPLYlink written 9 weeks ago by Michael Love22k

Sorry for that. I have done lfc shrinking separately from main program. That's why I forgot to attach in this question. There are two ways for shrinking. I am not sure which way to go. Here is the code : res1 = lfcShrink(dds, coef = "Intercept", res = results(dds))

ADD REPLYlink written 9 weeks ago by vivekr0

Ok so your two problems are that you are not setting factors levels as directed in the vignette (software cannot guess what is the reference in your experiment) and you are shrinking the intercept and comparing to the coefficient of condition differences.

ADD REPLYlink written 9 weeks ago by Michael Love22k

ok. I got it. When I try with coef = "condition_untreated_vs_treated then I am getting change in LFC magnitude only not sign. So according to you I should use this instead of Intercept. By changing LFC coef as untreatedvstreated , I have got some good results but I still don't understand what are those statistical conditions or results which tells me that LFC results are not good and I should go for shrinking (because of may be high dispersion/small dataset/ high variance in LFC for very small gene counts ....)

ADD REPLYlink written 9 weeks ago by vivekr0

At this point I would recommend that you read the papers associated with the software, or partner with someone who can explain how the methods work. Our recommendation is to use the LFC shrinkage, and we have two papers (the 2014 and the new, apeglm paper) showing that it produces better estimates compared to the maximum likelihood estimate (MLE). I'm not going to recap those papers here because they are both open access and easy to read.

ADD REPLYlink written 9 weeks ago by Michael Love22k

Ok.Thanks for your help.

ADD REPLYlink written 9 weeks ago by vivekr0

How does DESeq2 take care of correlation in between raw counts of genes expression i.e. if two genes are positively or negatively correlated then it may affect GLM parameter estimation. Is it taken care by normalizing the counts or not?

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by vivekr0

Many genes are correlated, for example all up-regulated genes are correlated. This is not an issue.

ADD REPLYlink written 8 weeks ago by Michael Love22k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 375 users visited in the last hour