Hi Micheal,
I am trying to understand the directionality of the log2 fold change that is reported by DESeq2.
I see in the manual that it says that the first level of the factor will be taken as the dominator for calculations by DESeq2. I also see that the order of the factor levels matters when results() is called. Can you elaborate on how those two work together. For example, lets suppose I am testing between level A and level B, would the results be wrong if in the DESeq() call level A was first, but in the results() call level B was first?
Thanks,
Adam
So...
results(dds, contrast=("cond","B","A"))
returns log2(B/A)?
Yes, this should be printed at the top of the table also when you print it in R.
Hi Michael,
I am honestly still confused regarding the comparison order of DESeq2. If I compare my samples with:
I got a list of 1042 DEGs, but if I change the comparison order to:
I got 1122 DEGs. So around 80 more DEGs this time.
I thought I would get the same list only that the upregulated genes will be listed as downregulated genes and the other way around depending on the comparison order. But I supposed it is not the case. Should wildtyp or non treated samples always serves as baseline?
thanks in advance for your help! Dewi
Can you post all your code and details about what you’re doing. That line of code usually just changes the sign of the test statistic and LFC. Are there more details?
Hi Michael,
thanks for the quick response. Sure I can post it here
I pretty much followed the examples from the vignette but maybe you can give me a hint, where it went wrong. Thanks a lot!
Can you double check that the results() command on the same dds really gives two different lists in a clean R session?
The code in results() just flips the sign of the LFC and statistic so I’m wondering if there were other differences between your two runs that give different output.
For example, are the same number of genes included in both runs?
You are right, if I change the comparison order/level only in the
results()
I got the same list only with opposite values and I think this is due to different statistical values in between beforeresults()
?But this also got me thinking, which condition should then always be used for the baseline?
anyway, thanks a lot for your help and I am a big fan of DESeq2 since 2014 began with my bachelor thesis! :)
You're welcome :)
My guess is that you're not putting the same genes into the dds. You can do some exploration to see if this is the case. But I think it's also possible to have small numerical differences when
DESeq()
is run with a different coding of the design matrix. Butresults()
will be identical.As per baseline, if you don't have a clear reference/control group, just pick one group and use that throughout.