Difference between log2FC(MLE) and calculated log2 fold change in DESeq2 ver 1.20.0
1
0
Entering edit mode
cycy • 0
@cycy-21203
Last seen 2.1 years ago

I am curious about why the calculated log2 fold change value differs from the log2FoldChange of DESeq2 and want to know the cause.

Result (three condition/ Total 16 samples):

Condition 1 normalized counts: 0.000000 4.496866 8.383799 9.168738 5.433209

Condition 2 normalized counts: 5.609478 7.348834 6.021589 6.293060 6.732453

Condition 3 normalized counts: 4.727638 10.062812 8.112052 10.146985 8.873856 8.106829

Condition 1 average: 5.496522

Condition 2 average: 6.401083

Calculated log2 fold change: log2(6.401083/5.496522) = 0.219797

log2 fold change (MLE): condition Condition 2 vs Condition 1 : -0.00487575611632497

Can you tell me how to calculate log2 fold change? If it is difficult to tell me about the detailed method, I would like to know what factors(ex. baseMean lfcSE...) affect calculations and the cause of difference between plus(Calculated log2 fold change) and minus(log2 fold change (MLE): condition Condition 2 vs Condition 1).

deseq2 • 972 views
0
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

The MLE LFC is typically the log2 ratio of the mean of normalized counts, if it is a simple design with one factor.

You haven’t provided any of your code so I don’t know about your design or how you’ve run DESeq2 though.

0
Entering edit mode

txi.rsem.ex <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)

dds <- DESeqDataSetFromTximport(txi.rsem.ex, sampleTable, ~condition)

dds <- DESeq(dds, fitType = "parametric")

keep <- rowSums(counts(dds)) >= 10

dds <- dds[keep,]

dds <- DESeq(dds,fitType = "parametric")

head(counts(dds,normalized=TRUE),1)

ENSG00000000003.14_1 / 0.00000 4.496866 8.383799 9.168738 5.433209 5.609478 7.348834 6.021589 6.29306 6.732453 4.727638 10.06281 8.112052 10.14698 8.873856 8.106829

res = results(dds, contrast=c('condition','Condition2','Condition1'), independentFiltering=FALSE)

res

log2 fold change (MLE): condition Condition2 vs Condition1

Wald test p-value: condition Condition2 vs Condition1

DataFrame with 15669 rows and 6 columns

                        baseMean       log2FoldChange             lfcSE

<numeric>            <numeric>         <numeric>


ENSG00000000003.14_1/6.84488728991658 -0.00487575611632497 0.470063450907851

ENSG00000000419.12_1/ 471.531051580136 -0.61398371888155 0.26185744737714

ENSG00000000457.13_1/ 532.837380784179 0.0970014203408028 0.102600280199951

ENSG00000000460.16_2/ 112.059734944934 0.00740154312148043 0.190685869057898

ENSG00000000938.12_1/ 14989.1210747367 0.248168496953757 0.204711629956007

... ... ... ...

ENSG00000283496.1_1/ 0.603572642839276 2.59110989861822 3.00222231413428

ENSG00000283528.1_1/ 26.5572341542125 0.7132833237189 0.518081989743246

ENSG00000283632.1_1/ 2.18792323399264 0.509781159465374 0.897894139446614

ENSG00000283683.1_1/ 0.675640168262519 0.230315303559532 1.3305502133294

ENSG00000283697.1_1/ 0.686822760894622 -2.67178327010738 1.97936990724239

I would appreciate it if you check it again. Thank you for your quick reply.

0
Entering edit mode

Can you post this:

dds$condition  ADD REPLY 0 Entering edit mode dds$condition

[1] Condition1 Condition1 Condition1 Condition1 Condition1 Condition2 Condition2 Condition2 Condition2 [10] Condition2 Condition3 Condition3 Condition3 Condition3 Condition3 Condition3

0
Entering edit mode

I'm not sure why you see something different there. Is it different for all genes? Or just the first gene? Were there any messages printed when you ran DESeq()?

When I plug in these counts (rounded) into a simulated data set I get nearly the same LFC as using normalized counts (it is not necessarily that the MLE is identical to the simple ratio for a GLM, but typically very close).


> dds <- makeExampleDESeqDataSet(m=16)
> dds$condition <- factor(rep(1:3,c(5,5,6))) > counts(dds)[1,] <- c(0L,5L,8L,8L,5L,6L,7L,6L,6L,7L,5L,10L,8L,10L,9L,8L) > dds <- DESeq(dds) > as.data.frame(results(dds, name="condition_2_vs_1")[1,]) baseMean log2FoldChange lfcSE stat pvalue padj gene1 6.423135 0.3054816 0.6574549 0.4646427 0.6421874 0.9454032 > log2(mean(counts(dds,normalized=TRUE)[1,dds$condition=="2"])/mean(counts(dds,normalized=TRUE)[1,dds$condition=="1"])) [1] 0.3085395  ADD REPLY 0 Entering edit mode Not all genes are different and I did not see any messages when I ran DESeq2. C=counts(dds,normalized=TRUE) test1 = apply(C, 1, function(x){ log2( mean(x[6:10]) / mean(x[1:5]) ) }) test2 = data.frame(calculated=test1, log2FC=res$log2FoldChange)

dim(test2[which(abs(test2$calculated - test2$log2FC) > 0.2),])

[1] 1084 2

head(test2[which(abs(test2$calculated - test2$log2FC) > 0.2),],10)

                   calculated   log2FC


ENSG00000000003.14_1/ 0.21979688 -0.004875756

ENSG00000003137.8_1/ 5.21396910 4.260814729

ENSG00000003436.14_1/ -2.52976087 -2.313205639

ENSG00000005108.15_2/ 2.29681243 2.038137364

ENSG00000006042.11_1/ -2.00756362 -1.597780853

ENSG00000006659.12_1/ 0.93533078 0.519878947

ENSG00000006704.10_2/ -0.03160563 -0.246252646

ENSG00000007062.11_2/ 0.81472789 0.413966060

ENSG00000007171.16_2/ 1.58032878 1.373813189

ENSG00000007306.14_1/ 0.58076389 0.039064557

1
Entering edit mode

I took a look, it's the offsets (called size factors / normalization factors in DESeq2) in the GLM, that affect the contribution of the samples per gene, which means that the GLM gives a different answer than the simple ratio of averages. This is ok, there's no issue in the analysis.

For example, for the first gene:

> unname(normalizationFactors(dds)[1,])
[1] 0.3069981 0.6671314 1.1927767 1.1997289 1.2883730 1.4261578 0.9525320 1.1624838 1.4301469
[10] 1.1882741 0.6345664 1.0931338 0.7396403 1.2811688 1.3522870 1.1101752

0
Entering edit mode

Dear Mr. Love,

is it possible to tell more precisely how the normalization factors affect the final ratio as it is not exactly the log2 ratio of the means of normalized counts?

Sticking to the simple scenario depicted above: One factor with two levels A and B, I tried to inccorporate the normalization factors and later the size factors to somehow get the log2foldchanges as in the results() output, but in vain.

I am sorry if this was clarified elsewhere already or should rather be a new question on its own.

Thank you very much in advance

0
Entering edit mode

How about any notes printed during DESeq()?

Could you share the dds with me via email? I'm maintainer("DESeq2")