DESEQ2 LFC calculation
1
0
Entering edit mode
@amalthomas111-13189
Last seen 3.3 years ago
United States

Hello DESeq2 community,

Could you please tell me whether I am interpreting the LFC calculation right. 

Raw counts for geneX:
    condition1    condition1    condition1    condition1    condition1    condition2    condition2    condition2    condition2
    sample1    sample2    sample3    sample4    sample5    sample6    sample7    sample8    sample9
geneX    60    57    69    36    41    134    114    42    19

DESeq2 Normalized counts for geneX (stored in normaizedcounts):                                    
    condition1    condition1    condition1    condition1    condition1    condition2    condition2    condition2    condition2
    sample1    sample2    sample3    sample4    sample5    sample6    sample7    sample8    sample9
geneX    72.0426614733    61.9232190902    61.5316071546    45.8332912714    28.5786292082    116.8379135112    111.1193128782    54.2858927513    14.8136746784

> log2(mean(normaizedcounts[c(rownames(colData(dds))[colData(dds)$condition=="condition1"])])/mean(normaizedcounts[c(rownames(colData(dds))[colData(dds)$condition=="condition2"])]))
[1] -0.4601916

> res = results(dds, alpha=0.05, contrast = c("condition","condition1","condition2"))
> res["geneX",]

log2 fold change (MLE): condition condition1 vs condition2 
Wald test p-value: condition condition1 vs condition2 
DataFrame with 1 row and 6 columns
                 baseMean log2FoldChange     lfcSE      stat       pvalue       padj
                <numeric>      <numeric> <numeric> <numeric>    <numeric>  <numeric>
geneX  429.1487       2.304769 0.6485406  3.553777 0.0003797409 0.01325116

The DESeq2 calculated LFC is 2.3 while calculating the log2 mean of normalized counts in the groups is -0.46. I wanted to know whether the LFC shrinkage is making the difference here. I checked the variance of each group. I found that the variance of "condition2" group for this gene is very high. So is this high difference in LFC calculation because DESeq2 shrinks the LFC of "condition2" group to zero by a higher degree than that of "condition1" samples?

> var(normaizedcounts[,c(rownames(colData(dds))[colData(dds)$condition=="condition1"])]["geneX",])
[1] 289.4949
> var(normaizedcounts[,c(rownames(colData(dds))[colData(dds)$condition=="condition2"])]["geneX",])
[1] 2368.106

Using  DESeq2_1.16.1 version.

Thanks,

deseq2 • 2.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 days ago
United States

I'm not sure you've got the same gene there. The base mean is 429, but with the normalized counts you gave it should be 63. Are there more conditions? Note that there is no LFC shrinkage here: it says "log2 fold change (MLE)"

ADD COMMENT
0
Entering edit mode

Yes, there are more conditions. Actually, there are about 45 samples belonging to different conditions. Also, my design has more covariates. 

> dds  = DESeqDataSetFromMatrix(countData= df, colData= design.matrix, ~ culture + dissociation +  celllinetargetorgan)
> dds =  DESeq(dds)​

> res = results(dds, alpha=0.05, contrast = c("celllinetargetorgan","Brx50brai","Brx50H"))​
> normalizedcounts = counts(dds, normalized=T)​
> normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50brai"])]["ENSG00000141655",]
X50brai.56.3 X50brai.58.3 X50brai.83.2 X50brai.84.2 X50brai.59.3 
    72.04266     61.92322     61.53161     45.83329     28.57863 
> normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50H"])]["ENSG00000141655",]
    X50HR      X50H  X50HD.R1  X50HD.R2 
 54.28589  14.81367 116.83791 111.11931 
> res["ENSG00000141655",]
log2 fold change (MLE): celllinetargetorgan Brx50brai vs Brx50H 
Wald test p-value: celllinetargetorgan Brx50brai vs Brx50H 
DataFrame with 1 row and 6 columns
                 baseMean log2FoldChange     lfcSE      stat       pvalue       padj
                <numeric>      <numeric> <numeric> <numeric>    <numeric>  <numeric>
ENSG00000141655  429.1487       2.304769 0.6485406  3.553777 0.0003797409 0.01325116

> log2(mean(normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50brai"])]["ENSG00000141655",]) / mean(normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50H"])]["ENSG00000141655",]))
[1] -0.4601916


> var(normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50brai"])]["ENSG00000141655",])
[1] 289.4949
> var(normalizedcounts[,c(rownames(colData(dds))[colData(dds)$celllinetargetorgan=="Brx50H"])]["ENSG00000141655",])
[1] 2368.106

If it is not LFC shrinkage, what could be the reason behind difference in DESeq2 LFC (2.3) and manually calculated LFC?

ADD REPLY
1
Entering edit mode

The LFC is not the ratio of the mean of normalized counts when additional variables are in the design. It's a bit more complex to explain, but this is how you get language in research papers that say "controlling for the variables...". I'd recommend discussing this meaning more in depth with a statistical collaborator if you are interested.

ADD REPLY
0
Entering edit mode

Thank you very much, Michael.

ADD REPLY

Login before adding your answer.

Traffic: 578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6