Question

DESeq2 reports high logFC but not in matrix

1

Entering edit mode

Raj • 0

@raj-9784

Last seen 2.6 years ago

USA

Hi , Could someone please suggest a probable reason for the following contradiction I see with DESeq2?

DESeq2 reports high logFC but the same gene expression median across condition in both normalized counts (obtained from DESeq2) and raw counts in matrix is not difference as reported. For example, DESeq2 reports a logFC as 5 and I see literally 0 difference between the median of gene expression between groupA and groupB.

Code used is straight forward from tutorial '''

DF.CD = data.frame(condition=factor(treat) rownamesDF.CD) = as.character(patient_ID) all.equal(colnames(matrix), rownamesDF.CD))#[1] TRUE

DDS = DESeqDataSetFromMatrix(countData = matrix, colData = DF.CD, design = ~ condition) DDS.ALL = DESeq(DDS, test="Wald",fitType = "parametric") RES = results(DDS.ALL, contrast=c("condition","groupA","groupB"),alpha=0.05)

I thought matrix might be an issue so I looked at the difference in both raw counts and normalized counts. Both say the same trend.

Thanks much for helping,

deseq2 • 751 views

ADD COMMENT • link 3.7 years ago Raj • 0

score 1 · Answer 1 · 2020-08-28

1

Entering edit mode

Michael Love 42k

@mikelove

Last seen 14 hours ago

United States

Can you post plotCounts for the gene with large LFC?

ADD COMMENT • link 3.7 years ago Michael Love 42k

0

Entering edit mode

Hi Michael, Thanks much for reply. Please find the image at DEG in raw counts, normalized counts and DESeq2 plotCounts

Here is the result for the gene look like

> RES.PosNeg[which(RES.PosNeg$log2FoldChange > 5 & RES.PosNeg$padj < 0.05),]
log2 fold change (MLE): condition Positive vs Negative 
Wald test p-value: condition Positive vs Negative 
DataFrame with 1 row and 6 columns
                         baseMean   log2FoldChange            lfcSE
                       <numeric>        <numeric>        <numeric>
ENSG00000173237 12.3322353283969 5.04851575572932 1.08757304864703
                            stat               pvalue                padj
                       <numeric>            <numeric>           <numeric>
ENSG00000173237 4.64200152992926 3.45050305598895e-06 0.00185415246358606

ADD REPLY • link 3.7 years ago Raj • 0

1

Entering edit mode

This looks like the LFC is explained by four samples with large norm counts (100-5000) in the positive group. So makes sense to me, and is not surprising.

If you want to test for differences and have less sensitivity to such large count samples you can use methods such as SAMseq or Swish, which implement Wilcoxon testing (Swish is designed for input from Salmon). Alternatively testing on the log2 scale as is done is Lima-voom would deprioritize such genes likely.

ADD REPLY • link 3.7 years ago Michael Love 42k