Question

DESeq2 discrepancy in log2FC significance and normalized counts significance

0

Entering edit mode

Marina • 0

@912b3422

Last seen 8 months ago

Australia

Hi all,

I am looking to identify the differentially expressed genes within a cell line upon treatment, and have noticed that for some genes, despite there being significance in the log2FC at my chosen threshold of >1.5 (adj.p <0.05), when cross-checking against their corresponding normalized counts, the two groups show no significant difference. Since the log2FC is derived from the normalized counts, I'm having difficulty understanding why this would be the case, even if there was variability in my replicates/small sample size etc.

Would greatly appreciate if someone could clear this up for mem and confirm whether I would need validation from the normalized counts to confirm what I see on the log2FC level.

Cheers

RNASeqData DESeq2 • 846 views

ADD COMMENT • link 8 months ago Marina • 0

score 0 · Answer 1 · 2023-08-02

0

Entering edit mode

ATpoint ★ 4.0k

@atpoint-13662

Last seen 1 day ago

Germany

Hello. Generally for reproduction of these sorts of questions you need to show at least the results line for that gene and the plot based on the normalized counts. Note that the logFC is not the simply the ratio of normalized counts, but it is calculated as part of the model based on the raw counts, taking into account the size factors and all covariates, plus some other magic happens under the hood to make the logFC estimates reliable, especially when counts and sample size are low. That it the "Moderated" part in the DESeq2 paper called "Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2". See also Log2FoldChange and normalized count values are not consistent?

ADD COMMENT • link 8 months ago ATpoint ★ 4.0k

0

Entering edit mode

Thank you so much for your reply.

Below I've only attached the code I used to normalize the raw counts, "vsd" was used to perform the t-test.

As an example, Exosc6 had a logFC >1.5 and adj. p.value <0.05 when comparing groups 2.5uM and DMSO. However, when performing a t.test (two-tail, paired) between these two groups on the normalized counts (vsd), the p.value is 0.079.

Is this a common occurrence and is it something to worry about?

As an example, Exosc6 had a logFC >1.5 and adj. p.value <0.05 when comparing groups 2.5uM and DMSO. However, when performing a t.test (two-tail, paired) between these two groups on the normalized counts (vsd), the p.value is 0.079.

To clarify, if I'd like to do validation of the genes by qPCR, do I have to ensure the normalized counts between the groups show significanceto confirm the logFC and pvalue observed?

Another question is I noticed that the normalized counts were 7.557 for a lot of the samples, and I think this may be the default value for those that show no expression. How is this value calculated?

Thank you again, please let me know if more information is required :)


# Create a DESeqDataSet object which is used by the DESeq2 package for storing read counts, phenotypic data and gene annotation.
dds <- DESeqDataSetFromMatrix(countData=round(counts), colData = pheno, design = ~1)

# Filtering: Apply pre-filtering to remove rows that have no reads or only 1 read.
min.filt <- rowSums(counts(dds)) > 10

length(min.filt)

dds <- dds[min.filt, ]

dds <- estimateSizeFactors(dds)

# Apply VST normalisation to data
vsd <- vst(dds, blind=FALSE)
rld <- rlog(dds, blind=FALSE)
ntd <- normTransform(dds)

ADD REPLY • link 8 months ago Marina • 0