Dear community,
a quick question on how fold changes are calculated in DESeq2. Below I have the counts for a gene X in 5 treatments and a control, with 4 replicates:
Control : 1 0 0 0
Treatment 1: 1 3 475 0
Treatment 2: 0 0 0 0
Treatment 3: 1 0 0 0
Treatment 4: 1 1 0 0
Treatment 5: 0 0 0 0
After running DESeq2 (code below), these are my results comparing each treatment to the control:
baseMean log2FoldChange lfcSE stat pvalue padj
treat1 22.81609 9.069945 3.5469 2.557085 NA NA
treat2 22.8160 -15.59239 3.69023 -4.225318 NA NA
treat3 22.81609 0.0142733 3.6902 0.003867862 NA NA
treat4 22.81609 0.9338621 3.6900 0.253077 NA NA
treat5 22.81609 -15.42883 3.69023 -4.180993 NA NA
I understand the NA p-values, as DESeq is probably taking that "475" as an outlier. My question is: when log2FC is calculated in a treatment1_vs_control comparison for example, is it based directly on the counts of treat 1 and control, or does it involve the overall counts of that gene across treatments? And any idea of why the logFC is so contrasting in treatments 2, 3, 4 and 5, even though their counts were somewhat similar?
#Adding data
cts <- (read.csv("count_matrix.csv",sep=",",check.names=FALSE,row.names="gene_id"))
coldata <- read.csv("metadata.csv", row.names=1)
coldata$treatment <- factor(coldata$treatment)
head(cts,5)
coldata
##Check rownames
rownames(coldata) <- sub("fb", "", rownames(coldata))
all(rownames(coldata) %in% colnames(cts))
all(rownames(coldata) == colnames(cts))
cts <- cts[, rownames(coldata)]
all(rownames(coldata) == colnames(cts))
##Create a DESeq Data Set
dds <- DESeqDataSetFromMatrix(countData = cts,
colData = coldata,
design = ~ treatment)
dds
featureData <- data.frame(gene=rownames(cts))
mcols(dds) <- DataFrame(mcols(dds), featureData)
mcols(dds)
##Pre-filtering
keep <- rowSums(counts(dds)) >= 1
dds <- dds[keep,]
##Setting levels
dds$treatment <- relevel(dds$treatment, ref = "control")
##Running Deseq
dds <- DESeq(dds)
res <- results(dds)
res
###Analyzing Treatment vs Control (1 analysis per Treatment)
res_treatment1 <- results(dds, contrast=c("treatment","treatment1","control"))
##Log fold change shrinkage for visualization and ranking
resultsNames(dds)
resLFC_treatment1 <- lfcShrink(dds, coef="treatment_treatment1_vs_mock", type="apeglm")
resLFC_treatment1
####p-values and adjusted p-values
#We can order our results table by the smallest p value:
resOrdered <- res_treatment1L[order(res_treatment1$pvalue),]
summary(res_treatment1)
I agree with you, no conclusions to be made here, I was looking at this gene just because those contrasting FC were making it stand out. So I was trying to understand how FCs results were calculated.
Well, the fold change between not zero and zero should be pretty high, right? So it's pretty high. I think doing LFCshrink would attenuate that effect somewhat.