Enter the body of text here
Code should be placed in three backticks as shown below
I have the following model on DESeq2 where I am blocking for replicate.
dds <- DESeqDataSetFromMatrix(countData = CPEB4_featureCounts_3utr_matrix,
colData = CPEB4_sample_list,
design = ~ replicate + sample_name)
dds <- DESeq(dds)
These are the metadata:
sample_name replicate
0195_2022 INPUT 4
0196_2022 IgG 4
0197_2022 CPEB4 4
0198_2022 INPUT 5
0199_2022 IgG 5
0200_2022 CPEB4 5
2125_2021 INPUT 1
2126_2021 IgG 1
2127_2021 CPEB4 1
2235_2021 INPUT 2
2237_2021 CPEB4 2
2238_2021 INPUT 3
2239_2021 IgG 3
2240_2021 CPEB4 3
I want to extract the contrast "CPEB4 - IgG"
I can do it by using the results function like this:
CPEB4vsIgG <- results(dds, contrast=c("sample_name","CPEB4","IgG"))
I get the following DEGs:
out of 17300 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 598, 3.5%
LFC < 0 (down) : 30, 0.17%
outliers [1] : 0, 0%
low counts [2] : 7637, 44%
(mean count < 41)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
However, I could also manually calculate the coefficient (I usually do this when I have more complex contrasts), like this:
mod_mat <- model.matrix(design(dds), colData(dds))
CPEB4 <- colMeans(mod_mat[dds$sample_name == "CPEB4", ])
IgG <- colMeans(mod_mat[dds$sample_name == "IgG", ])
CPEB4vsIgG_2 <- results(dds, contrast = (CPEB4 - IgG))
However, with this code I get a slightly different list of DEGs:
summary(CPEB4vsIgG_2)
out of 17300 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 672, 3.9%
LFC < 0 (down) : 81, 0.47%
outliers [1] : 0, 0%
low counts [2] : 7637, 44%
(mean count < 41)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
If I check the coefficient for the two groups I am subtracting it looks like everything is fine:
print(CPEB4)
(Intercept) replicate2 replicate3 replicate4 replicate5 sample_nameIgG
1.0 0.2 0.2 0.2 0.2 0.0
sample_nameINPUT
0.0
print(IgG)
(Intercept) replicate2 replicate3 replicate4 replicate5 sample_nameIgG
1.00 0.00 0.25 0.25 0.25 1.00
sample_nameINPUT
0.00
Why is there this difference?
If I create a model without taking into account the replicate I have the same results with the two approaches.
Thank you I got it!
I actually changed the coefficient for the replicates to be the same between CPBE4 and IgG and I got the exact same result as in the DESeq2 function.
Therefore we must be extra careful when using the second approach in case of unbalanced experimental design!
One small clarification. Considering this kind of unbalanced design, which approach is the correct one? Calling the contrast with
contrast=c("sample_name","CPEB4","IgG")
or the second apporach in my example?I use the coefficient in the model that represents the CPEB4 vs IgG effect, e.g. using the named contrast.