How to account for solvent controls in DESeq2 when comparing follicular vs luteal phase?
1
1
Entering edit mode
snijesh ▴ 200
@snijesh-20358
Last seen 42 minutes ago
India

I have an RNA-seq dataset with 8 samples (genes as rows, samples as columns):

AA - FP SC T1   (Follicular phase, solvent control, replicate 1)
AB - FP T1      (Follicular phase, treated, replicate 1)
AC - FP SC T2   (Follicular phase, solvent control, replicate 2)
AD - FP T2      (Follicular phase, treated, replicate 2)
AG - LP SC T1   (Luteal phase, solvent control, replicate 1)
AE - LP T1      (Luteal phase, treated, replicate 1)
AF - LP SC T2   (Luteal phase, solvent control, replicate 2)
AH - LP T2      (Luteal phase, treated, replicate 2)

My goal is to perform differential expression analysis between FP and LP, but I want to remove the effect of solvent controls so that I capture only the biological differences between the two phases.

What would be the correct way to handle the solvent controls in DESeq2 for this type of design?

Possible approaches I considered

  1. Subtract SC from treated counts within each replicate before running DESeq2 (e.g., AB-AA = FP_T1_corrected).: This reduces to 4 corrected samples: FP_T1, FP_T2, LP_T1, LP_T2. But here I am concerned about whether the subtraction can create negative counts, and only 2 replicates per group remain.

  2. Model SC directly in DESeq2 using an interaction term: Metadata includes phase (FP/LP), condition (SC/Treated), and replicate (T1/T2).

design ~ replicate + phase + condition + phase:condition

Then use the interaction term (phaseLP.conditionTreated) to test whether LP vs FP differences hold after adjusting for SC.

What is the best practice in this situation?

DESeq2 • 291 views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 2 days ago
United States

Your approach 1 is a bad idea. Subtracting the gene counts won't do what you want to do.

I would personally check to see if there is any evidence that the solvent has an effect on gene expression in the two phases. I sort of doubt it does, but if there are no genes that are differentially expressed between the controls at the two phases, then you can safely ignore the solvent effect. The interaction term is in general much lower powered than a direct contrast, so ideally you would assume that the solvent effects are consistent, and a direct comparison of the treated in each phase will provide those genes that are affected differently in the follicular vs luteal phase.

0
Entering edit mode

I have checked DEGs between Solvent Control (SC) of FP and LP and there are noticebale differences in terms of DEGs.

# Checking SC in FP and LP
# Read featureCounts table
counts = read.table("counts.txt", header=TRUE, sep="\t", row.names=1)

# Metadata
targets = data.frame(ID = c("YHDAA","YHDAB","YHDAC","YHDAD","YHDAE","YHDAF","YHDAG","YHDAH"),
                     Name = c("FP_SC_T1","FP_T1","FP_SC_T2","FP_T2", "LP_SC_T1","LP_T1","LP_SC_T2","LP_T2"),
                     phase = c("FP","FP","FP","FP","LP","LP","LP","LP"),
                     Group = c("FP_SC","FP","FP_SC","FP","LP_SC","LP","LP_SC","LP"),
                     condition = c("SC","Treated","SC","Treated","SC","Treated","SC","Treated"))

# converting 'NA' values to 0
counts[is.na(counts)] <- 0
# make sample_info and counts in same order
counts = counts[, targets$ID]
#generate dds object
dds <- DESeqDataSetFromMatrix(countData = counts, colData = targets, design = ~ Group)

# Remove genes with 0 counts
keep_genes <- rowSums(counts(dds)) > 0
dds <- dds[ keep_genes, ]
# genes to have more than a sum total of 10 reads of support in all samples
dds = dds[rowSums(counts(dds)) > 10, ]

dds = DESeq(dds, betaPrior=FALSE, parallel=TRUE, BPPARAM=SnowParam(5))
res = results(dds, contrast = c("Group", "FP_SC", "LP_SC"), parallel = TRUE)

This has identified quite a good number of DEGs.

Next approach,

#DESeq2 object with interaction design
dds <- DESeqDataSetFromMatrix(countData = counts, colData = targets, design = ~ phase * condition)
# Equivalent to: ~ phase + condition + phase:condition

How do I calculate the res here by negating the SC effect?

res <- results(dds, name = "phaseLP.conditionTreated")

ADD REPLY
0
Entering edit mode

Not sure I understand. You ask how to do something and then show how one would do so.

0
Entering edit mode

I would like to clarify whether the command res <- results(dds, name = "phaseLP.conditionTreated") is the correct approach to generate differentially expressed genes (DEGs) when aiming to account for or negate the effect of the solvent control.

ADD REPLY

Login before adding your answer.

Traffic: 920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6