What is the best way to compare a subset of data for differential expression analysis. The vignette for DESeq2 says to use contrasts but I see most people in the literature don't use contrasts and pre-subset their data instead.
Can someone explain the best way to do this? What are the differences between the methods? I understand that subsetting the data alters the dispersion estimates but what about case #1 verses case #2 below, don't they both consider all the samples? Does altering the dispersion estimates matter?
For example, lets say that I have extracted RNA from treated and control tissue at two time points, 24 hours and 48 hours. I am interested in which genes are differentially expressed at 48 hours. The way I see it there are three ways to do this but I am not sure which one is correct.
- Use contrasts
This is the method that is recommended and in the vignette. Here I have a separate column in the metadata for each factor. For example, a column for treatment that has treated or control, and a column for time with 48h or 24h.
dds <- DESeqDataSetFromMatrix(
countData = counts,
colData = meta, #here the metadata would contain a column for treatment and for time
design = ~ time_point +
treatment +
time_point:treatment)
res <- results(
dds,
contrast = list(
c("treatment_inoculated_vs_control",
"time_point48h.treatmenttreated")
)
)
- subset the data using metadata
Here the metadata would contain a column (groups) that has treatment and time together instead of two separate columns. For example, treated_24, control_24, treated_48, control_48.
dds <- DESeqDataSetFromMatrix(
countData = counts,
colData = meta,
design = ~ groups
)
res <- results(dds, contrast = c("groups", "treated_48h", "control_48h"))
- pre-subset the data in R and then run deseq2.
Here I could use either metadata but I will use the one from #1.
48h_df <- meta[meta$time == "48h",] #I would also filter the count data
dds <- DESeqDataSetFromMatrix(
countData = counts,
colData = 48h_df,
design = ~ treatment
)
res <- results(dds, contrast = c("treatment", "treated", "control"))
So which way is "correct"? What are the pros and cons?
Thank you!
