Note: I do not have biological replicates. I have read the warning messages and am aware that the analysis without replicates probably will not yield any meaning results. I want to make sure that I have the concept down for future reference.
What is the appropriate way of performing differential expression analysis when I have:
- 6 samples from the same cell line
- 3 sample types based on the phenotype (A, B, C)
- 2 times; 0 (C) & 48 (T) hours
In short, the 6 samples are as follows: cell-AC, cell-AT, cell-BC, cell-BT, cell-CC, cell-CT
Q1: If I want to perform differential expression analysis between just the sample types (A vs. B or A vs. C or B vs. C), is it correct to first subset the DESeqDataSet such that samples that belong to types that are being compared are included and set the design to ~type?
Q2: If I want to do sample-to-sample comparisons (AC vs. BC, AT vs. BT, ...; all 15 possible comparisons), is it correct to first subset the DESeqDataSet such that we only include samples of interest (e.g. only AC and BC for AC vs. BC) and set the design to ~type+condition?
Edit: I think the second question can be answered with the answer to a previously asked question A: DESEq2 comparison with mulitple cell types under 2 conditions.
Q3: It looks like for my samples, control samples cluster together and treatment samples cluster together. Is it okay to group the samples based on the condition to carry out differential expression analysis (AC & BC & CC vs. AT & BT & CT)? If so, would the design have to be set to ~type+condition or just ~condition?
Thank you, Michael!
Is it true that the newer releases for DESeq2 will now throw an error when you try to run DESeq without replicates?
Yes. We carried over the option to do no replicate analysis from DESeq but the results weren’t even really meaningful, and I didn’t think it was appropriate to offer the option anymore, so we deprecated over a release cycle and then removed the option. For no replicates you can compute the vst() and then make plots. This is much more reasonable than any kind of testing approach.
Note that you have sufficient replicates for a design with main effects here.
I see. It was my understanding that I would need more than one sample belonging to each group (in this example, 3 cell-AC, 3 cell-AT, etc... since 3 is what I believe is the minimum for cell lines).
Hi Michael,
For comparisons within each time point, could I employ GFOLD which doesn't require replicates to perform DEG? Would it be not good practice to report findings from two different tools?
GFOLD and the difference between vst() samples is a similar approach.