Hi,
I would like to know what are the limits of analysis for the following dataset I have been given:
1 cell population subjected to the following conditions:
- Untreated
- Drug vehicle
- Drug A 1uM
- Drug A 5uM
- Drug A 10uM
- Drug B 1uM
- Drug B 5uM
- Drug B 10uM
(There is an additional control cell population sample, which was not subjected to any of the above conditions (apart from untreated)).
To restate, there are no biological replicates of any condition (they are all from 1 patient). I understand that because of this, I am limited in the statistical comparisons I can use. This is what I am doing currently:
- I have plotted the samples on a heatmap of euclidean distance to show that Drug A samples cluster separately from everything else (we do not expect Drug B to have an effect).
- I have generated a table of genes differentially expressed between grouped Drug A and Drug B samples (ignoring dose).
- I am generating a table of transformed count values for particular genes of interest for each dose of each drug, so we can at least look at whether each gene has a pattern of dose dependency (not applying any statistical test).
I am keen to know if there is something more I can do with this dataset. Would I be valid in performing a likelihood ratio test to look for genes which show a drug-dependent effect with dose, such as suggested by DESeq2 timecourse - How to set the experimental design?
Many thanks
Thank you for your help. I have talked to the investigators.
For looking at dose-dependent changes:
We expect the log10 of the drug doses to have a linear relationship to gene expression over this range.
If I understand correctly, I could rewrite the numeric doses (1, 5, 10) as log10 values (0, 0.69897, 1) which would then assume a linear increase with log10 dose on log2 gene expression. Should I be looking at raw expression here, however? Can you point me in the right direction as to how best to write this expression?
Baseline/vehicle
The vehicle is used to deliver drugs A and B. It is present at the same concentration in the 'drug vehicle' sample and in all drug A and B samples. I would therefore not be including it in the dose-dependent linear model. Given there are no replicates of the 'drug vehicle' and 'untreated' samples, is the only way to compare these 2 samples to look for genes with large fold changes in expression?
Many thanks
log10 is a problem because you have a dose=0. You want to have a scale such that you can include a numeric value for dose for 0,1,5,10.
I don't have any particular advice about what functional form to use, the simplest to code would be linear changes in log expression, but this is a statistical and biological design choice of the analyst, and goes beyond what I can offer in terms of software support.
If you pick either untreated or vehicle to be the baseline, you can use a design of
~drug:dose
. You should code the untreated baseline as dose=0, drug="A", though it won't make a difference - it will be the baseline for both drugs.