hello, I'm working with some RNA seq data from a clinical trial. There are about 160 patients (40/group with low, medium, high doses of a drug plus a placebo group). Samples were taken at baseline, week 4, and week 12. I would like to identify genes that are differentially expressed by the drug vs placebo and to explore if there is some sort of dose response in gene expression. Can someone offer some suggestions on how to approach this analysis using DESEQ2? For the drug vs placebo, would a design that includes the factors treatment, week, and patient be applicable and allow me to look at the contrasts of interest(for example, high dose vs placebo or combine doses vs placebo)? I have Ensembl gene identifiers and plan to annotate them with HGNC symbols. Not all Ensembl ids map to a known HGNC symbol. What is the typical way to handle this? Are genes without known symbols be excluded prior to analysis? If not then how would I handle genes that may be identified as differentially expressed but do not map to a known symbol? thanks in advance.
First, you can read over the time series example in our RNA-seq workflow:
You have 3 treatments instead of 1, but you can use a similar approach to find genes which show treatment-specific changes over time. Make sure to set a reasonable reference level for condition and time using relevel().
In order to control for patient differences, and because the patients are nested within treatment group, you can follow the advice posted here:
Meaning you would add a condition:patient term to the design.
You should use a design that looks something like ~condition + condition:patient + time + condition:time. Then you can test differences over time for any condition group with a likelihood ratio test:
dds <- DESeq(dds, test="LRT", reduced=~condition + condition:patient + time) res <- results(dds)
For testing individual treatments vs control, I think the easiest approach would be to subset the dds to those samples (the samples from that treatment and the control group), and then run the two lines above.
Note that with likelihood ratio tests, there is not a single LFC being tested, but many, and the LFC column in the results table is just one of the many being tested. See the paragraph in ?results on this for more Details.