Hi,
I’m trying to do some differential expression analysis which I believe that it should be trivial but I couldn’t find any reference for what I’m looking for. I’m pretty sure that the solution will be much shorter than the description below for what I’m looking for (sorry for that…):
I would like to find differentially expressed genes that indicate that a treatment A has a higher impact comparing a treatment B, where each of the treatments is compared to DIFFERENT reference levels as follows:
- I have overall of 24 samples.
- 6 first samples are of 6 different patients (group A) before a treatment ”A” was given. I consider the 6 patients as biological replicants, as I’m not interested to differentiate between patients.
- Next 6 samples are of the same 6 patients as above, but after the treatment “A” was given. Once again, I consider all these 6 samples as biological replicants.
- Next 6 samples are of another group (group B) of 6 patients, before the treatment “B” was given. All of these 6 samples are considered as biological replicants.
- Last 6 samples are of the 6 patients of group B, after the treatment “B” was given.
I would like to find a differential expression for the following: [fold change of ” after treatment A” vs ”before treatment A”], compared to [fold change of ”after treatment B” vs ” before treatment B”].
I managed easily to apply DESeq2 for a differential expression of ”before treatment X” vs ”after treatment X”, but now I have 2 result matrices that include LOG2FC, pvalue and adjusted pvalue, and I’m not sure if and how I should proceed and compare between them. I prefer that DESeq2 will do the entire required analysis from the beginning, if possible.
I was looking in the DESeq2’s vignette and I tried the following code which works well, but I have no idea if it does what I mean:
d <- expand.grid(condition=c(rep("beforeT",6), rep("afterT",6)), treatment=c("A", "B"))
coldataBOTH <- data.frame(row.names=colnames(BOTH), d)
ddsBOTH <- DESeqDataSetFromMatrix(countData=BOTH, colData=coldataBOTH, design=~condition+treatment)
#(the matrix BOTH is the raw count matrix that includes 24 libraries)
mcols(ddsBOTH) <- geneID
ddsBOTH <- DESeq(ddsBOTH)
resB <- results(ddsBOTH)
I suspect that what I really need is described under either the section “Multi-factor designs”or probably under ”Interactions” in the following great, just recently updated, vignette:
https://bioconductor.org/packages/3.7/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions
However, I’m not sure that I could find there a good correspondence with what I’m trying to do.
I would really appreciate if someone can show me an explicit example, tailored for my case (I guess that it should be just an addition of 2-3 lines to my code)…
If DESEQ2 is not the right tool I would be happy to get any recommendation for a proper one. Or any other statistical suggestions.
p.s – just for the case that the someone in the future will encounter my question and it’s not what he will be looking for, I found some hints for what I am trying to do in previous topics in this forum:
DESEQ2--compare two log2-transformed fold changes
But once again I failed to relate and implement the help that was given in those topics into my case.
A difference between fold changes can be found with an interaction. Have you seen the diagram in the vignette? Before I post code, I’m curious why you dont want to account for patient in the comparison.
Hi Michael,
Thanks so much for your reply. Do you mean the diagrams in the Interaction section in the vignette? – I had a look into it and although they were clear enough, I’m not sure how to deduce from these figures to what I need.
One of my problems is that I’m new for the entire RNAseq and differential expression business. My supervisor and I first thought that since we would like to know whether the treatment is successful or not, generally speaking, we should not take into account differentiations in the patients. We are not looking for personal medicine or personal treatment so we hoped that collecting data from 6 different patients for each treatment would allow us considering them as biological replicates. However, your question hit me and I now believe that since we deal with different patients we should also consider differentiating between treatments while considering the different patients. Either way, I would like then to implement both options:
Another problem of mine is that I’m even not a rookie in R. I have only (awfully) basic knowledge in programming, so I study mainly from examples. I’m not sure that I fully understand the format of the design equation, and especially not meaning of “:” when specifying multiple factors. I tried the following code in order to differentiate between treatments referenced to their base line, considering the different patients:
> d2 <- expand.grid(condition=c(rep("beforeT",6), rep("afterT",6)), treatment=c("A", "B"), patient=c("P"))
>d2$patient <- c(rep(paste0("P", 1:6),2), rep(paste0("P", 7:12),2))
>coldataBOTH <- data.frame(row.names=colnames(BOTH), d2)
>ddsBOTH <- DESeqDataSetFromMatrix(countData=BOTH, colData=coldataBOTH, >design=~treatment+treatment:condition+treatment:condition:patient)
#(the matrix BOTH is the raw count matrix that includes 24 libraries)
But I get “Model matrix not full rank” error.
I also tried instead to combine all factors into a single factor:
>ddsBOTH$group <- factor(paste0(c(rep("beforeT",6), rep("afterT",6)), c(rep("A", 12), rep("B",12)), c(rep(paste0("P", 1:6),2), rep(paste0("P", 7:12),2))))
>design(ddsBOTH) <- ~group
but I’m not sure how to extract now my interesting compares…
Any help will be highly appreciated!
Simon, Michael, Many thanks to both of you!