Dear community,
I’m pretty new to computational biology and to transcriptomics. I’m currently trying to characterize how the overexpression of three transcription factors (x, y, z) acts on the transcriptome of a cell line.
The design involves 5 groups of samples (x, y, z, xyz, Ctrl), each group containing 5 replicates. Data has been filtered for 0 counts removal.
As far as I've noticed each of the TFs induces or represses some genes, as highlighted by differential gene expression analysis performed using DESeq2 (using the Ctrl condition as reference). I’ve runt PCA analysis using FactoMineR on all the expressed genes, with count values transformed as regularized log as suggested in DESeq2 pipeline. Plotting the samples over the first three PC, I’ve noticed something pretty curious.
Basically it appears as each TFs (x, y, z) is “pulling” the Ctrl state along a certain “dragging” path, while the path related to the simultaneous overexpression (xyz) seems to be the sum of the three (x+y+z) actions. I’m wondering whether any of you can suggest any method to test this hypothesis.
Thanks in advance for any suggestion.
Sebastiano
Dear Ryan,
thank you for your reply. You are right, unfortunately I've not been precise in the description of my hypothesis, I apologize for that. Anyway, your suggestions hit exactly the mark.
I'm interested in both, first, identifying genes which significantly differ between xyz expression and their individual average ((x,y,z)/3), and second to test how the combined overexpression of x,y,z relate to their individual effects.
Following your indication, to test whether the average of the individual overexpression of x, y, and z ((x+y+z)/3) differs from the simultaneous delivery of xyz, I should use the contrast argument of the result function of DEseq2. Unfortunately, I'm not aware of how to do it. So far I've been using the contrast argument to call for pairwise comparisons (see code below):
Can you please post an example of code?
Regarding the usage of Limma Voom I'll get to the studies and then I'll write back.
Thanks a lot.
Sebastiano
I'm more familiar with edgeR and limma, which use a different method of specifying contrasts that lets you simply specify the arithmetic expressions as I have written them above. I think with DESeq2 you need to construct a numeric vector with -1/3 for x, y, and z and +1 for xyz (and 0 for control, since it's not involved in the contrast). See the DESeq2 help page for
results
.In DESeq2 you can specify a list where the first character vector is the numerator terms and the second character vector is the denominator terms. Then you specify listValues, e.g.
c(1, -1/3)
Thank you Michael,
I think that this solved my linear combination problem, if I get it right, it is testing the significantly DE gene between the simultaneous expression of xyz and the "average" of the individual delivery as the following linear combination (1/3*(x,y,z)).
Yes, that's correct.