I am working with an RNA-seq dataset where we're looking at the potential synergistic effects of two genes (by definition, the output in a synergistic system suggests that the two genes, when acting together as proteins on the same cell, will have a greater effect than the sum of their individual effects on a cell). All 4 mouse samples are stimulated with the proteins expressed by these genes:
Single KO of gene 1
Single KO of gene 2
Since I'm interested in asking "What genes are differentially expressed as a result of the synergistic action of gene 1 and gene 2", I've come to the conclusion that I need to take my WT sample (which should represent gene expression from both the synergistic and individual effects of the proteins) and do a differential analysis with both SKO1 and SKO2. The DKO sample expression can also be added in order to account for the basal levels of gene expression without the two stimuli; the differential "equation" thus becomes:
(WT+DKO) - (SKO1 + SKO2) = Synergy
*I am not taking the average of SKO1 and SKO2 on purpose; to note synergistic effects, we must compare to an additive effect. I am specifically interested in the genes expressed due to synergy, not the individual additions of the stimuli.
I've thought of a few ways to attempt this (my analysis is in edgeR but happy to work with DESeq2/other packages as well), but I am worried that these methods are not technically sound. One method is to combine the raw counts of SKO1 and SKO2 as well as those of WT and DKO, then run dispersion/glmFit/makeContrasts. I've tried this pre- and post-normalization, and my outputs are similar (and makes biological sense as the one gene we know is synergistic is one of the top genes). However, even with logical outputs and after going through the edgeR manual/other posts, I'm not quite sure what differs in the analysis when I do combine my counts (especially because the “library sizes” of the combined counts are almost twice as big as the individual samples).
Is there any precedent for combining counts of non-replicate samples and looking at a differential output?
Would it make any sense to do this at another stage (ex. FASTQ file/reads/before generation of counts) if combining counts is just grossly wrong and flawed? If not, are there any other ways I could get at this question of synergy from differential analysis with this data?