Hi, I am performing differential expression analysis of RNA-seq data using GLMs in edgeR.
I have an experimental design with 4 different groups of subjects in a 2k-factorial design (6 different subjects per group). In other words, a control group, a group with condition A, a group with condition B, and a group with both condition A and B. Each subject is also treated with a compound and we have samples at baseline and 3 time-points after treatment.
I am using a model like this: ~condA+condB+condA:condB+treatment
Now to my question: In e.g. condition A there will be samples from different subjects, but also samples from the same subject (4 samples due to the treatment). How should this be properly handled when e.g. identifying differential expression for condition A vs control (or vs condition B)?
I cannot include factors for each subject. I have been suggested to use mixed effects models and use random effects to handle subjects. I am not sure how to implement this for RNA-seq data though.
I have run a separate analysis on only the baseline samples (before treatment) and removing the treatment factor from the model. The factor p-values correlate well with the ones I get from an analysis of all samples including treatment according to the model above, but very few (or no) genes can be called significant. Is the reason that I get significant genes using the complete data due to increased sample size, or is it influenced by the fact that I have replicates within subjects (not really replicates, since they are treated) that the model does not handle?
Any thoughts on how to properly analyze this data, or arguments for that it is ok to use my current strategy, would be very welcome!