I'm sorry if this is a recurrent question but the more I read the more I get confused!
I have RNA-seq data from patients for which I have cells either "control" (CT) or "stimulated" (S). So for each patient, I have extracted a cell type and I'm comparing 2 treatments. If I want to analyze for deferentially expressed genes between CT and S, accounting for the patient effect, what design should I use ?
I have seen different ways of doing this in Limma, EdgeR and DESeq2 manuals. My opinion was that I should use:
but I have been told that another way to go would be:
which should (I thought) be the same as above, except here I have to use the function
makeContrasts as an additional step. But the 2 models give me different results!
Another issue I have is that I have a huge variability in the read/sample for this data set, i.e. ranging from 6M reads to 30M. Even after normalization for library size and using the function
voomWithQualityWeights (in Limma) I still get a library size effect, since I clearly see that my samples cluster together (in a heatmap for example) depending on their amount of total reads. Plus, if I look at the FPKM values of house keeping genes (using this ref. : http://www.sciencedirect.com/science/article/pii/S0168952513000899) I can see differential expression across samples, correlating with library size, even after filtering and normalization. Any idea if there is a way to correct for this?
Many thanks for your help!