Hi,
I would like to hear some opinions and get some support on how to generate the design for DE Analysis.
First, a short description of the data:
The data is from cell line development. 3 different host cell lines were treated/processed the same way and showed improvement. Now we are interested in the difference between the improved subclone and the orginal cell line to look for possible targets for rational cell line development.

This is the vsd normalised PCA plot - which is as expected: each cell line is clustering and the subclones are different than the host.
Now to my question:
Due to the different biological context - 3 different cell lines, should I subset the matrix and investigate each cell line on its own or should I keep all of them in one big matrix and then do the design according to levelC1 to levelC3, respectively?
> sample_ann
ID levels condition cellline levelC1 levelC2 levelC3
1 Cellline1.rep1 Cellline1 host Cellline1 Cellline1 Cellline1 Cellline1
2 Cellline1.rep2 Cellline1 host Cellline1 Cellline1 Cellline1 Cellline1
3 Cellline1_imp.rep1 Cellline1_imp subclone Cellline1 Cellline1_imp Cellline1 Cellline1
4 Cellline1_imp.rep2 Cellline1_imp subclone Cellline1 Cellline1_imp Cellline1 Cellline1
5 Cellline2.rep1 Cellline2 host Cellline2 Cellline2 Cellline2 Cellline2
6 Cellline2.rep2 Cellline2 host Cellline2 Cellline2 Cellline2 Cellline2
7 Cellline2_imp.rep1 Cellline2_imp subclone Cellline2 Cellline2 Cellline2_imp Cellline2
8 Cellline2_imp.rep2 Cellline2_imp subclone Cellline2 Cellline2 Cellline2_imp Cellline2
9 Cellline3.rep1 Cellline3 host Cellline3 Cellline3 Cellline3 Cellline3
10 Cellline3.rep2 Cellline3 host Cellline3 Cellline3 Cellline3 Cellline3
11 Cellline3_imp.rep1 Cellline3_imp subclone Cellline3 Cellline3 Cellline3 Cellline3_imp
12 Cellline3_imp.rep2 Cellline3_imp subclone Cellline3 Cellline3 Cellline3 Cellline3_imp
The standard pipeline in my group is to keep samples that belong together and were sequenced together in one matrix. However, this is the first time that we actually sequenced 3 distinct cell lines and we could arque for both ways.
Thanks!

Thank you for the fast reply.
Based on the PCA I would actually assume that the subclone to host effect will be different per cell line. However, I am still confused why if it would be better in this case whether to keep the whole matrix and not to subset the matrix, so that each cell line has its own matrix.
That question is answered in the vignette FAQ actually.
The above design would work, or another equivalent design (so answers are the same) would be to combine line and condition into one factor called
groupwhich has levelsline1host,line1subclone, etc. and then just use thecontrastargument of results() to make comparisons. Whichever is easier.