I have a question about how to analyze a mix of biological and semi-technical replicates.
The experiment I am analyzing consists of 3 cell lines X 3 replicates of each cell line X 2 conditions. The 3 replicates are done with the same cell line, but independently treated, processed and sequenced, so they aren't "hard" technical replicates, but they are not biological replicates as the 3 cell lines. They show higher correlation between them (cluster more closely in a PCA) than with the other biological replicates (cell lines). The experiment is paired, in which a sample is split and treated with treatments A and B. The 3 cell lines are sequenced together (replicate group below) in 3 groups.
What is the best way to analyze these data? Is a paired analysis (~condition + pair) OK? Or should I average the semi-technical replicates? How else should I account for different correlation between replicates/cell lines?
Analyzing ~condition + pair or ~condition + cell_line yields DEG fairly similar to analyzing only one replicate group and consistent GO enrichment (but many more DEG), but I wonder if using the semi-technical replicates in the same way I'm using biological replicates is increasing type I error. It doesn't seem it is, judging by the consistent GO fold-enrichment of some interesting terms.
condition cell_line repl_group pair A C1 1 c1-1 A C2 1 c2-1 A C3 1 c3-1 A C1 2 c1-2 A C2 2 c2-2 A C3 2 c3-2 A C1 3 c1-3 A C2 3 c2-3 A C3 3 c3-3 B C1 1 c1-1 B C2 1 c2-1 B C3 1 c3-1 B C1 2 c1-2 B C2 2 c2-2 B C3 2 c3-2 B C1 3 c1-3 B C2 3 c2-3 B C3 3 c3-3