I have several samples , 3 biological replicates. (2015, 2016_1,2016_2 )
I want to compare each condition between each other.
My PCA on rlog transformed values (here) make me think I should analyse 2015 group separately. Maybe I'm wrong !
In 2015 group I have 3 conditions with 2 technical replicates each time. (here) So then can I analyse them as if they were biological replicates to get p-values ?The biological variability of each gene will be biaised...So I will found more genes with differential expression, no ?
Or the best idea is to analyse them together with severals factors (condition adjusted on biological group) ?
Thanks
I would recommend merging the technical replicates, rather than treating them as the same kind of replicate as the other biological replicates - you could either pool the reads, or (more straight-forwardly) average the read-counts. As you say, treating them as biological replicates would bias the variance estimate.
Regarding the batch effect due to year, I prefer to make the decision on how to deal with this based on evidence from positive control genes rather than the global view given by PCA. If genes known to respond to treatment show a significant batch effect, then I tend to keep a batch factor in the analysis. If they show a batch:treatment interaction, I may analyse the batches separately (either with a nested design, or subsetting the data - the latter in your case being problematic as you have only technical replication in the 2015 batch), and if they show no batch effect, then I have a model with no batch term in it. My intuition would say in your case to omit the 'batch' effect and take the penalty of increased noise, as you don't have many degrees of freedom to play with. Obviously, the appearance of positive controls in any resulting gene-list is near-tautological, so shouldn't be the basis of research claims!
PCA plots show what the major global changes are, but the biology might be focused in unexplored components, which is why I prefer to use an approach more targeted to the 'expected' biology, but when you have no positive controls, you are forced into more empirical approaches like PCA. It's always tempting to try out the multiple approaches, and see which set of results makes more biological sense, but again this uses up researcher-degrees-of-freedom, and this should be reflected in adjusted p-values (and careful avoidance of making circular claims of what 'biological sense' means!)