I want to perform a differential expression on RNA-Seq data. During the data exploration, the PCA plot indicated that a batch effect was present (plot not shown). I obtained additional information about the experiment and indeed, the samples were processed by two different persons. The metadata of the experiment is shown in this table:
sample treatment lab_tech 1 sample1 control tech1 2 sample2 control tech1 3 sample3 control tech1 4 sample4 treatA tech2 5 sample5 treatA tech2 6 sample6 treatA tech1 7 sample7 treatA tech1
My first idea was to perform a differential expression analysis between samples4/5 and samples6/7. The genes that are called differentially expressed are probably due to the batch effect. Therefore, I could use that list to "correct" the results of differential expression analysis of control vs treatA. But then I started wondering if the batch effect couldn't be modelled by including it into the design? However, I do not know how to formulate a correct design. I am not even sure if this is possible. Does anyone want to help?
Thanks in advance.