I'm currently analyzing a series of RNA-Seq experiments that are splitted into 4 runs. In order to account for the batch effect of each experiments, I'm trying to correct my gene counts using sva, as recommended inside the Bioconductor RNASeq workflow (http://www.bioconductor.org/help/workflows/rnaseqGene/#removing-hidden-batch-effects)
As I proceed in this phase, I got very confused regarding the methods and the procedures, so I wonder if you could answer me to these questions:
1) Do I need to account for the batch effect when I specify the design in the dds creation? e.g.: assuming that I'm screening for the differences between two different cell types, should I say design = ~type + batch or just design = type? Why?
2)in the RNASeq bioconductor workflow, when exporting the counts matrix to sva, it assumed that the data has already been normalized. Specifically, counts are exported in this fashion:
dat <- counts(dds, normalized=TRUE
This seems pretty confusing to me, as my understanding is that it is needed to remove the batch effect before performing the normalization. Is this true for DeSeq2? Do I need only to Estimate the size Factor before performing sva normalization? Do you have code examples that you can share?
3) Do you have examples other than the one reported in the tutorial for the batch effect removal? Are there any other bioConductor libraries that integrates the batch effect normalization into the DeSeq2 Workflow?
Apologies if the questions sounds naive, I'm a self learner and I think I still have a long road to walk!
Thanks in advance,