I am conducting some RNA-Seq experiments to determine differentially expressed genes after treatment with various antimicrobials. I have used RUVSeq to remove a batch effect present in my data set and DESeq2 to estimate differential expression.
I used the EDASeq::plotPCA function on the SeqExpressionSet generated from the RUVs argument and it shows my samples subjected to the same treatment are clustering closer together. I then used this SeqExpressionSet to estimate differential expression in DESeq2 with the code:
dds <- DESeqDataSetFromMatrix(countData = counts(set_postRUVs_W1), colData = pData(set_postRUVs_W1), design = ~ group + W_1) dds <- DESeq(dds)
I would now like to use some of the exploratory analysis techniques in the DESeq2 vignette to compare my data before and after removing the batch effect, specifically how the different samples and most variable genes are clustering. Following the code in the vignette, there is no difference in clustering before and after removing the batch effect. The same goes for a PCA analysis of the rlog transformed dds object - the samples are clustering just the same as they were before the batch removal with RUVSeq.
Excuse my naïvety but is this because the sample distances are determined based on the sizeFactors column of the DESeq assay object, and it is not factoring in the W_1 column from the set_postRUVs_W1 object?
I'd be grateful for any advice on how to address this or if I have missed something working between RUVSeq and DESeq2.