Was just wondering if somebody would be able to clarify something for me regarding variance stabilising transformation and batch correction and subsequently extracting a matrix of batch corrected VST counts.
I have run an experiment as follows:
DESeqDataSetFromMatrix(countData = countdata, colData = sampledata, design = ~ Organ +Extraction+ Age ) DESeq(dds, reduced = ~Organ+Extraction, test = "LRT", parallel = TRUE)
Extraction being the batch (for this run, only two batches, 1 and 2). And i only want to see DE genes as a result of Age. Organ and batch (extraction) effects are therefore included in the reduced model. I am happy with the inclusion of extraction in the reduced model and i cannot see any clear batch related effects when plotting PCA and there is a good mix amongst batches.
I would like to do further downstream analysis away from DESeq2 however and so need to take a log or VST transformed counts table for this analysis. Although these effects are modelled within DESeq2, is:
vsd <- vst(dds) and then extracting counts from this taking in to account the batch effects across samples? or does the variance stabilisation automatically take care of this?
If not, is there a way to extract a VST of counts with batch effects accounted for?
My downstream analysis is machine learning classification. I have been using a VST counts matrix till now which has caused no real issues during classification across all samples and ages etc, however I will soon have an additional 3 batches so want to make sure these effects are completely minimised in the future.