Question: Extracting VST matrix with batch effects removed
0
6 weeks ago by
A40
A40 wrote:

Hi all,

Was just wondering if somebody would be able to clarify something for me regarding variance stabilising transformation and batch correction and subsequently extracting a matrix of batch corrected VST counts.

I have run an experiment as follows:

DESeqDataSetFromMatrix(countData = countdata, colData = sampledata, design = ~  Organ +Extraction+ Age )
DESeq(dds, reduced = ~Organ+Extraction, test = "LRT", parallel = TRUE)


Extraction being the batch (for this run, only two batches, 1 and 2). And i only want to see DE genes as a result of Age. Organ and batch (extraction) effects are therefore included in the reduced model. I am happy with the inclusion of extraction in the reduced model and i cannot see any clear batch related effects when plotting PCA and there is a good mix amongst batches.

I would like to do further downstream analysis away from DESeq2 however and so need to take a log or VST transformed counts table for this analysis. Although these effects are modelled within DESeq2, is:

vsd <- vst(dds) and then extracting counts from this taking in to account the batch effects across samples? or does the variance stabilisation automatically take care of this?

If not, is there a way to extract a VST of counts with batch effects accounted for?

My downstream analysis is machine learning classification. I have been using a VST counts matrix till now which has caused no real issues during classification across all samples and ages etc, however I will soon have an additional 3 batches so want to make sure these effects are completely minimised in the future.

Many thanks!!

deseq2 batchcorrection • 71 views
written 6 weeks ago by A40

Hi,

have a look at this post: https://support.bioconductor.org/p/62954/

Thank you! so by running removeBatchEffect in limma, the mean shifts are removed in the same way they would be when including batch in the reduced model? is this a correct interpretation of Michaels comment?

so then, would:

newCounts<-removeBatchEffect(assay(vsd), vsd\$Extraction))
write.csv(assay(newCounts), file = "batch corrected.csv"))


produce the counts matrix I am after?

Yes, we actually have it in the FAQ now:

http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-after-vst-are-there-still-batches-in-the-pca-plot

Brilliant! thank you so much! i will proceed like this!

The code above for generating the new counts matrix is ok?

thanks!

I said yes and it's also what's listed in the link I sent, no?