Question

DESeq2 batch correction best practice.

0

Entering edit mode

94133 • 0

@94133-14305

Last seen 5.5 years ago

USA, Stanford

I have a clear batch effect that's caused by sequencing paired-end vs. single-end on different days. I'd like to correct for this in the DESeq2 analysis as suggested in the vignette ("If there is unwanted variation present in the data (e.g. batch effects) it is always recommend to correct for this").

I added the sequencer batch effect to the design but only see a very modest change in the PCA plot that's produced. Does this truly reflect the change in the design?

ddsTxi <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ sequencer + condition)

vsd <- vst(ddsTxi, blind = FALSE)
plotPCA(vsd, returnData=FALSE, intgroup="replicate")

I then estimated the batch effect with RUVseq and limma, which both do a decent job of correcting the batch effect via PCA. What's best practice here? I'm thinking to use PCA to inform best batch correction. Is there a recommended vignette for adding limma/RUVseq correction to the design?

Thank you kindly for your time.

deseq2 limma RUVSeq • 25k views

ADD COMMENT • link updated 5.5 years ago by Michael Love 43k • written 5.5 years ago by 94133 • 0

0

Entering edit mode

Some practical advise: if the thing is just single vs. paired then simply take only the R1 fastq file from the paired-end data and treat it as single-end. Repeat mapping/quantification and be done with it. This would eliminate that batch. You can always treat paired-end as single-end data but obviously not vice versa. Is this the only source of batch effect here? It would be helpful to add some more details like the PCA plot and a table describing all relevant metadata of this experiment.

ADD REPLY • link 5.5 years ago ATpoint ★ 5.0k

0

Entering edit mode

Thanks for this suggestion! Yes, this is certainly something that I should do. I suspect that PE vs SE is not the only batch because the libraries were prepared at different times and sequenced on different machines etc. but will be interesting to compare the results.

ADD REPLY • link 5.5 years ago 94133 • 0

score 0 · Answer 1 · 2020-06-23

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

“I added the sequencer batch effect to the design but only see a very modest change in the PCA plot that's produced. Does this truly reflect the change in the design?”

The vignette has a frequently asked question section and your question is answered there.

ADD COMMENT • link 5.5 years ago Michael Love 43k

0

Entering edit mode

Mike, thanks for your reply. It's not clear to me how removeBatchEffect compares to including a DESeq2 batch effect variable. Is it just that removeBatchEffect is a good estimate of how DESeq2 handles the batch effect? I've scoured the vignette and online and just haven't seen this explained, but I'm probably just not understanding. Thanks again.

ADD REPLY • link 5.5 years ago 94133 • 0

0

Entering edit mode

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-after-vst-are-there-still-batches-in-the-pca-plot

In short: VST does not remove variation in the counts from variables in the design.

ADD REPLY • link 5.5 years ago Michael Love 43k

0

Entering edit mode

Yes, I think I understand that part, but that's not my question. How does the result of DESeq2 variation removal in the counts from variables in the design compare to removeBatchEffect?

ADD REPLY • link 5.5 years ago 94133 • 0

0

Entering edit mode

Let's talk about these two options:

1) DESeq() with ~batch + condition 2) PCA plot of data across condition after having run removeBatchEffect

These are kind of conceptually similar but there are differences. (1) uses counts and a GLM, while (2) is working on transformed values (approximately log2 scaled counts). (1) estimate the contribution from batch and condition simultaneously while (2) first removes batch variation first, then plots the points coloring by condition. Actually, if you use the design argument in removeBatchEffect, then it is more similar to (1) in that it estimates the batch and condition effect simultaneously.

ADD REPLY • link 5.5 years ago Michael Love 43k