Question

MDS plot; Batch effects

0

Entering edit mode

chipolino • 0

@chipolino-15565

Last seen 6.0 years ago

Hi,

I am doing RNA-seq analysis using limma and edgeR, I have 48 samples and 2 batches. I built MDS plots (see below), and now my question is do I have a batch effect I need to correct for or not? My understanding is if batches cluster together on MDS/PCA plot, it's an evidence that batch effects are present, am I right?

limma rna-seq edger batches • 3.3k views

ADD COMMENT • link updated 6.5 years ago by Aaron Lun ★ 29k • written 6.5 years ago by chipolino • 0

score 2 · Answer 1 · 2019-04-28

2

Entering edit mode

Aaron Lun ★ 29k

@alun

Last seen 22 hours ago

The city by the bay

There is clear separation by batch in your MDS plots, so yes, you should include a batch term in your model. In fact, I would generally default to including the batch term anyway, rather than waiting until I did a MDS plot to make the decision. It costs little to have an additional batch term but can protect you from misleading results, especially if the conditions are not evenly distributed across batches. This is most relevant for subtle batch effects that might not have shown up in the first few dimensions of a MDS plot.

On that note, you didn't mention what your other experimental factors are. I can only hope that they are not confounded with the batch, otherwise you're stuffed.

P.S. Modelling the batch effect is not the same as correcting it. The former involves estimating the batch effect as part of the linear model and is preferred for any statistically rigorous DE analysis, the latter should only be used (e.g., with removeBatchEffect) for visualization and clustering.

ADD COMMENT • link 6.5 years ago Aaron Lun ★ 29k

0

Entering edit mode

Thank you so much for your answer! I have 16 different cell conditions, and 3 replicates for each. 3 replicates are performed in different cell donors (3 donors; 1 replicate for each condition in each donor). My goal is to compare conditions between each other using DGE. Here how the MDS plot looks like (3,4 dimensions, first one colored by batch, second - by donor):

So I even have a clear separation by donor. So should I include donor factor to linear model as well?

P.S. So I don't need to correct for batches prior the DGE analysis with limma right? I just include batch factor to linear model?

ADD REPLY • link 6.5 years ago chipolino • 0

0

Entering edit mode

It seems like donor is nested in batch, so just block on the former instead.
Yes, just include donor in the design matrix.

ADD REPLY • link 6.5 years ago Aaron Lun ★ 29k

0

Entering edit mode

But how do I know, if it worked? is there any way to visualize the data again after voom or lmFit/eBayes?

ADD REPLY • link 6.5 years ago chipolino • 0

0

Entering edit mode

But how do I know, if it worked?

Well, if you get a different number of DE genes, then clearly it had an effect.

is there any way to visualize the data again after voom or lmFit/eBayes?

For any given gene, you can plot the distribution of residuals for each donor. You should see that they are more-or-less normally distributed around zero if the model properly accounts for the donor effect.

ADD REPLY • link 6.5 years ago Aaron Lun ★ 29k