Question: DESeq2 PCA plot on fitted values
0
Sachin Pundhir0 wrote:

I am analyzing a RNA-seq data having some batch effect using DESeq2.

To account for the effect, I add it as a covariate to the regression model.

Next, to determine, how well the effect is corrected, I plot
two PCA plots

1. plotMDS(assays(dds)[["counts"]

] - raw count

2. plotMDS(assays(dds)[["mu"]] - fitted values.

My question is: Is it the correct way to check how well the regression
model worked on accounting for the batch effect?

deseq • 2.0k views  modified 5.0 years ago by Michael Love25k • written 5.0 years ago by Sachin Pundhir0

*moved it down

ADD REPLYlink modified 16 months ago • written 16 months ago by C T90
Answer: DESeq2 PCA plot on fitted values
1
Michael Love25k wrote:

No the "mu" here is not the right matrix to look at, as it contains the batch effect terms. In ?DESeq, you can rearrange the formula to get:

mu_ij = s_j 2^(beta_intercept + beta_batch + ...)

It's not easy to observe how the model corrects for batch effect, as it happens inside the model, in the balance of the coefficients from batch and the other variables. The model accounts for mean shift (in log common scale counts) which can be explained by batch.

You might look at the difference between the counts and fitted values on the log common scale to see if they associate with batch:

log(counts(dds, normalized=TRUE) + 1) - log(t( t(assays(dds)[["mu"]]) / sizeFactors(dds) ) + 1)

Thanks, based on the suggestion, I made two PCA plots (https://www.dropbox.com/s/nk55yr1cgvux6hr/DESeq2_pca_plot.pdf?dl=0)

a) on normalized read counts

plotMDS(log(counts(dds, normalized=TRUE) + 1))

b) on the difference between the counts and fitted values

plotMDS(log(counts(dds, normalized=TRUE) + 1) - log(t( t(assays(dds)[["mu"]]) / sizeFactors(dds) ) + 1))

By looking at them, can we infer that the batch effect on WT_1 and KO_1 has been accounted for? or else like in Limma where we can make PCA plots to see how well the batch effect has been removed, is there a way to visualize or quantify how well the batch effect has been accounted for by the model in DESeq2.

With only 2 samples per batch, it's hard to see how much of a batch effect there is in the first place. The batch 1 samples are different from the others, but not in the same direction, and the batch 2 and 3 samples do not tightly cluster by batch before or after. This looks ok to me.

Sorry to revive this old thread. But, I just want to make sure.

So, if I do

plotMDS(log(t( t(assays(dds)[["mu"]]) / sizeFactors(dds) ) + 1)

I am using the fitted values that include the batch effect, correct? So, if samples are closer together in this MDS plot, it should represents how similar the samples are after the variance associated with batch effect is accounted for, correct?