DESeq2 - Acquiring batch-corrected values for PCA and hierarchical clustering
2
6
Entering edit mode
salahm ▴ 70
@salahm-7045
Last seen 7.5 years ago

Dear list,

I am analyzing a RNAseq dataset with a very obvious batch-effect using the DESeq2 (v1.4.5) R package. I account for the batch-effect in my model and in terms of differential expression it really improves the analysis. Now, I would like to visualize the data by principal component analysis and hierarchical clustering, which requires the adjusted values.  However, I cannot figure out how to get the batch-corrected values. Thanks you for the help!

Best,

Salah

deseq2 deseq • 9.7k views
8
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

For the DE analysis, adding the batch term accounts for mean shifts in the gene counts, and allows one to isolate the effect of a condition of interest, as you know. Typically, we want to look at the PCA plot of the samples in order to see the natural relationship of the samples to each other, so including the batch effects. However, if you want to see the variation among samples, excluding mean shifts due to batch, you can take the matrix of VST or rlog transformed values, and run limma's removeBatchEffect() function on this matrix, then use that matrix as you would the VST or rlog transformed matrix. This function removes mean shifts which can be accounted for using a batch variable.

0
Entering edit mode

With all due respec to Gordon Smyth and other authors of limma which is a great package, in my experience the empirical Bayes moderated regression in ComBat (package sva) simply works better than the simple linear model implemented in limma (at least last time I checked limma).

0
Entering edit mode

Hi Michael,

In our data, I see a strong batch effect when I just do a straight up rlog transformation. However, remove the batch effects as you suggested (removeBatchEffect()) seems to work. Now, if I want to still look at differentially expressed features, should I just account for the batch effect by adding it as another factor in the design or should I somehow use the batch effect removed values? Personally, I think it will make sense to add the batch effect as a linear factor (design = ~Sample+Batch) and adjust the results for it instead of subtracting the batch effect. But honestly I am not sure.

I know you guys don't suggest using rlog/vst transformed values for differential expression analysis, but if I have to use the batch effect removed data (rlog transformed), then how could I do that?

I hope my question makes sense and isn't that stupid.

Thanks,

Praful

0
Entering edit mode

hi Praful,

Yes, you should just add batch as another factor in the design.

We do not have any kind of functionality in DESeq2 for differential expression for non-count data. Using the batch within the count-based GLM is the recommended approach, as we have information about the statistical properties of the raw counts.

0
Entering edit mode

Hi Michael,

Thank you for the prompt response.

Praful

0
Entering edit mode

Hi, Michael --

Can you explain how what you are suggesting here relates to the use of the 'blind' argument in the rlog and varianceStabilizingTransformation functions?  Specifically, and supposing I have included a batch term in my DESeq2 model, if I do rlog(blind=T) followed by removeBatchEffect() to remove mean shifts, how would that differ from just doing rlog(blind=F)?  Would not the latter also remove mean shifts due to the batch effect?

Bob

0
Entering edit mode

they are fairly unrelated.

'blind' only makes a difference in the estimation of the global shape of mean-dispersion trend (whether or not the dispersions which inform this trend are aware of the sample grouping or not).

either way, the sample information is not *directly* used by the transformation.

there is some more description of blind in the vignette and the man pages for the transformations.

1
Entering edit mode
salahm ▴ 70
@salahm-7045
Last seen 7.5 years ago

Thanks! It works very well!

Best,

Salah