Question: DESeq2 - Acquiring batch-corrected values for PCA and hierarchical clustering
4
gravatar for salahm
4.4 years ago by
salahm50
salahm50 wrote:

Dear list,

I am analyzing a RNAseq dataset with a very obvious batch-effect using the DESeq2 (v1.4.5) R package. I account for the batch-effect in my model and in terms of differential expression it really improves the analysis. Now, I would like to visualize the data by principal component analysis and hierarchical clustering, which requires the adjusted values.  However, I cannot figure out how to get the batch-corrected values. Thanks you for the help!

Best,

Salah

 

deseq deseq2 • 7.3k views
ADD COMMENTlink modified 4.3 years ago • written 4.4 years ago by salahm50
Answer: DESeq2 - Acquiring batch-corrected values for PCA and hierarchical clustering
5
gravatar for Michael Love
4.3 years ago by
Michael Love22k
United States
Michael Love22k wrote:

For the DE analysis, adding the batch term accounts for mean shifts in the gene counts, and allows one to isolate the effect of a condition of interest, as you know. Typically, we want to look at the PCA plot of the samples in order to see the natural relationship of the samples to each other, so including the batch effects. However, if you want to see the variation among samples, excluding mean shifts due to batch, you can take the matrix of VST or rlog transformed values, and run limma's removeBatchEffect() function on this matrix, then use that matrix as you would the VST or rlog transformed matrix. This function removes mean shifts which can be accounted for using a batch variable.

ADD COMMENTlink modified 2.5 years ago • written 4.3 years ago by Michael Love22k

With all due respec to Gordon Smyth and other authors of limma which is a great package, in my experience the empirical Bayes moderated regression in ComBat (package sva) simply works better than the simple linear model implemented in limma (at least last time I checked limma).

ADD REPLYlink written 4.3 years ago by Peter Langfelder1.8k

Hi Michael,

In our data, I see a strong batch effect when I just do a straight up rlog transformation. However, remove the batch effects as you suggested (removeBatchEffect()) seems to work. Now, if I want to still look at differentially expressed features, should I just account for the batch effect by adding it as another factor in the design or should I somehow use the batch effect removed values? Personally, I think it will make sense to add the batch effect as a linear factor (design = ~Sample+Batch) and adjust the results for it instead of subtracting the batch effect. But honestly I am not sure.

I know you guys don't suggest using rlog/vst transformed values for differential expression analysis, but if I have to use the batch effect removed data (rlog transformed), then how could I do that?

I hope my question makes sense and isn't that stupid.

Thanks,

Praful

ADD REPLYlink written 4.0 years ago by aggp110

hi Praful,

Yes, you should just add batch as another factor in the design.

We do not have any kind of functionality in DESeq2 for differential expression for non-count data. Using the batch within the count-based GLM is the recommended approach, as we have information about the statistical properties of the raw counts.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Michael Love22k

Hi Michael,

Thank you for the prompt response. 

Praful

ADD REPLYlink written 4.0 years ago by aggp110

Hi, Michael --

Can you explain how what you are suggesting here relates to the use of the 'blind' argument in the rlog and varianceStabilizingTransformation functions?  Specifically, and supposing I have included a batch term in my DESeq2 model, if I do rlog(blind=T) followed by removeBatchEffect() to remove mean shifts, how would that differ from just doing rlog(blind=F)?  Would not the latter also remove mean shifts due to the batch effect?

Thanks for your help.

Bob

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Bob Thurman0

they are fairly unrelated.

'blind' only makes a difference in the estimation of the global shape of mean-dispersion trend (whether or not the dispersions which inform this trend are aware of the sample grouping or not).

either way, the sample information is not *directly* used by the transformation.

there is some more description of blind in the vignette and the man pages for the transformations.

ADD REPLYlink written 3.7 years ago by Michael Love22k
Answer: DESeq2 - Acquiring batch-corrected values for PCA and hierarchical clustering
1
gravatar for salahm
4.3 years ago by
salahm50
salahm50 wrote:

Thanks! It works very well! 

Best,

Salah

ADD COMMENTlink written 4.3 years ago by salahm50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 352 users visited in the last hour