Can SVs identified from sva be used for removeBatchEffect() in DESeq2
1
0
Entering edit mode
Xiao • 0
@821e15d0
Last seen 9 months ago
United States

Hi community,

I used sva on DESeq2 normalized count.

mod=model.matrix(~group, colData(dds))

mod0=model.matrix(~1, colData(dds))

sv=svaseq(counts(dds, normalized=TRUE), mod, mod0)

Is it recommended to use these SVs in removeBatchEffect(), and how should it be done if it is recommended?

Thanks.

sva DESeq2 BatchEffect limma • 1.2k views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.5k
@atpoint-13662
Last seen 1 day ago
Germany

I feel like this has been asked and answered many times before. Generally, the recommendations for DESeq2 (and edgeR/limma, see their manuals and many posts on this here on the support site) is to include covariates (like batch, surrogate variables, anything...) into the design rather than using direct adjustment methods.

That means, you would add the SVs to colData(dds) and then do ~SV1+SV2+SV(n)+your_actual_design. You could use removeBatchEffect() on the output of vst or rlog to visualize / approximate the impact of adding these variables, e.g. by doing PCA with and without correction. But this is not input for DESeq2 as it uses raw counts. The removeBatchEffect does not return raw or integer counts, so it's not compatible.

See for example https://github.com/mikelove/preNivolumabOnNivolumab/blob/main/preNivolumabOnNivolumab.knit.md which is a little case study from Mike Love using RUVseq (same principles apply for sva) with DESeq2.

ADD COMMENT
0
Entering edit mode

Thanks ATpoint for your comments. I totally agree with the way how the SVs should be included in the design matrix.

My question is more on removeBatchEffect(). If I understand what you've said correctly, assuming normTransform() is used, the correction step should be

mod=model.matrix(~group, data=colData(dds))

mod0=model.matrix(~SV1+SV2+SV(n), data=colData(dds))

ntd=normTransform(dds)

ntd_cor=ntd

assay(ntd_cor)=limma::removeBatchEffect(assay(ntd), covariates=mod0, design=mod)

.

Since the SVs is calculated based on the normalized count, as suggested in here, is it more appropriate to use removeBatchEffect() based on the normalized count and then return to log scale, such as

assay(ntd_cor)=log2(limma::removeBatchEffect(counts(dds, normalized=TRUE), covariates=mod0, design=mod)+1)

?

ADD REPLY
0
Entering edit mode
assay(ntd_cor)=limma::removeBatchEffect(assay(ntd), covariates=mod0, design=mod) 

...should work. No need for log2, in- and output of the function is log2.

ADD REPLY
0
Entering edit mode

Sorry about the typo.

It is indeed

assay(ntd_cor)=limma::removeBatchEffect(assay(ntd), covariates=mod0, design=mod)

.

Do you think it necessary to apply removeBatchEffect() with the expression data and covariates in the same scale? I am a bit concerned here since assay(ntd) is in log2 scale while the covariates are in the normalized count scale.

ADD REPLY
0
Entering edit mode

I later find that removeBatchEffect() assumes log-expression values as input. So it should be fine to use

assay(ntd_cor)=limma::removeBatchEffect(assay(ntd), covariates=mod0, design=mod)

as you've suggested.

ADD REPLY
0
Entering edit mode

Yes, see the help page for this function and the DESeq2 vignette which covers this.

ADD REPLY

Login before adding your answer.

Traffic: 482 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6