Question

Batch-corrected expression matrix NOT on a log2 scale

0

Entering edit mode

jma1991 ▴ 70

@jma1991-11856

Last seen 20 months ago

Cumbernauld

I would like to generate a batch-corrected expression matrix from RNA-seq data which is NOT on a log2 scale.

After looking at similar posts, it is commonly advised to use limma's removeBatchEffect on the output generated by the rlog function from DESeq2:

dds <- makeExampleDESeqDataSet()

dds$batch <- factor(rep(c("A","B"), each = 3))

rld <- rlog(dds)

mod <- model.matrix(~ condition, colData(rld))

assay(rld) <- limma::removeBatchEffect(assay(rld), batch = rld$batch, design = mod)

Would it be correct to simply inverse the log2 transformation to get the desired matrix?

normcounts <- counts(dds, normalized = TRUE) # normalized counts with batch

normcounts.batch <- 2 ^ assay(rld) # normalized counts without batch

Doing it this way, the batch-corrected counts seem to be on a different scale than the normalized counts matrix. Therefore, would it be better to run the batch effect removal on the normalized counts matrix:

mod <- model.matrix(~ condition, colData(rld))

normcounts.batch <- limma::removeBatchEffect(normcounts, batch = rld$batch, design = mod)

I'm unsure as to whether the removeBatchEffect function requires the input values to be on a log2 scale?

DESeq2 limma • 1.6k views

ADD COMMENT • link updated 3.3 years ago by Kevin Blighe ★ 4.0k • written 3.3 years ago by jma1991 ▴ 70

0

Entering edit mode

Should data be on log-scale if using removeBatchEffect => yes, according to e.g. this exchange on the mailing list and ?limma::removeBatchEffect() (the text explaining what the input data x should be) for reasons explained in that first link. Both the authors of limma/edgeR (Batch-corrected expression matrix NOT on a log2 scale) and DESeq2 (see its vignette) seem to recommend that correcting batch effects for DE analysis should be part of the model so unless one have comparable experts knowledge I'd follow that advise. For everything else however, e.g. plots and other downstream applications, you need something else that directly corrects the counts. I personally made good experience with ComBat-Seq which lives in the sva package of Biocoductor. The advantages over other methods that have originally be developped for array-like data is that a) it has been developped specifically for RNA-seq, b) it takes raw counts and returns corrected raw counts so you can plug it into any downstream normalization you want/need, c) it preserves zeros as zeros and does not introduce negative values and d) it returns values not on log scale so it is completely on you how to transform data and you do not have to bother with conversion from log scale back to normal.

ADD REPLY • link 3.3 years ago ATpoint ★ 4.6k

score 0 · Answer 1 · 2021-08-09

Hi,

Doing it this way, the batch-corrected counts seem to be on a different scale than the normalized counts matrix. Therefore, would it be better to run the batch effect removal on the normalized counts matrix:

This is due to the fact that there is a lot more occurring during these transformations than a simple natural log or log [base 2] transformation.

As opposed to trying this after you have already created a DESeq2 object, I think that it would be better to correct at the raw count [prior to DESeq2] stage using ComBat-Seq: https://github.com/zhangyuqing/ComBat-seq

In this case, you would not have to worry about adjusting for batch, or removing it on the transformed ['logged'] levels using the limma function. You should still check it on an MDS plot or PCA bi-plot, though.

In this case, your design formula would just be ~ condition

Kevin

Note, in your code as presented, as you imply that there is a batch effect in your data, your design formula should probably have been:

~ condition + batch

Also, limma::removeBatchEffect() is not designed to work with count data. We would normally apply limma::removeBatchEffect() to the variance-stabilised or regularised log expression levels. Please see: Why after VST are there still batches in the PCA plot?