Batch-corrected expression matrix NOT on a log2 scale
1
0
Entering edit mode
jma1991 ▴ 70
@jma1991-11856
Last seen 12 months ago
Cumbernauld

I would like to generate a batch-corrected expression matrix from RNA-seq data which is NOT on a log2 scale.

After looking at similar posts, it is commonly advised to use limma's removeBatchEffect on the output generated by the rlog function from DESeq2:

dds <- makeExampleDESeqDataSet()

dds$batch <- factor(rep(c("A","B"), each = 3))

rld <- rlog(dds)

mod <- model.matrix(~ condition, colData(rld))

assay(rld) <- limma::removeBatchEffect(assay(rld), batch = rld$batch, design = mod)

Would it be correct to simply inverse the log2 transformation to get the desired matrix?

normcounts <- counts(dds, normalized = TRUE) # normalized counts with batch

normcounts.batch <- 2 ^ assay(rld) # normalized counts without batch

Doing it this way, the batch-corrected counts seem to be on a different scale than the normalized counts matrix. Therefore, would it be better to run the batch effect removal on the normalized counts matrix:

mod <- model.matrix(~ condition, colData(rld))

normcounts.batch <- limma::removeBatchEffect(normcounts, batch = rld$batch, design = mod)

I'm unsure as to whether the removeBatchEffect function requires the input values to be on a log2 scale?

DESeq2 limma • 1.2k views
ADD COMMENT
0
Entering edit mode

Should data be on log-scale if using removeBatchEffect => yes, according to e.g. this exchange on the mailing list and ?limma::removeBatchEffect() (the text explaining what the input data x should be) for reasons explained in that first link. Both the authors of limma/edgeR (Batch-corrected expression matrix NOT on a log2 scale) and DESeq2 (see its vignette) seem to recommend that correcting batch effects for DE analysis should be part of the model so unless one have comparable experts knowledge I'd follow that advise. For everything else however, e.g. plots and other downstream applications, you need something else that directly corrects the counts. I personally made good experience with ComBat-Seq which lives in the sva package of Biocoductor. The advantages over other methods that have originally be developped for array-like data is that a) it has been developped specifically for RNA-seq, b) it takes raw counts and returns corrected raw counts so you can plug it into any downstream normalization you want/need, c) it preserves zeros as zeros and does not introduce negative values and d) it returns values not on log scale so it is completely on you how to transform data and you do not have to bother with conversion from log scale back to normal.

ADD REPLY
0
Entering edit mode
Kevin Blighe ★ 3.9k
@kevin
Last seen 7 days ago
Republic of Ireland

Hi,

Doing it this way, the batch-corrected counts seem to be on a different scale than the normalized counts matrix. Therefore, would it be better to run the batch effect removal on the normalized counts matrix:

This is due to the fact that there is a lot more occurring during these transformations than a simple natural log or log [base 2] transformation.

As opposed to trying this after you have already created a DESeq2 object, I think that it would be better to correct at the raw count [prior to DESeq2] stage using ComBat-Seq: https://github.com/zhangyuqing/ComBat-seq

In this case, you would not have to worry about adjusting for batch, or removing it on the transformed ['logged'] levels using the limma function. You should still check it on an MDS plot or PCA bi-plot, though.

In this case, your design formula would just be ~ condition

Kevin



Note, in your code as presented, as you imply that there is a batch effect in your data, your design formula should probably have been:

~ condition + batch

Also, limma::removeBatchEffect() is not designed to work with count data. We would normally apply limma::removeBatchEffect() to the variance-stabilised or regularised log expression levels. Please see: Why after VST are there still batches in the PCA plot?

ADD COMMENT
0
Entering edit mode

IMO using ComBat or ComBat-Seq prior to DE analysis will introduce spurious differential expression. It is impossible to batch correct the data as if the batch effect never existed. The only correct procedure is to batch correct as part of the linear model.

ADD REPLY
0
Entering edit mode

Yes, I agree with you, and was unsure on the intent of the user on obtaining batch-corrected 'normalised' counts.

ADD REPLY

Login before adding your answer.

Traffic: 866 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6