Question

How to use combat in order to remove batch effects?

1

Entering edit mode

Emma ▴ 10

@emma-25007

Last seen 3.7 years ago

Hello! I have RNA seq data and I need to use combat to remove the batch effects. Somehow when I run it, it isnt actually doing anything.

dds <- DESeqDataSetFromMatrix(countData=data, 
                              colData=metadata, 
                              design=~~Batch + dex, tidy = TRUE)
dds <- DESeq(dds, betaPrior=TRUE)
normalized_counts <- counts(dds, normalized=TRUE)
log2 = log2(normalized_counts+1)

modcombat = model.matrix(~dex, metadata) - with metadata being a variable containing treatment or control (under dex column) and batch and name of each patient.

com<-ComBat(log2, metadata$Batch, mod = modcombat)

Its supposed to be 4 different batches, but in the com variable I can see that the values have stayed the same as in log2.

What could be wrong? Would appreciate any help!

BatchEffect DESeq2 RNASeqData • 13k views

ADD COMMENT • link updated 2.9 years ago by BioNovice247 • 0 • written 3.7 years ago by Emma ▴ 10

score 5 · Answer 1 · 2021-03-12

5

Entering edit mode

Kevin Blighe ★ 4.0k

@kevin

Last seen 26 days ago

Republic of Ireland

Hi,

It may help to show how you created the dds object.

Nevertheless, I would not use ComBat in this way. In your case, I would either use ComBat-seq on the raw counts prior to any DESeq2 command, or, I would use limma::removeBatchEffect in this way:

dds <- DESeq(dds)
normalized_counts <- counts(dds, normalized = TRUE)
vsd <- vst(dds, blind = FALSE)
mat <- assay(vsd)
mat <- limma::removeBatchEffect(mat, vsd$Batch)
assay(vsd) <- mat

Please also see Why after VST are there still batches in the PCA plot?

Kevin

ADD COMMENT • link 3.7 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

But I want to batch correct the normalized and log2 values, and the vst only takes integer numbers (my normalized values are not integers). Also I was told to use combat on my data by the unit at my University, why do you think its not a good idea?

Thanks!

ADD REPLY • link 3.7 years ago Emma ▴ 10

0

Entering edit mode

The unit at your university is incorrect, unfortunately, unless they meant ComBat-seq?

ComBat was / is not designed for bulk RNA-seq data - it was originally developed for microarray data, which is measured on very different scales compared to RNA-seq. By applying ComBat to log2(normalized_counts+1), you are not really following good practice.

If you definitely want batch-corrected normalised counts, then use ComBat-seq and apply it to the raw counts prior to any DESeq2 function. In this case, you would be batch-correcting the raw counts, and would ultimately, therefore, obtain batch-corrected normalised counts, too. Please see: https://github.com/zhangyuqing/ComBat-seq

ADD REPLY • link 3.7 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Hi Kevin.

Considering that for applications such as WGCNA, one would want VST normalized RNA-seq counts and that unlike Combat, Combat-seq requires the input matrix to be raw counts, I'm a bit confused as where does the appropriate normalization step come in? Considering that the authors of the Combat-seq mention that after the adjustment of the data for batch effect by Combat-seq, the data can be directly used as input for algorithms such as DEseq2 (which have their own internal normalizations), am I correct to assume that after batch effect correction with Combat-seq, the data can be VST normalized and be used in downstream applications such as WGCNA?

Thanks in advance!

ADD REPLY • link 2.9 years ago BioNovice247 • 0