Question

ComBat Normalisation - data is integers and some floats, confused?!

0

Entering edit mode

zeegzaag • 0

@user-24987

Last seen 2.2 years ago

United Kingdom

Hello,

I have bulk RNASeq data in TPM format. I could see from doing an initial PCA that there were issues with batch effects with this data. So I did ComBat Normalisation which seems to improve this. However, I am really confused about the data I have from running ComBat on my data. The majority of the data is integers and then a handful of the data are floats. I don't understand why this is? I should add that prior to doing ComBat normalisation on this dataset, I did log2 + 1.

Please could someone tell me if: 1) it's ok to carry out combat on data in TPM format 2) is it ok that Log2 transformed the data prior to combat 3) why is the majority of the data integers with some floats? why not all integers or all floats?

Thank you in advance.

combat rnaseq combatnormalisation jeffleek sva • 1.9k views

ADD COMMENT • link updated 6 months ago by Hamza • 0 • written 3.1 years ago by zeegzaag • 0

score 0 · Answer 1 · 2021-03-10

0

Entering edit mode

Kevin Blighe ★ 3.9k

@kevin

Last seen 1 day ago

Republic of Ireland

Hi,

I trust that you mean ComBat-seq, not the original ComBat? - see https://github.com/zhangyuqing/ComBat-seq

If you mean the original ComBat, then I would not use that on TPM. It was designed for microarray data.

If you are worried about batch effects, then the standard protocol is to deal with this (or these) batch effects by including batch as a covariate in your design formula, such as ~ condition + batch. In this way, when test statistics are derived for, e.g., condition, these [test statistics] will be adjusted for the effect of batch.

If you then need to use batch-adjusted expression levels downstream as, e.g. log2 CPMs or DESeq2's regularised log or variance-stabilised expression levels, then please use limma:removeBatchEffect() on these.

Kevin

ADD COMMENT • link 3.1 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Hi Kevin,

Thank you for replying back. Yes, I used ComBat-seq. I have access to two sets of data, both gene expression and one set of data is in TPM format and the other is FPKM format.

This is what I did....

batch_all <-  as.numeric(as.factor(metadata$Centres)) 
ComBAT_TPM_exprs <- sva::ComBat_seq(data, batch=batch_all, group=NULL)

Thank you for your advice, I will give that a go and have a search through the forum for this topic.

ADD REPLY • link 3.1 years ago zeegzaag • 0

1

Entering edit mode

Hey, that is interesting; however, I think that ComBat-seq requires raw counts. So, you will not be able to use TPM or FPKM. Attempting to remove a batch effect between TPM and FPKM data will be difficult without having access to the raw counts.

ADD REPLY • link 3.1 years ago Kevin Blighe ★ 3.9k

0

Entering edit mode

Hi Kevin, I had a question on this topic, sorry for reviving this old question.

In order to use a certain R package, I need fragment size adjusted counts such as TPM or FPKM. However, the raw counts are influenced by batch effects and I want to remove batch effects using ComBatseq. Would it be possible to use TPM or FPKM after correction by ComBatseq?

ADD REPLY • link 6 months ago Hamza • 0