ComBat Normalisation - data is integers and some floats, confused?!
1
0
Entering edit mode
zeegzaag • 0
@user-24987
Last seen 3 months ago
United Kingdom

Hello,

I have bulk RNASeq data in TPM format. I could see from doing an initial PCA that there were issues with batch effects with this data. So I did ComBat Normalisation which seems to improve this. However, I am really confused about the data I have from running ComBat on my data. The majority of the data is integers and then a handful of the data are floats. I don't understand why this is? I should add that prior to doing ComBat normalisation on this dataset, I did log2 + 1.

Please could someone tell me if: 1) it's ok to carry out combat on data in TPM format 2) is it ok that Log2 transformed the data prior to combat 3) why is the majority of the data integers with some floats? why not all integers or all floats?

combat rnaseq combatnormalisation jeffleek sva • 423 views
0
Entering edit mode
@kevin
Last seen 10 hours ago
Republic of Ireland

Hi,

I trust that you mean ComBat-seq, not the original ComBat? - see https://github.com/zhangyuqing/ComBat-seq

If you mean the original ComBat, then I would not use that on TPM. It was designed for microarray data.

If you are worried about batch effects, then the standard protocol is to deal with this (or these) batch effects by including batch as a covariate in your design formula, such as ~ condition + batch. In this way, when test statistics are derived for, e.g., condition, these [test statistics] will be adjusted for the effect of batch.

If you then need to use batch-adjusted expression levels downstream as, e.g. log2 CPMs or DESeq2's regularised log or variance-stabilised expression levels, then please use limma:removeBatchEffect() on these.

Kevin

0
Entering edit mode

Hi Kevin,

Thank you for replying back. Yes, I used ComBat-seq. I have access to two sets of data, both gene expression and one set of data is in TPM format and the other is FPKM format.

This is what I did....

batch_all <-  as.numeric(as.factor(metadata\$Centres))
ComBAT_TPM_exprs <- sva::ComBat_seq(data, batch=batch_all, group=NULL)


Thank you for your advice, I will give that a go and have a search through the forum for this topic.

1
Entering edit mode

Hey, that is interesting; however, I think that ComBat-seq requires raw counts. So, you will not be able to use TPM or FPKM. Attempting to remove a batch effect between TPM and FPKM data will be difficult without having access to the raw counts.