ComBat Normalisation - data is integers and some floats, confused?!
1
0
Entering edit mode
zeegzaag • 0
@user-24987
Last seen 2.8 years ago
United Kingdom

Hello,

I have bulk RNASeq data in TPM format. I could see from doing an initial PCA that there were issues with batch effects with this data. So I did ComBat Normalisation which seems to improve this. However, I am really confused about the data I have from running ComBat on my data. The majority of the data is integers and then a handful of the data are floats. I don't understand why this is? I should add that prior to doing ComBat normalisation on this dataset, I did log2 + 1.

Please could someone tell me if: 1) it's ok to carry out combat on data in TPM format 2) is it ok that Log2 transformed the data prior to combat 3) why is the majority of the data integers with some floats? why not all integers or all floats?

Thank you in advance.

combat rnaseq combatnormalisation jeffleek sva • 2.6k views
ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 16 days ago
Republic of Ireland

Hi,

I trust that you mean ComBat-seq, not the original ComBat? - see https://github.com/zhangyuqing/ComBat-seq

If you mean the original ComBat, then I would not use that on TPM. It was designed for microarray data.

If you are worried about batch effects, then the standard protocol is to deal with this (or these) batch effects by including batch as a covariate in your design formula, such as ~ condition + batch. In this way, when test statistics are derived for, e.g., condition, these [test statistics] will be adjusted for the effect of batch.

If you then need to use batch-adjusted expression levels downstream as, e.g. log2 CPMs or DESeq2's regularised log or variance-stabilised expression levels, then please use limma:removeBatchEffect() on these.

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin,

Thank you for replying back. Yes, I used ComBat-seq. I have access to two sets of data, both gene expression and one set of data is in TPM format and the other is FPKM format.

This is what I did....

batch_all <-  as.numeric(as.factor(metadata$Centres)) 
ComBAT_TPM_exprs <- sva::ComBat_seq(data, batch=batch_all, group=NULL)

Thank you for your advice, I will give that a go and have a search through the forum for this topic.

ADD REPLY
1
Entering edit mode

Hey, that is interesting; however, I think that ComBat-seq requires raw counts. So, you will not be able to use TPM or FPKM. Attempting to remove a batch effect between TPM and FPKM data will be difficult without having access to the raw counts.

ADD REPLY
0
Entering edit mode

Hi Kevin, I had a question on this topic, sorry for reviving this old question.

In order to use a certain R package, I need fragment size adjusted counts such as TPM or FPKM. However, the raw counts are influenced by batch effects and I want to remove batch effects using ComBatseq. Would it be possible to use TPM or FPKM after correction by ComBatseq?

ADD REPLY

Login before adding your answer.

Traffic: 799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6