Hello,
I have bulk RNASeq data in TPM format. I could see from doing an initial PCA that there were issues with batch effects with this data. So I did ComBat Normalisation which seems to improve this. However, I am really confused about the data I have from running ComBat on my data. The majority of the data is integers and then a handful of the data are floats. I don't understand why this is? I should add that prior to doing ComBat normalisation on this dataset, I did log2 + 1.
Please could someone tell me if: 1) it's ok to carry out combat on data in TPM format 2) is it ok that Log2 transformed the data prior to combat 3) why is the majority of the data integers with some floats? why not all integers or all floats?
Thank you in advance.
Hi Kevin,
Thank you for replying back. Yes, I used ComBat-seq. I have access to two sets of data, both gene expression and one set of data is in TPM format and the other is FPKM format.
This is what I did....
Thank you for your advice, I will give that a go and have a search through the forum for this topic.
Hey, that is interesting; however, I think that ComBat-seq requires raw counts. So, you will not be able to use TPM or FPKM. Attempting to remove a batch effect between TPM and FPKM data will be difficult without having access to the raw counts.
Hi Kevin, I had a question on this topic, sorry for reviving this old question.
In order to use a certain R package, I need fragment size adjusted counts such as TPM or FPKM. However, the raw counts are influenced by batch effects and I want to remove batch effects using ComBatseq. Would it be possible to use TPM or FPKM after correction by ComBatseq?