How to remove batch effect from RNA-seq without count data?
1
0
Entering edit mode
@user-24527
Last seen 9 weeks ago
Hongkou

Hi everyone,

I need to remove batch effect between two RNA-seq datasets and get the corrected expression profile for downstream analysis, such as clustering. One data is from TCGA and the other is provided with only FPKM/TPM values. However，the current tools dealing with bulk RNA-seq data are count-based(ComBat_seq, svaseq and RUVseq) and other tools, such as removeBatchEffects, ComBat and sva, are designed for microarray data . Is there any way to solve my problem? And if I also have microarray data, can I remove batch effect between RNA-seq data and microarray data?

RUVSeq FPKM removeBatchEffect RNA-seq sva • 235 views
0
Entering edit mode
ATpoint ▴ 700
@atpoint-13662
Last seen 5 hours ago
Germany

This has been asked many times before, you may want to browse biostars and this forum plus google for answers. Generally, you cannot just collect random samples from the internet and expect to meaningfully combine them, especially if these are from completely different labs and batch is confounded by condition (or celltype or whatever your group information is). If you start from raw counts in RNA-seq (assuming experimental design is not confounded) then people often use ComBat-Seq from sva, or removeBatchEffects using the normalized counts on the log scale. You most likely cannot combine RNA-seq and microarrays directly on the count/intensity level, these are completely different technologies with unique characteristics. Maybe some kind of rank-based meta-analysis would serve you better, again assuming this is not confounded by technology which it most likely is.

0
Entering edit mode

Thanks.

I have browsed many answers before but most of them did not focus on whether the log(FPKM/RPKM) values could be used directly for the input of batch efffect removing tools. Samples from two datasets are same tumor types and I just want to remove the batch effect from the data sources. In fact I had tried this by taking log(TPM), converted using FPKM, as input for ComBat and it seems worked that samples were not clustered by datasets anymore. But I am confused whether the result is reliable as ComBat is suitable for mircoarray data.

And you are right that comparing RNA-seq and microarrays directly is unreasonable