Question

How to remove batch effect from RNA-seq without count data?

0

Entering edit mode

Xiaojie Cheng • 0

@user-24527

Last seen 4.8 years ago

Hongkou

Hi everyone,

I need to remove batch effect between two RNA-seq datasets and get the corrected expression profile for downstream analysis, such as clustering. One data is from TCGA and the other is provided with only FPKM/TPM values. However，the current tools dealing with bulk RNA-seq data are count-based(ComBat_seq, svaseq and RUVseq) and other tools, such as removeBatchEffects, ComBat and sva, are designed for microarray data . Is there any way to solve my problem? And if I also have microarray data, can I remove batch effect between RNA-seq data and microarray data?

Thanks in advance

RUVSeq FPKM removeBatchEffect RNA-seq sva • 6.5k views

ADD COMMENT • link updated 4.1 years ago by ATpoint ★ 5.0k • written 4.9 years ago by Xiaojie Cheng • 0

score 0 · Answer 1 · 2021-03-24

0

Entering edit mode

ATpoint ★ 5.0k

@atpoint-13662

Last seen 1 hour ago

Germany

This has been asked many times before, you may want to browse biostars and this forum plus google for answers. Generally, you cannot just collect random samples from the internet and expect to meaningfully combine them, especially if these are from completely different labs and batch is confounded by condition (or celltype or whatever your group information is). If you start from raw counts in RNA-seq (assuming experimental design is not confounded) then people often use ComBat-Seq from sva, or removeBatchEffects using the normalized counts on the log scale. You most likely cannot combine RNA-seq and microarrays directly on the count/intensity level, these are completely different technologies with unique characteristics. Maybe some kind of rank-based meta-analysis would serve you better, again assuming this is not confounded by technology which it most likely is.

ADD COMMENT • link 4.9 years ago ATpoint ★ 5.0k

0

Entering edit mode

Thanks.

I have browsed many answers before but most of them did not focus on whether the log(FPKM/RPKM) values could be used directly for the input of batch efffect removing tools. Samples from two datasets are same tumor types and I just want to remove the batch effect from the data sources. In fact I had tried this by taking log(TPM), converted using FPKM, as input for ComBat and it seems worked that samples were not clustered by datasets anymore. But I am confused whether the result is reliable as ComBat is suitable for mircoarray data.

And you are right that comparing RNA-seq and microarrays directly is unreasonable

ADD REPLY • link 4.9 years ago Xiaojie Cheng • 0

0

Entering edit mode

The issue has nothing to do with what mathematical manipulations you might have subjected your counts to. It's a fundamental fact of current RNASeq library preps. They are strongly affected by batch effect. And you can't just remove it like you'd pick the pepperoni off a pizza.

Just because ComBat gave you a result that superficially looks like you want it to, that is not at all a guarantee that its manipulations are valid.

ADD REPLY • link 4.1 years ago swbarnes2 ★ 1.4k

score 0 · Answer 2 · 2022-01-13

0

Entering edit mode

Zeynab • 0

@c71537a7

Last seen 4.1 years ago

Iran

Hi, I have the same problem.

ADD COMMENT • link 4.1 years ago Zeynab • 0

0

Entering edit mode

And what kind of answer do you expect to such a comment?

ADD REPLY • link 4.1 years ago ATpoint ★ 5.0k