Question

Combine RNA-seq

0

Entering edit mode

Young Tae • 0

@97b21f08

Last seen 10 months ago

South Korea

Hello, I have a basic question about RNA-seq data preprocessing and I am writing to seek clarification. I currently have two different RNA-seq datasets. One is composed of Ensembl gene IDs, while the other is composed of Ensembl gene IDs with version. Both are in raw count format and I want to merge them into a single dataset using sva::Combat_seq without losing any information. Is it possible to merge Ensembl gene IDs with and without version? Or should I use Biomart's getBM to find the common ones? Thank you for your help!

BatchEffect R rnaseq • 1.4k views

ADD COMMENT • link updated 21 months ago by swbarnes2 ★ 1.4k • written 21 months ago by Young Tae • 0

score 0 · Answer 1 · 2023-04-24

0

Entering edit mode

ATpoint ★ 4.6k

@atpoint-13662

Last seen 11 hours ago

Germany

The obvious answer is to remove the version using gsub("\\..*", "", x) where x is the gene ID. Since you have differences in the gene IDs means that the datasets have been processe differently. That is not good, you should process identically to avoid batch effects. Even if processed the same in silico, be aware that batch correction has some assumptions, being that batch is not nested with the groups you have. Cannot comment further without details.

ADD COMMENT • link 21 months ago ATpoint ★ 4.6k

0

Entering edit mode

One dataset was processed using GRCh38, while the other was processed using GRCh37. Both were run on the same Illumina HiSeq 2500 platform using homo sapiens. If I use filtering methods such as cpm() to reduce the number of features, would it be a good way to start the analysis?

ADD REPLY • link 21 months ago Young Tae • 0

0

Entering edit mode

I do not know project and aim but generally a good way would be to use the exact same preprocessing pipeline in both. In the lab we cannot always avoid batches, in silico we can. No point using different pipelines.

ADD REPLY • link 21 months ago ATpoint ★ 4.6k

score 0 · Answer 2 · 2023-04-24

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 3 hours ago

San Diego

Save yourself the headache of trying to harmonize them. Get fastqs for both, and realign both to the same genome and annotation. It's probably easier, and way safer.

ADD COMMENT • link 21 months ago swbarnes2 ★ 1.4k