Question

batch effect correction for TCGA RNAseq harmonized data

0

Entering edit mode

KCLiv ▴ 20

@kcliv-16251

Last seen 4.8 years ago

Dear all,

I plan to use TCGA RNAseq data for my analysis, since there are 2 datasets (legacy and harmonized), I am deciding which one to be used. For me, the harmonized one seems to be more standardized (please correct me if I am wrong). As we all know, batch effect is really big issue. My question are:

1) For harmonized data, does it already corrected for batch effect? In fact, I actually tried plotting by PCA but I have not found any confounding pattern by either sequencing centers or the platform (HiSeq or GA incase of colon cancer which some of the samples were sequenced by GA platform). So, before I go ahead in the analysis, I want to make sure that they have already been corrected.

2) If not, does it really necessary for correcting and which method would be a potential way to correct?

Thank you very much,

deseq2 limma • 2.4k views

ADD COMMENT • link updated 6.2 years ago by Michael Love 43k • written 6.2 years ago by KCLiv ▴ 20

3

Entering edit mode

The data processing steps are here: https://docs.gdc.cancer.gov/Data/BioinformaticsPipelines/ExpressionmRNA_Pipeline/

In short, if you obtain the HT-seq count files from the GDC, then you can assume that nothing has been done to account for batch. On the other hand, if you obtain data via some third-party source, like cBioPortal, TCGAbiolinks, etc., then check with those individual sources to see what extra processing (if any) they performed.

Edit: if you want to check for sources of bias in the data, then aim to perform surrogate variable analysis. You can then either account for these via regression modeling, or directly adjust your expression data to eliminate these effects. There are many questions both here and on Biostars about this particular topic.

ADD REPLY • link 6.2 years ago Kevin Blighe ★ 4.0k

score 0 · Answer 1 · 2019-10-28

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 hours ago

United States

Is there a specific DESeq2 question here? I don't offer general bioinformatic analysis suggestions on the support site.

ADD COMMENT • link 6.2 years ago Michael Love 43k