batch effect correction for TCGA RNAseq harmonized data
1
0
Entering edit mode
KCLiv ▴ 20
@kcliv-16251
Last seen 3.7 years ago

Dear all,

I plan to use TCGA RNAseq data for my analysis, since there are 2 datasets (legacy and harmonized), I am deciding which one to be used. For me, the harmonized one seems to be more standardized (please correct me if I am wrong). As we all know, batch effect is really big issue. My question are:

1) For harmonized data, does it already corrected for batch effect? In fact, I actually tried plotting by PCA but I have not found any confounding pattern by either sequencing centers or the platform (HiSeq or GA incase of colon cancer which some of the samples were sequenced by GA platform). So, before I go ahead in the analysis, I want to make sure that they have already been corrected.

2) If not, does it really necessary for correcting and which method would be a potential way to correct?

Thank you very much,

deseq2 limma • 1.7k views
ADD COMMENT
3
Entering edit mode

The data processing steps are here: https://docs.gdc.cancer.gov/Data/BioinformaticsPipelines/ExpressionmRNA_Pipeline/

In short, if you obtain the HT-seq count files from the GDC, then you can assume that nothing has been done to account for batch. On the other hand, if you obtain data via some third-party source, like cBioPortal, TCGAbiolinks, etc., then check with those individual sources to see what extra processing (if any) they performed.

Edit: if you want to check for sources of bias in the data, then aim to perform surrogate variable analysis. You can then either account for these via regression modeling, or directly adjust your expression data to eliminate these effects. There are many questions both here and on Biostars about this particular topic.

ADD REPLY
0
Entering edit mode
@mikelove
Last seen 4 days ago
United States

Is there a specific DESeq2 question here? I don't offer general bioinformatic analysis suggestions on the support site.

ADD COMMENT

Login before adding your answer.

Traffic: 580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6