Entering edit mode
I have thousands of samples from TCGA
retrieved using TCGABiolinks
. I want to remove the batch effect
from the datasets. It's mentioned that batch can be detected from sample ID itself
How do we identify the batch info from the sample ID?
My ids look as follows.
TCGA-LL-A73Y-01A-11R-A33A-13
TCGA-AO-A03U-01B-21R-A10I-13
TCGA-E9-A1NH-01A-11R-A14C-13
TCGA-BH-A1EY-01A-11R-A13P-13
TCGA-AO-A1KS-01A-11R-A13P-13
TCGA-B6-A0I6-01A-11R-A035-13
TCGA-E9-A229-01A-31R-A156-13
TCGA-D8-A27H-01A-11R-A16E-13
TCGA-A2-A0EM-01A-11R-A035-13
TCGA-E2-A1II-01A-11R-A143-13
TCGA-BH-A0H3-01A-11R-A12O-13
TCGA-E2-A1IL-01A-11R-A14C-13
TCGA-BH-A0GY-01A-11R-A057-13
TCGA-BH-A0DG-01A-21R-A12O-13
I have looked at this link get information on sample ID
, but not specifically mentioned about batches.
Is it a combination of PlateId
, ShipDate
, and Tissue Source Site
or can I consider plates
or tss
as batch?
Snijesh; did you work this out as there seems to be no reply? I also need to know the answer to the question. I do not understand why it is so hard as it is such a key problem. William