Question: Which one to pick if there are duplicate TCGA samples?
0
19 months ago by
Biologist70
Biologist70 wrote:

Hi @Sean Davis and @Martin Morgan,

I have downloaded TCGA BRCA raw sequencing data (fastq's) from gdc legacy. A total of 1256 cases. I have the UUID names for them. So, I used "Genomics Data commons" package to convert UUID to TCGA-Barcode. But after doing this conversion I see that there are duplicate cases.

 UUID samplenames 5516dd59-3d95-4bc6-84e7-5719b1bbcabf TCGA-A7-A26F-01B a907f2d1-92ad-4a1b-b439-20e5a7347d5b TCGA-A7-A26F-01A b570a72f-5e6c-4301-923b-9992662409ca TCGA-A7-A26F-01B ba22d7e6-3e70-4a43-9dc1-59069b39e8c2 TCGA-A7-A26F-01B eb068925-2dcc-4e18-838f-903ac8d2b661 TCGA-A7-A26F-01A

As you see for "TCGA-A7-A26F-01B" I see three UUID's and for "TCGA-A7-A26F-01A" I see two UUID's.

Questions:

1) Is there a way to get the whole TCGA-Barcode  with aliquot like "TCGA-A7-A26F-01A-21R-A169-07" from UUID's ? So, that based on Analyte or plate number I can choose the sample.

2) From Firebrowse.org I downloaded "gdac.broadinstitute.org_BRCA.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2016012800.0.0.tar.gz" and I see that it has 1212 samples. Among that I see there is only one case with "TCGA-A7-A26F" which is "TCGA-A7-A26F-01A-21R-A169-07"

3) How to get whether sample is FFPE or not?

4) What to do when two UUID's having same TCGA-barcode with same aliquot like below:

 eb068925-2dcc-4e18-838f-903ac8d2b661 TCGA-A7-A26F-01A-21R-A169-07
 a907f2d1-92ad-4a1b-b439-20e5a7347d5b TCGA-A7-A26F-01A-21R-A169-07

Thank you

rnaseq R tcga tcgabiolinks gdc • 377 views
modified 19 months ago • written 19 months ago by Biologist70