Genomic Data Commons hosts a gene-wise copy number summary for each cancer, which has genes as rows and samples as columns. The column headings are aliquot UUIDs. How may these be matched to other data types, such as a MAF file of SNVs which contains TCGA barcodes as the sample identifier?
Hi, thank you very much for this library.
I seem to have noticed a mismatch on some UUID when converting from file_id. For example, '56467ebd-af89-4413-84b5-1e00699a2744' returns 'TCGA-2L-AAQM-01A-11D-A396-01' but the GDC portal returns 'TCGA-IB-A5SO' instead. I am converting the masked copy number segment data and I noticed that a number of these mismatch comes from those cases with multiple aliquots. Could you confirm this or perhaps I have done something wrong?
My code is simply: UUIDtoBarcode('56467ebd-af89-4413-84b5-1e00699a2744', fromtype = "fileid")
Thank you in advance.
Hi e0338272, Thank you for your report. I will look into this today. It seems like the function should be returning multiple identifiers. I'll check the package's tests. Follow this issue for updates: https://github.com/waldronlab/TCGAutils/issues/24 Best, Marcel
It seems the UUID you have '56467ebd-af89-4413-84b5-1e00699a2744' is the file ID that contains multiple Barcodes (https://imgshare.io/image/vYlHe)
Thanks this has been fixed.
-Marcel