Matching TCGA Aliquot ID to UUID or Barcode
2
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 19 days ago
Australia

Genomic Data Commons hosts a gene-wise copy number summary for each cancer, which has genes as rows and samples as columns. The column headings are aliquot UUIDs. How may these be matched to other data types, such as a MAF file of SNVs which contains TCGA barcodes as the sample identifier?

TCGAutils TCGAbiolinks GenomicDataCommons • 1.9k views
3
Entering edit mode
@marcel-ramos-7325
Last seen 22 days ago
United States

Hi Dario, Thanks for your question. I've added support for this in TCGAutils 1.5.5.

library(TCGAutils)
UUIDtoBarcode("d85d8a17-8aea-49d3-8a03-8f13141c163b", "aliquot_ids")
#>            analytes.aliquots.aliquot_id analytes.aliquots.submitter_id
#> 13 d85d8a17-8aea-49d3-8a03-8f13141c163b   TCGA-CV-5443-01A-01D-1510-01


Created on 2019-07-17 by the reprex package (v0.3.0)

0
Entering edit mode

Hi, thank you very much for this library.

I seem to have noticed a mismatch on some UUID when converting from file_id. For example, '56467ebd-af89-4413-84b5-1e00699a2744' returns 'TCGA-2L-AAQM-01A-11D-A396-01' but the GDC portal returns 'TCGA-IB-A5SO' instead. I am converting the masked copy number segment data and I noticed that a number of these mismatch comes from those cases with multiple aliquots. Could you confirm this or perhaps I have done something wrong?

My code is simply: UUIDtoBarcode('56467ebd-af89-4413-84b5-1e00699a2744', fromtype = "fileid")

0
Entering edit mode

Hi e0338272, Thank you for your report. I will look into this today. It seems like the function should be returning multiple identifiers. I'll check the package's tests. Follow this issue for updates: https://github.com/waldronlab/TCGAutils/issues/24 Best, Marcel

0
Entering edit mode

It seems the UUID you have '56467ebd-af89-4413-84b5-1e00699a2744' is the file ID that contains multiple Barcodes (https://imgshare.io/image/vYlHe)

0
Entering edit mode

Thanks this has been fixed.
-Marcel

1
Entering edit mode
@tiago-chedraoui-silva-8877
Last seen 21 months ago
Brazil - University of São Paulo/ Los A…

In TCGAbiolinks, when reading the copy number data we use the GDC API to map the aliquot id to barcode: https://github.com/BioinformaticsFMRP/TCGAbiolinks/blob/master/R/prepare.R#L1182-L1211

1
Entering edit mode

It looks like the function takes a barcode as input and returns the aliquot ID. What about converting an aliquot ID to a barcode?

0
Entering edit mode

Hi Dario, I just tested Marcel code and it is working fine. I think that is the easiest way would be using TCGAutils.

From my code you would need to change the filter from barcode to aliquot ID.

https://github.com/BioinformaticsFMRP/TCGAbiolinks/blob/master/R/prepare.R#L1190 -> cases.submitterid to samples.portions.analytes.aliquots.aliquotid. But this would give you all aliquots/barcodes to a patient.

So, Marcel code is a better solution.

0
Entering edit mode

Hi Dario, I just tested Marcel code and it is working fine. I think that is the easiest way would be using TCGAutils.

From my code you would need to change the filter from barcode to aliquot ID.

https://github.com/BioinformaticsFMRP/TCGAbiolinks/blob/master/R/prepare.R#L1190 -> cases.submitterid to samples.portions.analytes.aliquots.aliquotid. But this would give you all aliquots/barcodes to a patient.

So, Marcel code is a better solution.

0
Entering edit mode

Hi Dario, I just tested Marcel code and it is working fine. I think that is the easiest way would be using TCGAutils.

From my code you would need to change the filter from barcode to aliquot ID.

https://github.com/BioinformaticsFMRP/TCGAbiolinks/blob/master/R/prepare.R#L1190 -> cases.submitterid to samples.portions.analytes.aliquots.aliquotid. But this would give you all aliquots/barcodes to a patient.

So, Marcel code is a better solution.

0
Entering edit mode

Hi Dario, I just tested Marcel code and it is working fine. I think that is the easiest way would be using TCGAutils.

From my code you would need to change the filter from barcode to aliquot ID.

https://github.com/BioinformaticsFMRP/TCGAbiolinks/blob/master/R/prepare.R#L1190 -> cases.submitterid to samples.portions.analytes.aliquots.aliquotid. But this would give you all aliquots/barcodes to a patient.

So, Marcel code is a better solution.