Good morning,
I am struggling to complete a simple task using TCGA related packages.
I would need to obtain a manifest with all TGCA sample IDs (normal and primary tumor) and patient IDs which satisfy the following conditions:
Whole exome sequencing + DNA methlation profile (450k) + Gene expression profile (RNA-seq)
Is it possible to submit a comprehensive query for such a request, without going through a series of single manifests, filtering and merging?
Moreover, the usage of case UUID and their relationship with TCGA barcodes adopted in the past is still a little bit tricky for me, so if anyone can point me a good resource to get to know this detail better it would be great.
Thank you for your attention and help,
Gian
The way TCGAbiolinks was structured it is not possible to do the query requested. You would need to go through a series of single manifests, filtering and merging.I'm not sure which WXS data you wanted, or which RNA-seq but the code below can be easily modified.
The Barcode was supposed to be readable and give information about the samples (center, TSS, etc), but the UUID would not give any information. So far, I don't know if there is a trivial transformation between one and another.
I'm not sure if this will be helpful to you, but the "sampleMap" files in
ExperimentHub
(created for use bycuratedTCGAData
) contain the aliquot and case barcodes, and theTCGAutils
package provides a utility for filtering by sample type. There's a warning in the example below because some barcodes are truncated (these get kept in the sample type filtering):One more note,
TCGAutils
also provides simplified utilities for mapping barcode <--> UUID and UUID <--> UUID, including:For example (note, these functions use the GDC REST API, and not all kinks have been worked out, so if you try this on all barcodes some will return errors):
Thank you very much for both your answers, that was exactly what I needed!
Best