How to obtain clinical data from TCGA via Bioconductor GenomicDataCommons
1
@0df7ded5
Last seen 3.3 years ago
Germany
Dear community,
I am totally new to TCGA and Bioconductor and I am really confused how to obtain more clinical data (e.g. for survival analysis, gender, RNA-seq read count data, ...) from some cases I got. For every "patient" I have
gdc_file_uuid ( e.g. 52F6329C-CDC6-4196-A4A0-58952332905C)
filename ( e.g. UNCID_1552290.d6b7779f-a245-48ee-b9a8-2570c023a531.sorted_genome_alignments.bam)
case_uuid ( e.g. 2be42cc2-9b97-4821-afc2-d1e42eb3932d)
How can I use this in the R package GenomicDataCommons to get more clinical data?
I would be glad for any help!
Kind regards,
Hashirama
TCGA
GenomicDataCommons
• 1.5k views
@sean-davis-490
Last seen 5 weeks ago
United States
The GenomicDataCommons package can take a set of uuids for the cases to get quite a bit of clinical detail. See available_expand( cases( ))
for the types of data that can be returned. Here is some code to get you started
library( GenomicDataCommons)
cases( ) %> %
expand( c( 'diagnoses' ,'demographic' ,'diagnoses.pathology_details' )) %> %
GenomicDataCommons::filter( case_id %in% c( "2be42cc2-9b97-4821-afc2-d1e42eb3932d" )) %> %
results( ) %> %
tibble::as_tibble( ) %> %
dplyr::glimpse( )
Results:
Rows: 1
Columns: 22
$ id < chr> "2be42cc2-9b97-4821-afc2-d1e42eb3932d"
$ slide_ids < named list> < "9a182c4a-6085-4829-a3d0-c46114f0875b" , "4236…
$ submitter_slide_ids <named list> <" TCGA-HZ-7926-01Z-00-DX1", " TCGA-HZ-79…
$ disease_type < chr> "Ductal and Lobular Neoplasms"
$ analyte_ids < named list> < "05fce9a0-fa4d-4a30-ad33-a4f04bf84abf" …
$ submitter_id < chr> "TCGA-HZ-7926"
$ submitter_analyte_ids < named list> < "TCGA-HZ-7926-01A-11R" , "TCGA-HZ-7926-10A-01W…
$ aliquot_ids <named list> <" 1925e7c2-1730-48a4-8257-772fc4448d9b"…
$ submitter_aliquot_ids <named list> <" TCGA-HZ-7926-10A-01D-2153-01", " TCGA-HZ-7926…
$ diagnoses < named list> [ < data.frame[ 1 x 28] > ]
$ diagnosis_ids < named list> "f172c483-6888-5e06-9e5c-0b2bb4be64dd"
$ created_datetime < lgl> NA
$ sample_ids < named list> < "8b7bd592-74f0-48e3-9e21-8005ab8d419e" …
$ demographic < df[ ,14] > < data.frame[ 1 x 14] >
$ submitter_sample_ids < named list> < "TCGA-HZ-7926-01A" , "TCGA-HZ-7926-10A" …
$ submitter_diagnosis_ids < named list> "TCGA-HZ-7926_diagnosis"
$ primary_site < chr> "Pancreas"
$ updated_datetime < chr> "2019-08-06T14:42:37.317113-05:00"
$ case_id < chr> "2be42cc2-9b97-4821-afc2-d1e42eb3932d"
$ portion_ids < named list> < "de913076-84e6-4ed7-8f2f-16cdd2a7f7b0" …
$ state < chr> "released"
$ submitter_portion_ids < named list> < "TCGA-HZ-7926-01A-11" , "TCGA-HZ-7926-1…
Login before adding your answer.
Traffic: 609 users visited in the last hour
cross-posted: https://www.biostars.org/p/9499402