Missing clinical data at TCGAbiolinks data
1
0
Entering edit mode
Talip ▴ 10
@talip-zengin-14290
Last seen 8 hours ago
Türkiye

Hello, I am using TCGAbiolinks routinely to analyse molecular abnormalities by using heatmap and survival plots. I use the commands below. For 2 days, TCGAanalyze_survival has given the error below because days_to_death column is missing in expression data downloaded and prepared by GDCquery, GDCdownload and GDCprepare commands. What has changed in two days? How can I solve this problem?

query_exp2 <- GDCquery(project = paste0("TCGA-", cancer),
                       data.category = "Transcriptome Profiling",
                       data.type = "Gene Expression Quantification", 
                       workflow.type = "HTSeq - Counts",
                       sample.type = "Primary solid Tumor",
                       barcode = uniq_tsb_exp)

GDCdownload(query_exp2, files.per.chunk = 100)

GeneExp_paired2 <- GDCprepare(query_exp2, save = TRUE, save.filename = paste0(cancer, "_GeneExp_paired2.rda"))

TCGAanalyze_survival(data = colData(GeneExp_paired2),
                     clusterCol = "subtype_iCluster.Group",
                     main = "TCGA Kaplan-Meier Survival Plot for Consensus Clusters",
                     legend = "RNA Group",
                     height = 10,
                     risk.table = FALSE,
                     conf.int = FALSE,
                     color = c("black","red","blue","green3"),
                     filename = paste0(cancer, "_survival_expression_subtypes0.png"))

Error in TCGAanalyzesurvival(data = colData(GeneExppaired2), clusterCol = "subtypeiCluster.Group", : Columns vitalstatus, daystodeath and daystolastfollowup should be in data frame

I tried the same code for two version of TCGAbiolinks but they gave the same error.

package.version("TCGAbiolinks")
[1] "2.9.2"
GeneExp_paired2$days_to_death
NULL

package.version("TCGAbiolinks")
[1] "2.10.5"
GeneExp_paired2$days_to_death
NULL

I tried to get days_to_death column from clinic data but this column is missing in clinic data, too. The used code is below:

clinic <- GDCquery_clinic(project = paste0("TCGA-", cancer), type = "clinical")
clinic$days_to_death
NULL

Thanks in advance.

TCGAbiolinks days_to_death clinical data missing column • 1.6k views
ADD COMMENT
0
Entering edit mode

The API is not returning days to death anymore. And it seems it has been removed from some documentations (https://gdc.cancer.gov/clinical-data-elements), but they add year of death. I sent an email to GDC to check, but for the moment the function will not work without that information.

Thanks for noticing the problem.

ADD REPLY
0
Entering edit mode

I wonder "Days To Last Followup" may serve the same purpose as days to death when paired with "Vital status". I.e. days to the last follow up when the patient passed away = days to death.

ADD REPLY
2
Entering edit mode
@tiago-chedraoui-silva-8877
Last seen 4.2 years ago
Brazil - University of São Paulo/ Los A…

There was a bug in GDC API. They are making changes and the data was removed. They fixed and the metadata should be correct.

ADD COMMENT
0
Entering edit mode

Thanks very much for your quick reply and effort. It is very appreciated :)

ADD REPLY

Login before adding your answer.

Traffic: 598 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6