Question: TCGAbiolinks gives error when parsing a txt file (instead of expected xml)
18 months ago
andre.verissimo wrote:

I was trying to download clinical data from the TCGA-KIRC project and it is failing.

I found out it was downloading a TXT file and trying to read it as an XML (id: 64a1b6e7-d037-4502-bbad-0d07849fc32e and file: nationwidechildrens.org_clinical_nte_kirc.txt


>   gdc$clinical        <- GDCprepare_clinic(query$clinical, = 'patient')
  |==                                                                                                          |   2%Error in doc_parse_file(con, encoding = encoding, as_html = as_html, options = options) : 
  Start tag expected, '<' not found [4]

The code to replicate is below, using bioconductor 3.7 and TCGAbiolinks 2.8.1

project <- 'TCGA-KIRC'
query <- list()
query$clinical <- GDCquery(project = project,
                             data.category = "Clinical")

download.out <- GDCdownload(query$clinical, method = 'api')

gdc <- list()
gdc$clinical        <- GDCprepare_clinic(query$clinical, = 'patient')
tcgabiolinks
18 months ago by andre.verissimo

I believe the same question is asked in biostars:

andre.verissimo
Answer: C: TCGAbiolinks gives error when parsing a txt file (instead of expected xml)
18 months ago
Brazil - University of São Paulo/ Los Angeles - Cedars-Sinai Medical Center
Tiago Chedraoui Silva wrote:

Hello I fixed the documentation yesterday.

It seems the parsed TXT were added to the same group as the XML files.

You need to add file.type = "xml" as filter.

 query <- GDCquery(project = 'TCGA-KIRC', data.category = "Clinical",file.type = "xml")

18 months ago by Tiago Chedraoui Silva
Thanks for the reply, I've been trying to test confirm to mark this question as resolved, but it continues to say that `GDC server down, try to use this package later`. I will comment back when it allows.
17 months ago by andre.verissimo
