Problem downloading TCGA projects with the method "client"
Entering edit mode
sann • 0
Last seen 20 months ago

Hi, I am trying to download TCGA projects with the GDCdownload function. I get an error when using the 'client' method, the 'api' method works fine. But I would really like to use the "client" method for the downloading of the TCGA data, because the "client" method should be more stable than the "api" method. Below I inserted the code to reproduce the error I encountered.

The following code is copied from the terminal:

> library(TCGAbiolinks)
> query <-
+         GDCquery(
+             project = "TCGA-ESCA",
+             data.category = "Transcriptome Profiling",
+             data.type = "Gene Expression Quantification",
+             workflow.type = "HTSeq - Counts"
+         )
o GDCquery: Searching in GDC database
Genome of reference: hg38
oo Accessing GDC. This might take a while...
ooo Project: TCGA-ESCA
oo Filtering results
ooo By data.type
ooo By workflow.type
oo Checking data
ooo Check if there are duplicated cases
ooo Check if there results for the query
o Preparing output
> GDCdownload(query, method = "client")
Downloading data for project TCGA-ESCA
trying URL ''
Content type 'application/zip' length 15221595 bytes (14.5 MB)
downloaded 14.5 MB

Error in unzip(basename(bin)) : invalid zip name argument
In addition: Warning message:
In if (grepl("^https?://", url)) { :
  the condition has length > 1 and only the first element will be used

And then the script breaks, the TCGA project data is not downloaded and cannot be worked with. If I execute the same code but use method = "api", the script does work, but the "api" method is more unstable.

I also inserted the output from the terminal when using the sessionInfo() command:

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] TCGAbiolinks_2.15.3

loaded via a namespace (and not attached):
  [1] pkgcond_0.1.0               colorspace_1.4-1
  [3] ggsignif_0.6.0              selectr_0.4-2
  [5] hwriter_1.3.2               testextra_0.1.0.1
  [7] XVector_0.26.0              GenomicRanges_1.38.0
  [9] ggpubr_0.2.5                ggrepel_0.8.1
 [11] bit64_0.9-7                 AnnotationDbi_1.48.0
 [13] xml2_1.2.2                  codetools_0.2-16
 [15] splines_3.6.2               R.methodsS3_1.8.0
 [17] doParallel_1.0.15           DESeq_1.38.0
 [19] geneplotter_1.64.0          knitr_1.28
 [21] jsonlite_1.6.1              Rsamtools_2.2.3
 [23] km.ci_0.5-2                 broom_0.5.5
 [25] annotate_1.64.0             dbplyr_1.4.2
 [27] png_0.1-7                   R.oo_1.23.0
 [29] readr_1.3.1                 compiler_3.6.2
 [31] httr_1.4.1                  backports_1.1.5
 [33] assertthat_0.2.1            Matrix_1.2-18
 [35] limma_3.42.2                prettyunits_1.1.1
 [37] tools_3.6.2                 gtable_0.3.0
 [39] glue_1.3.1                  GenomeInfoDbData_1.2.2
 [41] dplyr_0.8.4                 ggthemes_4.2.0
 [43] rappdirs_0.3.1              ShortRead_1.44.3
 [45] Rcpp_1.0.3                  Biobase_2.46.0
 [47] vctrs_0.2.3                 Biostrings_2.54.0
 [49] nlme_3.1-144                rtracklayer_1.46.0
 [51] iterators_1.0.12            xfun_0.12
 [53] stringr_1.4.0               testthat_2.3.1
 [55] rvest_0.3.5                 lifecycle_0.1.0
 [57] XML_3.99-0.3                edgeR_3.28.1
 [59] zoo_1.8-7                   postlogic_0.1.0.1
 [61] zlibbioc_1.32.0             scales_1.1.0
 [63] aroma.light_3.16.0          hms_0.5.3
 [65] parallel_3.6.2              SummarizedExperiment_1.16.1
 [67] RColorBrewer_1.1-2          curl_4.3
 [69] memoise_1.1.0               gridExtra_2.3
 [71] KMsurv_0.1-5                ggplot2_3.3.0
 [73] downloader_0.4              biomaRt_2.42.0
 [75] latticeExtra_0.6-29         stringi_1.4.6
 [77] RSQLite_2.2.0               genefilter_1.68.0
 [79] S4Vectors_0.24.3            foreach_1.4.8
 [81] GenomicFeatures_1.38.2      BiocGenerics_0.32.0
 [83] BiocParallel_1.20.1         GenomeInfoDb_1.22.0
 [85] rlang_0.4.4                 pkgconfig_2.0.3
 [87] matrixStats_0.55.0          bitops_1.0-6
 [89] lattice_0.20-38             purrr_0.3.3
 [91] GenomicAlignments_1.22.1    bit_1.1-15.2
 [93] tidyselect_1.0.0            plyr_1.8.5
 [95] magrittr_1.5                R6_2.4.1
 [97] IRanges_2.20.2              generics_0.0.2
 [99] DelayedArray_0.12.2         DBI_1.1.0
[101] mgcv_1.8-31                 pillar_1.4.3
[103] survival_3.1-8              RCurl_1.98-1.1
[105] tibble_2.1.3                EDASeq_2.20.0
[107] crayon_1.3.4                survMisc_0.5.5
[109] purrrogress_0.1.1           BiocFileCache_1.10.2
[111] jpeg_0.1-8.1                progress_1.2.2
[113] locfit_1.5-9.1              grid_3.6.2
[115] sva_3.34.0                  data.table_1.12.8
[117] blob_1.2.1                  digest_0.6.25
[119] xtable_1.8-4                tidyr_1.0.2
[121] R.utils_2.9.2               openssl_1.4.1
[123] stats4_3.6.2                munsell_0.5.0
[125] survminer_0.4.6             parsetools_0.1.2
[127] askpass_1.1

Could someone please tell me what I am doing wrong and how I can fix this?

Thanks a lot!

bioconductor GDCdownload • 241 views
Entering edit mode

Just to add, the exact same problem also persists on macOS Catalina 10.15.4 (R 3.6.3, platform x86_64-apple-darwin15.6.0 (64-bit)). I suggest you use the "api" method and download files in chunks, as that seems to work like a charm.


Login before adding your answer.

Traffic: 228 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6