GDCdownload does not work
2
0
Entering edit mode
miki716 • 0
@miki716-18914
Last seen 6.0 years ago

Hi, I am trying to use GDCdownload (package TCGAbiolinks) but I am having an error all the time when I try to download my data. I do the query and when I try to download it starts downloading for a while and then stops with a error message. I have tried to do it with other query (even from Bioconductor webpage) and I obtain the same message.

The code is obtained from the paper titled "TCGA Workflow: Analyze cancer genomics and epigenomics data using Bioconductor packages" (Silva).

I paste my code and the messages that I am obtaining:

 

> query.met.gbm=GDCquery(project="TCGA-GBM", legacy=TRUE, data.category="DNA methylation", platform="Illumina Human Methylation 450", barcode=c("TCGA-76-4926-01B-01D-1481-05", "TCGA-28-5211-01C-11D-1844-05"))
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg19
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
[1] "https://api.gdc.cancer.gov/legacy/files/?pretty=true&expand=cases.samples.portions.analytes.aliquots,cases.project,center,analysis,cases.samples&size=988&filters=%7B%22op%22:%22and%22,%22content%22:[%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22cases.project.project_id%22,%22value%22:[%22TCGA-GBM%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.data_category%22,%22value%22:[%22DNA%20methylation%22]%7D%7D,%7B%22op%22:%22in%22,%22content%22:%7B%22field%22:%22files.platform%22,%22value%22:[%22Illumina%20Human%20Methylation%20450%22]%7D%7D]%7D&format=JSON"
ooo Project: TCGA-GBM
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By barcode
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
> GDCdownload(query.met.gbm)
Downloading data for project TCGA-GBM
GDCdownload will download 2 files. A total of 42.603084 MB
Downloading as: Tue_Dec_18_11_13_35_2018.tar.gz
Downloading: 20 MB     <simpleWarning in file.create(to[okay]): cannot create file 'GDCdata/TCGA-GBM/legacy/DNA_methylation/Methylation_beta_value/0faddb5f-fe60-4269-90cd-736048a5b061/jhu-usc.edu_GBM.HumanMethylation450.6.lvl-3.TCGA-76-4926-01B-01D-1481-05.txt', reason 'No such file or directory'>
<simpleWarning in file.create(to[okay]): cannot create file 'GDCdata/TCGA-GBM/legacy/DNA_methylation/Methylation_beta_value/abb0c4c1-9249-4582-8fa3-34c0a5e3b8e6/jhu-usc.edu_GBM.HumanMethylation450.8.lvl-3.TCGA-28-5211-01C-11D-1844-05.txt', reason 'No such file or directory'>


> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.10.0

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2            selectr_0.4-1               rjson_0.2.20                hwriter_1.3.2               circlize_0.4.5              XVector_0.22.0             
  [7] GenomicRanges_1.34.0        GlobalOptions_0.1.0         ggpubr_0.2                  matlab_1.0.2                ggrepel_0.8.0               bit64_0.9-7                
 [13] AnnotationDbi_1.44.0        xml2_1.2.0                  codetools_0.2-15            splines_3.5.1               R.methodsS3_1.7.1           doParallel_1.0.14          
 [19] DESeq_1.34.0                geneplotter_1.60.0          knitr_1.21                  jsonlite_1.6                Rsamtools_1.34.0            km.ci_0.5-2                
 [25] broom_0.5.1                 annotate_1.60.0             cluster_2.0.7-1             R.oo_1.22.0                 readr_1.3.0                 compiler_3.5.1             
 [31] httr_1.4.0                  backports_1.1.3             assertthat_0.2.0            Matrix_1.2-14               lazyeval_0.2.1              limma_3.38.3               
 [37] prettyunits_1.0.2           tools_3.5.1                 bindrcpp_0.2.2              gtable_0.2.0                glue_1.3.0                  GenomeInfoDbData_1.2.0     
 [43] dplyr_0.7.8                 ggthemes_4.0.1              ShortRead_1.40.0            Rcpp_1.0.0                  Biobase_2.42.0              Biostrings_2.50.1          
 [49] nlme_3.1-137                rtracklayer_1.42.1          iterators_1.0.10            xfun_0.4                    stringr_1.3.1               rvest_0.3.2                
 [55] XML_3.98-1.16               edgeR_3.24.2                zoo_1.8-4                   zlibbioc_1.28.0             scales_1.0.0                aroma.light_3.12.0         
 [61] hms_0.4.2                   parallel_3.5.1              SummarizedExperiment_1.12.0 RColorBrewer_1.1-2          curl_3.2                    ComplexHeatmap_1.20.0      
 [67] memoise_1.1.0               gridExtra_2.3               KMsurv_0.1-5                ggplot2_3.1.0               downloader_0.4              biomaRt_2.38.0             
 [73] latticeExtra_0.6-28         stringi_1.2.4               RSQLite_2.1.1               genefilter_1.64.0           S4Vectors_0.20.1            foreach_1.4.4              
 [79] GenomicFeatures_1.34.1      BiocGenerics_0.28.0         BiocParallel_1.16.2         shape_1.4.4                 GenomeInfoDb_1.18.1         rlang_0.3.0.1              
 [85] pkgconfig_2.0.2             matrixStats_0.54.0          bitops_1.0-6                lattice_0.20-35             purrr_0.2.5                 bindr_0.1.1                
 [91] cmprsk_2.2-7                GenomicAlignments_1.18.0    bit_1.1-14                  tidyselect_0.2.5            plyr_1.8.4                  magrittr_1.5               
 [97] R6_2.3.0                    IRanges_2.16.0              generics_0.0.2              DelayedArray_0.8.0          DBI_1.0.0                   mgcv_1.8-24                
[103] pillar_1.3.1                survival_2.42-3             RCurl_1.95-4.11             tibble_1.4.2                EDASeq_2.16.0               crayon_1.3.4               
[109] survMisc_0.5.5              GetoptLong_0.1.7            progress_1.2.0              locfit_1.5-9.1              grid_3.5.1                  sva_3.30.0                 
[115] data.table_1.11.8           blob_1.1.1                  ConsensusClusterPlus_1.46.0 digest_0.6.18               xtable_1.8-3                tidyr_0.8.2                
[121] R.utils_2.7.0               stats4_3.5.1                munsell_0.5.0               survminer_0.4.3  
GDCdow tcgabiolinks error tcgadownload download_error • 4.3k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 20 hours ago
United States

So the critical part of the error (which is pretty self explanatory) is this part:

cannot create file '<blah blah blah>' reason 'No such file or directory'

Which means, in this instance, that whatever directory you are specifying doesn't exist, so R can't create a file there. Ideally there would be some error checking that does something like

if(!file.exists(dirname(<some random path name>))) file.create(dirname(<some random path name>))

to ensure that whatever random path you are specifying actually exists first. But failing that, you do get a pretty clear error, IMO.

 

ADD COMMENT
1
Entering edit mode
@tiago-chedraoui-silva-8877
Last seen 4.3 years ago
Brazil - University of São Paulo/ Los A…

In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 character (https://docs.microsoft.com/en-us/windows/desktop/fileio/naming-a-file#maximum-path-length-limitation)

 

In windows 10 you can enable long path: https://www.howtogeek.com/266621/how-to-make-windows-10-accept-file-paths-over-260-characters/

 

ADD COMMENT
0
Entering edit mode

Huh?

> z <- "GDCdata/TCGA-GBM/legacy/DNA_methylation/Methylation_beta_value/0faddb5f-fe60-4269-90cd-736048a5b061/jhu-usc.edu_GBM.HumanMethylation450.6.lvl-3.TCGA-76-4926-01B-01D-1481-05.txt"
> length(strsplit(z, "")[[1]])
[1] 176
ADD REPLY
0
Entering edit mode

You should consider the full path, not the relative path.

z <- file.path(getwd(),"GDCdata/TCGA-GBM/legacy/DNA_methylation/Methylation_beta_value/0faddb5f-fe60-4269-90cd-736048a5b061/jhu-usc.edu_GBM.HumanMethylation450.6.lvl-3.TCGA-76-4926-01B-01D-1481-05.txt")
length(strsplit(z, "")[[1]])
ADD REPLY
0
Entering edit mode

Yes, I saw these and I changed it but I am getting the same result. And anyway with the full path I am getting less than 260 characters so it would not be the problem.

Well, I don't get why, but now it is working. I put it directly on D:\\R and now it works. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 643 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6