Difficulty dowloading hg38 cnv data in tcgabiolinks
Entering edit mode
raf4 ▴ 20
Last seen 16 days ago
United States

Dear List,

I am trying to download cnv data for esophageal cancer in TCGAbiolinks. 1 file is downloaded. In the file I get with TCGAbiolinks, samples are identified with designations such as 75a8bcb9-cac9-4fee-8757-bb802f4d355f", and copy numbers are identified by -1,0,1. When I use the TCGA web interface samples are identified by their sample numbers, and copy numbers are denoted by their actual inferred copy number 1,2, 10 etc. However they are is that it is in 183 separate folders, so I would prefer to use TCGAbiolinks, which produces a dataframe, if possible.

> library(TCGAbiolinks)
> library(xlsx)
> library(DT)
> query.cnv <- GDCquery(project = "TCGA-ESCA", data.category = "Copy Number Variation", 
data.type = "Gene Level Copy Number Scores", platform="Affymetrix SNP 6.0")
o GDCquery: Searching in GDC database
Genome of reference: hg38
oo Accessing GDC. This might take a while...
ooo Project: TCGA-ESCA
oo Filtering results
ooo By platform
ooo By data.type
oo Checking data
ooo Check if there are duplicated cases
ooo Check if there results for the query
o Preparing output
> GDCdownload(query.cnv, method = "client")
Downloading data for project TCGA-ESCA
trying URL 'https://gdc.cancer.gov/files/public/file/gdc-client_v1.6.1_OSX_x64.zip'
Content type 'application/zip' length 15650334 bytes (14.9 MB)
downloaded 14.9 MB

GDCdownload will download: 8.026253 MB                                                                                                                   
Executing GDC client with the following command:
./gdc-client download -m gdc_manifest.txt
100% [############################################] Time:  0:00:01   4.3 MiB/s ?[32mSuccessfully downloaded?[0m: 1
> sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] DT_0.19             xlsx_0.6.5          TCGAbiolinks_2.20.1

loaded via a namespace (and not attached):
 [1] bitops_1.0-7                matrixStats_0.61.0          bit64_4.0.5                 filelock_1.0.2              progress_1.2.2             
 [6] httr_1.4.2                  GenomeInfoDb_1.28.4         tools_4.1.1                 utf8_1.2.2                  R6_2.5.1                   
[11] DBI_1.1.1                   BiocGenerics_0.38.0         colorspace_2.0-2            withr_2.4.2                 tidyselect_1.1.1           
[16] prettyunits_1.1.1           bit_4.0.4                   curl_4.3.2                  compiler_4.1.1              rvest_1.0.2                
[21] Biobase_2.52.0              xml2_1.3.2                  DelayedArray_0.18.0         scales_1.1.1                readr_2.0.2                
[26] rappdirs_0.3.3              stringr_1.4.0               digest_0.6.28               R.utils_2.11.0              XVector_0.32.0             
[31] pkgconfig_2.0.3             htmltools_0.5.2             MatrixGenerics_1.4.3        dbplyr_2.1.1                fastmap_1.1.0              
[36] htmlwidgets_1.5.4           rlang_0.4.12                RSQLite_2.2.8               generics_0.1.0              jsonlite_1.7.2             
[41] vroom_1.5.5                 dplyr_1.0.7                 R.oo_1.24.0                 RCurl_1.98-1.5              magrittr_2.0.1             
[46] GenomeInfoDbData_1.2.6      Matrix_1.3-4                Rcpp_1.0.7                  munsell_0.5.0               S4Vectors_0.30.2           
[51] fansi_0.5.0                 lifecycle_1.0.1             R.methodsS3_1.8.1           stringi_1.7.5               SummarizedExperiment_1.22.0
[56] zlibbioc_1.38.0             plyr_1.8.6                  BiocFileCache_2.0.0         grid_4.1.1                  blob_1.2.2                 
[61] parallel_4.1.1              crayon_1.4.1                lattice_0.20-45             Biostrings_2.60.2           xlsxjars_0.6.1             
[66] hms_1.1.1                   KEGGREST_1.32.0             knitr_1.36                  pillar_1.6.4                GenomicRanges_1.44.0       
[71] TCGAbiolinksGUI.data_1.12.0 biomaRt_2.48.3              stats4_4.1.1                XML_3.99-0.8                glue_1.4.2                 
[76] downloader_0.4              data.table_1.14.2           BiocManager_1.30.16         png_0.1-7                   vctrs_0.3.8                
[81] tzdb_0.1.2                  selectr_0.4-2               gtable_0.3.0                purrr_0.3.4                 tidyr_1.1.4                
[86] assertthat_0.2.1            cachem_1.0.6                ggplot2_3.3.5               xfun_0.27                   tibble_3.1.5               
[91] rJava_1.0-5                 AnnotationDbi_1.54.1        memoise_2.0.0               IRanges_2.26.0              ellipsis_0.3.2             

Thanks and best wishes, Rich

TCGAbiolinks cnv • 137 views
Entering edit mode
Last seen 14 days ago
Miami, US

I just added suport to this data type

query.cnv <- GDCquery(
    project = "TCGA-ESCA",
    data.category = "Copy Number Variation",
    data.type = "Gene Level Copy Number"

GDCdownload(query.cnv) data <- GDCprepare(query.cnv)

Entering edit mode


Thank you. It worked.

Best wishes, Rich


Login before adding your answer.

Traffic: 301 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6