Dear List,
I am trying to download cnv data for esophageal cancer in TCGAbiolinks. 1 file is downloaded. In the file I get with TCGAbiolinks, samples are identified with designations such as 75a8bcb9-cac9-4fee-8757-bb802f4d355f", and copy numbers are identified by -1,0,1. When I use the TCGA web interface samples are identified by their sample numbers, and copy numbers are denoted by their actual inferred copy number 1,2, 10 etc. However they are is that it is in 183 separate folders, so I would prefer to use TCGAbiolinks, which produces a dataframe, if possible.
> library(TCGAbiolinks)
> library(xlsx)
> library(DT)
> query.cnv <- GDCquery(project = "TCGA-ESCA", data.category = "Copy Number Variation",
data.type = "Gene Level Copy Number Scores", platform="Affymetrix SNP 6.0")
--------------------------------------
o GDCquery: Searching in GDC database
--------------------------------------
Genome of reference: hg38
--------------------------------------------
oo Accessing GDC. This might take a while...
--------------------------------------------
ooo Project: TCGA-ESCA
--------------------
oo Filtering results
--------------------
ooo By platform
ooo By data.type
----------------
oo Checking data
----------------
ooo Check if there are duplicated cases
ooo Check if there results for the query
-------------------
o Preparing output
-------------------
> GDCdownload(query.cnv, method = "client")
Downloading data for project TCGA-ESCA
trying URL 'https://gdc.cancer.gov/files/public/file/gdc-client_v1.6.1_OSX_x64.zip'
Content type 'application/zip' length 15650334 bytes (14.9 MB)
==================================================
downloaded 14.9 MB
GDCdownload will download: 8.026253 MB
Executing GDC client with the following command:
./gdc-client download -m gdc_manifest.txt
100% [############################################] Time: 0:00:01 4.3 MiB/s ?[32mSuccessfully downloaded?[0m: 1
>
> sessionInfo( )
R version 4.1.1 (2021-08-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] DT_0.19 xlsx_0.6.5 TCGAbiolinks_2.20.1
loaded via a namespace (and not attached):
[1] bitops_1.0-7 matrixStats_0.61.0 bit64_4.0.5 filelock_1.0.2 progress_1.2.2
[6] httr_1.4.2 GenomeInfoDb_1.28.4 tools_4.1.1 utf8_1.2.2 R6_2.5.1
[11] DBI_1.1.1 BiocGenerics_0.38.0 colorspace_2.0-2 withr_2.4.2 tidyselect_1.1.1
[16] prettyunits_1.1.1 bit_4.0.4 curl_4.3.2 compiler_4.1.1 rvest_1.0.2
[21] Biobase_2.52.0 xml2_1.3.2 DelayedArray_0.18.0 scales_1.1.1 readr_2.0.2
[26] rappdirs_0.3.3 stringr_1.4.0 digest_0.6.28 R.utils_2.11.0 XVector_0.32.0
[31] pkgconfig_2.0.3 htmltools_0.5.2 MatrixGenerics_1.4.3 dbplyr_2.1.1 fastmap_1.1.0
[36] htmlwidgets_1.5.4 rlang_0.4.12 RSQLite_2.2.8 generics_0.1.0 jsonlite_1.7.2
[41] vroom_1.5.5 dplyr_1.0.7 R.oo_1.24.0 RCurl_1.98-1.5 magrittr_2.0.1
[46] GenomeInfoDbData_1.2.6 Matrix_1.3-4 Rcpp_1.0.7 munsell_0.5.0 S4Vectors_0.30.2
[51] fansi_0.5.0 lifecycle_1.0.1 R.methodsS3_1.8.1 stringi_1.7.5 SummarizedExperiment_1.22.0
[56] zlibbioc_1.38.0 plyr_1.8.6 BiocFileCache_2.0.0 grid_4.1.1 blob_1.2.2
[61] parallel_4.1.1 crayon_1.4.1 lattice_0.20-45 Biostrings_2.60.2 xlsxjars_0.6.1
[66] hms_1.1.1 KEGGREST_1.32.0 knitr_1.36 pillar_1.6.4 GenomicRanges_1.44.0
[71] TCGAbiolinksGUI.data_1.12.0 biomaRt_2.48.3 stats4_4.1.1 XML_3.99-0.8 glue_1.4.2
[76] downloader_0.4 data.table_1.14.2 BiocManager_1.30.16 png_0.1-7 vctrs_0.3.8
[81] tzdb_0.1.2 selectr_0.4-2 gtable_0.3.0 purrr_0.3.4 tidyr_1.1.4
[86] assertthat_0.2.1 cachem_1.0.6 ggplot2_3.3.5 xfun_0.27 tibble_3.1.5
[91] rJava_1.0-5 AnnotationDbi_1.54.1 memoise_2.0.0 IRanges_2.26.0 ellipsis_0.3.2
>
Thanks and best wishes, Rich
Tiago,
Thank you. It worked.
Best wishes, Rich