Hi everybody,
when I try to use TCGAbiolinks I get an error when preparing the Data. It would be awesome if somebody might have an answer for my problem, thank you!
So I have no problem in downloading the Data via GDCdownload, but as soon as I try to prepare the Data ( GDCprepare) to get a summarized Experiment I get this error:
Downloading genome information (try:0) Using: Homo sapiens genes (GRCh37.p13) Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column
Here is my code that I want to use to download and prepare the expression data of the TCGA KIRC project:
library(TCGAbiolinks) query <- GDCquery(project = "TCGA-KIRC", legacy = TRUE, data.category = "Gene expression", data.type = "Gene expression quantification", sample.type = "Primary solid Tumor", file.type = "normalized_results") GDCdownload(query, method = "api") data <- GDCprepare(query, save = TRUE, save.filename = "Gene_Expression_Quantification.rda", remove.files.prepared = TRUE)
thank you so much!
Felix
> sessionInfo() R version 3.4.0 Patched (2017-06-17 r72807) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200) Matrix products: default locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C [5] LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] TCGAbiolinks_2.5.4 loaded via a namespace (and not attached): [1] circlize_0.4.0 fastmatch_1.1-0 aroma.light_3.6.0 plyr_1.8.4 [5] igraph_1.0.1 selectr_0.3-1 ConsensusClusterPlus_1.40.0 lazyeval_0.2.0 [9] splines_3.4.0 BiocParallel_1.10.1 pathview_1.16.0 GenomeInfoDb_1.12.2 [13] ggplot2_2.2.1 digest_0.6.12 foreach_1.4.3 GOSemSim_2.2.0 [17] viridis_0.4.0 GO.db_3.4.1 magrittr_1.5 memoise_1.1.0 [21] cluster_2.0.6 doParallel_1.0.10 limma_3.32.2 ComplexHeatmap_1.14.0 [25] Biostrings_2.44.1 readr_1.1.1 annotate_1.54.0 matrixStats_0.52.2 [29] R.utils_2.5.0 colorspace_1.3-2 rvest_0.3.2 ggrepel_0.6.5 [33] dplyr_0.7.0 RCurl_1.95-4.8 jsonlite_1.5 hexbin_1.27.1 [37] graph_1.54.0 genefilter_1.58.1 supraHex_1.14.0 zoo_1.8-0 [41] survival_2.41-3 iterators_1.0.8 ape_4.1 glue_1.1.0 [45] survminer_0.4.0 gtable_0.2.0 zlibbioc_1.22.0 XVector_0.16.0 [49] GetoptLong_0.1.6 DelayedArray_0.2.7 kernlab_0.9-25 Rgraphviz_2.20.0 [53] shape_1.4.2 prabclus_2.2-6 BiocGenerics_0.22.0 DEoptimR_1.0-8 [57] scales_0.4.1 DOSE_3.2.0 DESeq_1.28.0 mvtnorm_1.0-6 [61] DBI_0.7 edgeR_3.18.1 ggthemes_3.4.0 Rcpp_0.12.11 [65] cmprsk_2.2-7 viridisLite_0.2.0 xtable_1.8-2 foreign_0.8-67 [69] matlab_1.0.2 mclust_5.3 km.ci_0.5-2 stats4_3.4.0 [73] httr_1.2.1 fgsea_1.2.1 RColorBrewer_1.1-2 fpc_2.1-10 [77] modeltools_0.2-21 XML_3.98-1.8 R.methodsS3_1.7.1 flexmix_2.3-14 [81] nnet_7.3-12 locfit_1.5-9.1 rlang_0.1.1 reshape2_1.4.2 [85] AnnotationDbi_1.38.1 munsell_0.4.3 tools_3.4.0 downloader_0.4 [89] RSQLite_1.1-2 broom_0.4.2 stringr_1.2.0 knitr_1.16 [93] robustbase_0.92-7 survMisc_0.5.4 purrr_0.2.2.2 KEGGREST_1.16.0 [97] dendextend_1.5.2 EDASeq_2.10.0 nlme_3.1-131 whisker_0.3-2 [101] R.oo_1.21.0 KEGGgraph_1.34.0 DO.db_2.9 xml2_1.1.1 [105] biomaRt_2.32.1 compiler_3.4.0 curl_2.6 png_0.1-7 [109] tibble_1.3.3 geneplotter_1.54.0 stringi_1.1.5 GenomicFeatures_1.28.3 [113] lattice_0.20-35 trimcluster_0.1-2 Matrix_1.2-10 psych_1.7.5 [117] KMsurv_0.1-5 GlobalOptions_0.0.12 data.table_1.10.4 bitops_1.0-6 [121] rtracklayer_1.36.3 GenomicRanges_1.28.3 qvalue_2.8.0 R6_2.2.2 [125] latticeExtra_0.6-28 hwriter_1.3.2 ShortRead_1.34.0 gridExtra_2.2.1 [129] IRanges_2.10.2 codetools_0.2-15 MASS_7.3-47 assertthat_0.2.0 [133] SummarizedExperiment_1.6.3 rjson_0.2.15 GenomicAlignments_1.12.1 Rsamtools_1.28.0 [137] mnormt_1.5-5 S4Vectors_0.14.3 GenomeInfoDbData_0.99.0 diptest_0.75-7 [141] parallel_3.4.0 hms_0.3 clusterProfiler_3.4.3 grid_3.4.0 [145] tidyr_0.6.3 class_7.3-14 rvcheck_0.0.8 ggpubr_0.1.3 [149] Biobase_2.36.2
Hi Felix,
I know it is an old issue that you reported but I am facing the same problem right now. Have you managed to sort it out?
Thanks,
Krzysztof