Question: Error using TCGAbiolinks "Error in, y) : 'by' must specify a uniquely valid column"
Hi everybody,

when I try to use TCGAbiolinks I get an error when preparing the Data. It would be awesome if somebody might have an answer for my problem, thank you!

So I have no problem in downloading the Data via GDCdownload, but as soon as I try to prepare the Data ( GDCprepare) to get a summarized Experiment I get this error:

Downloading genome information (try:0) Using: Homo sapiens genes (GRCh37.p13)
Error in, y) : 'by' must specify a uniquely valid column

Here is my code that I want to use to download and prepare the expression data of the TCGA KIRC project:

query <- GDCquery(project = "TCGA-KIRC",
                  legacy = TRUE,
                  data.category = "Gene expression",
                  data.type = "Gene expression quantification",
                  sample.type = "Primary solid Tumor",
                  file.type =  "normalized_results")

GDCdownload(query, method = "api")

data <- GDCprepare(query, save = TRUE,
  save.filename = "Gene_Expression_Quantification.rda",
  remove.files.prepared = TRUE)


thank you so much!


> sessionInfo()
R version 3.4.0 Patched (2017-06-17 r72807)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TCGAbiolinks_2.5.4

loaded via a namespace (and not attached):
  [1] circlize_0.4.0              fastmatch_1.1-0             aroma.light_3.6.0           plyr_1.8.4                 
  [5] igraph_1.0.1                selectr_0.3-1               ConsensusClusterPlus_1.40.0 lazyeval_0.2.0             
  [9] splines_3.4.0               BiocParallel_1.10.1         pathview_1.16.0             GenomeInfoDb_1.12.2        
 [13] ggplot2_2.2.1               digest_0.6.12               foreach_1.4.3               GOSemSim_2.2.0             
 [17] viridis_0.4.0               GO.db_3.4.1                 magrittr_1.5                memoise_1.1.0              
 [21] cluster_2.0.6               doParallel_1.0.10           limma_3.32.2                ComplexHeatmap_1.14.0      
 [25] Biostrings_2.44.1           readr_1.1.1                 annotate_1.54.0             matrixStats_0.52.2         
 [29] R.utils_2.5.0               colorspace_1.3-2            rvest_0.3.2                 ggrepel_0.6.5              
 [33] dplyr_0.7.0                 RCurl_1.95-4.8              jsonlite_1.5                hexbin_1.27.1              
 [37] graph_1.54.0                genefilter_1.58.1           supraHex_1.14.0             zoo_1.8-0                  
 [41] survival_2.41-3             iterators_1.0.8             ape_4.1                     glue_1.1.0                 
 [45] survminer_0.4.0             gtable_0.2.0                zlibbioc_1.22.0             XVector_0.16.0             
 [49] GetoptLong_0.1.6            DelayedArray_0.2.7          kernlab_0.9-25              Rgraphviz_2.20.0           
 [53] shape_1.4.2                 prabclus_2.2-6              BiocGenerics_0.22.0         DEoptimR_1.0-8             
 [57] scales_0.4.1                DOSE_3.2.0                  DESeq_1.28.0                mvtnorm_1.0-6              
 [61] DBI_0.7                     edgeR_3.18.1                ggthemes_3.4.0              Rcpp_0.12.11               
 [65] cmprsk_2.2-7                viridisLite_0.2.0           xtable_1.8-2                foreign_0.8-67             
 [69] matlab_1.0.2                mclust_5.3                  km.ci_0.5-2                 stats4_3.4.0               
 [73] httr_1.2.1                  fgsea_1.2.1                 RColorBrewer_1.1-2          fpc_2.1-10                 
 [77] modeltools_0.2-21           XML_3.98-1.8                R.methodsS3_1.7.1           flexmix_2.3-14             
 [81] nnet_7.3-12                 locfit_1.5-9.1              rlang_0.1.1                 reshape2_1.4.2             
 [85] AnnotationDbi_1.38.1        munsell_0.4.3               tools_3.4.0                 downloader_0.4             
 [89] RSQLite_1.1-2               broom_0.4.2                 stringr_1.2.0               knitr_1.16                 
 [93] robustbase_0.92-7           survMisc_0.5.4              purrr_0.2.2.2               KEGGREST_1.16.0            
 [97] dendextend_1.5.2            EDASeq_2.10.0               nlme_3.1-131                whisker_0.3-2              
[101] R.oo_1.21.0                 KEGGgraph_1.34.0            DO.db_2.9                   xml2_1.1.1                 
[105] biomaRt_2.32.1              compiler_3.4.0              curl_2.6                    png_0.1-7                  
[109] tibble_1.3.3                geneplotter_1.54.0          stringi_1.1.5               GenomicFeatures_1.28.3     
[113] lattice_0.20-35             trimcluster_0.1-2           Matrix_1.2-10               psych_1.7.5                
[117] KMsurv_0.1-5                GlobalOptions_0.0.12        data.table_1.10.4           bitops_1.0-6               
[121] rtracklayer_1.36.3          GenomicRanges_1.28.3        qvalue_2.8.0                R6_2.2.2                   
[125] latticeExtra_0.6-28         hwriter_1.3.2               ShortRead_1.34.0            gridExtra_2.2.1            
[129] IRanges_2.10.2              codetools_0.2-15            MASS_7.3-47                 assertthat_0.2.0           
[133] SummarizedExperiment_1.6.3  rjson_0.2.15                GenomicAlignments_1.12.1    Rsamtools_1.28.0           
[137] mnormt_1.5-5                S4Vectors_0.14.3            GenomeInfoDbData_0.99.0     diptest_0.75-7             
[141] parallel_3.4.0              hms_0.3                     clusterProfiler_3.4.3       grid_3.4.0                 
[145] tidyr_0.6.3                 class_7.3-14                rvcheck_0.0.8               ggpubr_0.1.3               
[149] Biobase_2.36.2       


So.. Apperently it's an error in the summarizedexperiment generation, as the GDCprepare works if I use the following code:

df <- GDCprepare(query,
                 save.filename = "Gene_Expression_Quantification.rda",
                 summarizedExperiment = FALSE)

Nevertheless I would like to get a summarizedExperiment, as I would like to combine the expression data easily to the clinical data.

thank you so much for your help!

Felix Geist


PhD student dkfz Heidelberg


