dataAssy object is giving me the exact same output as in your case, though I am getting some warnings in the previous steps and error in colnames..I have pasted the complete session output below.
> library(TCGAbiolinks)
Warning messages:
1: replacing previous import by ‘grid::arrow’ when loading ‘TCGAbiolinks’
2: replacing previous import by ‘grid::unit’ when loading ‘TCGAbiolinks’
> library(SummarizedExperiment)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall,
clusterEvalQ, clusterExport, clusterMap,
parApply, parCapply, parLapply, parLapplyLB,
parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame,
as.vector, cbind, colnames, do.call,
duplicated, eval, evalq, Filter, Find, get,
grep, grepl, intersect, is.unsorted, lapply,
lengths, Map, mapply, match, mget, order,
paste, pmax, pmax.int, pmin, pmin.int,
Position, rank, rbind, Reduce, rownames,
sapply, setdiff, sort, table, tapply, union,
unique, unlist, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material;
view with 'browseVignettes()'. To cite
Bioconductor, see 'citation("Biobase")', and
for packages 'citation("pkgname")'.
> library(TCGAbiolinks)
>
> cancer <- "BRCA"
> PlatformCancer <- "IlluminaHiSeq_RNASeqV2"
> dataType <- "rsem.genes.results"
> pathCancer <- paste0("../data",cancer)
>
> datQuery <- TCGAquery(tumor = cancer, platform = PlatformCancer, level = "3")
> lsSample <- TCGAquery_samplesfilter(query = datQuery)
>
> # get subtype information
> dataSubt <- TCGAquery_subtype(tumor = cancer)
>
> # Which samples are Primary Solid Tumor
> dataSmTP <- TCGAquery_SampleTypes(barcode = lsSample$IlluminaHiSeq_RNASeqV2, typesample = "TP")
>
> # Which samples are Solid Tissue Normal
> dataSmTN <- TCGAquery_SampleTypes(barcode = lsSample$IlluminaHiSeq_RNASeqV2, typesample ="NT")
>
> # get clinical data
> dataClin <- TCGAquery_clinic(tumor = cancer, clinical_data_type = "clinical_patient")
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Downloading:1 files
| Path:./nationwidechildrens.org_BRCA.bio.Level_2.0.42.0
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
|=============================================| 100%
Tumor type: BRCA
| | 0%
Adding disease collumn to data frame
>
> TCGAdownload(data = datQuery,
+ path = pathCancer,
+ type = dataType,
+ samples = c(dataSmTP,dataSmTN))
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
| Downloading:1211 files
| Path:../dataBRCA/unc.edu_BRCA.IlluminaHiSeq_RNASeqV2.Level_3.1.11.0
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
|=============================================| 100%
>
> dataAssy <- TCGAprepare(query = datQuery,
+ dir = pathCancer,
+ type = dataType,
+ save = TRUE,
+ summarizedExperiment = TRUE,
+ samples = c(dataSmTP,dataSmTN),
+ filename = paste0(cancer,"_",PlatformCancer,".rda"))
|=============================================================================================================| 100%
Adding metadata to the rse object...
Saving the data...
Data saved in: BRCA_IlluminaHiSeq_RNASeqV2.rda
Warning messages:
1: In fread(files[i], header = TRUE, sep = "\t", stringsAsFactors = FALSE) :
Stopped reading at empty line 14607 but text exists afterwards (discarded): RALA|589
2: In data.table::data.table(...) :
Item 2 is of size 14605 but maximum size is 20531 (recycled leaving remainder of 5926 items)
3: In fread(files[i], header = TRUE, sep = "\t", stringsAsFactors = FALSE) :
Stopped reading at empty line 2346 but text exists afterwards (discarded): C21orf57|54059 500.0
4: In data.table::data.table(...) :
Item 2 is of size 2344 but maximum size is 20531 (recycled leaving remainder of 1779 items)
>
> dataAssy
class: RangedSummarizedExperiment
dim: 20330 1211
metadata(3): Query: TCGAprepareParameters
FilesInfo:
assays(2): raw_counts scaled_estimate
rownames(20330): A1BG|1 A1CF|29974 ...
ZZEF1|23140 ZZZ3|26009
rowRanges metadata column names(3): gene_id
entrezgene
transcript_id.transcript_id_TCGA-E9-A1RD-11A-33R-A157-07
colnames(1211): TCGA-E9-A1RD-11A-33R-A157-07
TCGA-E9-A1RC-01A-11R-A157-07 ...
TCGA-D8-A1J9-01A-11R-A13Q-07
TCGA-AC-A6IX-01A-12R-A32P-07
colData names(10): sample patient ... Siglust
PAM50
> dataPrep <- TCGAanalyze_Preprocessing(object = dataAssy, cor.cut = 0.6)
Error in `colnames<-`(`*tmp*`, value = c("TCGA-E9-A1RD-11A-33R-A157-07", :
length of 'dimnames' [2] not equal to array extent
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=English_India.1252 LC_CTYPE=English_India.1252 LC_MONETARY=English_India.1252 LC_NUMERIC=C LC_TIME=English_India.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] SummarizedExperiment_1.0.2 Biobase_2.30.0 GenomicRanges_1.22.4 GenomeInfoDb_1.6.3 IRanges_2.4.6
[6] S4Vectors_0.8.11 BiocGenerics_0.16.1 TCGAbiolinks_1.0.5
loaded via a namespace (and not attached):
[1] nlme_3.1-122 bitops_1.0-6 matrixStats_0.50.1
[4] devtools_1.10.0 doParallel_1.0.10 RColorBrewer_1.1-2
[7] httr_1.1.0 Rgraphviz_2.14.0 tools_3.2.3
[10] R6_2.1.2 affyio_1.40.0 KernSmooth_2.23-15
[13] DBI_0.3.1 colorspace_1.2-6 GGally_1.0.1
[16] preprocessCore_1.32.0 chron_2.3-47 graph_1.48.0
[19] rvest_0.3.1 xml2_0.1.2 sandwich_2.3-4
[22] rtracklayer_1.30.1 caTools_1.17.1 scales_0.3.0
[25] hexbin_1.27.1 mvtnorm_1.0-5 genefilter_1.52.1
[28] affy_1.48.0 DESeq_1.22.1 stringr_1.0.0
[31] supraHex_1.8.0 digest_0.6.9 Rsamtools_1.22.0
[34] R.utils_2.2.0 XVector_0.10.0 limma_3.26.7
[37] RSQLite_1.0.0 BiocInstaller_1.20.1 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[40] zoo_1.7-12 hwriter_1.3.2 BiocParallel_1.4.3
[43] gtools_3.5.0 xlsx_0.5.7 dplyr_0.4.3
[46] R.oo_1.19.0 RCurl_1.95-4.7 magrittr_1.5
[49] modeltools_0.2-21 heatmap.plus_1.3 futile.logger_1.4.1
[52] Matrix_1.2-3 Rcpp_0.12.3 munsell_0.4.2
[55] ape_3.4 R.methodsS3_1.7.0 stringi_1.0-1
[58] multcomp_1.4-3 edgeR_3.12.0 MASS_7.3-45
[61] zlibbioc_1.16.0 gplots_2.17.0 plyr_1.8.3
[64] grid_3.2.3 gdata_2.17.0 lattice_0.20-33
[67] Biostrings_2.38.3 splines_3.2.3 xlsxjars_0.6.1
[70] GenomicFeatures_1.22.12 annotate_1.48.0 EDASeq_2.4.1
[73] igraph_1.0.1 rjson_0.2.15 geneplotter_1.48.0
[76] codetools_0.2-14 biomaRt_2.26.1 futile.options_1.0.0
[79] XML_3.98-1.3 ShortRead_1.28.0 downloader_0.4
[82] latticeExtra_0.6-26 lambda.r_1.1.7 data.table_1.9.6
[85] foreach_1.4.3 gtable_0.1.2 reshape_0.8.5
[88] assertthat_0.1 ggplot2_2.0.0 dnet_1.0.7
[91] aroma.light_3.0.0 coin_1.1-2 xtable_1.8-0
[94] ConsensusClusterPlus_1.24.0 survival_2.38-3 rJava_0.9-8
[97] iterators_1.0.8 GenomicAlignments_1.6.3 AnnotationDbi_1.32.3
[100] memoise_1.0.0 cluster_2.0.3 TH.data_1.0-7
One possibility is that one of the many packages you have loaded define a generic
colnames<-
ordimnames<-
that interferes with the version of these functions TCGAbiolinks is expecting. Immediately after the error occurs, run the commandtraceback()
> dataPrep <- TCGAanalyze_Preprocessing(object = dataAssy, cor.cut = 0.6) Error in `colnames<-`(`*tmp*`, value = c("TCGA-E9-A1RD-11A-33R-A157-07", : length of 'dimnames' [2] not equal to array extent > traceback() ## WHAT IS YOUR OUTPUT HERE?
Also, it would be helpful to see the output of
Ok, I ran in windows and I still have no problem. But I don't have these warnings below. Maybe some files were corrupted during download. Please, could you send me your dataAssy object?
1: In fread(files[i], header = TRUE, sep = "\t", stringsAsFactors = FALSE) :
Stopped reading at empty line 14607 but text exists afterwards (discarded): RALA|589
2: In data.table::data.table(...) :
Item 2 is of size 14605 but maximum size is 20531 (recycled leaving remainder of 5926 items)
3: In fread(files[i], header = TRUE, sep = "\t", stringsAsFactors = FALSE) :
Stopped reading at empty line 2346 but text exists afterwards (discarded): C21orf57|54059 500.0
4: In data.table::data.table(...) :
Item 2 is of size 2344 but maximum size is 20531 (recycled leaving remainder of 1779 items)
Here is the link to dataAssay object which has been created on my system
https://drive.google.com/file/d/0B9SRy5XoOWiENlU4UmNTMVBYRjA/view?usp=sharing
Your object is equal to mine except for two samples. I believe somehow some files were corrupted during download. And the package does not check for the data integrity.
> assay(dataAssy)[15000:15002,492]
SCO1|6341 SCO2|9997 SCOC|60592
5.00 4015.42 3398.00
> assay(dataAssy2)[15000:15002,492]
SCO1|6341 SCO2|9997 SCOC|60592
1731 644 9153
The two files are unc.edu.6130b450-8b88-4a9a-b462-a34ec94183c9.1163157.rsem.genes.results and unc.edu.97b5ef6f-d621-4093-ab77-d60dcf706173.1152807.rsem.genes.results.
Could you remove them from dataBRCA and run TCGADownload and TCGAPrepare again?
Also my object is here
You can run this command to see if the prepared data are equal
Also TCGAanalyze_Preprocessing had a bug. The fix should be in bioconductor tonight (version 1.0.7). It is available in the github repository.
Your object is equal to mine except for two samples. I believe somehow some files were corrupted during download. And the package does not check for the data integrity.
> assay(dataAssy)[15000:15002,492]
SCO1|6341 SCO2|9997 SCOC|60592
5.00 4015.42 3398.00
> assay(dataAssy2)[15000:15002,492]
SCO1|6341 SCO2|9997 SCOC|60592
1731 644 9153
The two files are unc.edu.6130b450-8b88-4a9a-b462-a34ec94183c9.1163157.rsem.genes.results and unc.edu.97b5ef6f-d621-4093-ab77-d60dcf706173.1152807.rsem.genes.results.
Could you remove them from dataBRCA and run TCGADownload and TCGAPrepare again?
Also my object is here
You can run this command to see if the prepared data are equal
Also TCGAanalyze_Preprocessing had a bug. The fix should be in bioconductor tonight (version 1.0.7). It is available in the github repository.
We just added the code to check data integrety, it is still in test. But could you reinstall and run your code again?
The corrupted files should be redownloaded and TCGAPrepare should not show more warnings,
@tiagochst Thank You for looking into this error, I installed the new version of TCGA which is 1.0.7 and tried running through case 1. I am no more getting the Error in 'colnames' issue, but now the error is at data filtering and data DEGs step..
here is complete run code including trace back and session info as well as comparison of the datAssay object
And upon comparing the datAssay object of yours and the one generate on my system ( can be accessed @ https://drive.google.com/open?id=0B9SRy5XoOWiEbllNalIxaTZSVk0), its not same and differs as following
As there were outliers, the data frame lost some columns. But
is considering the object has them all. The code should be something like?
About the data download, I saw you are using 1.0.7, it does not have the verification of downloaded files. So it is not going to correct the files. Please install it with:
and rerun the code.
I am not able to install from devtools. Am I doing something wrong??
nlme is outdated. Try updating the packages or installing the last version manually https://cran.r-project.org/web/packages/nlme/index.html (the last version is 3.1-124)
updated nlme; Now the installation process has come up with three more errors
Strangely, one of the complaint is for java, but java on my system is updated:
How to resolve this?
I am able to install TCGAbiolinks from git hub repo with few warnings and while downloading the corrupted file, it is entering into an infinite loop..
Still not able to install from github on Windows R studio, here is the
> devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks")
Downloading GitHub repo BioinformaticsFMRP/TCGAbiolinks@master
from URL https://api.github.com/repos/BioinformaticsFMRP/TCGAbiolinks/zipball/master
Installing TCGAbiolinks
Skipping 2 unavailable packages: ALL, TxDb.Hsapiens.UCSC.hg19.knownGene
Installing 1 package: IRanges
Warning: package ‘IRanges’ is in use and will not be installed
"C:/PROGRA~1/R/R-32~1.3/bin/x64/R" --no-site-file --no-environ --no-save --no-restore CMD INSTALL \
"C:/Users/bioxcel/AppData/Local/Temp/Rtmp0gncV8/devtools147c13c166eb/BioinformaticsFMRP-TCGAbiolinks-c73bb0f" \
--library="C:/Users/bioxcel/Documents/R/win-library/3.2" --install-tests
* installing *source* package 'TCGAbiolinks' ...
Warning in file.copy(f, instdir, TRUE) :
problem copying .\NAMESPACE to C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\NAMESPACE: Permission denied
Warning in file(file, ifelse(append, "a", "w")) :
cannot open file 'C:/Users/bioxcel/Documents/R/win-library/3.2/TCGAbiolinks/DESCRIPTION': No such file or directory
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
ERROR: installing package DESCRIPTION failed for package 'TCGAbiolinks'
* restoring previous 'C:/Users/bioxcel/Documents/R/win-library/3.2/TCGAbiolinks'
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem copying C:\Users\bioxcel\Documents\R\win-library\3.2\00LOCK-BioinformaticsFMRP-TCGAbiolinks-c73bb0f\TCGAbiolinks\CITATION to C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\CITATION: No such file or directory
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem creating directory C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\data: No such file or directory
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem copying C:\Users\bioxcel\Documents\R\win-library\3.2\00LOCK-BioinformaticsFMRP-TCGAbiolinks-c73bb0f\TCGAbiolinks\DESCRIPTION to C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\DESCRIPTION: No such file or directory
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem creating directory C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\help: No such file or directory
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem creating directory C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\html: No such file or directory
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem copying C:\Users\bioxcel\Documents\R\win-library\3.2\00LOCK-BioinformaticsFMRP-TCGAbiolinks-c73bb0f\TCGAbiolinks\INDEX to C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\INDEX: No such file or directory
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem creating directory C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\Meta: No such file or directory
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem copying C:\Users\bioxcel\Documents\R\win-library\3.2\00LOCK-BioinformaticsFMRP-TCGAbiolinks-c73bb0f\TCGAbiolinks\NAMESPACE to C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\NAMESPACE: No such file or directory
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem copying C:\Users\bioxcel\Documents\R\win-library\3.2\00LOCK-BioinformaticsFMRP-TCGAbiolinks-c73bb0f\TCGAbiolinks\NEWS to C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\NEWS: No such file or directory
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem creating directory C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\R: No such file or directory
Warning in file.copy(lp, dirname(pkgdir), recursive = TRUE, copy.date = TRUE) :
problem creating directory C:\Users\bioxcel\Documents\R\win-library\3.2\TCGAbiolinks\tests: No such file or directory
Error: Command failed (1)
> traceback()
10: stop("Command failed (", status, ")", call. = FALSE)
9: system_check(r_path, options, c(r_profile(), r_env_vars(), env_vars),
...)
8: force(code)
7: withr::with_dir(path, system_check(r_path, options, c(r_profile(),
r_env_vars(), env_vars), ...))
6: R(paste("CMD INSTALL ", shQuote(built_path), " ", opts, sep = ""),
quiet = quiet)
5: install(source, ..., quiet = quiet, metadata = metadata)
4: FUN(X[[i]], ...)
3: vapply(remotes, install_remote, ..., FUN.VALUE = logical(1))
2: install_remotes(remotes, quiet = quiet, ...)
1: devtools::install_github(repo = "BioinformaticsFMRP/TCGAbiolinks")
As you said this might be windows specific error, so I tried it with RStudio server, though I am able to install TCGAbiolinks but error while downloading the file, it seems to me that TCGA link for the BRCA RNASeq datasets has changed
I see this error. If I
and then run
I get to
Entering '1' and exploring a bit, I seem I'm at the last several lines of the function
and that the subset leading to
objectW0
drops one columnso the attempt to update the
colnames()
has the wrong length; it should beI'm not sure why the example is not reproducible by @tiagochst; maybe the upstream files have changed, and the attempt to reproduce uses a cache?
I was impressed with the clear design of the package, especially that my download recovered rather than starting over!
Yes, it's a bug. Using a cor.cut higher I could reproduce the error, but why it is not reproducible with the same cor.cut was quite strange.