Entering edit mode
Some experiments on ArrayExpress only contain phenotypic information because the processed data live elsewhere. In particular, only .idf and .sdrf files might be present, but these files can be useful per se even if adf files are not posted, because ArrayExpress strictly subsumes GEO, so is a more canonical source. Currently, the package assumes ADF files are present, eg, line 3 of `readPhenoData.`
Repex:
> library(ArrayExpress)
> habib_ae <- getAE('GSE85721')
> pd <- ArrayExpress:::readPhenoData(habib_ae$sdrf, habib_ae$path)
ArrayExpress: Reading pheno data from SDRF Error in which(sapply(seq_len(nrow(pData(ph))), function(i) all(pData(ph)[i, : argument to 'which' is not logical> #Because line 3 of
readPhenoDataresults in an empty AnnotatedDataFrame
> sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: macOS Sierra 10.12.2 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] ArrayExpress_1.34.0 GEOquery_2.40.0 [3] MultiAssayExperiment_1.0.0 SummarizedExperiment_1.4.0 [5] GenomicRanges_1.26.2 GenomeInfoDb_1.10.2 [7] Zeisel2015Data_0.9 rmarkdown_1.3.9002 [9] RColorBrewer_1.1-2 Biobase_2.34.0 [11] stringr_1.1.0 Biostrings_2.42.1 [13] XVector_0.14.0 IRanges_2.8.1 [15] S4Vectors_0.12.1 BiocGenerics_0.20.0 [17] data.table_1.10.4 preprocessData_0.10.3 [19] knitr_1.15.1 devtools_1.12.0 loaded via a namespace (and not attached): [1] httr_1.2.1 splines_3.3.2 foreach_1.4.3 [4] shiny_1.0.0 assertthat_0.1 yaml_2.1.14 [7] RSQLite_1.1-2 backports_1.0.5 lattice_0.20-34 [10] limma_3.30.10 digest_0.6.12 oligoClasses_1.36.0 [13] colorspace_1.3-2 preprocessCore_1.36.0 htmltools_0.3.5 [16] httpuv_1.3.3 Matrix_1.2-8 plyr_1.8.4 [19] XML_3.98-1.5 affxparser_1.46.0 zlibbioc_1.20.0 [22] xtable_1.8-2 scales_0.4.1 whisker_0.3-2 [25] affyio_1.44.0 getopt_1.20.0 ff_2.2-13 [28] optparse_1.3.2 tibble_1.2 pkgmaker_0.22 [31] ggplot2_2.2.1 withr_1.0.2 oligo_1.38.0 [34] lazyeval_0.2.0 magrittr_1.5 crayon_1.3.2 [37] mime_0.5 memoise_1.0.0 evaluate_0.10 [40] doParallel_1.0.10 NMF_0.20.6 xml2_1.1.1 [43] shinydashboard_0.5.3 BiocInstaller_1.24.0 tools_3.3.2 [46] registry_0.3 gridBase_0.4-7 munsell_0.4.3 [49] cluster_2.0.5 rngtools_1.2.4 compiler_3.3.2 [52] grid_3.3.2 RCurl_1.95-4.8 iterators_1.0.8 [55] rstudioapi_0.6 bitops_1.0-6 gtable_0.2.0 [58] codetools_0.2-15 DBI_0.5-1 roxygen2_6.0.0 [61] reshape2_1.4.2 R6_2.2.0 bit_1.1-12 [64] commonmark_1.1 rprojroot_1.2 desc_1.1.0 [67] stringi_1.1.2 Rcpp_0.12.9
Hi Andrew, the error is happening in readPhenoData when it looks for Array.Data.File, which is empty because raw data is empty. Which results in empty ph object. It is possible however to modify the readPhenoData to look at Assay.Name instead and get a working function and a full object as a result. Not sure though how dependencies break if this is implemented in the package.
Hope this helps.
Regards,
Andrew
> pd <- ArrayExpress:::readPhenoData(habib_ae$sdrf, habib_ae$path)
debugging in: ArrayExpress:::readPhenoData(habib_ae$sdrf, habib_ae$path)
debug: {
message("ArrayExpress: Reading pheno data from SDRF")
ph = try(read.AnnotatedDataFrame(sdrf, path = path, row.names = NULL,
blank.lines.skip = TRUE, fill = TRUE, varMetadata.char = "$",
quote = "\""))
ph = ph[gsub(" ", "", ph$Array.Data.File) != “"] %% gsub return an empty index set, ph is assigned an empty array.
sampleNames(ph) = ph$Array.Data.File
ph@varMetadata["Array.Data.File", "labelDescription"] = "Index"
ph@varMetadata["Array.Data.File", "channel"] = as.factor("_ALL_")
emptylines = which(sapply(seq_len(nrow(pData(ph))), function(i) all(pData(ph)[i, %% this inevitably fails
] == "", na.rm = TRUE)))