Question: ArrayExpress fails with experiments without array data but could still return parsed .idf and .sdrf
gravatar for Andrew_McDavid
2.7 years ago by
Andrew_McDavid190 wrote:

Some experiments on ArrayExpress only contain phenotypic information because the processed data live elsewhere. In particular, only .idf and .sdrf files might be present, but these files can be useful per se even if adf files are not posted, because ArrayExpress strictly subsumes GEO, so is a more canonical source.  Currently, the package assumes ADF files are present, eg, line 3 of `readPhenoData.`


> library(ArrayExpress)
> habib_ae <- getAE('GSE85721')
> pd <- ArrayExpress:::readPhenoData(habib_ae$sdrf, habib_ae$path)
ArrayExpress: Reading pheno data from SDRF
Error in which(sapply(seq_len(nrow(pData(ph))), function(i) all(pData(ph)[i,  : 
  argument to 'which' is not logical
> #Because line 3 of readPhenoData results in an empty AnnotatedDataFrame
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.2

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] ArrayExpress_1.34.0        GEOquery_2.40.0           
 [3] MultiAssayExperiment_1.0.0 SummarizedExperiment_1.4.0
 [5] GenomicRanges_1.26.2       GenomeInfoDb_1.10.2       
 [7] Zeisel2015Data_0.9         rmarkdown_1.3.9002        
 [9] RColorBrewer_1.1-2         Biobase_2.34.0            
[11] stringr_1.1.0              Biostrings_2.42.1         
[13] XVector_0.14.0             IRanges_2.8.1             
[15] S4Vectors_0.12.1           BiocGenerics_0.20.0       
[17] data.table_1.10.4          preprocessData_0.10.3     
[19] knitr_1.15.1               devtools_1.12.0           

loaded via a namespace (and not attached):
 [1] httr_1.2.1            splines_3.3.2         foreach_1.4.3        
 [4] shiny_1.0.0           assertthat_0.1        yaml_2.1.14          
 [7] RSQLite_1.1-2         backports_1.0.5       lattice_0.20-34      
[10] limma_3.30.10         digest_0.6.12         oligoClasses_1.36.0  
[13] colorspace_1.3-2      preprocessCore_1.36.0 htmltools_0.3.5      
[16] httpuv_1.3.3          Matrix_1.2-8          plyr_1.8.4           
[19] XML_3.98-1.5          affxparser_1.46.0     zlibbioc_1.20.0      
[22] xtable_1.8-2          scales_0.4.1          whisker_0.3-2        
[25] affyio_1.44.0         getopt_1.20.0         ff_2.2-13            
[28] optparse_1.3.2        tibble_1.2            pkgmaker_0.22        
[31] ggplot2_2.2.1         withr_1.0.2           oligo_1.38.0         
[34] lazyeval_0.2.0        magrittr_1.5          crayon_1.3.2         
[37] mime_0.5              memoise_1.0.0         evaluate_0.10        
[40] doParallel_1.0.10     NMF_0.20.6            xml2_1.1.1           
[43] shinydashboard_0.5.3  BiocInstaller_1.24.0  tools_3.3.2          
[46] registry_0.3          gridBase_0.4-7        munsell_0.4.3        
[49] cluster_2.0.5         rngtools_1.2.4        compiler_3.3.2       
[52] grid_3.3.2            RCurl_1.95-4.8        iterators_1.0.8      
[55] rstudioapi_0.6        bitops_1.0-6          gtable_0.2.0         
[58] codetools_0.2-15      DBI_0.5-1             roxygen2_6.0.0       
[61] reshape2_1.4.2        R6_2.2.0              bit_1.1-12           
[64] commonmark_1.1        rprojroot_1.2         desc_1.1.0           
[67] stringi_1.1.2         Rcpp_0.12.9          



arrayexpress • 593 views
ADD COMMENTlink written 2.7 years ago by Andrew_McDavid190

Hi Andrew, the error is happening in readPhenoData when it looks for Array.Data.File, which is empty because raw data is empty. Which results in empty ph object. It is possible however to modify the readPhenoData to look at Assay.Name instead and get a working function and a full object as a result. Not sure though how dependencies break if this is implemented in the package.

Hope this helps.




> pd <- ArrayExpress:::readPhenoData(habib_ae$sdrf, habib_ae$path)
debugging in: ArrayExpress:::readPhenoData(habib_ae$sdrf, habib_ae$path)
debug: {
    message("ArrayExpress: Reading pheno data from SDRF")
    ph = try(read.AnnotatedDataFrame(sdrf, path = path, row.names = NULL,
        blank.lines.skip = TRUE, fill = TRUE, varMetadata.char = "$",
        quote = "\""))
    ph = ph[gsub(" ", "", ph$Array.Data.File) != “"] %% gsub return an empty index set, ph is assigned an empty array.
    sampleNames(ph) = ph$Array.Data.File
    ph@varMetadata["Array.Data.File", "labelDescription"] = "Index"
    ph@varMetadata["Array.Data.File", "channel"] = as.factor("_ALL_")
    emptylines = which(sapply(seq_len(nrow(pData(ph))), function(i) all(pData(ph)[i, %% this inevitably fails
        ] == "", na.rm = TRUE)))




ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by andrew.tikhonov0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 217 users visited in the last hour