ExperimentHub()[["EH359"]] has character expression values
3
2
Entering edit mode
@wolfgang-huber-3550
Last seen 11 weeks ago
EMBL European Molecular Biology Laborat…

I am not sure what the right protocol is for reporting bugs in ExperimentHub, but here we go. It appears that the dataset ExperimentHub()[["EH359"]] (apparently a.k.a. curatedMetagenomicData::ZellerG_2014.marker_abundance) is an ExpressionSet whose exprs is a matrix of characters. The matrix can be converted to numeric, and all elements seem to represent legitimate numbers, but I wonder whether this should not be fixed upstream.

 

library("ExperimentHub")

eh = ExperimentHub()

# snapshotDate(): 2016-10-26
zeller = eh[["EH359"]]
# see ?curatedMetagenomicData and browseVignettes('curatedMetagenomicData') for documentation
# loading from cache ‘/Users/huber//.ExperimentHub/359’
str(exprs(zeller))
# chr [1:130272, 1:156] "1.8115942029" "17.0542635659" "55.5555555556" ...
# - attr(*, "dimnames")=List of 2
#  ..$ : chr [1:130272] "gi|333126069|ref|NZ_AEMJ01000490.1|:c656-105" #"gi|381149847|ref|NZ_JH604847.1|:635-1279" "gi|331001572|ref|NZ_GL883724.1|:311-544" "gi|381150020|ref|NZ_JH605020.1|:16763-17575" ...
#  ..$ : chr [1:156] "CCIS00146684ST-4-0" "CCIS00281083ST-3-0" "CCIS02124300ST-4-0" "CCIS02379307ST-4-0" ...
nonum = is.na(as.numeric(exprs(zeller)))
table(nonum)
#nonum
#  FALSE
#20322432

sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.12.1 (Sierra)

locale:
[1] C/UTF-8/C/C/C/C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] curatedMetagenomicData_1.0.0 phyloseq_1.18.0             
 [3] magrittr_1.5                 ExperimentHubData_1.0.0     
 [5] AnnotationHubData_1.4.0      futile.logger_1.4.3         
 [7] GenomicRanges_1.26.1         GenomeInfoDb_1.10.0         
 [9] IRanges_2.8.0                S4Vectors_0.12.0            
[11] Biobase_2.34.0               ExperimentHub_1.0.0         
[13] AnnotationHub_2.6.0          BiocGenerics_0.20.0         
[15] fortunes_1.5-3              

loaded via a namespace (and not attached):
 [1] httr_1.2.1                    splines_3.3.1                
 [3] jsonlite_1.1                  foreach_1.4.3                
 [5] shiny_0.14.2                  interactiveDisplayBase_1.12.0
 [7] RBGL_1.50.0                   Rsamtools_1.26.1             
 [9] RSQLite_1.0.0                 lattice_0.20-34              
[11] RUnit_0.4.31                  chron_2.3-47                 
[13] digest_0.6.10                 XVector_0.14.0               
[15] colorspace_1.2-7              htmltools_0.3.5              
[17] httpuv_1.3.3                  Matrix_1.2-7.1               
[19] plyr_1.8.4                    OrganismDbi_1.16.0           
[21] GEOquery_2.40.0               XML_3.98-1.4                 
[23] biomaRt_2.30.0                rBiopaxParser_2.14.0         
[25] zlibbioc_1.20.0               xtable_1.8-2                 
[27] scales_0.4.0                  getopt_1.20.0                
[29] optparse_1.3.2                BiocParallel_1.8.1           
[31] biocViews_1.42.0              mgcv_1.8-15                  
[33] ggplot2_2.1.0                 SummarizedExperiment_1.4.0   
[35] GenomicFeatures_1.26.0        survival_2.40-1              
[37] mime_0.5                      MASS_7.3-45                  
[39] nlme_3.1-128                  xml2_1.0.0                   
[41] vegan_2.4-1                   graph_1.52.0                 
[43] BiocInstaller_1.24.0          tools_3.3.1                  
[45] data.table_1.9.6              stringr_1.1.0                
[47] munsell_0.4.3                 cluster_2.0.5                
[49] AnnotationDbi_1.36.0          lambda.r_1.1.9               
[51] Biostrings_2.42.0             ade4_1.7-4                   
[53] rhdf5_2.18.0                  grid_3.3.1                   
[55] RCurl_1.95-4.8                iterators_1.0.8              
[57] biomformat_1.2.0              AnnotationForge_1.16.0       
[59] igraph_1.0.1                  bitops_1.0-6                 
[61] multtest_2.30.0               gtable_0.2.0                 
[63] codetools_0.2-15              DBI_0.5-1                    
[65] curl_2.2                      reshape2_1.4.2               
[67] R6_2.2.0                      GenomicAlignments_1.10.0     
[69] rtracklayer_1.34.1            futile.options_1.0.0         
[71] permute_0.9-4                 ape_3.5                      
[73] stringi_1.1.2                 Rcpp_0.12.7                  
[75] BiocCheck_1.10.0             

 

ExperimentHub curatedMetagenomicData • 774 views
ADD COMMENT
3
Entering edit mode
@schifferl
Last seen 16 days ago
Boston University, Boston, MA

The issue is related to this function:

merged_eset <- function(assay_data, pheno_data, experiment_data) {
    as.matrix(assay_data) %>%
    ExpressionSet(., pheno_data, experimentData = experiment_data)
}

I have made the following change to ensure that all the matrices are numeric. The datasets will take just over 24 hours to regenerate and will be updated within ExperimentHub early next week. Sorry for the confusion.

merged_eset <- function(assay_data, pheno_data, experiment_data) {
    data.matrix(assay_data) %>%
    ExpressionSet(., pheno_data, experimentData = experiment_data)
}
ADD COMMENT
0
Entering edit mode
@levi-waldron-3429
Last seen 3 months ago
CUNY Graduate School of Public Health a…

Thanks Wolfgang - it's a pipeline bug that we'll fix and post here when that's done. 

ADD COMMENT
0
Entering edit mode
@schifferl
Last seen 16 days ago
Boston University, Boston, MA

Hello again, just wanted to give an update that the issue with the character matrix is fixed. All data in ExperimentHub should be correct! : )

ADD COMMENT

Login before adding your answer.

Traffic: 331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6