15 months ago by
United States
Hi Mustafa,
Nearly 13k samples from the SRA ones don't have any characteristics or GEO accession numbers as shown with the code below. There's nothing we can really do about it. Sometimes updates in SRAdb include new GEO accession numbers. The issue with sample metadata being incomplete is a problem that Shannon Ellis and others have tried to address in different ways. Check http://biorxiv.org/content/early/2017/06/03/145656, http://metasra.biostat.wisc.edu/publication.html, SHARQ beta http://www.cs.cmu.edu/~ckingsf/sharq/about.html and elsewhere.
Regarding the row.names issue, if you have some reproducible code then I bet other people could help you out. And if you could highlight what step is actually failing that'd be great too. In any case, if you are combining rows, you could set the row names to be unique before combining them.
Best,
Leonardo
> library(recount)
> m <- all_metadata()
> table(sum(is.na(m$characteristics)) == 1)
FALSE TRUE
37278 12821
> table(is.na(m$geo_accession))
FALSE TRUE
37395 12704
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] recount_1.4.0 SummarizedExperiment_1.8.0 DelayedArray_0.4.0 matrixStats_0.52.2 Biobase_2.38.0
[6] GenomicRanges_1.30.0 GenomeInfoDb_1.14.0 IRanges_2.12.0 S4Vectors_0.16.0 BiocGenerics_0.24.0
loaded via a namespace (and not attached):
[1] bitops_1.0-6 bit64_0.9-7 RColorBrewer_1.1-2 progress_1.1.2 httr_1.3.1 GenomicFiles_1.14.0
[7] tools_3.4.2 backports_1.1.1 doRNG_1.6.6 R6_2.2.2 rpart_4.1-11 Hmisc_4.0-3
[13] DBI_0.7 lazyeval_0.2.1 colorspace_1.3-2 nnet_7.3-12 gridExtra_2.3 prettyunits_1.0.2
[19] RMySQL_0.10.13 bit_1.1-12 compiler_3.4.2 htmlTable_1.9 derfinder_1.12.0 xml2_1.1.1
[25] pkgmaker_0.22 rtracklayer_1.38.0 scales_0.5.0 checkmate_1.8.5 readr_1.1.1 stringr_1.2.0
[31] digest_0.6.12 Rsamtools_1.30.0 foreign_0.8-69 rentrez_1.1.0 GEOquery_2.46.1 XVector_0.18.0
[37] base64enc_0.1-3 pkgconfig_2.0.1 htmltools_0.3.6 BSgenome_1.46.0 htmlwidgets_0.9 rlang_0.1.2
[43] RSQLite_2.0 bindr_0.1 jsonlite_1.5 BiocParallel_1.12.0 acepack_1.4.1 dplyr_0.7.4
[49] VariantAnnotation_1.24.0 RCurl_1.95-4.8 magrittr_1.5 GenomeInfoDbData_0.99.1 Formula_1.2-2 Matrix_1.2-11
[55] Rcpp_0.12.13 munsell_0.4.3 stringi_1.1.5 zlibbioc_1.24.0 qvalue_2.10.0 plyr_1.8.4
[61] bumphunter_1.20.0 grid_3.4.2 blob_1.1.0 lattice_0.20-35 Biostrings_2.46.0 splines_3.4.2
[67] GenomicFeatures_1.30.0 hms_0.3 derfinderHelper_1.12.0 locfit_1.5-9.1 knitr_1.17 rngtools_1.2.4
[73] reshape2_1.4.2 codetools_0.2-15 biomaRt_2.34.0 XML_3.98-1.9 glue_1.2.0 downloader_0.4
[79] latticeExtra_0.6-28 data.table_1.10.4-3 foreach_1.4.3 gtable_0.2.0 purrr_0.2.4 tidyr_0.7.2
[85] assertthat_0.2.0 ggplot2_2.2.1 xtable_1.8-2 survival_2.41-3 tibble_1.3.4 iterators_1.0.8
[91] GenomicAlignments_1.14.0 AnnotationDbi_1.40.0 registry_0.3 memoise_1.1.0 bindrcpp_0.2 cluster_2.0.6