Getting pheno tables from recount datasets without any NA values in characteristics and geo_accession fields and also without duplicated row.names
Entering edit mode
Last seen 16 months ago

Dear sir’s Bioconductor developers,

I have intrinsic question about the recount repository “datasets” I work to make advanced statistics analysis for the most of the recount dataset,

we noticed that the most of pheno tables in recount have the NA values for the characteristics and geo_accession fields!!!

Could you please anyone help me how I could getting to the pheno tables for all the projects in the recount without any NA values in a characteristics and geo_accession fields moreover that I faced also critical obstacle with duplicated “row.names” , could any one directive me how I can overcome to that essentially dogma, please.

Thank so much for any one will suggest or give me any practical guide  .


recount summarizedexperiment • 532 views
Entering edit mode
Last seen 12 days ago
United States

Hi Mustafa,

Nearly 13k samples from the SRA ones don't have any characteristics or GEO accession numbers as shown with the code below. There's nothing we can really do about it. Sometimes updates in SRAdb include new GEO accession numbers. The issue with sample metadata being incomplete is a problem that Shannon Ellis and others have tried to address in different ways. Check, SHARQ beta and elsewhere.


Regarding the row.names issue, if you have some reproducible code then I bet other people could help you out. And if you could highlight what step is actually failing that'd be great too. In any case, if you are combining rows, you could set the row names to be unique before combining them. 





> library(recount)
> m <- all_metadata()
> table(sum($characteristics)) == 1)
37278 12821 
> table($geo_accession))
37395 12704 

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] recount_1.4.0              SummarizedExperiment_1.8.0 DelayedArray_0.4.0         matrixStats_0.52.2         Biobase_2.38.0            
 [6] GenomicRanges_1.30.0       GenomeInfoDb_1.14.0        IRanges_2.12.0             S4Vectors_0.16.0           BiocGenerics_0.24.0       

loaded via a namespace (and not attached):
 [1] bitops_1.0-6             bit64_0.9-7              RColorBrewer_1.1-2       progress_1.1.2           httr_1.3.1               GenomicFiles_1.14.0     
 [7] tools_3.4.2              backports_1.1.1          doRNG_1.6.6              R6_2.2.2                 rpart_4.1-11             Hmisc_4.0-3             
[13] DBI_0.7                  lazyeval_0.2.1           colorspace_1.3-2         nnet_7.3-12              gridExtra_2.3            prettyunits_1.0.2       
[19] RMySQL_0.10.13           bit_1.1-12               compiler_3.4.2           htmlTable_1.9            derfinder_1.12.0         xml2_1.1.1              
[25] pkgmaker_0.22            rtracklayer_1.38.0       scales_0.5.0             checkmate_1.8.5          readr_1.1.1              stringr_1.2.0           
[31] digest_0.6.12            Rsamtools_1.30.0         foreign_0.8-69           rentrez_1.1.0            GEOquery_2.46.1          XVector_0.18.0          
[37] base64enc_0.1-3          pkgconfig_2.0.1          htmltools_0.3.6          BSgenome_1.46.0          htmlwidgets_0.9          rlang_0.1.2             
[43] RSQLite_2.0              bindr_0.1                jsonlite_1.5             BiocParallel_1.12.0      acepack_1.4.1            dplyr_0.7.4             
[49] VariantAnnotation_1.24.0 RCurl_1.95-4.8           magrittr_1.5             GenomeInfoDbData_0.99.1  Formula_1.2-2            Matrix_1.2-11           
[55] Rcpp_0.12.13             munsell_0.4.3            stringi_1.1.5            zlibbioc_1.24.0          qvalue_2.10.0            plyr_1.8.4              
[61] bumphunter_1.20.0        grid_3.4.2               blob_1.1.0               lattice_0.20-35          Biostrings_2.46.0        splines_3.4.2           
[67] GenomicFeatures_1.30.0   hms_0.3                  derfinderHelper_1.12.0   locfit_1.5-9.1           knitr_1.17               rngtools_1.2.4          
[73] reshape2_1.4.2           codetools_0.2-15         biomaRt_2.34.0           XML_3.98-1.9             glue_1.2.0               downloader_0.4          
[79] latticeExtra_0.6-28      data.table_1.10.4-3      foreach_1.4.3            gtable_0.2.0             purrr_0.2.4              tidyr_0.7.2             
[85] assertthat_0.2.0         ggplot2_2.2.1            xtable_1.8-2             survival_2.41-3          tibble_1.3.4             iterators_1.0.8         
[91] GenomicAlignments_1.14.0 AnnotationDbi_1.40.0     registry_0.3             memoise_1.1.0            bindrcpp_0.2             cluster_2.0.6  

Login before adding your answer.

Traffic: 479 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6