[recount3] Merge gene counts tables from multiple projects (SRP)
1
0
Entering edit mode
@antonio-miguel-de-jesus-domingues-5182
Last seen 8 weeks ago
Germany

My question is somwewaht related to Downloading count matrix for full recount3 dataset but I am not interested in a single study, but rather in analysing multiple datasets together. For example:

projects <- c("SRP009615", "SRP013565")
proj_info <- subset(
    human_projects,
    project %in% c(projects) & project_type == "data_sources"
)
proj_info
       project organism file_source     project_home project_type n_samples
33   SRP013565    human         sra data_sources/sra data_sources      2212
1838 SRP009615    human         sra data_sources/sra data_sources        12

## Create a RangedSummarizedExperiment (RSE) object at the gene level
rse_gene <- create_rse(proj_info, type = "gene")

sessionInfo( )

leads to an error:

Error in create_rse(proj_info, type = "gene") : 
  'project_info' should only have one row

I could do a loop or apply to get the counts tables and metadata for each project, and then merge it all in a new summarizedExperiment object, but I wanted to ask if there is already a way of doing this with the recount3 API.

Cheers.

recount3 recountWorkflow • 1.1k views
ADD COMMENT
0
Entering edit mode
> devtools::session_info()                                    
─ Session info ───────────────────────────────────────────────────────────────
 setting  value                                               
 version  R version 4.1.1 (2021-08-10)                        
 os       Ubuntu 20.04.2 LTS                                  
 system   x86_64, linux-gnu                                   
 ui       X11                                                 
 language (EN)                                                
 collate  C.UTF-8                                             
 ctype    C.UTF-8                                             
 tz       Etc/UTC                                             
 date     2021-10-05                                          

─ Packages ───────────────────────────────────────────────────────────────────
 package              * version  date       lib source        
 assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.1.0)
 Biobase              * 2.52.0   2021-05-19 [1] Bioconductor  
 BiocFileCache          2.0.0    2021-05-19 [1] Bioconductor  
 BiocGenerics         * 0.38.0   2021-05-19 [1] Bioconductor  
 BiocIO                 1.2.0    2021-05-19 [1] Bioconductor  
 BiocParallel           1.26.0   2021-05-19 [1] Bioconductor  
 Biostrings             2.60.0   2021-05-19 [1] Bioconductor  
 bit                    4.0.4    2020-08-04 [1] CRAN (R 4.1.0)
 bit64                  4.0.5    2020-08-30 [1] CRAN (R 4.1.0)
 bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.1.0)
 blob                   1.2.2    2021-07-23 [1] CRAN (R 4.1.0)

 cachem                 1.0.6    2021-08-19 [1] CRAN (R 4.1.1)
 callr                  3.7.0    2021-04-20 [1] CRAN (R 4.1.0)
 cli                    3.0.1    2021-07-17 [1] CRAN (R 4.1.0)
 crayon                 1.4.1    2021-02-08 [1] CRAN (R 4.1.0)
 curl                   4.3.2    2021-06-23 [1] CRAN (R 4.1.0)
 data.table           * 1.14.0   2021-02-21 [1] CRAN (R 4.1.0)
 DBI                    1.1.1    2021-01-15 [1] CRAN (R 4.1.0)
 dbplyr                 2.1.1    2021-04-06 [1] CRAN (R 4.1.0)
 DelayedArray           0.18.0   2021-05-19 [1] Bioconductor  
 desc                   1.3.0    2021-03-05 [1] CRAN (R 4.1.0)
 devtools               2.4.2    2021-06-07 [1] CRAN (R 4.1.0)
 dplyr                  1.0.7    2021-06-18 [1] CRAN (R 4.1.0)
 ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.1.0)
 fansi                  0.4.2    2021-01-15 [1] CRAN (R 4.1.0)
 fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.1.0)
 filelock               1.0.2    2018-10-05 [1] CRAN (R 4.1.0)
 fs                     1.5.0    2020-07-31 [1] CRAN (R 4.1.0)
 generics               0.1.0    2020-10-31 [1] CRAN (R 4.1.0)
 GenomeInfoDb         * 1.28.0   2021-05-19 [1] Bioconductor  
 GenomeInfoDbData       1.2.6    2021-09-27 [1] Bioconductor  
 GenomicAlignments      1.28.0   2021-05-19 [1] Bioconductor  
 GenomicRanges        * 1.44.0   2021-05-19 [1] Bioconductor  
 glue                   1.4.2    2020-08-27 [1] CRAN (R 4.1.0)
 httr                   1.4.2    2020-07-20 [1] CRAN (R 4.1.0)
 IRanges              * 2.26.0   2021-05-19 [1] Bioconductor  
 lattice                0.20-45  2021-09-22 [1] CRAN (R 4.1.1)
 lifecycle              1.0.1    2021-09-24 [1] CRAN (R 4.1.1)
 magrittr             * 2.0.1    2020-11-17 [1] CRAN (R 4.1.0)
 Matrix                 1.3-4    2021-06-01 [1] CRAN (R 4.1.0)
 MatrixGenerics       * 1.4.0    2021-05-19 [1] Bioconductor  
 matrixStats          * 0.61.0   2021-09-17 [1] CRAN (R 4.1.1)
 memoise                2.0.0    2021-01-26 [1] CRAN (R 4.1.0)
 pillar                 1.6.3    2021-09-26 [1] CRAN (R 4.1.1)
 pkgbuild               1.2.0    2020-12-15 [1] CRAN (R 4.1.0)
 pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.1.0)
 pkgload                1.2.2    2021-09-11 [1] CRAN (R 4.1.1)
 prettyunits            1.1.1    2020-01-24 [1] CRAN (R 4.1.0)
 processx               3.5.2    2021-04-30 [1] CRAN (R 4.1.0)
 ps                     1.6.0    2021-02-28 [1] CRAN (R 4.1.0)
 purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.1.0)
 R.methodsS3            1.8.1    2020-08-26 [1] CRAN (R 4.1.0)
 R.oo                   1.24.0   2020-08-26 [1] CRAN (R 4.1.0)
 R.utils                2.11.0   2021-09-26 [1] CRAN (R 4.1.1)
 R6                     2.5.1    2021-08-19 [1] CRAN (R 4.1.1)
 rappdirs               0.3.3    2021-01-31 [1] CRAN (R 4.1.0)
 Rcpp                   1.0.7    2021-07-07 [1] CRAN (R 4.1.0)
 RCurl                  1.98-1.5 2021-09-17 [1] CRAN (R 4.1.1)
 recount3             * 1.2.1    2021-05-25 [1] Bioconductor  
 remotes                2.4.0    2021-06-02 [1] CRAN (R 4.1.0)
 restfulr               0.0.13   2017-08-06 [1] CRAN (R 4.1.0)
 rjson                  0.2.20   2018-06-08 [1] CRAN (R 4.1.0)
 rlang                  0.4.11   2021-04-30 [1] CRAN (R 4.1.0)
 rprojroot              2.0.2    2020-11-15 [1] CRAN (R 4.1.0)
 Rsamtools              2.8.0    2021-05-19 [1] Bioconductor  
 RSQLite                2.2.5    2021-03-27 [1] CRAN (R 4.1.0)
 rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.1.0)
 rtracklayer            1.52.0   2021-05-19 [1] Bioconductor  
 S4Vectors            * 0.30.0   2021-05-19 [1] Bioconductor  
 sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.1.0)
 SummarizedExperiment * 1.22.0   2021-05-19 [1] Bioconductor  
 testthat               3.0.4    2021-07-01 [1] CRAN (R 4.1.0)
 tibble                 3.1.4    2021-08-25 [1] CRAN (R 4.1.1)
 tidyselect             1.1.1    2021-04-30 [1] CRAN (R 4.1.0)
 usethis                2.0.1    2021-02-10 [1] CRAN (R 4.1.0)
 utf8                   1.2.2    2021-07-24 [1] CRAN (R 4.1.0)
 vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.1.0)
 withr                  2.4.2    2021-04-18 [1] CRAN (R 4.1.0)
 XML                    3.99-0.8 2021-09-17 [1] CRAN (R 4.1.1)
 XVector                0.32.0   2021-05-19 [1] Bioconductor  
 yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.1.0)
 zlibbioc               1.38.0   2021-05-19 [1] Bioconductor  

[1] /mnt/projects/genomics/envs/dge_R4.0/lib/R/library
2
Entering edit mode
@lcolladotor
Last seen 6 days ago
United States

Hi,

Unless there's already a "collection" in recount3 involving data from multiple studies, we don't have any specific tools for merging across studies in recount3. You'll have to use cbind() which is provided by SummarizedExperiment for this type of RangedSummarizedExperiment objects. The rowRanges() should be the same (assuming the same annotation), so then it comes down to the colData() having the same colnames(colData()). This will be the case for any SRA studies, but if you are merging across SRA, GTEx, and/or TCGA, then you'll need to decide how to merge (or drop) the colData() columns.

Best, Leo

ADD COMMENT

Login before adding your answer.

Traffic: 996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6