Our team has curated a collection of 107 gene-expression datasets from different sources (e.g., GEO, ArrayExpress) that are saved in a publicly accessible repository (Open Science Framework). All these datasets contain expression levels for the same genes across a wide variety of samples related to breast cancer. We also have metadata and gene information and are storing all this information in SummarizedExperiment objects.
We want to create a Bioconductor/ExperimentHub package or packages that make this data available for analysis by others. Would it be best to create one package for each dataset (107 packages total). Or is there a way to create one package that includes all 107 datasets (each in a separate SummarizedExperiment object)?
Many thanks, and sorry if we missed something in the documentation about this!
The scRNAseq package (source on GitHub) might serve as a helpful example of a single package that provides multiple datasets.