Our team has curated a collection of 107 gene-expression datasets from different sources (e.g., GEO, ArrayExpress) that are saved in a publicly accessible repository (Open Science Framework). All these datasets contain expression levels for the same genes across a wide variety of samples related to breast cancer. We also have metadata and gene information and are storing all this information in SummarizedExperiment objects.
We want to create a Bioconductor/ExperimentHub package or packages that make this data available for analysis by others. Would it be best to create one package for each dataset (107 packages total). Or is there a way to create one package that includes all 107 datasets (each in a separate SummarizedExperiment object)?
Many thanks, and sorry if we missed something in the documentation about this!
The scRNAseq package (source on GitHub) might serve as a helpful example of a single package that provides multiple datasets.
FYI the scRNAseq packages uses a non-ExperimentHub storage platform, see here for more details.
The idea was to provide some features that ExperimentHub might eventually adopt, namely the ability for each package maintainer to perform uploads without having to ask Lori for new keys or approval. Maintainers could also give probationary access to other users who want to contribute to their package, e.g., if someone else wanted to upload a single-cell dataset, I (with my scRNAseq maintainer hat on) could give them one-time access to push a new object, without involvement from the platform administrator (well, this is also me, but with a different hat on). It might also be cheaper - I haven't seen ExperimentHub's numbers, but my platform runs at $0/month right now, as I manage to stay in the free tier for storage and requests. Obviously this would go up with more datasets but someone (maybe me again) could probably tolerate paying a few dollars a month for several terabytes of storage.
Anyway, ExperimentHub passed on my stuff but I think it's pretty cost effective and low maintenance given the traffic. I just looked at it for the first time in a year and it's racked up an easy 6k requests in the past 24 hours. Cloudflare Workers are truly fantastic.