Sharing a collection of datasets in ExperimentHub
1
0
Entering edit mode
@stephen-piccolo-6761
Last seen 4 days ago
United States

Our team has curated a collection of 107 gene-expression datasets from different sources (e.g., GEO, ArrayExpress) that are saved in a publicly accessible repository (Open Science Framework). All these datasets contain expression levels for the same genes across a wide variety of samples related to breast cancer. We also have metadata and gene information and are storing all this information in SummarizedExperiment objects.

We want to create a Bioconductor/ExperimentHub package or packages that make this data available for analysis by others. Would it be best to create one package for each dataset (107 packages total). Or is there a way to create one package that includes all 107 datasets (each in a separate SummarizedExperiment object)?

Many thanks, and sorry if we missed something in the documentation about this!

Bioconductor • 115 views
ADD COMMENT
0
Entering edit mode
shepherl 4.2k
@lshep
Last seen 20 hours ago
United States

107 packages would be a lot! It is certainly possible (and recommended) to implement one package that utilizes ExperimentHub for controlling the download of the datasets. Each summarized experiment object would be a different entry in the ExperimentHub (by a unique entry in a metadata.csv file to add the resource); and you could specialize filtering and querying of ExperimentHub for a particular resource of interest within the one package. Feel free to ask further questions here or at hubs @ bioconductor.org

ADD COMMENT
0
Entering edit mode

The scRNAseq package (source on GitHub) might serve as a helpful example of a single package that provides multiple datasets.

ADD REPLY

Login before adding your answer.

Traffic: 1020 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6