Large example data with Bioconductor package
1
0
Entering edit mode
@mgierlinski-7369
Last seen 12 weeks ago
United Kingdom

I'm writing an R package I intend to submit to Bioconductor. The aim of the package is to analyse high throughput proteomics data, starting from the MaxQuant evidence file. I feel it is necessary to show step-by-step examples of such analysis in a vignette. However, this would require attaching large files, tens of MB each to the package.

What is the best practice in such cases?

I could reduce the size by selecting a small sample of random rows from the actual evidence data, but this would reduce numbers of peptides and proteins and the statistical power drastically, making the example unrealistic.

Alternatively, I could create a separate package containing only example data. The vignette of the original package would request the data package and demonstrate how to process real data.

Any recommendations?

package • 1.2k views
ADD COMMENT
1
Entering edit mode
shepherl 3.8k
@lshep
Last seen 26 minutes ago
United States

There are a couple of approaches you could take. In order of recommendation:

  1. See if there is any existing dataset already available in Bioconductor to use. There are a number of Experimental Data packages available for use that might have appropraite data. You can search the packages here

  2. Create an ExperimentHub data package that stores the large files in AWS S3 for use. Documentation for creating such a package can be found here

  3. We would still recommend reducing the size by selecting smaller subsets as a proof of principle example with real data is still helpful even if unrealistic. The third approach is as you have suggested having a seperate associated data package that provides the data. The software package would still have to meet the size requirment of 4MB but the associated data package can be larger (whether full data or subset).

ADD COMMENT

Login before adding your answer.

Traffic: 907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6