Question: Large example data with Bioconductor package
gravatar for M.Gierlinski
24 months ago by
United Kingdom
M.Gierlinski0 wrote:

I'm writing an R package I intend to submit to Bioconductor. The aim of the package is to analyse high throughput proteomics data, starting from the MaxQuant evidence file. I feel it is necessary to show step-by-step examples of such analysis in a vignette. However, this would require attaching large files, tens of MB each to the package.

What is the best practice in such cases?

I could reduce the size by selecting a small sample of random rows from the actual evidence data, but this would reduce numbers of peptides and proteins and the statistical power drastically, making the example unrealistic.

Alternatively, I could create a separate package containing only example data. The vignette of the original package would request the data package and demonstrate how to process real data.

Any recommendations?

package • 352 views
ADD COMMENTlink modified 24 months ago by shepherl ♦♦ 1.6k • written 24 months ago by M.Gierlinski0
Answer: Large example data with Bioconductor package
gravatar for shepherl
24 months ago by
shepherl ♦♦ 1.6k
United States
shepherl ♦♦ 1.6k wrote:

There are a couple of approaches you could take. In order of recommendation:

  1. See if there is any existing dataset already available in Bioconductor to use. There are a number of Experimental Data packages available for use that might have appropraite data. You can search the packages here

  2. Create an ExperimentHub data package that stores the large files in AWS S3 for use. Documentation for creating such a package can be found here

  3. We would still recommend reducing the size by selecting smaller subsets as a proof of principle example with real data is still helpful even if unrealistic. The third approach is as you have suggested having a seperate associated data package that provides the data. The software package would still have to meet the size requirment of 4MB but the associated data package can be larger (whether full data or subset).

ADD COMMENTlink written 24 months ago by shepherl ♦♦ 1.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 339 users visited in the last hour