We were asked recently what is the best practice for posting Salmon quantification files to GEO to allow for computational reproducibility.
We recommend the following:
tarthe entire sample output directory (the directory containing
- upload each sample
tarfile to GEO as the quantification file
This is allowed by GEO, and most importantly, it makes the analysis reproducible, and allows tximeta to identify the reference transcripts. Salmon outputs plain text quantification files called
quant.sf but these individual files don't contain all the information about the processing. The other files in the sample output directory contain metadata with really important information: which transcripts were used for quantification, the parameter settings, the version of Salmon, and even the estimated bias parameters. It would be awkward to stuff all this information into the header of the
quant.sf file, and there's really no downside to uploading the sample
tar file as the sample quantification file to GEO.
Feel free to post questions/comments here.