We were asked recently what is the best practice for posting Salmon quantification files to GEO to allow for computational reproducibility.
We recommend the following:
tar
the entire sample output directory (the directory containingquant.sf
)- upload each sample
tar
file to GEO as the quantification file
This is allowed by GEO, and most importantly, it makes the analysis reproducible, and allows tximeta to identify the reference transcripts. Salmon outputs plain text quantification files called quant.sf
but these individual files don't contain all the information about the processing. The other files in the sample output directory contain metadata with really important information: which transcripts were used for quantification, the parameter settings, the version of Salmon, and even the estimated bias parameters. It would be awkward to stuff all this information into the header of the quant.sf
file, and there's really no downside to uploading the sample tar
file as the sample quantification file to GEO.
Feel free to post questions/comments here.