Question: tximeta remote versus local reference files
0
4 months ago by
igor20
United States
igor20 wrote:

BioC 3.8 was just released and there is a great new package tximeta (at least if you work with Salmon). I am still trying to properly understand how it works. The vignette contains the following section:

However, to avoid downloading remote GTF files during this vignette, we will point to a GTF file saved locally (in the tximportData package). We link the transcriptome of the Salmon index to its locally saved GTF. The standard recommended usage of tximeta would be the code chunk above, or to specify a remote GTF source, not a local one. This following code is therefore not recommended for a typically workflow, but is particular to the vignette code.

Why is it not recommended to point to the locally saved GTF? I assume this is to allow for more reproducible analysis, but the primary input for the tool are Salmon files that were already run with a locally saved index and the same index is passed to makeLinkedTxome() as well. If certain reference files are already local anyway, why not just keep them all local?

tximeta • 129 views
modified 4 months ago by Michael Love22k • written 4 months ago by igor20
Answer: tximeta remote versus local reference files
2
4 months ago by
Michael Love22k
United States
Michael Love22k wrote:

hi Igor,

The first tximeta question!

If the GTFs are downloaded programmatically (by tximeta) and saved in a local cache (maintained by BiocFileCache) there is absolutely no chance of error. Whereas it is possible that a user could accidentally pick the wrong GTF in a directory, and so that would introduce a point of error. If I look in my "annotation" directory on my cluster, I have dozens of GTF files and some of them look pretty similar with small differences in the file ending (either version number or cDNA vs ncRNA vs all). So as long as it's a txome with a match in my hash table (which will be expanding as we develop the software and hook up to external resources), then I can guarantee the correct metadata gets attached. Also the GTF files are relatively small (typically <100 Mb) and so download in a few seconds on a good connection.

But we provide the linkedTxome() mechanism as a way for users to connect txomes that aren't found in the hash table for whatever reason.

Does that help explain the motivation?