Importing RSEM transcript-level data using tximport
1
1
Entering edit mode
@patrick-kimes-6796
Last seen 20 months ago
Boston, MA, USA

I'd like to read in transcript-level results from RSEM using the tximport package, but this does not appear to be supported.

The tximport vignette only describes importing sample.genes.results (gene-level) data, and the tximport::tximport function is hard-coded to set txIn = FALSE when type = "rsem" (see here). Also, in the same block of code, abundance results are read from the FPKM column of the RSEM output and not the TPM column. This appears to be inconsistent with the TPMs read in for Salmon and Kallisto.

Any thoughts on why these decisions were made? Changes to 1 allow transcript-level RSEM result, and 2 use TPMs instead of FPKMs, seem fairly quick.

tximport rsem • 1.9k views
2
Entering edit mode
@mikelove
Last seen 5 hours ago
United States

hi Patrick,

Initially, tximport was written for helping users summarize transcript-level measurements to gene-level, calculate the appropriate gene-level offset for average effective transcript length, and provide a uniform way to do this and arrange/name the matrices so that downstream packages could be run in a particular way (with benchmarking and Methods write up behind it), rather than in ad hoc manner (e.g. ignoring the bias corrected effective lengths and just using the counts). As RSEM does it's own summarization to gene-level (nearly the same as we do, minor differences for when a subset of samples have TPM=0 for the gene), I didn't code up defaults for the import of transcript-level measurements (type="RSEM" and txOut=TRUE), although now a few people have asked for this so I think I should when I find the time. First I need to put some example data in tximportData, so I could have some examples / tests for this.

Note: you can always import any kind of tables by manually specifying the arguments: geneIdCol, txIdCol, abundanceCol, countsCol, lengthCol.

Re: FPKM vs TPM, I don't know why my original code used the FPKM column. I agree it makes more sense to use TPM.

For both of these, I'll put it on my todo list to make the changes in devel, but first I'll need to generate the isoform level output files in tximportData so I can test against them. In the meantime you can use those arguments listed above.

0
Entering edit mode

I should have read the docs closer - thanks for pointing me to the manual options! (Also, sorry - probably shouldn't have assumed that these changes would be "fairly quick". Thanks for the hard work.)

0
Entering edit mode

I just pushed new quantifications for RSEM, Salmon and kallisto to tximportData, so I'll have something to test on when I add transcript-level import for RSEM.

0
Entering edit mode

Added transcript-level import for RSEM in version 1.7.3:

https://github.com/mikelove/tximport/commit/f80fcaac7411ae590688c237088072a313772668