21 days ago by
Thanks for posting. I think the sample-specific biases shown in the paper could be addressed with tximport in its effective length offset, if the upstream method can capture the bias with one of the sample-specific terms it estimates.
I'm familiar with Salmon which has a fragment length distribution (FLD) term by default and an optional position bias term that can be estimated per sample (
--posBias). The positional bias model is flexible across short and long transcripts by binning transcripts by their length as was suggested by Roberts (2011). I believe that these two terms should capture the effects seen in the downstream gene counts and gene lengths in this paper. I believe RSEM also has an optional sample-specific positional bias term. Most methods have a sample-specific FLD term.
You could try it out, and then run CQN or EDASeq on the estimated counts you get with tximport and
countsFromAbundance="lengthScaledTPM" to see if the biases are effectively removed.
If you see a residual bias, you can always use the offset from CQN or EDASeq as well. I suppose if you're trying for both methods to eliminate the bias you should provide the lengthScaledTPM to the CQN / EDASeq methods, so they do not over-adjust biases which are already corrected by the effective length correction that tximport calculates.