I have completed multiple RNAseq experiments using a targeted approach (custom Illumina panel looking at only genes of interest). I followed each run by salmon, then tximport. I used a targeted tx2gene file containing only the genes of interest.
txi <- tximport(files, type = "salmon", tx2gene = t2gene, countsFromAbundance = "lengthScaledTPM")
I opted for "lengthScaledTPM" in order to address the transcript length and library size. My first run only included 12 samples and my last included 48 which further encouraged me to use the "lengthScaledTPM" argument. Note that I continue to receive samples for this study (the second phase will begin soon) and incoming samples will come in different batches and I will be running samples as I accumulate n=12 (if many come at once I will be able to run as many as 48).
Some on the forum have stated they avoid using the transcript length correction because they do not feel confident about the transcript-specific values (I believe they mean the assignment of a particular read to one transcript for a gene versus another for that same gene). It is possible I misunderstood the comment.
So my main question is:
- is it appropriate to select the lengthscaledtpm option for the targeted custom panel sequencing approach and then combine the txi$counts for downstream comparison of the samples from different runs? I care about within-sample gene differences and between-sample gene differences.
My other question is regarding the comment posted on the forum regarding the lack of confidence of transcript level values. If geneA has 5 transcripts and sample1 gives me a zero for transcript3 and sample3 gives me a zero for transcript4 - how sure should I be about that based on the quasi nature of salmon? Would this potential lack of certainty be the reason to use scaledTPM and only address library size instead of using lengthscaledTPM?
Thank you in advance for any help on this :)