Hi!
I've performed Cox PH regression analysis using variance stabilized counts (produced with VST from Deseq2) obtained with salmon (at the transcript level) as suggested in this post https://support.bioconductor.org/p/103464/.
Now, I would like to perform the same analysis at the gene level; according to the tximport documentation I should set countsFromAbundance="lengthScaledTPM" or "scaledTPM" (since I'm worried about eventual differeces in the contribution by different isoform to the overall gene expression), and use the resulting counts for downstream analysis on Deseq2. My doubt is: is it recommended/safe to do that considering that subsequently I will do VST and Cox PH regression? I have doubts because from what I've understood it is recommended to use only raw counts for VST.
Also, I was thinking about a custom approach for this analysis, but I'm not sure if it will introduce some kind of bias: I would like to take the CoxPH results from the analysis at the isoform level and select isoforms with increased/reduced hazard ratio. Then I would like to group those isoform taking in account the gene they belong to, say for example we have 6 isoforms that belongs to gene1, Isoform1/2 have a reduced HR, Isoform3/4 have an increased HR and Isoform5/6 have a flat HR (1). I would like to produce a custom tix2gene table in this way:
Isoform1 reduced_gene1
Isoform2 reduced_gene1
Isoform3 increased_gene1
Isoform4 increased_gene1
Isoform5 flat_gene1
Isoform6 flat_gene1
So basically I'm creating "fake" genes to divide the isoforms based on the HR. After that I would like to perform the pipeline that I was explaining before (tximport, VST, CoxPH etc). Apparently I see no reason why this approach should introduce some biases, but of course I'm not an expert of tximport/Deseq2 (nor of CoxPH by the way...) so I have some paranoia.
Thanks a lot in advance!