Doubts about: Salmon -> Tximport -> Deseq2(vst) -> cox-regression
Entering edit mode
Last seen 3.1 years ago


I've performed Cox PH regression analysis using variance stabilized counts (produced with VST from Deseq2) obtained with salmon (at the transcript level) as suggested in this post

Now, I would like to perform the same analysis at the gene level; according to the tximport documentation I should set countsFromAbundance="lengthScaledTPM" or "scaledTPM" (since I'm worried about eventual differeces in the contribution by different isoform to the overall gene expression), and use the resulting counts for downstream analysis on Deseq2. My doubt is: is it recommended/safe to do that considering that subsequently I will do VST and Cox PH regression? I have doubts because from what I've understood it is recommended to use only raw counts for VST.

Also, I was thinking about a custom approach for this analysis, but I'm not sure if it will introduce some kind of bias: I would like to take the CoxPH results from the analysis at the isoform level and select isoforms with increased/reduced hazard ratio. Then I would like to group those isoform taking in account the gene they belong to, say for example we have 6 isoforms that belongs to gene1, Isoform1/2 have a reduced HR, Isoform3/4 have an increased HR and Isoform5/6 have a flat HR (1). I would like to produce a custom tix2gene table in this way:

Isoform1    reduced_gene1
Isoform2    reduced_gene1
Isoform3    increased_gene1
Isoform4    increased_gene1
Isoform5    flat_gene1
Isoform6    flat_gene1

So basically I'm creating "fake" genes to divide the isoforms based on the HR. After that I would like to perform the pipeline that I was explaining before (tximport, VST, CoxPH etc). Apparently I see no reason why this approach should introduce some biases, but of course I'm not an expert of tximport/Deseq2 (nor of CoxPH by the way...) so I have some paranoia.

Thanks a lot in advance!

deseq2 vst cox-regression Salmon Tximport • 570 views
Entering edit mode
Last seen 5 days ago
United States

You do not need to set countsFromAbundance="lengthScaledTPM" or "scaledTPM" in order to do the gene-level aggregation. You can do:

dds <- DESeqDataSetFromTximport(txi, design)
vsd <- vst(dds)

The normalization inside of the vst call will deal with changes to the gene length from differential usage of isoforms.

As for your second question, I have a concern about summarizing using the association of the isoforms with outcome. You are strengthening signal based on outcome, and then testing again association with outcome. You can make toy data where it's pretty clearly a problem:

z <- matrix(rnorm(100*100),ncol=100)
y <- rnorm(100)
r <- cor(z, y)
cor.test(rowMeans(z[,r > 0]), y)

##  Pearson's product-moment correlation
## data:  rowMeans(z[, r > 0]) and y
## t = 6.0627, df = 98, p-value = 2.504e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3630954 0.6518191
## sample estimates:
##       cor
## 0.5222663

Login before adding your answer.

Traffic: 516 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6