Question: Doubts about: Salmon -> Tximport -> Deseq2(vst) -> cox-regression
gravatar for filippo.martignano
7 months ago by
filippo.martignano0 wrote:


I've performed Cox PH regression analysis using variance stabilized counts (produced with VST from Deseq2) obtained with salmon (at the transcript level) as suggested in this post

Now, I would like to perform the same analysis at the gene level; according to the tximport documentation I should set countsFromAbundance="lengthScaledTPM" or "scaledTPM" (since I'm worried about eventual differeces in the contribution by different isoform to the overall gene expression), and use the resulting counts for downstream analysis on Deseq2. My doubt is: is it recommended/safe to do that considering that subsequently I will do VST and Cox PH regression? I have doubts because from what I've understood it is recommended to use only raw counts for VST.

Also, I was thinking about a custom approach for this analysis, but I'm not sure if it will introduce some kind of bias: I would like to take the CoxPH results from the analysis at the isoform level and select isoforms with increased/reduced hazard ratio. Then I would like to group those isoform taking in account the gene they belong to, say for example we have 6 isoforms that belongs to gene1, Isoform1/2 have a reduced HR, Isoform3/4 have an increased HR and Isoform5/6 have a flat HR (1). I would like to produce a custom tix2gene table in this way:

Isoform1    reduced_gene1
Isoform2    reduced_gene1
Isoform3    increased_gene1
Isoform4    increased_gene1
Isoform5    flat_gene1
Isoform6    flat_gene1

So basically I'm creating "fake" genes to divide the isoforms based on the HR. After that I would like to perform the pipeline that I was explaining before (tximport, VST, CoxPH etc). Apparently I see no reason why this approach should introduce some biases, but of course I'm not an expert of tximport/Deseq2 (nor of CoxPH by the way...) so I have some paranoia.

Thanks a lot in advance!

ADD COMMENTlink modified 6 months ago by Michael Love25k • written 7 months ago by filippo.martignano0
Answer: Doubts about: Salmon -> Tximport -> Deseq2(vst) -> cox-regression
gravatar for Michael Love
6 months ago by
Michael Love25k
United States
Michael Love25k wrote:

You do not need to set countsFromAbundance="lengthScaledTPM" or "scaledTPM" in order to do the gene-level aggregation. You can do:

dds <- DESeqDataSetFromTximport(txi, design)
vsd <- vst(dds)

The normalization inside of the vst call will deal with changes to the gene length from differential usage of isoforms.

As for your second question, I have a concern about summarizing using the association of the isoforms with outcome. You are strengthening signal based on outcome, and then testing again association with outcome. You can make toy data where it's pretty clearly a problem:

z <- matrix(rnorm(100*100),ncol=100)
y <- rnorm(100)
r <- cor(z, y)
cor.test(rowMeans(z[,r > 0]), y)

##  Pearson's product-moment correlation
## data:  rowMeans(z[, r > 0]) and y
## t = 6.0627, df = 98, p-value = 2.504e-08
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3630954 0.6518191
## sample estimates:
##       cor
## 0.5222663
ADD COMMENTlink written 6 months ago by Michael Love25k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 330 users visited in the last hour