DEXSeq: Testing only a subset of exons
0
0
Entering edit mode
i.sudbery ▴ 40
@isudbery-8266
Last seen 10 weeks ago
European Union

In DEXSeq is it possible (advisable) to test only a subset of exons for differential usage?

I am currently working on a project where we are only interested in retained intron usage. Being able to test for only a subset of events would a) increase power b) reduce compute time (currently about 16hrs on 6 cores).  I have built transcript models that include extra exons for the retained introns and counted reads over them. I have named the retained introns so that they can be identified later.

I know that when you build a DEXSeqDataSet, for each exon and library, the counts matrix contains the number of reads in that exon and the number in all other exons. Thus, this suggests that I should be able to subset the DEXSeqDataSet to only contain the retained introns with something like:

dxd <- dxd[vector_of_retained_introns,]

However, I wonder if subsetting like this will mess with the dispersion estimation? I suppose that I could estimate dispersion before subsetting, but I seem to remember that its the dispersion estimation that takes most of the time (thus I'd save the power, but not the time).

Cheers,

Ian

---

dexseq • 1.5k views
ADD COMMENT
0
Entering edit mode

Hi Ian,

It should not be a problem, as long as you have enough features to do the mean-variance fit. Maybe it would be good, as sanity check, to verify if the final dispersion estimates change drastically (although I would not expect so).

Did you try to increase the number of cores to reduce computing times?

Alejandro

ADD REPLY
0
Entering edit mode

Hi Alejandro,

Depends what you mean by change. The dispersions are slightly higher in the retained intron only analysis (best fit about 1.1 fold higher). However, everything that was significant doing the complete analysis is also significant in the subsetted analysis and then some more.
My problem now is that I can't fit the fold changes in the subsetted analysis: the full analysis works fine, but the subsetted one produces:

Error in exp(alleffects) : non-numeric argument to mathematical function

As for the analysis time, running the subsetted analysis takes about 30minutes on 4 cores. I don't think it is the dispersion estimates that take a long time. I've been running the analysis with DEXSeq(),  but all 4 cores are only active for a very short fraction of that, say the first couple of minutes. For the rest, the code only appears to be using one core.

Ian

 

ADD REPLY

Login before adding your answer.

Traffic: 801 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6