Question: DEXSeq analysis with large number of samples
gravatar for nemanja.vucic
16 months ago by
nemanja.vucic10 wrote:


I am wondering if it is even possible to perform differential exon usage analysis with DEXSeq having a large number of samples, in my case almost 600? Samples are unequally divided between two conditions (1:10 ratio) and DEXSeq object contains almost 500K rows (exons) and 1.2K columns (samples x2). After performing the normalization, dispersion estimation step is running on 16CPUs for several days. I tried subsampling DEXSeq object and for 17 features (exons) on 16CPUs analysis lasted 3 minutes meaning that it takes approximately 3min CPU time per exon. Given that, analysis of the full dataset would never finish.

dexseq estimatedispersions • 315 views
ADD COMMENTlink modified 16 months ago by Michael Love26k • written 16 months ago by nemanja.vucic10

Yeah, the GLMs can take very long to fit when the models are large. One option is just to configure a BPPARAM with a cluster configuration and distribute it across many jobs. With such number of samples some of the steps from DEXSeq might not be needed. I would have a look at the diffSplice function from limma, it is designed to address the same question as DEXSeq and does not have a problem dealing with large datasets. 

ADD REPLYlink written 16 months ago by Alejandro Reyes1.7k

I already tried configuring BPPARAM and splitting analysis across many AWS instances but without much success, so I'll try diffSplice from limma. Thanks for a quick response.

ADD REPLYlink written 16 months ago by nemanja.vucic10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 374 users visited in the last hour