Question: DEXSeq analysis with large number of samples
gravatar for nemanja.vucic
8 months ago by
nemanja.vucic10 wrote:


I am wondering if it is even possible to perform differential exon usage analysis with DEXSeq having a large number of samples, in my case almost 600? Samples are unequally divided between two conditions (1:10 ratio) and DEXSeq object contains almost 500K rows (exons) and 1.2K columns (samples x2). After performing the normalization, dispersion estimation step is running on 16CPUs for several days. I tried subsampling DEXSeq object and for 17 features (exons) on 16CPUs analysis lasted 3 minutes meaning that it takes approximately 3min CPU time per exon. Given that, analysis of the full dataset would never finish.

dexseq estimatedispersions • 189 views
ADD COMMENTlink modified 8 months ago by Michael Love22k • written 8 months ago by nemanja.vucic10

Yeah, the GLMs can take very long to fit when the models are large. One option is just to configure a BPPARAM with a cluster configuration and distribute it across many jobs. With such number of samples some of the steps from DEXSeq might not be needed. I would have a look at the diffSplice function from limma, it is designed to address the same question as DEXSeq and does not have a problem dealing with large datasets. 

ADD REPLYlink written 8 months ago by Alejandro Reyes1.6k

I already tried configuring BPPARAM and splitting analysis across many AWS instances but without much success, so I'll try diffSplice from limma. Thanks for a quick response.

ADD REPLYlink written 8 months ago by nemanja.vucic10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 266 users visited in the last hour