Search
Question: DEXSeq analysis with large number of samples
1
gravatar for nemanja.vucic
3 months ago by
nemanja.vucic10 wrote:

Hi, 

I am wondering if it is even possible to perform differential exon usage analysis with DEXSeq having a large number of samples, in my case almost 600? Samples are unequally divided between two conditions (1:10 ratio) and DEXSeq object contains almost 500K rows (exons) and 1.2K columns (samples x2). After performing the normalization, dispersion estimation step is running on 16CPUs for several days. I tried subsampling DEXSeq object and for 17 features (exons) on 16CPUs analysis lasted 3 minutes meaning that it takes approximately 3min CPU time per exon. Given that, analysis of the full dataset would never finish.

ADD COMMENTlink modified 3 months ago by Michael Love19k • written 3 months ago by nemanja.vucic10
2

Yeah, the GLMs can take very long to fit when the models are large. One option is just to configure a BPPARAM with a cluster configuration and distribute it across many jobs. With such number of samples some of the steps from DEXSeq might not be needed. I would have a look at the diffSplice function from limma, it is designed to address the same question as DEXSeq and does not have a problem dealing with large datasets. 

ADD REPLYlink written 3 months ago by Alejandro Reyes1.6k

I already tried configuring BPPARAM and splitting analysis across many AWS instances but without much success, so I'll try diffSplice from limma. Thanks for a quick response.

ADD REPLYlink written 3 months ago by nemanja.vucic10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 456 users visited in the last hour