Question: DESeq2 performance information for counts tables with very large rows number
0
gravatar for atisou
3.0 years ago by
atisou0
Switzerland
atisou0 wrote:

Hello,

I am testing some different approaches to analyse differential reads counts *per positions*, between 2 conditions with 2 replicates each.  I was thinking of using DESeq2, but modified in a way that the reads counts table would contain positions instead of genes.

Therefore I would end up with a count matrix having, say 1 millions rows (instead of some thousands of rows as in the counts table usually produced by htseq-count).

What would be the performances/CPU times in this case?

Any info for parallelized computation is welcome. I know there is the option to set up parallel = TRUE by setting library(BiocParallel), register(MultiCoreParam(x)).

For instance, would that be still worth it if I would, say, use x=16, or x=32  ?

 

Thanks

 

Hatice

 

deseq2 performance • 671 views
ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by atisou0
Answer: DESeq2 performance information for counts tables with very large rows number
1
gravatar for Michael Love
3.0 years ago by
Michael Love22k
United States
Michael Love22k wrote:

Firstly, you should also check out the derfinder package which already implements per-position differential testing.

http://biostatistics.oxfordjournals.org/content/15/3/413.long

http://www.biorxiv.org/content/early/2015/02/19/015370.abstract

DESeq2 will scale linearly with number of rows. You can expect typical reductions in running time from parallelization (that is, expect some reduction but not scaling with # cores due to parallelization overhead)Note that this is definitely *not* the recommended usage of DESeq2: counts of fragments aligned to unique features. You will have strong correlations across positions within a feature, and this may violate the procedures which share information across genes for dispersion and fold change estimation. Basically, you would be going off course as far as DESeq2 usage and it's up to you to prove that your implementation is reasonable. In addition, you would probably want to try something faster like limma on per-position coverage (but again this is already implemented in derfinder).

ADD COMMENTlink written 3.0 years ago by Michael Love22k
Answer: DESeq2 performance information for counts tables with very large rows number
0
gravatar for atisou
3.0 years ago by
atisou0
Switzerland
atisou0 wrote:

Of course you are right and we are aware of these issues (we are not going to make DESeq2 say more than its purpose :) we are only doing some preliminary methods trials).

"DESeq2 will scale linearly with number of rows." That's what I wanted to know.

Thanks for your lights.

H.

ADD COMMENTlink written 3.0 years ago by atisou0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 214 users visited in the last hour