Search
Question: Running DESeq with 1000 samples
0
gravatar for Guest User
3.4 years ago by
Guest User12k
Guest User12k wrote:
Hi, I'm using DESeq to find the differential expressed genes between 2 populations. The RNA-seq data set has a total sample size of around 1000. However, even I set the memory limit of R to 6 Gb, it still reports the error that it cannot allocate vector of certain size. I wonder if it's possible to use DESeq on this huge data set and how much memory should be enough. Thank you! -- output of sessionInfo(): NA -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENTlink modified 2.5 years ago by Guest1950 • written 3.4 years ago by Guest User12k
2
gravatar for Simon Anders
3.4 years ago by
Simon Anders3.4k
Zentrum für Molekularbiologie, Universität Heidelberg
Simon Anders3.4k wrote:
Hi On 09/07/14 20:58, Maoqi Xu [guest] wrote: > I'm using DESeq to find the differential expressed genes between 2 > populations. The RNA-seq data set has a total sample size of around > 1000. However, even I set the memory limit of R to 6 Gb, it still > reports the error that it cannot allocate vector of certain size. I > wonder if it's possible to use DESeq on this huge data set and how > much memory should be enough. You really have one thousand RNA-Seq libraries? This is impressive. First: As Steve already pointed out, please consider using DESeq2. On the other hand: The main point of tools like DESeq2 or edgeR is to use information sharing, such as Bayesian shrinkage, to get decent power even if the sample size is only modest. With so much data, you can keep things very simple, especially if you really just have a standard two-group comparison with no other covariates. I would use DESeq2 only to normalize the data and then do a Wilcoxon rank-sum test on the normalized counts, for each gene separately, or, even better, use a permutation test. Simon
ADD COMMENTlink written 3.4 years ago by Simon Anders3.4k
0
gravatar for Steve Lianoglou
3.4 years ago by
Genentech
Steve Lianoglou12k wrote:
Hi, On Wed, Jul 9, 2014 at 11:58 AM, Maoqi Xu [guest] <guest at="" bioconductor.org=""> wrote: > Hi, > I'm using DESeq to find the differential expressed genes between 2 populations. The RNA-seq data set has a total sample size of around 1000. However, even I set the memory limit of R to 6 Gb, it still reports the error that it cannot allocate vector of certain size. I wonder if it's possible to use DESeq on this huge data set and how much memory should be enough. First: if you're just starting your project, you should prefer to use DESeq2 Second: you'll need some serious horsepower -- someone will likely swoop in with a precise calculation, but I wouldn't expect this to work on a machine w/ 8gb of RAM -- maybe 16gb would be enough, but if you're routinely working on data at this scale I hope you've got a big iron machine with ~ 64gb or more ram. One option would be to do the "hard bits" on Amazon's cloud using bioconductor's latest and greatest AMI: http://www.bioconductor.org/help/bioconductor-cloud-ami/ HTH, -steve -- Steve Lianoglou Computational Biologist Genentech
ADD COMMENTlink written 3.4 years ago by Steve Lianoglou12k
0
gravatar for Gordon Smyth
3.4 years ago by
Gordon Smyth32k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth32k wrote:

Dear Maoqi Xu,

You could use limma-voom instead, which will handle 1000 samples in a few seconds without the need for extra memory.

See:

  http://genomebiology.com/2014/15/2/R29

If you particularly wanted to stick to an exact negative binomial analysis, then you could consider edgeR which uses considerably less memory than DESeq for large datasets, but for so many samples voom would seem the way to go.

Best wishes
Gordon

ADD COMMENTlink modified 2.5 years ago • written 3.4 years ago by Gordon Smyth32k
0
gravatar for Guest195
2.5 years ago by
Guest1950
France
Guest1950 wrote:

Sorry to re-open the conversation,
I am new in RNA-seq and I wonder with which sample size it starts to be reasonable to perform classical non-parametric test instead of ad-hoc RNAseq method ?

Thank you !

 

ADD COMMENTlink written 2.5 years ago by Guest1950

There is no sample size that would make me want to use a Wilcoxon test or genewise permutation test to test for differential expression with RNA-seq data. We use voom-limma for large RNA-seq datasets.

There are many reasons for why I wouldn't use a permutation test. Here are few examples: It can't properly account for variations in sequencing depth. It is unable to adjust for batch effects. It can't incorporate quality weights or adjust for heteroscedasticity. It doesn't estimate magnitude of change. It doesn't extend to pathway signature analyses.

PS. Rather than adding a question to an old thread, it would be better to start a new question with a title that better describes your question. Then you wouldn't need to apologize about re-opening the conversation.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Gordon Smyth32k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 204 users visited in the last hour