Running DESeq with 1000 samples
4
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 9.6 years ago
Hi, I'm using DESeq to find the differential expressed genes between 2 populations. The RNA-seq data set has a total sample size of around 1000. However, even I set the memory limit of R to 6 Gb, it still reports the error that it cannot allocate vector of certain size. I wonder if it's possible to use DESeq on this huge data set and how much memory should be enough. Thank you! -- output of sessionInfo(): NA -- Sent via the guest posting facility at bioconductor.org.
DESeq DESeq • 4.2k views
ADD COMMENT
2
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 3.7 years ago
Zentrum für Molekularbiologie, Universi…
Hi On 09/07/14 20:58, Maoqi Xu [guest] wrote: > I'm using DESeq to find the differential expressed genes between 2 > populations. The RNA-seq data set has a total sample size of around > 1000. However, even I set the memory limit of R to 6 Gb, it still > reports the error that it cannot allocate vector of certain size. I > wonder if it's possible to use DESeq on this huge data set and how > much memory should be enough. You really have one thousand RNA-Seq libraries? This is impressive. First: As Steve already pointed out, please consider using DESeq2. On the other hand: The main point of tools like DESeq2 or edgeR is to use information sharing, such as Bayesian shrinkage, to get decent power even if the sample size is only modest. With so much data, you can keep things very simple, especially if you really just have a standard two-group comparison with no other covariates. I would use DESeq2 only to normalize the data and then do a Wilcoxon rank-sum test on the normalized counts, for each gene separately, or, even better, use a permutation test. Simon
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 14 months ago
United States
Hi, On Wed, Jul 9, 2014 at 11:58 AM, Maoqi Xu [guest] <guest at="" bioconductor.org=""> wrote: > Hi, > I'm using DESeq to find the differential expressed genes between 2 populations. The RNA-seq data set has a total sample size of around 1000. However, even I set the memory limit of R to 6 Gb, it still reports the error that it cannot allocate vector of certain size. I wonder if it's possible to use DESeq on this huge data set and how much memory should be enough. First: if you're just starting your project, you should prefer to use DESeq2 Second: you'll need some serious horsepower -- someone will likely swoop in with a precise calculation, but I wouldn't expect this to work on a machine w/ 8gb of RAM -- maybe 16gb would be enough, but if you're routinely working on data at this scale I hope you've got a big iron machine with ~ 64gb or more ram. One option would be to do the "hard bits" on Amazon's cloud using bioconductor's latest and greatest AMI: http://www.bioconductor.org/help/bioconductor-cloud-ami/ HTH, -steve -- Steve Lianoglou Computational Biologist Genentech
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 5 hours ago
WEHI, Melbourne, Australia

Dear Maoqi Xu,

You could use limma-voom instead, which will handle 1000 samples in a few seconds without the need for extra memory.

See:

  http://genomebiology.com/2014/15/2/R29

If you particularly wanted to stick to an exact negative binomial analysis, then you could consider edgeR which uses considerably less memory than DESeq for large datasets, but for so many samples voom would seem the way to go.

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode
Guest195 • 0
@guest195-8087
Last seen 7.1 years ago
France

Sorry to re-open the conversation,
I am new in RNA-seq and I wonder with which sample size it starts to be reasonable to perform classical non-parametric test instead of ad-hoc RNAseq method ?

Thank you !

 

ADD COMMENT
0
Entering edit mode

There is no sample size that would make me want to use a Wilcoxon test or genewise permutation test to test for differential expression with RNA-seq data. We use voom-limma for large RNA-seq datasets.

There are many reasons for why I wouldn't use a permutation test. Here are few examples: It can't properly account for variations in sequencing depth. It is unable to adjust for batch effects. It can't incorporate quality weights or adjust for heteroscedasticity. It doesn't estimate magnitude of change. It doesn't extend to pathway signature analyses.

PS. Rather than adding a question to an old thread, it would be better to start a new question with a title that better describes your question. Then you wouldn't need to apologize about re-opening the conversation.

ADD REPLY

Login before adding your answer.

Traffic: 885 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6