4.5 years ago by
Zentrum für Molekularbiologie, Universität Heidelberg
On 09/07/14 20:58, Maoqi Xu [guest] wrote:
> I'm using DESeq to find the differential expressed genes between 2
> populations. The RNA-seq data set has a total sample size of around
> 1000. However, even I set the memory limit of R to 6 Gb, it still
> reports the error that it cannot allocate vector of certain size. I
> wonder if it's possible to use DESeq on this huge data set and how
> much memory should be enough.
You really have one thousand RNA-Seq libraries? This is impressive.
First: As Steve already pointed out, please consider using DESeq2.
On the other hand: The main point of tools like DESeq2 or edgeR is to
use information sharing, such as Bayesian shrinkage, to get decent
even if the sample size is only modest.
With so much data, you can keep things very simple, especially if you
really just have a standard two-group comparison with no other
covariates. I would use DESeq2 only to normalize the data and then do
Wilcoxon rank-sum test on the normalized counts, for each gene
separately, or, even better, use a permutation test.