It takes a so long time to run DESeq2 on 1000 samples
1
0
Entering edit mode
Ruifeng • 0
@19dfc7b8
Last seen 29 days ago
United States

I've been using DEseq2 to find the differentially expressed genes on a dataset of RNAseq samples. There about 700 cases and 300 control samples.

The design formula is "~ Condition + age + sex". DEseq2 takes a really long time to finish. It used about 16h with 8 cores and 32 GB memory.

Here is my running codes:

dds <- DESeqDataSetFromMatrix(countData = round(cts),
colData = covarianceTable,
design = ~ CONDITION + age + sex)

register(MulticoreParam(8))

## It takes a long time to run.................
dds_subset <- DESeq(dds,parallel=TRUE, BPPARAM=MulticoreParam(8))
resultsNames(dds_subset)

resLFC <- lfcShrink(dds_subset, coef="CONDITION_Case_vs_Control", type="apeglm",
parallel=TRUE, BPPARAM=MulticoreParam(8))


Also, the lfcShrink function also takes a long time.

So my questions are:

Is it the expected running time to run DESeq2 on a 58000 Genes * 10000 Samples expression matrix?

Is there any way that I can reduce the time needed for running DEseq2 on my dataset?

(I have tried to use only protein-coding genes(~20 000 genes), It still takes a long time, about 12h)

...I really appreciate any suggestion and many thanks in advance!

Best,

Ruifeng

DESeq2 • 134 views
0
Entering edit mode

The benefits of DESeq2 mainly kick in when sample size (and therefore per-gene information) is limited. With 1000 sampes I would simply use limma-voom, see for references e.g.

DESeq2 with many samples

Running DESeq with 1000 samples

DESeq2 taking long time to run with 270 samples in 10 groups.

0
Entering edit mode

Yes agree with this as well for bulk RNA-seq.

0
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

You can use glmGamPoi here. See the vignette section on how to run it from DESeq().