I've been using DEseq2 to find the differentially expressed genes on a dataset of RNAseq samples. There about 700 cases and 300 control samples.
The design formula is "~ Condition + age + sex". DEseq2 takes a really long time to finish. It used about 16h with 8 cores and 32 GB memory.
Here is my running codes:
dds <- DESeqDataSetFromMatrix(countData = round(cts), colData = covarianceTable, design = ~ CONDITION + age + sex) register(MulticoreParam(8)) ## It takes a long time to run................. dds_subset <- DESeq(dds,parallel=TRUE, BPPARAM=MulticoreParam(8)) resultsNames(dds_subset) resLFC <- lfcShrink(dds_subset, coef="CONDITION_Case_vs_Control", type="apeglm", parallel=TRUE, BPPARAM=MulticoreParam(8))
Also, the lfcShrink function also takes a long time.
So my questions are:
Is it the expected running time to run DESeq2 on a 58000 Genes * 10000 Samples expression matrix?
Is there any way that I can reduce the time needed for running DEseq2 on my dataset?
(I have tried to use only protein-coding genes(~20 000 genes), It still takes a long time, about 12h)
...I really appreciate any suggestion and many thanks in advance!