Hi,
I am running DESeq function in DEseq2 (version 1.16.1) and it's taking too long . I had it running over night and it's still not done.
I am thinking it's because my data are too big because I was able to run it with smaller dataset. My countdata is made of 56730 genes from 475 samples and I reduced metadata to include only 4 variables for 475 samples. Previously I had run DESeq with 20000 genes and 280 samples and it did not take more than 15 minutes. I am wondering if this is expected considering the large size or if there are any other ways to make this function run faster. Based on previous post (about 3.3 years ago), I also tried converting all metafile components to factor. Thanks for your input in advance!
dds.adj2<-DESeqDataSetFromMatrix(countData = hg38.counts, colData = hg38.coremeta, design=~agequart+muse_IDH1_status+seizure_history)
vsd.adj2<-vst(dds.adj2, blind = T)
dds.adj2<-estimateSizeFactors(dds.adj2)dds.adj2<-DESeq(dds.adj2)
> dds.adj2<-DESeq(dds.adj2)
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
Can you clarify how to remove genes with very small counts? Should I look at normalized counts (not the rawcounts?) and remove the genes with normalized counts less than 10 in most samples?
I have tried with 100 samples for all 56730 genes and it took about 15 min to do DESeq. Also, when I tried again with the entire countdata, I got an error as below. Thanks!
using pre-existing size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Error: cannot allocate vector of size 205.6 Mb