Does the running time of DESeq function is much more longer for design factor containing two variable compare to design factor containing only one variable?
1
0
Entering edit mode
Sep • 0
@06de5a1f
Last seen 12 hours ago
Germany

Hi,

I have a question regarding the running time of DESeq function in DESeq2.

Does the running time of DESeq function is much more longer for design factor containing two variable compare to design factor containing only one variable?

For example does the running time for the condition 1 is much more longer compare to condition 2?

Condition 1:

dds_1 <- DESeqDataSetFromMatrix(countData = bigdf_t, colData = sample_info, design = ~ subject + condition)

dds_1 <- DESeq(dds_1, parallel = TRUE)

Condition 2:

dds_1 <- DESeqDataSetFromMatrix(countData = bigdf_t, colData = sample_info, design = ~ condition)

dds_1 <- DESeq(dds_1, parallel = TRUE) `

I would like to add that in both condition the number of samples and genes are the same.

DESeq2 • 206 views
0
Entering edit mode

Cannot come up with a precise answer other than 'not much'. For normal sized analysis with tens to hundreds of samples that will take a few seconds unless you pump it with many covariates. It's really not much if an issue? Do you experience any problems?

0
Entering edit mode

The data comprises 100 samples and around 3 million covariates.

When I ran the code for the condition 2 ( DESeqDataSetFromMatrix(countData = bigdf_t, colData = sample_info, design = ~ condition)) it took around 5 6 hours to gave me the result, however the code is now running for around 23 hours for the condition 1 (DESeqDataSetFromMatrix(countData = bigdf_t, colData = sample_info, design = ~ subject + condition) and still I do not get any result. it is in gene-wise dispersion estimates status with out giving me any error or so...

Do you think it is ok or there is sth wrong there?

0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Do you mean 3 million features? Can you explain what type of data you are using with DESeq2?

0
Entering edit mode

Yes. The data is peptide profiles from pre and post SARS-CoV-2 infection samples.

0
Entering edit mode

I'm not sure this is appropriate for DESeq2, don't know anything about its distribution.

How large of counts do you have for these 3 million features?

I'd recommend limma-voom with filtering on the minimal count, without knowing if this type of data is appropriate to model with NB. It's faster and more robust to non-Negative-Binomial data.

0
Entering edit mode

The distribution is poisson-like distribution, the variance of features are larger than their mean and they measurements are counts per million. Do you believe that DESeq2 wont work this type of data?

0
Entering edit mode

For CPM you should definitely use limma-voom.