Does the running time of DESeq function is much more longer for design factor containing two variable compare to design factor containing only one variable?

Does the running time of DESeq function is much more longer for design factor containing two variable compare to design factor containing only one variable?

0

Entering edit mode

Sep • 0

@06de5a1f

Last seen 8 months ago

Germany

Hi,

I have a question regarding the running time of DESeq function in DESeq2.

Does the running time of DESeq function is much more longer for design factor containing two variable compare to design factor containing only one variable?

For example does the running time for the condition 1 is much more longer compare to condition 2?

Condition 1:

dds_1 <- DESeqDataSetFromMatrix(countData = bigdf_t, colData = sample_info, design = ~ subject + condition)

dds_1 <- DESeq(dds_1, parallel = TRUE)

Condition 2:

dds_1 <- DESeqDataSetFromMatrix(countData = bigdf_t, colData = sample_info, design = ~ condition)

dds_1 <- DESeq(dds_1, parallel = TRUE) ```

I would like to add that in both condition the number of samples and genes are the same.

Thanks a lot for the answer in advance.

DESeq2 • 728 views

ADD COMMENT • link updated 14 months ago by Michael Love 42k • written 14 months ago by Sep • 0

0

Entering edit mode

Cannot come up with a precise answer other than 'not much'. For normal sized analysis with tens to hundreds of samples that will take a few seconds unless you pump it with many covariates. It's really not much if an issue? Do you experience any problems?

ADD REPLY • link 14 months ago ATpoint ★ 4.1k

0

Entering edit mode

The data comprises 100 samples and around 3 million covariates.

When I ran the code for the condition 2 ( DESeqDataSetFromMatrix(countData = bigdf_t, colData = sample_info, design = ~ condition)) it took around 5 6 hours to gave me the result, however the code is now running for around 23 hours for the condition 1 (DESeqDataSetFromMatrix(countData = bigdf_t, colData = sample_info, design = ~ subject + condition) and still I do not get any result. it is in gene-wise dispersion estimates status with out giving me any error or so...

Do you think it is ok or there is sth wrong there?

ADD REPLY • link 14 months ago Sep • 0

0

Entering edit mode

Michael Love 42k

@mikelove

Last seen 3 hours ago

United States

Do you mean 3 million features? Can you explain what type of data you are using with DESeq2?

ADD COMMENT • link 14 months ago Michael Love 42k

0

Entering edit mode

Yes. The data is peptide profiles from pre and post SARS-CoV-2 infection samples.

ADD REPLY • link 14 months ago Sep • 0

0

Entering edit mode

I'm not sure this is appropriate for DESeq2, don't know anything about its distribution.

How large of counts do you have for these 3 million features?

I'd recommend limma-voom with filtering on the minimal count, without knowing if this type of data is appropriate to model with NB. It's faster and more robust to non-Negative-Binomial data.

ADD REPLY • link 14 months ago Michael Love 42k

0

Entering edit mode

The distribution is poisson-like distribution, the variance of features are larger than their mean and they measurements are counts per million. Do you believe that DESeq2 wont work this type of data?

ADD REPLY • link 14 months ago Sep • 0

0

Entering edit mode

For CPM you should definitely use limma-voom.

ADD REPLY • link 14 months ago Michael Love 42k

Login before adding your answer.

Similar Posts

Loading Similar Posts

Traffic: 575 users visited in the last hour

Content Search
Users
Tags
Badges

Help About
FAQ

Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the

version 2.3.6