Question: Subset-ing Large DESeqDataSet for data extraction with contrast function
0
gravatar for l.s.wijaya
6 days ago by
l.s.wijaya0
l.s.wijaya0 wrote:

Hi all,

I am now running Deseq2 analysis. However, I am stuck at extracting the results with results(dds, contrast = ()) after deseq() from the large DESeqDataSet. The problem was, it took me extremely long to only extract one comparison (3 hours/comparison meanwhile, I have 265 comparison to extract). I am planning to make a subset of the dds large matrix (after running the deseq()) since I don't want to lose genes before normalization. However, I always got an error when running results(dds, contrast = ()) using the new subset matrix :

Error in counts(object) %*% whichSamples : non-conformable arguments

Then, I tried to relevel the variable containing the numerator and denumerator for contrast function and it gave me another error :

Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues

It seems like the contrast = could not recognize the numerator and denumerator from the new subset matrix. I also read that actually releveling the factor after running the deseq() isn't allowed as it makes the software confused. https://support.bioconductor.org/p/102317/

If there is a way to do solve this problem, please share with me. Thanks in advance.

deseq2 • 72 views
ADD COMMENTlink modified 6 days ago by Michael Love25k • written 6 days ago by l.s.wijaya0
Answer: Subset-ing Large DESeqDataSet for data extraction with contrast function
0
gravatar for Michael Love
6 days ago by
Michael Love25k
United States
Michael Love25k wrote:

How many samples do you have? Also, how many samples per group?

ADD COMMENTlink written 6 days ago by Michael Love25k

in total I have 845 samples, each sample has 21111 genes. The samples per group vary depending on the group. Some, I have 6 only, but some I have 116. In total, I have 8 groups so that I plan to subset the matrix into 8 parts.

ADD REPLYlink modified 6 days ago • written 6 days ago by l.s.wijaya0

Two things: I personally use limma-voom for datasets with 100s of individuals in my lab, as I've said before on the support site.

Also, the development branch of DESeq2 (released in one month) is 10x faster for datasets with >100 samples.

ADD REPLYlink written 6 days ago by Michael Love25k

That would be great to try the second option. Thanks a lot. However, is there anything worth trying to solve this issue? I mean, like any way to subset the dds large matrix. I can also consider to use the first option.

ADD REPLYlink written 6 days ago by l.s.wijaya0

You can subset the dds and run DESeq() and results() for pairs of groups.

ADD REPLYlink written 6 days ago by Michael Love25k

And if you wanted to share the dispersion estimate across all groups, you could do DESeq() on the whole dataset, then subset and run only nbinomWaldTest() and results() on pairs.

ADD REPLYlink written 6 days ago by Michael Love25k

Currently, I am trying to subset the outcome matrix of DESeq(). However, I encountered errors written above, What do you mean running results() on pairs? Perhaps, if I manage to subset the matrix, I can run nbinomWaldTest() and calculate the log 2 FC with results().

ADD REPLYlink modified 6 days ago • written 6 days ago by l.s.wijaya0

Yes the second is what I’m recommending.

ADD REPLYlink written 5 days ago by Michael Love25k

By the way, is there any way I can use a function to spread the cores for results(). For instance, like deseq(), I can use BIOPARAM to spread the worker. This would be super helpful if I can do this stuff for results(). I tried to use mclapply for instance, bus still the results() was only performed by one worker, The results() is the one taking longest time.

ADD REPLYlink written 5 days ago by l.s.wijaya0

Your first step should be looking up help for functions.

?results

See BPPARAM

Also if you want to keep going with DESeq2, go ahead and use the development branch which is 10x faster for this size of dataset. That’s more than you’ll get with parallelization.

ADD REPLYlink written 5 days ago by Michael Love25k

Since the development branch of DESeq2 will be in 1 month, I will try to use the parallelization. I will keep an eye on the update. Thanks a lot for the helps!!

ADD REPLYlink written 5 days ago by l.s.wijaya0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 232 users visited in the last hour