Search
Question: DESeq2 - selective calculation of differential gene expression
0
13 months ago by
Germany
LilithElina0 wrote:

Hi,

due to a strange error in DESeq my group is finally being forced to switch to DESeq2 (probably a good thing). There is one problem we are not sure how to solve, though:

In the old DESeq, we would normalise the data and then call nbinomTest() for each comparison we wanted to make. Now, the DESeq() function does all in one, which is nice up until you don't want to compare everything with everything else. We have a data frame with over 400 samples and only want to compare them to the wild type, not to each other. I can't find any possibility to do so, though, neither with DESeq() nor with nbinomWaldTest().

Is it in this case recommended to reduce the data set before using DESeq2, for each comparison, or should we try to run the DESeq() function once and then save it to extract only the relevant results?

modified 13 months ago by Michael Love18k • written 13 months ago by LilithElina0
1
13 months ago by
Michael Love18k
United States
Michael Love18k wrote:

hi Lilith,

Yes, definitely a good thing. DESeq2 is more sensitive than DESeq, and the older version isn't being developed since ~2012/2013.

Can you explain a bit more, how do the 400 samples break down into groups? What are the sample sizes per group? Is there any other relevant structure to the samples?

Hey Michael,

I'm afraid they're either single or duplicates (depending on the project). We know replicates are important, but with 400 - 500 clinical isolates it just wasn't feasible. So the problem is that we don't really want to compare more than 400 samples against each other. In the old protocols, we always normalised over all samples at once and then did the comparisons we needed, but that doesn't seem possible any more now.

Sarah

That's fine, but what does table(condition) look like?

How many WT do you have, which you are comparing against?

The wild type is also just one sample, apparently, so the conditions df is basically just a df with the sample names as "conditions".

You want to compare each of 400 samples against a single WT sample? I just want to offer the best advice possible, but maybe it would be good if you could explain what kind of data this is? For how many of the 400 samples is there any replication, if only a duplicate? When you have duplicates, is it a technical replicate (just sequencing the same library) or is it a biological replicate? This is all relevant to what kind of advice I would give you.

The way you can make the comparisons is to run DESeq() once, with WT as the reference level of the condition variable, and then extract results tables for each comparison:

res <- results(dds, contrast=c("condition", "X", "WT"))

where you would fill in "X" with whichever sample you want to compare to WT.

In the example that caused me to ask this question we have no replicates at all, not even for the WT. We did RNA-Seq of ~400 clinical isolates (bacteria, grown in the lab, not ex vivo samples) and we're looking at their general transcription profiles as well as differential expression in comparison to the lab-grown WT.

Running DESeq() on all the data and just extracting the results we need seems to work, thank you. Apparently my colleague only looks at a few samples at a time. Do you think using save() or saveRDS() is a good way to keep the DESeq() results available?

Yes, save() the dds object makes sense