Search
Question: Using name parameter in DESeq2-results function
0
8 months ago by
muratcokol0 wrote:

Hi. I am trying to make sense of a previously written DESeq analysis which uses the “name” parameter to obtain results. A sample code is below, the first two lines are straightforward - making the Input file and running DESeq. My question is what does the results function return when the name parameter points to only a single condition. What is the comparison that is made? Thanks in advance for the input and I will also appreciate if you can point out the documentation regarding this usage.

ddsInput <- DESeqDataSetFromMatrix(countdata, coldata, ~ condition)

dds <- DESeq(ddsInput, betaPrior=TRUE, modelMatrixType="expanded")

myresults <- results(dds, name = “condition1”)

modified 8 months ago • written 8 months ago by muratcokol0

Hi Michael, thank you very much for your input, it was extremely helpful.

I have a follow-up question on this thread. I implemented the behavior of “name” according to your description as shown below, where I do a contrast of one condition to all other conditions.

expnames <- unique(coldata)
expnow <- "1"
otherexps <- str_c(“condition”, setdiff(expnames, expnow))

newresults <- results(
dds, contrast=list("condition1”), otherexps))

I give the results for two versions below. As you will see, basemean-stat-pvalue-padj replicates perfectly;  but the log2FoldChange is exactly double in the new results compared to old results. Would you have an explanation as to why this is the case?

myresults

baseMean log2FoldChange     lfcSE        stat       pvalue         padj
gene__A    0.2054465     0.11371753 0.1943700  0.58505706   0.55850935           NA
gene__B    6.4477539     0.45373690 0.3068560  1.47866394   0.13923015   0.23173656

newresults

baseMean log2FoldChange     lfcSE        stat       pvalue         padj
gene__A    0.2054465     0.22743636 0.3887400  0.58506039   0.55850711           NA
gene__B    6.4477539     0.90747168 0.6137120  1.47866052   0.13923107   0.23173510

(note, the question here is specific to old settings for running DESeq2, which are not the defaults since v1.14)

A few differences, using a list, you need to set listValues.

It still won’t be the same because the list approach is comparing X vs the rest, while name is comparing X vs all (including X).

Hi Michael, if the difference between “name” and “contrast” approaches is the inclusion of X, wouldn’t all results metrics be different? Now some are exactly same, some are exactly double.

Also, does this mean I cannot replicate the “name” approach by using “contrast”? Because if I write:

allexps <- str_c(“condition”, expnames)
newresults2 <- results(dds, contrast=list("condition1”), allexps)

then DESeq outputs the following error message:

“Error in checkContrast(contrast, resNames) :
elements in the contrast list should only appear in the numerator (first element of contrast list) or the denominator (second element of contrast list), but not both

Let me take a step back. True of all versions of DESeq2: ‘name’ and ‘contrast’ arguments are simply ways to add and subtract coefficients. The names of coefficients is given by resultsNames(dds). You can read about how to specify these in ?results

If after reading that section, do you have remaining questions?

I had read the ?results page before starting this thread and was unable to find some specifications. In your previous answer, for example, you said that when “name” parameter is called without a comparison, DESeq compares to the “mean” of all samples. As far as I can see, the ?results page does not specify this usage.

After reading ?results again, my previous question remains: in the examples I gave above, why does the log2foldchange in newresults is exactly double of myresults, while p-values are exactly same. How should one interpret this outcome?

My reply about using ‘name’ giving a level compared to a middle point is advice specific to the old settings, which aren’t used anymore. The main reason for moving away was that it was difficult for users to interpret.

There is still the description of the meaning of coefficients in an expanded model matrix in all the documentation, though we moved away from this approach. When you use ‘name’ you pull out a single coefficient.

In your example above you are taking one coefficient and subtracting two other coefficients. If you want to average over those levels you’d have to use listValues, here you would put (1,-1/2) to take the numerator coefficient and the average of the denominator coefficients.

If you want a single coefficient you would use ‘name’.

Is this a copy paste issue or is this how you have it in your code?

newresults <- results(dds, contrast=list("condition1”), otherexps))

Did you mean to finish the list after "condition1"?

My apologies - it is a copy paste mistake. It should have read:

newresults <- results(dds, contrast=list("condition1", otherexps))

So the 'name' version is the correct one, and below is a sketch as to why you are getting double that when you use contrast and a list and you don't specify 'listValues' to average over the other conditions (so you are summing over all the other conditions):

(The following only applies to the old settings, which aren't default in DESeq2 in the past two releases)

Suppose you're looking on the log scale at the mean counts for each condition, and we'll write that as x_1, ..., x_n for n conditions.

Using 'name' is giving you something like:

x_1 - mean_i x_i

Using the 'list' that you have above is giving you:

(x_1 - mean_i x_i) - sum_{i != 1} [x_i - mean_j x_j]

= (x_1 - mean_i x_i) - (sum_{i} [x_i - mean_j x_j] - (x_1 - mean_i x_i))

= 2 * (x_1 - mean_i x_i) - sum_{i} [x_i - mean_j x_j]

= 2 * (x_1 - mean_i x_i)

Hi Michael, thank you for this elegant explanation.

There is one question I asked above which didn’t get an answer yet, perhaps you could help in this as well - Is it possible to use “contrast” to get the same result as “name”?

Yes, but only trivially, like so:

results(dds, name=x)

results(dds, contrast=list(x))

Awesome. Thanks once again for your help and patience with all the questions.

0
8 months ago by
Michael Love18k
United States
Michael Love18k wrote:

First I will note that we've moved away from these settings. Now betaPrior=FALSE is the default in the past two releases. But the documentation is still there, and it's in the DESeq2 publication. The interpretation of name="condition1" for these older settings, is the LFC from condition1 samples compared to the "mean" of the samples from all conditions. It's better to say, the middle point, as it moves from the geometric to arithmetic mean depending on the amount of shrinkage.

Expanded model matrices are described in the DESeq2 publication in the Methods.

It's described here in the vignette:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#expanded-model-matrices

And there's a bit of information under 'modelMatrixType' in ?DESeq and in the Details section of ?nbinomWaldTest.