Search
Question: What's the relationship between "design" and "contrasts"?
0
20 months ago by
kjo30
kjo30 wrote:

The explanations I've heard of "design" and "contrasts" make them sound like "overlapping"/non-independent concepts.  I.e. it would be possible to give values to these two parameters that are inconsistent with each other.

Is that the case?  Or can the two parameters be specified completely independently from each other?

modified 20 months ago by Ryan C. Thompson6.8k • written 20 months ago by kjo30
4
20 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.8k wrote:

A design matrix is simply the matrix representation of the system linear equations that you wish to solve using regression, as described here: https://en.wikipedia.org/wiki/Design_matrix. Each column represents a coefficient, and each row represents a sample. It's possible to represent the same model using two different design matrices, just like it is possible to represent the same space using two different coordinate systems.

Once you fit a model using a given design matrix, you generally want to conduct statistical tests based on the model. If one of the coefficients represents the quantity you wish to test, then you can simply test that coefficient directly, and you have no need for contrasts. However, if you want to test for differences between two coefficients or some other more complicated relationship between coefficients, then you require a contrast. A contrast is nothing more than a simple arithmetic expression involving only the coefficients and constants. For example, if you have coefficients named A, B, and C, then "(A + B)/2 - C" is an example of a contrast. (Depending on what A, B, and C are, this contrast may or may not have a meaningful interpretation.) So it should be clear that any contrast is only defined in reference to a specific design matrix. Using the a contrast with a different design matrix than the one it was written for will either fail (if the dimensions/names don't match) or give a nonsensical result (if the dimensions and names happen to match by coincidence).

For more information, I highly recommend you read a textbook on linear regression. ISLR is a good one: http://www-bcf.usc.edu/~gareth/ISL/

1

Yes, and to follow up on Ryan, in terms of code, DESeq2 will make sure that, if you specify a contrast, those refer to specific coefficients that were fit based on the design formula. E.g. if you don't include a variable in the design, those coefficient won't be present to use with either 'name' or 'contrast' argument of results(). results() performs a lot of checks to make sure that when you specify a contrast, it corresponds to a reasonable linear combination of coefficients, and if it doesn't, results() tries to tell you what was wrong.