Question

Does DeSeq2 make comparisons between all samples vs. all controls, or between matched samples and controls in different groups?

0

Entering edit mode

Maria • 0

@e077895d

Last seen 3.3 years ago

United Kingdom

Does DeSeq2 make comparisons between all samples vs. all controls (A), or between matched samples and controls in different groups (B)?

E.g. Groups: 1, 2, 3 (treated) Controls: 1C, 2C, 3C (untreated)

(very simplistically)

(A) [1 + 2 + 3 treated] vs [1C + 2C + 3C]

or

(B) [1 vs. 1C] + [2 vs. 2C] + [3 vs. 3C]

The vignette states 'Typically, we recommend users to run samples from all groups together, and then use the contrast argument of the results function to extract comparisons of interest after fitting the model using DESeq.', so does this mean that method (A) above will be how this works, once the contrast is defined?

The reason I ask/want to understand how this is working is that I am working with a large bulk RNAseq dataset with hundreds of samples. I am wondering what the impact will be of removing individual low quality samples from the dataset on the running/output of DESeq2. Removal of individual sample will leave me with a mix of samples, some with associated/paired controls and some without; would it be better to remove all samples that do not have a matching treated sample or control in this case?

Thank you.

DESeq2 RNAseq bulk DifferentialExpression • 2.1k views

ADD COMMENT • link updated 3.3 years ago by James W. MacDonald 68k • written 3.3 years ago by Maria • 0

score 0 · Answer 1 · 2022-10-04

What the quote from the vignette is saying is that if you have say four groups, then it's usually best to keep them all in the model and make specific contrasts to compare the groups as you see fit. So you could have three treatments and one control, and would fit a model with all four groups and then make individual comparisons between each of the treatments and the control.

You don't say what you mean by 'matched'. This can mean paired, where e.g., 1 and 1C are treated and control samples from the same subject, or it can mean that you tried to get subjects for the treated and control group that have very similar phenotypes (age, sex, etc). If you mean the samples are paired, then you should have an additional subject factor in your model to account for the pairing. This is algebraically the same as doing (B), where you first compute the differences between treated and control within subject, and then test that the average difference is different from zero.

If you simply tried to get matched subjects for treated and control, then you don't use that information directly. In other words, there is no expectation that two people with similar age, height, weight, etc will be correlated, so you should not pretend that they are. Instead, the goal is to make any of those phenotypic variables orthogonal to the treatment so they won't bias the result. In a matched analysis (or an analysis where you might not have tried to match), you will essentially do (A), where you compare the mean of each group.

Also, due to the associative property of addition and subtraction, the numerator of your statistic will be the same regardless. Pairing only affects the denominator, by (hopefully) reducing the within-group variability.