I have 30 paired-end samples with 6 replicates in 5 groups (4 treatments + 1 baseline control). I am trying to decide how to setup the DE analysis. The glm approach would be the best design, but I'm not clear how to setup the design and comparisons with the baseline sample. Would I use the baseline group as an intercept and compare all treatments to baseline? How would I also include comparisons between each treatment that also takes into account the baseline?
OK, thanks. This was my first thought, but the PI mentioned "removing the baseline" from all samples before comparing each treatment. I'm not sure if this is logical or even a parameter that can be applied.
You could do that, but algebraically it's the same as not doing it. In other words
reduces to
That's essentially what you would get from fitting a model using a factor with control as baseline
Then all the coefficients (except for the first) are the difference between a treatment and the baseline, and any contrasts will be like my first example, above. But as I said, those reduce to just the difference between treatments, and you will get the exact same results that you would get if you followed Ryan's advice and made direct comparisons between treatments.
Thanks very much.
I just created a design matrix and used glmFit. I noticed that my baseline group, "T0", is now appearing as the last column instead of the first in the design matrix. I assume, as you mentioned, because it is arranged alphabetically.
Does this mean that the fitted model is using the first column (which is a treatment group) as a control for comparison between the other treatments? If so, how do I correct this?
Yes. The column with all 1s is the baseline. You can correct that by either using
relevel
on your group factor, or by specifying the 'levels' argument when you generate the factors. SeeI corrected the baseline reference group using the relevel function, but I'm a little confused as to whether I use the GLM approach where 0+ is included or where it is omitted. I have 5 groups, 4 treatments and 1 baseline (control). Thoughts?
This was answered in the original two responses in this thread. Ryan recommended using a cell means model (~ 0 + group) and I told you that it doesn't matter and explained algebraically why that was.
The only reason to use one over the other is to simplify your life, not because you will get different results.