Search
Question: edgeR RNA-seq analysis using glm
0
22 months ago by
es87410
es87410 wrote:

I have 30 paired-end samples with 6 replicates in 5 groups (4 treatments + 1 baseline control). I am trying to decide how to setup the DE analysis. The glm approach would be the best design, but I'm not clear how to setup the design and comparisons with the baseline sample. Would I use the baseline group as an intercept and compare all treatments to baseline? How would I also include comparisons between each treatment that also takes into account the baseline?

modified 22 months ago by Ryan C. Thompson6.8k • written 22 months ago by es87410
3
22 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.8k wrote:

Unless you have a reason not to, I would follow the advice in the User's Guide and use a design of ~0 + group, which will give you 5 coefficients, each representing the mean expression of one of the 5 treatment groups. You are then free to form any contrasts you like between groups.

OK, thanks. This was my first thought, but the PI mentioned "removing the baseline" from all samples before comparing each treatment. I'm not sure if this is logical or even a parameter that can be applied.

You could do that, but algebraically it's the same as not doing it. In other words

(treatment_1 - baseline) - (treatment_2 - baseline)

reduces to

treatment_1 - treatment_2

That's essentially what you would get from fitting a model using a factor with control as baseline

group <- factor(<groups go here>)
## assuming control comes first, alphabetically
design <- model.matrix(~ group)

Then all the coefficients (except for the first) are the difference between a treatment and the baseline, and any contrasts will be like my first example, above. But as I said, those reduce to just the difference between treatments, and you will get the exact same results that you would get if you followed Ryan's advice and made direct comparisons between treatments.

Thanks very much.

I just created a design matrix and used glmFit. I noticed that my baseline group, "T0", is now appearing as the last column instead of the first in the design matrix. I assume, as you mentioned, because it is arranged alphabetically.

Does this mean that the fitted model is using the first column (which is a treatment group) as a control for comparison between the other treatments? If so, how do I correct this?

Yes. The column with all 1s is the baseline. You can correct that by either using relevel on your group factor, or by specifying the 'levels' argument when you generate the factors. See

?relevel
?factor

I corrected the baseline reference group using the relevel function, but I'm a little confused as to whether I use the GLM approach where 0+ is included or where it is omitted. I have 5 groups, 4 treatments and 1 baseline (control). Thoughts?

This was answered in the original two responses in this thread. Ryan recommended using a cell means model (~ 0 + group) and I told you that it doesn't matter and explained algebraically why that was.

The only reason to use one over the other is to simplify your life, not because you will get different results.