Question: edgeR RNA-seq analysis using glm
gravatar for es874
2.1 years ago by
es87410 wrote:

I have 30 paired-end samples with 6 replicates in 5 groups (4 treatments + 1 baseline control). I am trying to decide how to setup the DE analysis. The glm approach would be the best design, but I'm not clear how to setup the design and comparisons with the baseline sample. Would I use the baseline group as an intercept and compare all treatments to baseline? How would I also include comparisons between each treatment that also takes into account the baseline?


ADD COMMENTlink modified 2.1 years ago by Ryan C. Thompson7.0k • written 2.1 years ago by es87410
gravatar for Ryan C. Thompson
2.1 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson7.0k wrote:

Unless you have a reason not to, I would follow the advice in the User's Guide and use a design of ~0 + group, which will give you 5 coefficients, each representing the mean expression of one of the 5 treatment groups. You are then free to form any contrasts you like between groups.

ADD COMMENTlink written 2.1 years ago by Ryan C. Thompson7.0k

OK, thanks. This was my first thought, but the PI mentioned "removing the baseline" from all samples before comparing each treatment. I'm not sure if this is logical or even a parameter that can be applied.

ADD REPLYlink written 2.1 years ago by es87410

You could do that, but algebraically it's the same as not doing it. In other words

(treatment_1 - baseline) - (treatment_2 - baseline)

reduces to

treatment_1 - treatment_2

That's essentially what you would get from fitting a model using a factor with control as baseline

group <- factor(<groups go here>) 
## assuming control comes first, alphabetically
design <- model.matrix(~ group)

Then all the coefficients (except for the first) are the difference between a treatment and the baseline, and any contrasts will be like my first example, above. But as I said, those reduce to just the difference between treatments, and you will get the exact same results that you would get if you followed Ryan's advice and made direct comparisons between treatments.


ADD REPLYlink written 2.1 years ago by James W. MacDonald48k

Thanks very much.

I just created a design matrix and used glmFit. I noticed that my baseline group, "T0", is now appearing as the last column instead of the first in the design matrix. I assume, as you mentioned, because it is arranged alphabetically.

Does this mean that the fitted model is using the first column (which is a treatment group) as a control for comparison between the other treatments? If so, how do I correct this?

ADD REPLYlink written 2.1 years ago by es87410

Yes. The column with all 1s is the baseline. You can correct that by either using relevel on your group factor, or by specifying the 'levels' argument when you generate the factors. See

ADD REPLYlink written 2.1 years ago by James W. MacDonald48k

I corrected the baseline reference group using the relevel function, but I'm a little confused as to whether I use the GLM approach where 0+ is included or where it is omitted. I have 5 groups, 4 treatments and 1 baseline (control). Thoughts?

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by es87410

This was answered in the original two responses in this thread. Ryan recommended using a cell means model (~ 0 + group) and I told you that it doesn't matter and explained algebraically why that was.

The only reason to use one over the other is to simplify your life, not because you will get different results.

ADD REPLYlink written 2.1 years ago by James W. MacDonald48k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 438 users visited in the last hour