Question

edgeR RNA-seq analysis using glm

0

Entering edit mode

es874 ▴ 20

@es874-11802

Last seen 8.1 years ago

I have 30 paired-end samples with 6 replicates in 5 groups (4 treatments + 1 baseline control). I am trying to decide how to setup the DE analysis. The glm approach would be the best design, but I'm not clear how to setup the design and comparisons with the baseline sample. Would I use the baseline group as an intercept and compare all treatments to baseline? How would I also include comparisons between each treatment that also takes into account the baseline?

edger rna-seq • 1.7k views

ADD COMMENT • link updated 8.2 years ago by Ryan C. Thompson ★ 7.9k • written 8.2 years ago by es874 ▴ 20

score 3 · Accepted Answer · 2016-11-07

3

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 12 weeks ago

Icahn School of Medicine at Mount Sinai…

Unless you have a reason not to, I would follow the advice in the User's Guide and use a design of ~0 + group, which will give you 5 coefficients, each representing the mean expression of one of the 5 treatment groups. You are then free to form any contrasts you like between groups.

ADD COMMENT • link 8.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

OK, thanks. This was my first thought, but the PI mentioned "removing the baseline" from all samples before comparing each treatment. I'm not sure if this is logical or even a parameter that can be applied.

ADD REPLY • link 8.2 years ago es874 ▴ 20

0

Entering edit mode

You could do that, but algebraically it's the same as not doing it. In other words

(treatment_1 - baseline) - (treatment_2 - baseline)

reduces to

treatment_1 - treatment_2

That's essentially what you would get from fitting a model using a factor with control as baseline

group <- factor(<groups go here>) 
## assuming control comes first, alphabetically
design <- model.matrix(~ group)

Then all the coefficients (except for the first) are the difference between a treatment and the baseline, and any contrasts will be like my first example, above. But as I said, those reduce to just the difference between treatments, and you will get the exact same results that you would get if you followed Ryan's advice and made direct comparisons between treatments.

ADD REPLY • link 8.2 years ago James W. MacDonald 67k

0

Entering edit mode

Thanks very much.

I just created a design matrix and used glmFit. I noticed that my baseline group, "T0", is now appearing as the last column instead of the first in the design matrix. I assume, as you mentioned, because it is arranged alphabetically.

Does this mean that the fitted model is using the first column (which is a treatment group) as a control for comparison between the other treatments? If so, how do I correct this?

ADD REPLY • link 8.2 years ago es874 ▴ 20

0

Entering edit mode

Yes. The column with all 1s is the baseline. You can correct that by either using relevel on your group factor, or by specifying the 'levels' argument when you generate the factors. See

?relevel
?factor

ADD REPLY • link 8.2 years ago James W. MacDonald 67k

0

Entering edit mode

I corrected the baseline reference group using the relevel function, but I'm a little confused as to whether I use the GLM approach where 0+ is included or where it is omitted. I have 5 groups, 4 treatments and 1 baseline (control). Thoughts?

ADD REPLY • link 8.2 years ago es874 ▴ 20

0

Entering edit mode

This was answered in the original two responses in this thread. Ryan recommended using a cell means model (~ 0 + group) and I told you that it doesn't matter and explained algebraically why that was.

The only reason to use one over the other is to simplify your life, not because you will get different results.

ADD REPLY • link 8.2 years ago James W. MacDonald 67k