Question

Removing batch effect on edgeR when interactons are looked at

0

Entering edit mode

David R ▴ 90

@david-rengel-6321

Last seen 14 months ago

European Union

Hi,

I am running some RNASeq analysis with two experimental factors, namely time (two-level factor) and genotype (three-level factor). In cases like this, I like using edgeR cause, among other reasons, it makeContrasts feature is very handy. Thus, I would create a design factor combining all levels of time and genotype. This would allow me to easily look, for instance, for interactions between Genotype 2 and 1 over time by doing. Like this:

design.factor <- rep (c("Gtype1.Time1","Gtype1.Time2", "Gtype2.Time3","Gtype2.Time2", "Gtype3.Time1","Gtype3.Time2"), each=4)

design.1 <- model.matrix(~0+design.factor)

colnames(design.1)=sub("design.factor","",colnames(design.1))

my.contrasts=makeContrasts(

Inter =  (Gtype2.Time2  – Gtype2.Time1)  –  (Gtype1.Time2  - Gtype1.Time1),

Levels=design.1)

However, in this particular experiment there is a strong replicate effect. Biological replicates where carried out over time in four independent experiments. I would like to correct this effect, but this is where things get a little bit complicated when I run the code here below:

reps <- rep(c("R1","R2","R3","R4"),6)

design.factor <- rep (c("Gtype1.Time1","Gtype1.Time2", "Gtype2.Time3","Gtype2.Time2", "Gtype3.Time1","Gtype3.Time2"), each=4)

design.2 <- model.matrix(~0+reps+design.factor)

colnames(design.2)=sub("reps|design.factor","",colnames(design.2))

colnames(design.1)

[1] “R1” “R2” “R3” “R4” “

colnames(design.1)

[1] "R1”  "R2"  "R3"  "R4"  "Gtype1.Time2" "Gty

pe2.Time2" "Gtype2.Time3"  "Gtype3.Time1" "Gtype3.Time2"

As you cans see, my “Gtype1.Time1” level has disappeared due, I guess, to the way the degrees of freedom have been handled. Then my questions are:

1-Does all that mean that, for instance, the "Gtype1.Time2" coefficient will be in fact the difference "Gtype1.Time2" - "Gtype1.Time1".

2-Can I run interaction-type contrasts as described above? An dif this is the case, how?

3-Should I try tools such as SVASeq instead?

I thank you in advance for any help on this.

Best,

David

rnaseq edger statistics model matrix • 1.8k views

ADD COMMENT • link updated 8.4 years ago by Steve Lianoglou ★ 13k • written 8.4 years ago by David R ▴ 90

score 2 · Answer 1 · 2017-10-02

2

Entering edit mode

Aaron Lun ★ 29k

@alun

Last seen 2 hours ago

The city by the bay

You can do everything you want to do by applying a bit of arithmetic to the terms in design.2. However, the simplest solution is just to set up your design matrix in a friendlier manner:

design.3 <- model.matrix(~0 + design.factor + reps)
colnames(design.3) <- sub("reps|design.factor","",colnames(design.3))

Printing colnames(design.3) gives me:

[1] "Gtype1.Time1" "Gtype1.Time2" "Gtype2.Time2" "Gtype2.Time3" "Gtype3.Time1"
[6] "Gtype3.Time2" "R2"           "R3"           "R4"

... where the first 6 coefficients represent the fitted expression for each group in R1. The last three coefficients represent the log-fold change of R2/3/4 over R1. You can then perform comparisons between groups as you did for design.1.

ADD COMMENT • link 8.4 years ago Aaron Lun ★ 29k

0

Entering edit mode

Hi Aaron,

Thanks a lot for your helpful reply, I had not thought about this way out. Let me ask: do you think I am "losing" any power from the fact that I would use only R1-coefficients or this is not the case cause the contrasts are going to be made on the fitted values along al 4 replicates.

Thaks again,

David.

ADD REPLY • link 8.4 years ago David R ▴ 90

0

Entering edit mode

No, there is no loss of power, information is used from all four replicates. It's an additive model, so there is an implicit assumption that the differences between groups for R1 are the same for R2, etc.

ADD REPLY • link 8.4 years ago Aaron Lun ★ 29k

0

Entering edit mode

Thanks a lot Aaron,

Best wishes,

David

ADD REPLY • link 8.4 years ago David R ▴ 90