Question

Paired RNA-seq experiment with 3 factors using duplicateCorrelation

0

Entering edit mode

aec ▴ 90

@aec-9409

Last seen 3.9 years ago

Dear all,

I have a complex design for limma-voom with paired samples and 3 factors:

RESPONSE
Good
Good
Good
Good
Good
Good
Good
Good
Bad
Bad
Bad
Bad
Good
Good
Bad
Bad
Good
Good
Good
Good

I need both within and between patients comparisons so I am using the limma 'duplicateCorrelation' function.

The problem is that I am a bit confused to extract coefficients for these multiple comparisons:

1.a) Pre : Group A vs Group B

1.b) Post: Group A vs Group B

2.a) Group A : Pre vs Post

2.b) Group B: Pre vs Post

3.a) Pre, Group A : Good vs Bad

3.b) Pre, Group B : Good vs Bad

4.a) Post, Group A: Good vs Bad

4.b) Post, Group B: Good vs Bad

5.a) Good, Group A: Pre vs Post

5.b) Good, Group B: Pre vs Post

6.a) Bad, Group A: Pre vs Post

6.b) Bad, Group B: Pre vs Post

Here the code:

TREATMENT=as.factor(info$TREATMENT)

GROUP=as.factor(info$GROUP)

RESPONSE=as.factor(info$RESPONSE)

INDIVIDUAL=as.factor(info$INDIVIDUAL)

design=model.matrix(~0+TREATMENT+GROUP+RESPONSE)

v=voom(y,design)

cor=duplicateCorrelation(v,design,block=INDIVIDUAL)

fit=lmFit(v,design,block=INDIVIDUAL, correlation=cor$consensus)

How can I extract the desired coefficients with this model ?

paired samples multiple factor design limma voom • 1.1k views

ADD COMMENT • link updated 7.5 years ago by Gordon Smyth 50k • written 7.5 years ago by aec ▴ 90

0

Entering edit mode

Split the tags for limma and voom, otherwise people won't get notified.

ADD REPLY • link 7.5 years ago Aaron Lun ★ 28k

score 2 · Answer 1 · 2016-10-21

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 3 hours ago

The city by the bay

To do all of the comparisons you've described, there's one simple solution:

EVERYTHING <- factor(paste0(TREATMENT, ".", GROUP, ".", RESPONSE))
design <- model.matrix(~0 + EVERYTHING)
colnames(design) <- levels(EVERYTHING)

... and then run the same limma commands as you've described. Each coefficient in this design represents the average expression of a particular combination. Then you just compare between specific combinations using makeContrasts, which covers comparisons 3a to 6b. For 1a, you can compare the average expression across all pre-groupA combinations to that across all pre-group B combinations - see the last contrast in Section 3.2.3 of the edgeR user's guide for an example. The same reasoning applies for the other comparisons up to 2b.

ADD COMMENT • link 7.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

As always, thanks for your advices Aaron,

Following your answer, here are my contrasts:

design=model.matrix(~0+condition)  ###condition is the combined TREATMENT.GROUP.RESPONSE factors

colnames(design)

#[1] "Post.GroupA.Bad" "Post.GroupA.Good"
#[3] "Post.GroupB.Bad"   "Post.GroupB.Good"
#[5] "Pre.GroupA.Bad"  "Pre.GroupA.Good"
#[7] "Pre.GroupB.Bad"    "Pre.GroupB.Good"

v=voom(y,design)

cor=duplicateCorrelation(v,design,block=info$INDIVIDUAL)

fit=lmFit(v,design,block=info$INDIVIDUAL, correlation=cor$consensus)

cm=makeContrasts(
GroupBvsGroupAforPRE=((Pre.GroupB.Bad+Pre.GroupB.Good)/2)-((Pre.GroupA.Bad+Pre.GroupA.Good)/2) ,
GroupBvsGroupAforPOST=((Post.GroupB.Bad+Post.GroupB.Good)/2)-((Post.GroupA.Bad+Post.GroupA.Good)/2),
PostvsPreforGroupA=((Post.GroupA.Bad+Post.GroupA.Good)/2)-((Pre.GroupA.Bad+Pre.GroupA.Good)/2),
PostvsPreforGroupBs=((Post.GroupB.Bad+Post.GroupB.Good)/2)-((Pre.GroupB.Bad+Pre.GroupB.Good)/2),
BadvsGoodforPreGroupA=Pre.GroupA.Bad-Pre.GroupA.Good,
BadvsGoodforPreGroupB=Pre.GroupB.Bad-Pre.GroupB.Good,
BadvsGoodforPostGroupA=Post.GroupA.Bad-Post.GroupA.Good,
BadvsGoodforPostGroupB=Post.GroupB.Bad-Post.GroupB.Good,
PostvsPreforGoodGroupA=Post.GroupA.Good-Pre.GroupA.Good,
PostvsPreforGoodGroupBs=Post.GroupB.Good-Pre.GroupB.Good,
PostvsPreforBadGroupA=Post.GroupA.Bad-Pre.GroupA.Bad,
PostvsPreforBadGroupBs=Post.GroupB.Bad-Pre.GroupB.Bad,
levels=design)

Are they correct?

ADD REPLY • link 7.5 years ago aec ▴ 90

0

Entering edit mode

Looks fine to me.

ADD REPLY • link 7.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

even though, it is hard to interpret the results:

 GroupBvsGroupAforPRE GroupBvsGroupAforPOST PostvsPreforGroupA
-1                    0                     0                   0
0                 25911                 25909               25911
1                     0                     2                   0
   PostvsPreforGroupB PostvsPre PostvsPreforGood PostvsPreforBad
-1                  0         0              325               0
0               25909     25910            25463           25911
1                   2         1              123               0
   BadvsGoodforPre BadvsGoodforPost BadvsGoodforPreGroupA BadvsGoodforPreGroupB
-1              65                0                      1                    2
0            25828            25911                  25910                25905
1               18                0                      0                    4
   BadvsGoodforPostGroupA BadvsGoodforPostGroupB PostvsPreforGoodGroupA
-1                       0                     0                       0
0                    25911                 25911                   25911
1                        0                     0                       0
   PostvsPreforGoodGroupB PostvsPreforBadGroupA PostvsPreforBadGroupB
-1                    952                      0                     0
0                   24266                  25911                 25911
1                     693                      0                     0

DE genes found in PostvsPreforGood overlap with the ones of PostvsPreforGoodGroupB, so all the genes found in PostvsPreforGood are due to GroupB ( for PostvsPreforGoodGroupA we have 0 DE genes).

Moreover, how can it be we have more than 1645 DE genes for PostvsPreforGoodGroupB but for the rest of the comparisons we have almost nothing. I would expect to find something for example for BadvsGoodforPostGroupB.

ADD REPLY • link 7.5 years ago aec ▴ 90

1

Entering edit mode

All "post versus pre" comparisons are more powerful than the "bad vs good" comparisons, because the former are performed within each patient. While duplicateCorrelation allows limma to perform cross-patient comparisons, it needs to account for the patient-to-patient variability, which results in reduced power. So there may well be a lot of (true) DE genes in your BadvsGoodforPostGroupB comparison; however, your experimental set-up doesn't have enough power to detect them. Similarly, it's hard to compare the number of DE genes from the comparisons between averages with that from the comparisons between groups, because there's a different number of libraries involved - this will inevitably affect the power of each comparison.

P.S. If you're interested in identifying genes where the treatment response differs between good/bad patients, you could do something like this:

con <- makeContrasts((Post.GroupB.Good - Pre.GroupB.Good) -
    (Post.GroupB.Bad - Pre.GroupB.Bad), levels=design)

This computes the treatment log-fold change within patients, and then compares the log-fold changes between good/bad patients. In this manner, we avoid making direct comparisons between patients, which results in the power issue that I mentioned above.

ADD REPLY • link 7.5 years ago Aaron Lun ★ 28k

ID	INDIVIDUAL	TREATMENT	GROUP	RESPONSE
1	1	Pre	A	Good
2	1	Post	A	Good
3	2	Pre	A	Good
4	2	Post	A	Good
5	3	Pre	A	Good
6	3	Post	A	Good
7	4	Pre	B	Good
8	4	Post	B	Good
9	5	Pre	B	Bad
10	5	Post	B	Bad
11	6	Pre	A	Bad
12	6	Post	A	Bad
13	7	Pre	A	Good
14	7	Post	A	Good
15	8	Pre	A	Bad
16	8	Post	A	Bad
17	9	Pre	B	Good
18	9	Post	B	Good
19	10	Pre	A	Good
20	10	Post	A	Good