Question

Multi-level comparisons with paired data: duplicateCorrelation or not?

1

Entering edit mode

Lluís Revilla Sancho ▴ 760

@lluis-revilla-sancho

Last seen 21 days ago

European Union

I have a dataset of 120 RNAseq files. These files are the results of an experiment where samples from 4 different people where given 30 different treatments (including the control without any stimulus ). This samples have been treated with both cytokines (5 different and some combinations of them: "INFg", "TNFa", "IL1b", "FLA", "INFg+TNFa", "INFg+PBS", "TNFa+PBS", "IL1b+PBS", "FLA+PBS", "INFg+TNFa+PBS", "PBS") and supernatant from culture of different bacteria (three factors: "PBSth", "PBNissle", "PBK12").

Here is the description of the data:

meta <-  structure(list(sample_name = c("1_602", "2_602", "3_602", 
"4_602", "5_602", "6_602", "7_602", "8_602", "9_602", "10_602", 
"1_604", "2_604", "3_604", "4_604", "5_604", "6_604", "7_604", 
"8_604", "9_604", "10_604"), SAMPLE = c(602, 602, 602, 602, 602, 
602, 602, 602, 602, 602, 604, 604, 604, 604, 604, 604, 604, 604, 
604, 604), PB = c(NA, NA, NA, NA, NA, "PBSth", "PBSth", "PBSth", 
"PBSth", "PBSth", NA, NA, NA, NA, NA, "PBSth", "PBSth", "PBSth", 
"PBSth", "PBSth"), stimulus = c("INFg", "TNFa", "IL1b", "FLA", 
"INFg+TNFa", "INFg", "TNFa", "IL1b", "FLA", "INFg+TNFa", "INFg", 
"TNFa", "IL1b", "FLA", "INFg+TNFa", "INFg", "TNFa", "IL1b", "FLA", 
"INFg+TNFa")), row.names = c(NA, -20L), class = "data.frame")
meta
##    sample_name SAMPLE    PB   stimulus
## 1        1_602    602  <NA>      INFg
## 2        2_602    602  <NA>      TNFa
## 3        3_602    602  <NA>      IL1b
## 4        4_602    602  <NA>       FLA
## 5        5_602    602  <NA> INFg+TNFa
## 6        6_602    602 PBSth      INFg
## 7        7_602    602 PBSth      TNFa
## 8        8_602    602 PBSth      IL1b
## 9        9_602    602 PBSth       FLA
## 10      10_602    602 PBSth INFg+TNFa
## 11       1_604    604  <NA>      INFg
## 12       2_604    604  <NA>      TNFa
## 13       3_604    604  <NA>      IL1b
## 14       4_604    604  <NA>       FLA
## 15       5_604    604  <NA> INFg+TNFa
## 16       6_604    604 PBSth      INFg
## 17       7_604    604 PBSth      TNFa
## 18       8_604    604 PBSth      IL1b
## 19       9_604    604 PBSth       FLA
## 20      10_604    604 PBSth INFg+TNFa

I have omitted some conditions and samples for brevity, but it is blocked in the sense that all the 4 samples have all the 30 conditions and in total 120 sample_names : 1_602, 2_602, 3_602, ..., 30_602, 1_603, 2_603, ..., 30_603, 1_604, ..., 30_604. No missing factor or a factor only present in one SAMPLE. We want to know the effect of the PB, the effect of the cytokines and their effect of their interaction. Till now I've been asked to do 43 different comparisons.

My question is about how to take into account the effect of originally having only 4 samples.

My current approach is using:

condition <- as.factor(paste0(meta$PB, meta$stimulus))
mm_complex2 <- model.matrix(~0+condition)
corfit <- duplicateCorrelation(cpm, design = mm_complex2, block = meta$SAMPLE)
fit_c2 <- lmFit(expr, design = mm_complex2, block = meta$SAMPLE, correlation = corfit$consensus.correlation)
....

This is following section section 9.5 about explicit factors and section 9.7 of limma vignette about pairing on multi-level experiments.

But I've received comments to use this instead (which according to the biologists the results make more sense to them):

mm_complex3 <- cbind(mm_complex2. paired = meta$SAMPLE)
fit_c3 <- lmFit(expr, design = mm_complex2)
...

Which strategy is better?

I've read this post about some nuances between having more samples available for contrasts using duplicateCorrelation and blocking them to just make the comparisons about the specific samples that meet the condition, and given that experimentalist like more the second approach I am no longer sure which method is best.

TL;DR: How to take into account paired samples on multi-level data?

limma • 1.0k views

ADD COMMENT • link 3.7 years ago Lluís Revilla Sancho ▴ 760

score 2 · Accepted Answer · 2021-03-25

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 28 minutes ago

WEHI, Melbourne, Australia

This sounds like a well-designed paired-experiment (also called a repeated measures experiment). So take the same approach as for all paired experiments:

design <- model.matrix(~condition + sample)

Your experiment is not multi-level because you don't wish to compare people. All the comparisons are within people, i.e., there is just one error level.

You would only use duplicateCorrelation if you needed to recover information from samples with missing conditions.

ADD COMMENT • link 3.7 years ago Gordon Smyth 52k

0

Entering edit mode

I was worried that as there are some condition that combine different cytokines (INFg+TNFa) it would be multi-level. Thanks for clarifying when to use duplicateCorrelation and your fast reply.

ADD REPLY • link 3.7 years ago Lluís Revilla Sancho ▴ 760