edgeR | paired samples | multiple baseline | averaging baseline or not
1
0
Entering edit mode
gatk • 0
@gatk-9898
Last seen 8.5 years ago

We have a multi-factor RNA-Seq experiment with multiple baseline measures that we would like to model using edgeR:

  • 2x pre-treatment samples per mouse [time=0]
  • 1x post-treatment sample per mouse [time=1]
  • 2x treatment groups (all mice are treated at post-treatment)

The goal is to evaluate the treatment effect between groups adjusted for baseline (see full R code below). Note, all mice are untreated at baseline. We identified the following two approaches to model this in edgeR:

Approach 1: use each of the two pre-treatment samples per mouse as part of the model

Model: mouse+time+time*treatment

  • 2x pre-treatment samples per mouse [time=0]
  • 1x post-treatment sample per mouse [time=1]
  • 2x treatment groups (all mice are treated at post-treatment)

Approach 2: calculate the average counts of the two pre-treatment samples per mouse prior to modeling and use it as single baseline value

Model: mouse+time+time*treatment 

  • 1x one pre-treatment sample [average] per mouse [time=0]
  • 1x post-treatment sample per mouse [time=1]
  • 2x treatment groups (all mice are treated at post-treatment)

Which of the two approaches do you recommend for estimating the treatment effect for this experiment using edgeR?

Full R code :

library(edgeR);

y = matrix(rnbinom(12000,mu=10,size=10),ncol=12);

time = as.factor(rep(c(0,0,1),4));

treatment = as.factor(rep(c(1,2),2,each=3));

mouse = as.factor(rep(c(1:4),each=3));

d = DGEList(counts=y,group=treatment);

design = model.matrix(~mouse+time+time*treatment )[,-6];

d=estimateGLMCommonDisp(d,design)

d=estimateGLMTrendedDisp(d,design)

d=estimateGLMTagwiseDisp(d,design)

d.fit = glmFit(d,design);

d.lrt = glmLRT(d.fit,'time1:treatment2')

edger rnaseq baseline linear model negative binomial model • 1.9k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 2 hours ago
WEHI, Melbourne, Australia

Dear gatk,

I am not quite following what you are worried about.

You seem to have already completed a perfectly satisfactory analysis, as shown by your R code. Personally I would have setup the design matrix in, what seems to me to be, a more transparent fashion:

Treat1vsBaseline <- Treat2vsBaseline <- rep(0,12)
Treat1vsBaseline[time==1 & treatment==1] <- 1
Treat2vsBaseline[time==1 & treatment==2] <- 1
design <- model.matrix(~mouse+Treat1vsBaseline+Treat2vsBaseline)
d=estimateGLMCommonDisp(d,design)
d=estimateGLMTrendedDisp(d,design)
d=estimateGLMTagwiseDisp(d,design)
fit = glmFit(d,design)
lrt <- glmLRT(fit, contrast=c(0,0,0,0,-1,1))
topTags(lrt)

In my approach, we estimate the two treatment effects, each corrected for its baseline level, and then test for a difference between the two. However my code will give exactly the same DE results as yours.

What I don't see is why you are considering discarding this analysis and instead doing some complicated fudge after averaging the counts of two pre-treatment samples. Why would you want to average the samples? Averaging counts is obviously wrong, because edgeR analyses counts and the average of two counts is not a count. Averaging also throws out information unnecessarily. Do you think that edgeR can't figure out for itself that there are two samples or can't compute averages as appropriate as part of the analysis?

Are you worried about correlations between repeated samples from the same mouse? (You ask about that in offline emails to me, although you didn't mention this concern in your post here.) Including mouse as an additive term in the model fully accounts for any such correlation. And, anyway, correlations exist between all the samples on the same mouse, not just the time==0 samples, so averaging wouldn't help.

ADD COMMENT
0
Entering edit mode

Thank you Gordon for sharing your solution and clarifying that combining pre-treatment samples from the same mouse is neither necessary nor optimal using edgeR. 

My main question was which model would be preferred to account for correlations between multiple pre-treatment samples from the same mouse. You fully answered this. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 456 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6