Question

Limma time course analysis

0

Entering edit mode

Katburton86 • 0

@katburton86-11314

Last seen 8.6 years ago

Hi, I am analysing an RNAseq dataset for which I have two kinetic, postprandial responses that I wish to compare- milk vs yoghurt (time 0, 2h, 4h, 6h). I have 7 repetitions for each condition and time as the experiment was completed with a cross-over design. I have used Limma with an adapted version of the time course model detailed in the Limma Vignette to assess whether there is a response for either condition (milk or yoghurt response versus 0) separately and then between the two conditions (please see my code below). I have three questions:

1. Is it correct to control for the volunteer variable using the blocking factor 'Subject' or is there a better way to exploit the paired nature of the study.

2. With the current coding are the time differences calculated separately for each individual before an assessment of the mean difference i.e mean(T120subject x-T0subjectx) or is the difference assessed by the mean T120-mean T0. (It should be the first option but am not sure whether the blocking is adequate to ensure this).

3. How is each contrast assessed? Independantly or is there any consideration of the fact that the time differences are part of a continuous series (ie. will three small differences show an overall difference even if separately there is no significant difference?)

cols<-c("Day5_dairy","Time5")
Postprandial$MY=do.call(paste, c(Postprandial[cols],sep='.'))

lev <- c("Milk.T0","Milk.T120","Milk.T240","Milk.T360","Yogurt.T0","Yogurt.T120","Yogurt.T240","Yogurt.T360")
f <- factor(Postprandial$MY, levels=lev)
design <- model.matrix(~0+f)
colnames(design) <- lev
Subject=Postprandial$Subject.ID
Counts=counts_pp20

corfit= duplicateCorrelation(Counts,design,block=Subject)
fit <- lmFit(Counts, design,block=Subject, correlation=corfit$consensus)


##########Analysis Milk

cont.M <- makeContrasts("Milk.T120-Milk.T0","Milk.T240-Milk.T120","Milk.T360-Milk.T240", levels=design)
fit2 <- contrasts.fit(fit, cont.M)
fit3 <- eBayes(fit2)
ttMilk=topTableF(fit3, number=nrow(Counts),adjust="fdr", genelist=rownames(Counts))

#Comparison yoghurt versus 0
###Raw counts used without T0 deducted to correct individual baseline
cont.Y <- makeContrasts("Yogurt.T120-Yogurt.T0","Yogurt.T240-Yogurt.T120","Yogurt.T360-Yogurt.T240", levels=design)
fit2 <- contrasts.fit(fit, cont.Y)
fit2 <- eBayes(fit2)
ttYog=topTableF(fit2, number=nrow(Counts),adjust="fdr", genelist=rownames(Counts))

#Use time course approach to assess the differences between milk and yoghurt

cont.dif <- makeContrasts(
  Dif120hr =(Yogurt.T120-Yogurt.T0)-(Milk.T120-Milk.T0),
  Dif240hr =(Yogurt.T240-Yogurt.T120)-(Milk.T240-Milk.T120),
  Dif360hr =(Yogurt.T360-Yogurt.T240)-(Milk.T360-Milk.T240),
  levels=design)
fit2 <- contrasts.fit(fit, cont.dif)
fit2 <- eBayes(fit2)
ttMY=topTableF(fit2, number=nrow(Counts),adjust="fdr", genelist=rownames(Counts))

With many thanks,

Kathryn

limma • 955 views

ADD COMMENT • link 8.6 years ago Katburton86 • 0

score 2 · Answer 1 · 2016-08-17

Blocking on subject with duplicateCorrelation is fine. The alternative is to include Subject in the design matrix. The former is more powerful but depends on the subject-specific effect behaving properly (e.g., homogeneously distributed effects without systematic differences or outliers, sufficient subjects for estimating the correlation and similar correlations across genes). So which one is "best" depends on whether those requirements are met - in your case, they probably are, so I would not worry about using duplicateCorrelation. In addition, with the latter approach, it may not be possible to perform some contrasts, e.g., if the subjects at one time point are different from the subjects at another time point. In such situations, the only option is to use duplicateCorrelation.
I don't think that the calculation is so simple when you have a blocking factor. lmFit will switch over to generalized least squares if block is specified, which will account for the dependencies between samples during model fitting. In general, the fitted values from GLS can't be described with a simple expression like "mean(A-B)" or "mean(A)-mean(B)". But for an end user like you, there's no need to worry about the technical specifics - just rest assured that the subject-specific effects will be considered in the final estimation of the log-fold change between time points.
With topTableF, all contrasts will be considered together in an ANOVA-like test, where the null hypothesis is that all the individual nulls associated with the contrasts are true. So, for cont.Y, you'd be testing for differences between any of the yoghurt time points. If you had small differences between four consecutive time points, this can "add up" to improve the significance of the overall result for the corresponding gene, even if each of the pairwise differences were not significant on their own.

score 0 · Answer 2 · 2016-08-19

0

Entering edit mode

Katburton86 • 0

@katburton86-11314

Last seen 8.6 years ago

Thank you for your quick and clear response. I will check that the subject-specific effect is behaving as expected- thank you for this tip. All subjects are present at all time points so this should allow use of either method at least based on the contrasts that can be performed. I suspected that the calculation using blocking factor was more complex. Are you aware of any publication that provides the details of how the blocking factor is applied with the generalised least squares method? In the Limma Vignette and also the referenced article Peart et al., (2005) on time course analysis do not seem to explain the mathematics behind the model. I would also be grateful if you have any reference for the anova-like test used to create topTableF. It is nevertheless reassuring to understand that the model is analysing the data as we had hoped!

ADD COMMENT • link 8.6 years ago Katburton86 • 0

1

Entering edit mode

For future reference, please respond to existing posts with "add comment" rather than "add answer".

For how blocking would work in GLS, consider a situation where samples 1-3 belong in block A, samples 4-5 in block B, and samples 6-10 in block C. You'd end up with a correlation matrix that looks like:

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]    1    x    x    0    0    0    0    0    0     0
 [2,]    x    1    x    0    0    0    0    0    0     0
 [3,]    x    x    1    0    0    0    0    0    0     0
 [4,]    0    0    0    1    x    0    0    0    0     0
 [5,]    0    0    0    x    1    0    0    0    0     0
 [6,]    0    0    0    0    0    1    x    x    x     x
 [7,]    0    0    0    0    0    x    1    x    x     x
 [8,]    0    0    0    0    0    x    x    1    x     x
 [9,]    0    0    0    0    0    x    x    x    1     x
[10,]    0    0    0    0    0    x    x    x    x     1

... where x is the consensus correlation. This matrix, I think, would be the value of the Omega in https://en.wikipedia.org/wiki/Generalized_least_squares (give or take a scaling factor for the variance). One could then find the least squares solution, while accounting for intra-block correlations. In effect, the off-diagonal entries downweight the contribution of samples in the same block to the residual sum of squares. This is desirable as correlated samples in the same block provide less information than if they were independent. I don't know of a limma-specific citation for the use of GLS, but there's one for duplicateCorrelation.

As for the ANOVA-like F-test, there's a couple of mentions of this behaviour of topTable(F) in the limma book and user's guide, and there's a more mathematical description of the moderated F-statistic in the SAGMB paper.

ADD REPLY • link 8.6 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you Aaron for your clear and complete response- I am grateful for your patience in explaining this to me. I will make reference to the described paper and Limma Vignette.

ADD REPLY • link 8.6 years ago Katburton86 • 0