Search
Question: Limma trend test with pre/post design and technical replicates
0
3 months ago by
Jenny G0
Jenny G0 wrote:

Hi all,

I have 4 dose groups, technical duplicates, and pre- and post-treatment data. I'm interested in detecting a linear trend across dose groups, adjusting for pre-treatment variability. I'm not certain how to set this up. Data is like this:

patients <- data.frame(Patient = rep(c(1,2,3,4,5,6,7,8), each=4), Treatment = factor(rep(c("Placebo", "Dose1", "Dose2", "Dose3"), each=4), levels = c("Placebo", "Dose1", "Dose2", "Dose3"), ordered=TRUE), Visit = factor(rep(c("Baseline","Baseline","Visit1","Visit1"), 8)))

Is this sufficient?

design <- model.matrix(~ Visit + Treatment, data=eSet)

corfit <- duplicateCorrelation(eSet,design,block=patients$patient) fit <- lmFit(eSet, design,block=patients$patient,correlation=corfit$consensus, robust=TRUE) fit2 <- eBayes(fit) topTable(fit = fit2, coef = "Treatment.L" )  Or does there need to be a Treatment * Visit interaction, then look at the Treatment.L * Visit1 effect? ADD COMMENTlink modified 3 months ago by Aaron Lun21k • written 3 months ago by Jenny G0 3 3 months ago by Aaron Lun21k Cambridge, United Kingdom Aaron Lun21k wrote: Firstly, your treatment variable is a factor, not a continuous covariate, so there's no trend to be fitted here. I don't even know how you managed to get a Treatment.L coefficient. Instead, you can test the effect of each treatment against another in a pairwise fashion, or use an ANOVA to test for any effect of treatment. Secondly, it seems that you're mixing technical replicates (same patient and visit) with some other form of replication (same patient, different visit). This will probably distort the estimate of the consensus correlation because the variability between technical replicates is probably much lower than that between separate visits, even after blocking on Visit. I would suggest averaging your technical replicates, see avearrays. Thirdly, your discussion of the interaction effect suggests that "baseline" is actually "untreated", which wasn't obvious from your description of the experiment. If this is true, it would seem that all patients should share the same "baseline" term as the effect of the different treatments should not yet have manifested. In summary, for every sample labelled as "baseline", you would replace its treatment with "untreated", and then only use the modified Treatment in your design matrix. Finally, you set robust=TRUE in lmFit. Are you sure you know what you're doing here? I would suggest setting robust=TRUE in eBayes instead, see A: p-values from robust linear model in LIMMA. ADD COMMENTlink modified 3 months ago • written 3 months ago by Aaron Lun21k Hi Aaron, Thanks for your time and your help. My apologies for not explaining the experiment well, I've tried to clarify below. I have a few points/questions, if you have the time to help me. 1. Treatment is an ordered factor, so limma reports linear, quadratic and cubic effects (same as lm() does, e.g. setting up contr.poly()). I don't want pairwise comparisons, I'm specifically looking for a linear (dose-response) trend across groups. This is important to the experiment, and was specified by the team as the result of interest. 2. I do have technical replicates (same visit, patient), and 2 different visits for each patient. I wasn't sure how to specifiy this in the model - I'll use avearrays as you suggest, thanks. 3. All patients are untreated at baseline, correct. If I understand you correctly, then I recode the treatment group variable to 'untreated' for all of the baseline samples. I could then specify the comparisons of interest (e.g. Placebo vs dose groups) from the treatment variable, and I would not need the 'visit' variable at all. I can see this, but then I lose the trend test across the dose groups. Can I set it up as follows, using orthogonal contrasts? contrast.matrix <- makeContrasts(-3*Treatment.Placebo, -1*Treatment.Dose1, 1 * Treatment.Dose2, 3* Treatment.Dose3, levels=design) Best wishes, Jen ADD REPLYlink written 10 weeks ago by Jenny G0 1 1. Ah, missed that in your patients. Yes, okay. I don't deal with ordered factors much, but it seems that some care may be required in interpretation when the quadratic and cubic terms are significant. For example: X <- factor(rep(LETTERS[1:5], each=3), ordered=TRUE) Y <- rep(1:5, each=3) - rep((-2):2, each=3)^2 Y <- jitter(Y) summary(lm(Y ~ X)) # X.L is significantly positive... plot(X, Y) # but it's hard to say that there is an increasing trend. 3. I would be inclined to formulate the problem in terms of the within-patient difference from the baseline. That is, for each patient, compute the log-fold difference of their visit 1 sample from their baseline. This represents the effect of the dosage, which I presume is the feature that's really of interest. I'll assume you did step 2 above: patients <- data.frame( Patient = factor(rep(c(1,2,3,4,5,6,7,8), each=2)), Treatment = factor(rep(c("Placebo", "Dose1", "Dose2", "Dose3"), each=2), levels = c("Placebo", "Dose1", "Dose2", "Dose3"), ordered=TRUE), Visit = factor(rep(c("Baseline","Visit1"), 8))) With some coaxing, you can formulate the following design matrix: poly.part <- model.matrix(~Treatment, patients) poly.part <- poly.part[,-1] # remove intercept for the time being poly.part[patients$Visit=="Baseline"] <- 0
design.final <- model.matrix(~patients\$Patient + poly.part)

... where the last three coefficients represent the linear, quadratic, cubic terms on the within-patient differences. You can alternatively put Patient in duplicateCorrelation instead of in the design matrix.

Thanks Aaron, I think I follow. A trend across within-patient changes sounds like what I want. I've done it this way and the results make sense. I've calculated percent change from baseline and plotted the hits, and they look good (e.g. effect sizes look right).

Hi Aaron,

Thanks again for your help. I'm struggling now with interpreting and explaining the FCs, with this design. I would like to present a table of results with p-values and FCs, then give a clear explanation of the analysis, and explain the results for one example (say, the top hit). I'd also like to present plots for the top hits - the preferred plots are spaghetti plots and/or boxplots of the percent-change-from-baseline. Relating all of these things to each other, correctly presented and explained, is tricky.

Our data is on a log2 scale. We are then looking at the contrasts specified above, which is, as you say, testing for a linear effect on the within-patient differences. So for each patient we have differences between 2 log-transformed data points, then we average by group and test for trend across 3 groups. Except I don't think the model is doing *exactly* that, is it? I'm struggling to understand and explain that part. So what is the logFC (and the FC) that limma produces, in this scenario? It's not really the log of the ratio of the averages of the change-from-baseline, is it? Some of these changes are negative, so how does that come into a FC?

I want to explain it something like this, but I don't think its quite right:

"This analysis is based on within-patient change-from-baseline values. The means of these changes are compared for linear increase or decrease across dose groups. The FC value for each gene is the average of the ratios of the changes-from-baseline, between successive groups."

Any help with understanding and explaining the FCs would be very much appreciated. Thank you!

Don't think in terms of fold changes. Think in terms of log-fold changes. This allows you to consider positive and negative values without any problems.

Now, from the model that I described earlier, you're getting one value per patient - the log-fold change of visit 1 over baseline. You are then fitting the ~Treatment model to these log-fold changes. So a high-level summary would be that you're looking for changes in the treatment effect with increasing dosage. (Note, these are changes in the effect, not the effects themselves; the former represents a "difference of differences".)

The real problem is with the interpretation of the logFC reported by topTable. This nominally represents the "gradient" of the linear component of the trend  with respect to increasing dosage "levels". However, I'm not sure that this corresponds directly to any value of scientific interest. Interpretation of any gradient requires you to know the x-axis units - for example, increase in expression per day, or per mg of drug - but this is not given in the model. Significant quadratic or cubic components can also interfere with interpretation of the linear component, as I mentioned earlier.

If you're having trouble with interpreting these components, perhaps you should just use an ordinary factor with levels in the right order, as suggested here.