Question: Paired analysis setup using edgeR - follow up question
0
9 weeks ago by
candida.vaz30
Singapore
candida.vaz30 wrote:

Dear Bioconductor support team,

This question is a follow up of the same I asked before, and then I carried out some analysis, but need help:-

I have samples from two time points. The samples belong to two categories: Treatment and Placebo as follows: 005v1 Placebo 005v4 Placebo 008v1 Treatment 008v4 Treatment

The ID number 005 etc represent the same sample at two time points. At V1 time-point there is no treatment given, its just that the samples are assigned to be given a treatment (treatment) or not (placebo). At V4 time-point there would be difference in placebo and treatment sample.

Now, if the question is to identify the changes caused due to the treatment, how should I define the design? Should V1 time-point be taken into consideration or not, given that the treatment intervention was given only after the assignment of samples at V1.

It seems like a paired analysis, where the blocking factor is the IDs of the samples, so using the edgeR guide (3.4), I tried to block on the ID (called as Cid in the Rcode) in the following way:-

Creating DGElist

raw <- DGEList(counts=counts, group=Group) raw$samples <- cbind(raw$samples,Cid,Visit) data.frame(Sample=colnames(raw),Group,Cid,Visit)

Filter

plot (table (rowSums(cpm(raw) >= 1) ) ) keep <- (rowSums(cpm(raw) >= 1) >= 17) raw <- raw[keep, , keep.lib.sizes=FALSE]

Normalization

norm <- calcNormFactors(raw)

design <- model.matrix( ~0 + Group + Visit, data=norm$samples) v <- voomWithQualityWeights(norm, design, plot=T) dupCor <- duplicateCorrelation(v, design, block=norm$samples$Cid) v <- voomWithQualityWeights(norm, design, block=norm$samples$Cid, correlation=dupCor$consensus, plot=T) fit <- lmFit(v, design, block=norm$samples$Cid, correlation=dupCor$consensus) cm <- makeContrasts(treat = Grouptreatment-Groupplacebo, levels=design ) fit2 <-contrasts.fit(fit, cm) efit <-eBayes(fit2) Thanks a ton! Appreciate you kind support ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by candida.vaz30 1 1. I would fuse Group and Visit into a single variable and use a one-way layout. Your additive model assumes that the treatment and time are additive, but this won't be true for the most interesting genes. 2. It wouldn't hurt to run duplicateCorrelation again after the second voomWithQualityWeights call. 3. Your filtering looks odd to me. Why keep 17? What happens to a gene that is not expressed in any group except the treated condition at time point 4 (which presumably would only have non-zero CPMs in a subset of the 17)? ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Aaron Lun23k Thanks a ton Aaron! 1. To fuse into a single variable can I use: Group <- factor(paste(Categ, Visit, sep=".")) raw <- DGEList(counts=counts, group=Group) raw$samples <- cbind(raw$samples,Cid,Visit) (Where Categ =placebo/treatment & Visit=two time points) 2. For filtering (you are right, i'm now using 8 which is the number of samples in treated category)...is this ok? keep <- (rowSums(cpm(raw) >= 1) >= 8) raw <- raw[keep, , keep.lib.sizes=FALSE] 3. Design and DGE calculation design <- model.matrix( ~0 + Group, data=norm$samples) # one way layout

v <- voomWithQualityWeights(norm, design, plot=T)

dupCor <- duplicateCorrelation(v, design, block=norm$samples$Cid)

v <- voomWithQualityWeights(norm, design, block=norm$samples$Cid, correlation=dupCor$consensus, plot=T) dupCor <- duplicateCorrelation(v, design, block=norm$samples$Cid) # second duplicateCorrelation fit <- lmFit(v, design, block=norm$samples$Cid, correlation=dupCor$consensus)

cm <- makeContrasts(treat = Grouptreatment.MV4-Groupplacebo.MV4, levels=design ) # contrast i'm interested in

fit2 <-contrasts.fit(fit, cm)

efit <-eBayes(fit2)

One important fact I would like to stress on, in the experimental design is that some samples remain placebo (9X2) at both time points and some (8X2) get treatment at 2nd time-point. Hope the model is sufficient to get DE genes (Treatment vs Placebo).

Appreciate your time and help on this, getting to learn a lot!

1

If you want to identify differences between treatment and placebo, the most relevant contrast seems to be:

makeContrasts((Grouptreatment.MV4 - Grouptreatment.MV1)
- (Groupplacebo.MV4 - Groupplacebo.MV1), levels=design)


This corresponds to the contrast for edgeR that I proposed in my original answer. It should be better than your cm, as a within-patient comparison avoids the penalty that duplicateCorrelation must impose (i.e., an increase in the standard error, and thus a decrease in power) when comparing across patients.

Dear Aaron,

Thanks a ton for your suggestions!

Dear Aaron,

Just to follow up on the same analysis: was wondering if one could run the double voom and double duplicate correlation on a model that includes adjusting for confounding factors. For example if Age is a confounding factor then the design model will be:

design <- model.matrix( ~0 + Group + Age, data=norm$samples) Can I now run voom and duplicate correlation on this adjusted design model? v <- voomWithQualityWeights(norm, design, plot=T) dupCor <- duplicateCorrelation(v, design, block=norm$samples$Cid) v <- voomWithQualityWeights(norm, design, block=norm$samples$Cid, correlation=dupCor$consensus, plot=T)

dupCor <- duplicateCorrelation(v, design, block=norm$samples$Cid) # second duplicateCorrelation

Thanks for your help on this!

1

That's fine, providing Age and Group are not confounded. Age will probably be confounded with Cid (unless you've recorded that the patient changes their age between visits), but that's not a problem when you block on patient ID in duplicateCorrelation(). You'll still probably have patient-specific effects that are independent of age, so blocking on Cid will still be necessary.

Age of the patients don't change much except by a few months during the two visits. My concern is the effect of "Age" itself on the outcome. I am interested in removing the effect of Age or any such factor and seeing only the effect of Treatment on the outcome. So are the steps mentioned above good enough to do so?

Appreciate your time and help on this!

Blocking on Age will certainly mitigate the effect of age. Whether the effect is removed depends on whether the effect of age on log-expression is linear. In reality, it probably isn't, so if you want to protect against non-linearity, you'd need to use a spline with respect to age (see the user's guide for details).

Another potential issue would be the additivity assumption, which you're also making when you do + Age in model.matrix(). In other words, you're assuming that the effect of age just "adds onto" other effects, which is also unlikely to be true in complex biological systems. You cannot avoid this assumption while still being able to perform your comparisons of interest.

So, it's unlikely that you can fully remove the effect of age in real data. The linearity and additivity assumptions are strong and only the former is really fixable. Your current approach is probably good enough; a spline would avoid the linearity assumption (assuming you have enough samples to tolerate the loss of residual d.f.); but if you want to really get rid of the effect of age, make sure you only collect samples from patients of the same age!

Thanks a lot Aaron for the advice.

I never came across the spline concept in the limma/edgeR guide. Could you please let me know where can I find it?

I will be using the additivity assumption + Age in model.matrix() (understand that it may not be true in complex biological systems.

Can I just confirm with you if its appropriate to further use double voom and double duplicate correlation on this additive model.

Thanks!

1

Yes, double use of voom and duplicateCorrelation is fine.