Search
Question: Which is the best design matrix to compare a timecourse between two conditions?
1
22 months ago by
Australian Institute for Bioengineering and Nanotechnology, The University of Queensland
Camila Orellana10 wrote:

Hi, I am a bit confused with what design matrix to use to test for differential expression. I've used two examples of design matrix for time course analyzis with many time points of two conditions, and I get different results when using voom. For design1 I got 884 genes with adjusted p-value <0.05 while for design 2 I got 146 genes with adjusted p-value<0.05. The code I used was:

> y <- DGEList(counts=counts)
> A <- aveLogCPM(y)
> y2 <- y[A>1,]
> y2 <- calcNormFactors(y2)
> library(splines)
> X <- ns(targets$Time, df=5) > Group <- factor(targets$Group)
> design1 <- model.matrix(~Group+Group:X)
> v1 <- voom(y2, design1, plot=TRUE)
> fit1 <- lmFit(v1, design1)
> fit1 <- eBayes(fit1)
> toptable_voomd1 <-topTable(fit1, coef=ncol(design1), number=30000, adjust="BH")
> write.table(toptable_voomd1, file="DE voom timecourse different trend design1.txt", sep="\t", quote=FALSE)

> design2 <- model.matrix(~Group*X)
> v2 <- voom(y2, design2, plot=TRUE)
> fit2 <- lmFit(v2, design2)
> fit2 <- eBayes(fit2)
> toptable_voomd2 <-topTable(fit2, coef=ncol(design2), number=30000, adjust="BH")
> write.table(toptable_voomd2, file="DE voom timecourse different trend design2.txt", sep="\t", quote=FALSE)

Design1 column names are:

[1] "X.Intercept."    "GroupControl"    "GroupAA.X1"      "GroupControl.X1" "GroupAA.X2"      "GroupControl.X2"
[7] "GroupAA.X3"      "GroupControl.X3" "GroupAA.X4"      "GroupControl.X4" "GroupAA.X5"      "GroupControl.X5"

Design2 column names are:

[1] "X.Intercept."    "GroupControl"    "X1"              "X2"              "X3"              "X4"
[7] "X5"              "GroupControl.X1" "GroupControl.X2" "GroupControl.X3" "GroupControl.X4" "GroupControl.X5"

Is the result in voom showing different trends in time between group1 and group 2?

Which design is more suitable to find the different trends in time?

Below are the top ten genes obtained for each design with voom, and the targets description:

<caption>Design1</caption>
 logFC AveExpr t P.Value adj.P.Val B XLOC_002474 8.93162 5.442491 21.17136 2.14E-16 5.05E-13 25.75543 XLOC_002448 7.358749 6.00706 16.84291 2.79E-14 3.29E-11 22.03328 XLOC_002473 8.914501 14.10273 15.8502 9.92E-14 7.82E-11 20.58166 XLOC_000487 6.142208 8.722301 15.17895 2.43E-13 1.44E-10 20.18806 XLOC_001445 4.23115 6.300768 13.39314 3.14E-12 1.16E-09 17.94857 XLOC_000486 5.548552 7.818772 13.37004 3.25E-12 1.16E-09 17.86896 XLOC_000492 4.955229 8.269234 13.32996 3.45E-12 1.16E-09 17.82936 XLOC_000496 6.022644 4.911473 13.13415 4.65E-12 1.37E-09 17.39038 XLOC_002003 5.620099 4.919423 12.89501 6.71E-12 1.76E-09 17.18281 XLOC_000480 5.166796 7.037035 12.75695 8.32E-12 1.83E-09 17.00794
<caption>Design2</caption>
 logFC AveExpr t P.Value adj.P.Val B XLOC_000487 7.316894 8.722301 10.19615 6.39E-10 1.51E-06 11.95797 XLOC_000481 6.975628 7.602637 8.456752 1.88E-08 1.23E-05 9.198301 XLOC_000486 6.112574 7.818772 8.429785 1.99E-08 1.23E-05 9.142058 XLOC_001439 -10.066 1.592145 -8.4085 2.08E-08 1.23E-05 8.521097 XLOC_000492 5.40706 8.269234 8.004614 4.82E-08 1.90E-05 8.324567 XLOC_002474 5.531465 5.442491 8.012472 4.75E-08 1.90E-05 8.283427 XLOC_000482 6.104077 7.207526 7.521789 1.36E-07 4.60E-05 7.464284 XLOC_000480 5.181319 7.037035 7.314652 2.14E-07 6.03E-05 7.049162 XLOC_000484 6.382048 5.818321 7.279154 2.32E-07 6.03E-05 6.992372 XLOC_000483 6.055545 6.707416 7.235958 2.55E-07 6.03E-05 6.909271
<caption>Targets</caption>
 FileName Group Time C1 Control 6.5 C2 Control 7.5 C3 Control 8.75 C4 Control 10 C5 Control 11.2 C6 Control 12 C7 Control 13 C8 Control 18 C9 Control 20 C10 Control 21 C11 Control 22 C12 Control 23 C13 Control 32 C14 Control 42 C15 Control 51.5 AA1 AA 7 AA2 AA 9 AA3 AA 11 AA4 AA 12 AA5 AA 13 AA6 AA 18 AA7 AA 20 AA8 AA 21 AA9 AA 22 AA10 AA 23 AA11 AA 33 AA12 AA 42 AA13 AA 47

Thank you,

Camila

modified 22 months ago by Gordon Smyth35k • written 22 months ago by Camila Orellana10
1

Note that by using coef = ncol(design) in topTable you are extracting the last contrast of the design matrix. Which is not the same between the design1 and the design2. You should probably set different contrast matrix to test the same hypothesis with different designs.

4
22 months ago by
Aaron Lun21k
Cambridge, United Kingdom
Aaron Lun21k wrote:

Despite having the same name, the last coefficients of your two design matrices represent different things.

Let's have a look at the coefficients of design1:

• The intercept, representing the log-expression in the AA group.
• The GroupControl coefficient, representing the log-fold change in Control over AA.
• The GroupAA:X1-X5 coefficients, representing the spline fit to the AA group against time.
• The GroupControl:X1-X5 coefficients, representing the spline fit to the Control group against time.

Now, let's have a look at the coefficients of design2:

• The intercept and GroupControl coefficient, same interpretation as before.
• The X1-X5 coefficients, representing the spline fit to the AA group against time. This is despite the fact that they do not have "AA" in their column names, which stresses the need to actually look at the design matrix.
• The GroupControl:X1-X5 coefficients, which now represent the difference in the spline fits between the AA and Control groups. This can be interpreted as the effect of time on the log-fold change between groups.

So, the GroupControlX5 coefficient does not have the same meaning between design1 and design2. In the first design, it represents (part of) the effect of time on expression in the control group, whereas in the second design, it represents the effect of time on the log-fold change between the control and AA groups. Little wonder, then, that you get different results.

That said, it's rarely sensible to drop individual spline coefficients. They are not easily interpreted by themselves, as the piecewise spline functions are constructed from all coefficients. If you want to test for a time effect, you should drop all 5 spline coefficients together.

Thank you very much for your explanation!

1
22 months ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

I guess you want to find genes for which the time trend is different between the two conditions. In that case,

topTable(fit2, coef=8:12)

will give you what you want.

You could conduct the same test using the first design matrix, but you would need to extract contrasts. For this test, the second design matrix is simpler.