Question: Which is the best design matrix to compare a timecourse between two conditions?
gravatar for Camila Orellana
14 months ago by
Australian Institute for Bioengineering and Nanotechnology, The University of Queensland
Camila Orellana10 wrote:

Hi, I am a bit confused with what design matrix to use to test for differential expression. I've used two examples of design matrix for time course analyzis with many time points of two conditions, and I get different results when using voom. For design1 I got 884 genes with adjusted p-value <0.05 while for design 2 I got 146 genes with adjusted p-value<0.05. The code I used was: 

> counts <- read.delim("counts.txt", row.names=1, stringsAsFactors=FALSE, header=TRUE)
> y <- DGEList(counts=counts)
> A <- aveLogCPM(y)
> y2 <- y[A>1,]
> y2 <- calcNormFactors(y2)
> library(splines)
> targets <- read.delim("targets.txt",header=TRUE)
> X <- ns(targets$Time, df=5)
> Group <- factor(targets$Group)
> design1 <- model.matrix(~Group+Group:X)
> v1 <- voom(y2, design1, plot=TRUE)
> fit1 <- lmFit(v1, design1)
> fit1 <- eBayes(fit1)
> toptable_voomd1 <-topTable(fit1, coef=ncol(design1), number=30000, adjust="BH")
> write.table(toptable_voomd1, file="DE voom timecourse different trend design1.txt", sep="\t", quote=FALSE)

> design2 <- model.matrix(~Group*X)
> v2 <- voom(y2, design2, plot=TRUE)
> fit2 <- lmFit(v2, design2)
> fit2 <- eBayes(fit2)
> toptable_voomd2 <-topTable(fit2, coef=ncol(design2), number=30000, adjust="BH")
> write.table(toptable_voomd2, file="DE voom timecourse different trend design2.txt", sep="\t", quote=FALSE)


Design1 column names are:

[1] "X.Intercept."    "GroupControl"    "GroupAA.X1"      "GroupControl.X1" "GroupAA.X2"      "GroupControl.X2"
[7] "GroupAA.X3"      "GroupControl.X3" "GroupAA.X4"      "GroupControl.X4" "GroupAA.X5"      "GroupControl.X5"


Design2 column names are:

 [1] "X.Intercept."    "GroupControl"    "X1"              "X2"              "X3"              "X4"             
 [7] "X5"              "GroupControl.X1" "GroupControl.X2" "GroupControl.X3" "GroupControl.X4" "GroupControl.X5"


Is the result in voom showing different trends in time between group1 and group 2?

Which design is more suitable to find the different trends in time?


Below are the top ten genes obtained for each design with voom, and the targets description:

  logFC AveExpr t P.Value adj.P.Val B
XLOC_002474 8.93162 5.442491 21.17136 2.14E-16 5.05E-13 25.75543
XLOC_002448 7.358749 6.00706 16.84291 2.79E-14 3.29E-11 22.03328
XLOC_002473 8.914501 14.10273 15.8502 9.92E-14 7.82E-11 20.58166
XLOC_000487 6.142208 8.722301 15.17895 2.43E-13 1.44E-10 20.18806
XLOC_001445 4.23115 6.300768 13.39314 3.14E-12 1.16E-09 17.94857
XLOC_000486 5.548552 7.818772 13.37004 3.25E-12 1.16E-09 17.86896
XLOC_000492 4.955229 8.269234 13.32996 3.45E-12 1.16E-09 17.82936
XLOC_000496 6.022644 4.911473 13.13415 4.65E-12 1.37E-09 17.39038
XLOC_002003 5.620099 4.919423 12.89501 6.71E-12 1.76E-09 17.18281
XLOC_000480 5.166796 7.037035 12.75695 8.32E-12 1.83E-09 17.00794
  logFC AveExpr t P.Value adj.P.Val B
XLOC_000487 7.316894 8.722301 10.19615 6.39E-10 1.51E-06 11.95797
XLOC_000481 6.975628 7.602637 8.456752 1.88E-08 1.23E-05 9.198301
XLOC_000486 6.112574 7.818772 8.429785 1.99E-08 1.23E-05 9.142058
XLOC_001439 -10.066 1.592145 -8.4085 2.08E-08 1.23E-05 8.521097
XLOC_000492 5.40706 8.269234 8.004614 4.82E-08 1.90E-05 8.324567
XLOC_002474 5.531465 5.442491 8.012472 4.75E-08 1.90E-05 8.283427
XLOC_000482 6.104077 7.207526 7.521789 1.36E-07 4.60E-05 7.464284
XLOC_000480 5.181319 7.037035 7.314652 2.14E-07 6.03E-05 7.049162
XLOC_000484 6.382048 5.818321 7.279154 2.32E-07 6.03E-05 6.992372
XLOC_000483 6.055545 6.707416 7.235958 2.55E-07 6.03E-05 6.909271
FileName Group Time
C1 Control 6.5
C2 Control 7.5
C3 Control 8.75
C4 Control 10
C5 Control 11.2
C6 Control 12
C7 Control 13
C8 Control 18
C9 Control 20
C10 Control 21
C11 Control 22
C12 Control 23
C13 Control 32
C14 Control 42
C15 Control 51.5
AA1 AA 7
AA2 AA 9
AA3 AA 11
AA4 AA 12
AA5 AA 13
AA6 AA 18
AA7 AA 20
AA8 AA 21
AA9 AA 22
AA10 AA 23
AA11 AA 33
AA12 AA 42
AA13 AA 47


Thank you, 


ADD COMMENTlink modified 14 months ago by Gordon Smyth33k • written 14 months ago by Camila Orellana10

Note that by using coef = ncol(design) in topTable you are extracting the last contrast of the design matrix. Which is not the same between the design1 and the design2. You should probably set different contrast matrix to test the same hypothesis with different designs.

ADD REPLYlink modified 14 months ago • written 14 months ago by Lluís R310
gravatar for Aaron Lun
14 months ago by
Aaron Lun19k
Cambridge, United Kingdom
Aaron Lun19k wrote:

Despite having the same name, the last coefficients of your two design matrices represent different things.

Let's have a look at the coefficients of design1:

  • The intercept, representing the log-expression in the AA group.
  • The GroupControl coefficient, representing the log-fold change in Control over AA.
  • The GroupAA:X1-X5 coefficients, representing the spline fit to the AA group against time.
  • The GroupControl:X1-X5 coefficients, representing the spline fit to the Control group against time.

Now, let's have a look at the coefficients of design2:

  • The intercept and GroupControl coefficient, same interpretation as before.
  • The X1-X5 coefficients, representing the spline fit to the AA group against time. This is despite the fact that they do not have "AA" in their column names, which stresses the need to actually look at the design matrix.
  • The GroupControl:X1-X5 coefficients, which now represent the difference in the spline fits between the AA and Control groups. This can be interpreted as the effect of time on the log-fold change between groups.

So, the GroupControlX5 coefficient does not have the same meaning between design1 and design2. In the first design, it represents (part of) the effect of time on expression in the control group, whereas in the second design, it represents the effect of time on the log-fold change between the control and AA groups. Little wonder, then, that you get different results.

That said, it's rarely sensible to drop individual spline coefficients. They are not easily interpreted by themselves, as the piecewise spline functions are constructed from all coefficients. If you want to test for a time effect, you should drop all 5 spline coefficients together.

ADD COMMENTlink modified 14 months ago • written 14 months ago by Aaron Lun19k

Thank you very much for your explanation! 

ADD REPLYlink written 14 months ago by Camila Orellana10
gravatar for Gordon Smyth
14 months ago by
Gordon Smyth33k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth33k wrote:

I guess you want to find genes for which the time trend is different between the two conditions. In that case,

topTable(fit2, coef=8:12)

will give you what you want.

You could conduct the same test using the first design matrix, but you would need to extract contrasts. For this test, the second design matrix is simpler.

ADD COMMENTlink modified 14 months ago • written 14 months ago by Gordon Smyth33k

Yes that is what I want to do, thank you very much!

ADD REPLYlink written 14 months ago by Camila Orellana10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 191 users visited in the last hour