Question

design matrix for complicated experimental design in limma

0

Entering edit mode

nashchem • 0

@nashchem-11188

Last seen 7.7 years ago

Hi,

I am working with microarray data and its experimental design is a bit strange. I have 20 samples (expression at 5 time points, at 1 hr with 3 replicates, 2 hr with 4 replicates, 3 hr with 3 replicates, 6hr with 6 replicates and 10hr with 4 replicates).

I want to find the DEGs at each time point vs rest of time points.

For instance, DEGs at 1 hr vs rest of all time points (2, 3, 6, 10)

DEGs at 2 hr vs rest of all time points (1, 3, 6, 10)

DEGs at 3 hr vs rest of all time points (1, 2, 6, 10) etc..

My design matrix is as follows:

xx <- c(paste0(rep("1hr", 3), sep = ":", paste0(rep("rep", 3), 1:3)), paste0(rep("2hr", 4), sep = ":", paste0(rep("rep", 4), 1:4)), paste0(rep("3hr", 3), sep = ":", paste0(rep("rep", 3), 1:3)), paste0(rep("6hr", 6), sep = ":", paste0(rep("rep", 6), 1:6)), paste0(rep("10hr", 4), sep = ":", paste0(rep("rep", 4), 1:4)))

design <- cbind(complete=1, hr1=c(rep(1, 3), rep(0, 17)), hr2=c(rep(0, 3), rep(1, 4), rep(0, 13)), hr3=c(rep(0, 7), rep(1, 3), rep(0, 10)), hr6=c(rep(0, 10), rep(1, 6), rep(0, 4)), hr10=c(rep(0, 16), rep(1, 4)))

rownames(design) <- xx

My design matrix looks like this:

complete hr1 hr2 hr3 hr6 hr10
1hr:rep1 1 1 0 0 0 0
1hr:rep2 1 1 0 0 0 0
1hr:rep3 1 1 0 0 0 0
2hr:rep1 1 0 1 0 0 0
2hr:rep2 1 0 1 0 0 0
2hr:rep3 1 0 1 0 0 0
2hr:rep4 1 0 1 0 0 0
3hr:rep1 1 0 0 1 0 0
3hr:rep2 1 0 0 1 0 0
3hr:rep3 1 0 0 1 0 0
6hr:rep1 1 0 0 0 1 0
6hr:rep2 1 0 0 0 1 0
6hr:rep3 1 0 0 0 1 0
6hr:rep4 1 0 0 0 1 0
6hr:rep5 1 0 0 0 1 0
6hr:rep6 1 0 0 0 1 0
10hr:rep1 1 0 0 0 0 1
10hr:rep2 1 0 0 0 0 1
10hr:rep3 1 0 0 0 0 1
10hr:rep4 1 0 0 0 0 1

Is this design matrix correct as per my DEGs comparison?

Moreover, I am getting this warning message: " Coefficients not estimable: hr10
Warning message:
Partial NA coefficients for 20214 probe(s) ".

Can anyone help me what's wrong with my design matrix?

Thank you.

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] Matrix_1.2-6 matrixStats_0.50.2 ggplot2_2.1.0 dplyr_0.5.0 readxl_0.1.1
[6] limma_3.28.15

loaded via a namespace (and not attached):
[1] Rcpp_0.12.6 lattice_0.20-33 digest_0.6.9 assertthat_0.1 grid_3.3.1
[6] R6_2.1.2 plyr_1.8.4 gtable_0.2.0 DBI_0.4-1 magrittr_1.5
[11] scales_0.4.0 lazyeval_0.2.0 labeling_0.3 tools_3.3.1 munsell_0.4.3
[16] colorspace_1.2-6 tibble_1.1

limma microarray designmatrix • 1.3k views

ADD COMMENT • link updated 7.7 years ago by Aaron Lun ★ 28k • written 7.7 years ago by nashchem • 0

score 2 · Answer 1 · 2016-07-27

2

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 2 hours ago

The city by the bay

Or, more simply:

timepoints <- factor(paste0("hr", rep(c(1,2,3,6,10), c(3,4,3,6,4))))
design <- model.matrix(~0 + timepoints)
colnames(design) <- levels(timepoints)

... and then use contrasts.fit to compare between time points, e.g., for hr1 against the rest:

con <- makeContrasts(hr1 - (hr2+hr3+hr6+hr10)/4, levels=design)

The problem with your original design is that the "complete" column is equal to the row sum of all other columns, which makes it impossible to estimate all of the coefficients. For example, if I had a coefficient of 1 for "complete" and zero for everything else, I would get the same fitted values as if I had a coefficient of zero for "complete" and 1 for everything else. There isn't a single solution, so limma just ignores some of the coefficients (in this case, the hr10 coefficient) to break the linear dependencies.

ADD COMMENT • link 7.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you Aaron. You made it clear. Another question, in the contrasts division by 4 is because of comparing 1 group vs 4 groups?

ADD REPLY • link 7.7 years ago nashchem • 0

0

Entering edit mode

Yes, you want to compare to the average of the four groups.

ADD REPLY • link 7.7 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank you Aaron.

ADD REPLY • link 7.7 years ago nashchem • 0