Question: design matrix for complicated experimental design in limma
0
gravatar for nashchem
3.1 years ago by
nashchem0
nashchem0 wrote:

Hi,

I am working with microarray data and its experimental design is a bit strange. I have 20 samples (expression at 5 time points, at 1 hr with 3 replicates, 2 hr with 4 replicates, 3 hr with 3 replicates, 6hr with 6 replicates and 10hr with 4 replicates).

I want to find the DEGs at each time point vs rest of time points.

For instance, DEGs at 1 hr vs rest of all time points (2, 3, 6, 10)

DEGs at 2 hr vs rest of all time points (1, 3, 6, 10)

DEGs at 3 hr vs rest of all time points (1, 2, 6, 10) etc..

My design matrix is as follows:

xx <- c(paste0(rep("1hr", 3), sep = ":", paste0(rep("rep", 3), 1:3)), paste0(rep("2hr", 4), sep = ":", paste0(rep("rep", 4), 1:4)), paste0(rep("3hr", 3), sep = ":", paste0(rep("rep", 3), 1:3)), paste0(rep("6hr", 6), sep = ":", paste0(rep("rep", 6), 1:6)), paste0(rep("10hr", 4), sep = ":", paste0(rep("rep", 4), 1:4)))

design <- cbind(complete=1, hr1=c(rep(1, 3), rep(0, 17)), hr2=c(rep(0, 3), rep(1, 4), rep(0, 13)), hr3=c(rep(0, 7), rep(1, 3), rep(0, 10)), hr6=c(rep(0, 10), rep(1, 6), rep(0, 4)), hr10=c(rep(0, 16), rep(1, 4)))

rownames(design) <- xx

My design matrix looks like this:

          complete hr1 hr2 hr3 hr6 hr10
1hr:rep1         1   1   0   0   0    0
1hr:rep2         1   1   0   0   0    0
1hr:rep3         1   1   0   0   0    0
2hr:rep1         1   0   1   0   0    0
2hr:rep2         1   0   1   0   0    0
2hr:rep3         1   0   1   0   0    0
2hr:rep4         1   0   1   0   0    0
3hr:rep1         1   0   0   1   0    0
3hr:rep2         1   0   0   1   0    0
3hr:rep3         1   0   0   1   0    0
6hr:rep1         1   0   0   0   1    0
6hr:rep2         1   0   0   0   1    0
6hr:rep3         1   0   0   0   1    0
6hr:rep4         1   0   0   0   1    0
6hr:rep5         1   0   0   0   1    0
6hr:rep6         1   0   0   0   1    0
10hr:rep1        1   0   0   0   0    1
10hr:rep2        1   0   0   0   0    1
10hr:rep3        1   0   0   0   0    1
10hr:rep4        1   0   0   0   0    1

 

Is this design matrix correct as per my DEGs comparison?

Moreover, I am getting this warning message: " Coefficients not estimable: hr10 
Warning message:
Partial NA coefficients for 20214 probe(s)  ".

Can anyone help me what's wrong with my design matrix?

Thank you.

 

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.2-6       matrixStats_0.50.2 ggplot2_2.1.0      dplyr_0.5.0        readxl_0.1.1      
[6] limma_3.28.15     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6      lattice_0.20-33  digest_0.6.9     assertthat_0.1   grid_3.3.1      
 [6] R6_2.1.2         plyr_1.8.4       gtable_0.2.0     DBI_0.4-1        magrittr_1.5    
[11] scales_0.4.0     lazyeval_0.2.0   labeling_0.3     tools_3.3.1      munsell_0.4.3   
[16] colorspace_1.2-6 tibble_1.1      

 

microarray limma designmatrix • 448 views
ADD COMMENTlink modified 3.1 years ago by Aaron Lun24k • written 3.1 years ago by nashchem0
Answer: design matrix for complicated experimental design in limma
2
gravatar for Aaron Lun
3.1 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

Or, more simply:

timepoints <- factor(paste0("hr", rep(c(1,2,3,6,10), c(3,4,3,6,4))))
design <- model.matrix(~0 + timepoints)
colnames(design) <- levels(timepoints)

... and then use contrasts.fit to compare between time points, e.g., for hr1 against the rest:

con <- makeContrasts(hr1 - (hr2+hr3+hr6+hr10)/4, levels=design)

The problem with your original design is that the "complete" column is equal to the row sum of all other columns, which makes it impossible to estimate all of the coefficients. For example, if I had a coefficient of 1 for "complete" and zero for everything else, I would get the same fitted values as if I had a coefficient of zero for "complete" and 1 for everything else. There isn't a single solution, so limma just ignores some of the coefficients (in this case, the hr10 coefficient) to break the linear dependencies.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Aaron Lun24k

Thank you Aaron. You made it clear. Another question, in the contrasts division by 4 is because of comparing 1 group vs 4 groups? 

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by nashchem0

Yes, you want to compare to the average of the four groups.

ADD REPLYlink written 3.1 years ago by Aaron Lun24k

Thank you Aaron.

ADD REPLYlink written 3.1 years ago by nashchem0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 247 users visited in the last hour