design matrix for complicated experimental design in limma
1
0
Entering edit mode
nashchem • 0
@nashchem-11188
Last seen 7.7 years ago

Hi,

I am working with microarray data and its experimental design is a bit strange. I have 20 samples (expression at 5 time points, at 1 hr with 3 replicates, 2 hr with 4 replicates, 3 hr with 3 replicates, 6hr with 6 replicates and 10hr with 4 replicates).

I want to find the DEGs at each time point vs rest of time points.

For instance, DEGs at 1 hr vs rest of all time points (2, 3, 6, 10)

DEGs at 2 hr vs rest of all time points (1, 3, 6, 10)

DEGs at 3 hr vs rest of all time points (1, 2, 6, 10) etc..

My design matrix is as follows:

xx <- c(paste0(rep("1hr", 3), sep = ":", paste0(rep("rep", 3), 1:3)), paste0(rep("2hr", 4), sep = ":", paste0(rep("rep", 4), 1:4)), paste0(rep("3hr", 3), sep = ":", paste0(rep("rep", 3), 1:3)), paste0(rep("6hr", 6), sep = ":", paste0(rep("rep", 6), 1:6)), paste0(rep("10hr", 4), sep = ":", paste0(rep("rep", 4), 1:4)))

design <- cbind(complete=1, hr1=c(rep(1, 3), rep(0, 17)), hr2=c(rep(0, 3), rep(1, 4), rep(0, 13)), hr3=c(rep(0, 7), rep(1, 3), rep(0, 10)), hr6=c(rep(0, 10), rep(1, 6), rep(0, 4)), hr10=c(rep(0, 16), rep(1, 4)))

rownames(design) <- xx

My design matrix looks like this:

          complete hr1 hr2 hr3 hr6 hr10
1hr:rep1         1   1   0   0   0    0
1hr:rep2         1   1   0   0   0    0
1hr:rep3         1   1   0   0   0    0
2hr:rep1         1   0   1   0   0    0
2hr:rep2         1   0   1   0   0    0
2hr:rep3         1   0   1   0   0    0
2hr:rep4         1   0   1   0   0    0
3hr:rep1         1   0   0   1   0    0
3hr:rep2         1   0   0   1   0    0
3hr:rep3         1   0   0   1   0    0
6hr:rep1         1   0   0   0   1    0
6hr:rep2         1   0   0   0   1    0
6hr:rep3         1   0   0   0   1    0
6hr:rep4         1   0   0   0   1    0
6hr:rep5         1   0   0   0   1    0
6hr:rep6         1   0   0   0   1    0
10hr:rep1        1   0   0   0   0    1
10hr:rep2        1   0   0   0   0    1
10hr:rep3        1   0   0   0   0    1
10hr:rep4        1   0   0   0   0    1

 

Is this design matrix correct as per my DEGs comparison?

Moreover, I am getting this warning message: " Coefficients not estimable: hr10 
Warning message:
Partial NA coefficients for 20214 probe(s)  ".

Can anyone help me what's wrong with my design matrix?

Thank you.

 

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Matrix_1.2-6       matrixStats_0.50.2 ggplot2_2.1.0      dplyr_0.5.0        readxl_0.1.1      
[6] limma_3.28.15     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6      lattice_0.20-33  digest_0.6.9     assertthat_0.1   grid_3.3.1      
 [6] R6_2.1.2         plyr_1.8.4       gtable_0.2.0     DBI_0.4-1        magrittr_1.5    
[11] scales_0.4.0     lazyeval_0.2.0   labeling_0.3     tools_3.3.1      munsell_0.4.3   
[16] colorspace_1.2-6 tibble_1.1      

 

limma microarray designmatrix • 1.3k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 2 hours ago
The city by the bay

Or, more simply:

timepoints <- factor(paste0("hr", rep(c(1,2,3,6,10), c(3,4,3,6,4))))
design <- model.matrix(~0 + timepoints)
colnames(design) <- levels(timepoints)

... and then use contrasts.fit to compare between time points, e.g., for hr1 against the rest:

con <- makeContrasts(hr1 - (hr2+hr3+hr6+hr10)/4, levels=design)

The problem with your original design is that the "complete" column is equal to the row sum of all other columns, which makes it impossible to estimate all of the coefficients. For example, if I had a coefficient of 1 for "complete" and zero for everything else, I would get the same fitted values as if I had a coefficient of zero for "complete" and 1 for everything else. There isn't a single solution, so limma just ignores some of the coefficients (in this case, the hr10 coefficient) to break the linear dependencies.

ADD COMMENT
0
Entering edit mode

Thank you Aaron. You made it clear. Another question, in the contrasts division by 4 is because of comparing 1 group vs 4 groups? 

ADD REPLY
0
Entering edit mode

Yes, you want to compare to the average of the four groups.

ADD REPLY
0
Entering edit mode

Thank you Aaron.

ADD REPLY

Login before adding your answer.

Traffic: 682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6