0
3.1 years ago by
nashchem0
nashchem0 wrote:

Hi,

I am working with microarray data and its experimental design is a bit strange. I have 20 samples (expression at 5 time points, at 1 hr with 3 replicates, 2 hr with 4 replicates, 3 hr with 3 replicates, 6hr with 6 replicates and 10hr with 4 replicates).

I want to find the DEGs at each time point vs rest of time points.

For instance, DEGs at 1 hr vs rest of all time points (2, 3, 6, 10)

DEGs at 2 hr vs rest of all time points (1, 3, 6, 10)

DEGs at 3 hr vs rest of all time points (1, 2, 6, 10) etc..

My design matrix is as follows:

xx <- c(paste0(rep("1hr", 3), sep = ":", paste0(rep("rep", 3), 1:3)), paste0(rep("2hr", 4), sep = ":", paste0(rep("rep", 4), 1:4)), paste0(rep("3hr", 3), sep = ":", paste0(rep("rep", 3), 1:3)), paste0(rep("6hr", 6), sep = ":", paste0(rep("rep", 6), 1:6)), paste0(rep("10hr", 4), sep = ":", paste0(rep("rep", 4), 1:4)))

design <- cbind(complete=1, hr1=c(rep(1, 3), rep(0, 17)), hr2=c(rep(0, 3), rep(1, 4), rep(0, 13)), hr3=c(rep(0, 7), rep(1, 3), rep(0, 10)), hr6=c(rep(0, 10), rep(1, 6), rep(0, 4)), hr10=c(rep(0, 16), rep(1, 4)))

rownames(design) <- xx

My design matrix looks like this:

complete hr1 hr2 hr3 hr6 hr10
1hr:rep1         1   1   0   0   0    0
1hr:rep2         1   1   0   0   0    0
1hr:rep3         1   1   0   0   0    0
2hr:rep1         1   0   1   0   0    0
2hr:rep2         1   0   1   0   0    0
2hr:rep3         1   0   1   0   0    0
2hr:rep4         1   0   1   0   0    0
3hr:rep1         1   0   0   1   0    0
3hr:rep2         1   0   0   1   0    0
3hr:rep3         1   0   0   1   0    0
6hr:rep1         1   0   0   0   1    0
6hr:rep2         1   0   0   0   1    0
6hr:rep3         1   0   0   0   1    0
6hr:rep4         1   0   0   0   1    0
6hr:rep5         1   0   0   0   1    0
6hr:rep6         1   0   0   0   1    0
10hr:rep1        1   0   0   0   0    1
10hr:rep2        1   0   0   0   0    1
10hr:rep3        1   0   0   0   0    1
10hr:rep4        1   0   0   0   0    1

Is this design matrix correct as per my DEGs comparison?

Moreover, I am getting this warning message: " Coefficients not estimable: hr10
Warning message:
Partial NA coefficients for 20214 probe(s)  ".

Can anyone help me what's wrong with my design matrix?

Thank you.

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Matrix_1.2-6       matrixStats_0.50.2 ggplot2_2.1.0      dplyr_0.5.0        readxl_0.1.1
[6] limma_3.28.15

loaded via a namespace (and not attached):
[1] Rcpp_0.12.6      lattice_0.20-33  digest_0.6.9     assertthat_0.1   grid_3.3.1
[6] R6_2.1.2         plyr_1.8.4       gtable_0.2.0     DBI_0.4-1        magrittr_1.5
[11] scales_0.4.0     lazyeval_0.2.0   labeling_0.3     tools_3.3.1      munsell_0.4.3
[16] colorspace_1.2-6 tibble_1.1

microarray limma designmatrix • 448 views
modified 3.1 years ago by Aaron Lun24k • written 3.1 years ago by nashchem0
2
3.1 years ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

Or, more simply:

timepoints <- factor(paste0("hr", rep(c(1,2,3,6,10), c(3,4,3,6,4))))
design <- model.matrix(~0 + timepoints)
colnames(design) <- levels(timepoints)

... and then use contrasts.fit to compare between time points, e.g., for hr1 against the rest:

con <- makeContrasts(hr1 - (hr2+hr3+hr6+hr10)/4, levels=design)

The problem with your original design is that the "complete" column is equal to the row sum of all other columns, which makes it impossible to estimate all of the coefficients. For example, if I had a coefficient of 1 for "complete" and zero for everything else, I would get the same fitted values as if I had a coefficient of zero for "complete" and 1 for everything else. There isn't a single solution, so limma just ignores some of the coefficients (in this case, the hr10 coefficient) to break the linear dependencies.

Thank you Aaron. You made it clear. Another question, in the contrasts division by 4 is because of comparing 1 group vs 4 groups?

Yes, you want to compare to the average of the four groups.