1
0
Entering edit mode
nashchem • 0
@nashchem-11188
Last seen 8.0 years ago

Hi,

I am working with microarray data and its experimental design is a bit strange. I have 20 samples (expression at 5 time points, at 1 hr with 3 replicates, 2 hr with 4 replicates, 3 hr with 3 replicates, 6hr with 6 replicates and 10hr with 4 replicates).

I want to find the DEGs at each time point vs rest of time points.

For instance, DEGs at 1 hr vs rest of all time points (2, 3, 6, 10)

DEGs at 2 hr vs rest of all time points (1, 3, 6, 10)

DEGs at 3 hr vs rest of all time points (1, 2, 6, 10) etc..

My design matrix is as follows:

xx <- c(paste0(rep("1hr", 3), sep = ":", paste0(rep("rep", 3), 1:3)), paste0(rep("2hr", 4), sep = ":", paste0(rep("rep", 4), 1:4)), paste0(rep("3hr", 3), sep = ":", paste0(rep("rep", 3), 1:3)), paste0(rep("6hr", 6), sep = ":", paste0(rep("rep", 6), 1:6)), paste0(rep("10hr", 4), sep = ":", paste0(rep("rep", 4), 1:4)))

design <- cbind(complete=1, hr1=c(rep(1, 3), rep(0, 17)), hr2=c(rep(0, 3), rep(1, 4), rep(0, 13)), hr3=c(rep(0, 7), rep(1, 3), rep(0, 10)), hr6=c(rep(0, 10), rep(1, 6), rep(0, 4)), hr10=c(rep(0, 16), rep(1, 4)))

rownames(design) <- xx

My design matrix looks like this:

complete hr1 hr2 hr3 hr6 hr10
1hr:rep1         1   1   0   0   0    0
1hr:rep2         1   1   0   0   0    0
1hr:rep3         1   1   0   0   0    0
2hr:rep1         1   0   1   0   0    0
2hr:rep2         1   0   1   0   0    0
2hr:rep3         1   0   1   0   0    0
2hr:rep4         1   0   1   0   0    0
3hr:rep1         1   0   0   1   0    0
3hr:rep2         1   0   0   1   0    0
3hr:rep3         1   0   0   1   0    0
6hr:rep1         1   0   0   0   1    0
6hr:rep2         1   0   0   0   1    0
6hr:rep3         1   0   0   0   1    0
6hr:rep4         1   0   0   0   1    0
6hr:rep5         1   0   0   0   1    0
6hr:rep6         1   0   0   0   1    0
10hr:rep1        1   0   0   0   0    1
10hr:rep2        1   0   0   0   0    1
10hr:rep3        1   0   0   0   0    1
10hr:rep4        1   0   0   0   0    1

Is this design matrix correct as per my DEGs comparison?

Moreover, I am getting this warning message: " Coefficients not estimable: hr10
Warning message:
Partial NA coefficients for 20214 probe(s)  ".

Can anyone help me what's wrong with my design matrix?

Thank you.

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Matrix_1.2-6       matrixStats_0.50.2 ggplot2_2.1.0      dplyr_0.5.0        readxl_0.1.1
[6] limma_3.28.15

loaded via a namespace (and not attached):
[1] Rcpp_0.12.6      lattice_0.20-33  digest_0.6.9     assertthat_0.1   grid_3.3.1
[6] R6_2.1.2         plyr_1.8.4       gtable_0.2.0     DBI_0.4-1        magrittr_1.5
[11] scales_0.4.0     lazyeval_0.2.0   labeling_0.3     tools_3.3.1      munsell_0.4.3
[16] colorspace_1.2-6 tibble_1.1

limma microarray designmatrix • 1.4k views
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 14 hours ago
The city by the bay

Or, more simply:

timepoints <- factor(paste0("hr", rep(c(1,2,3,6,10), c(3,4,3,6,4))))
design <- model.matrix(~0 + timepoints)
colnames(design) <- levels(timepoints)

... and then use contrasts.fit to compare between time points, e.g., for hr1 against the rest:

con <- makeContrasts(hr1 - (hr2+hr3+hr6+hr10)/4, levels=design)

The problem with your original design is that the "complete" column is equal to the row sum of all other columns, which makes it impossible to estimate all of the coefficients. For example, if I had a coefficient of 1 for "complete" and zero for everything else, I would get the same fitted values as if I had a coefficient of zero for "complete" and 1 for everything else. There isn't a single solution, so limma just ignores some of the coefficients (in this case, the hr10 coefficient) to break the linear dependencies.

0
Entering edit mode

Thank you Aaron. You made it clear. Another question, in the contrasts division by 4 is because of comparing 1 group vs 4 groups?

0
Entering edit mode

Yes, you want to compare to the average of the four groups.

0
Entering edit mode

Thank you Aaron.