Hi,
We have an experimental design where 8 chemicals plus control were applied at 4 concentrations with and without estrogen (called E2, either a 1 or a 0). We wish to control for lane (i.e. batch, three total). Given this design, the design statement would look like this:
design = ~lane + E2 + chemical+ +chemical:E2 + E2:Conc_uM + E2:chemical:Conc_uM
Problematically however, the eight chemicals were paired off, and share controls between them (see https://support.bioconductor.org/p/96254/#96722). For example two chemicals, PFOA and Tamoxifen, share controls. This led to some collinearity issues, where we were getting the "model matrix not full rank" error.
To address this, we are trying to implement Michael Love's recommendation to remove the "chemical" term, by instructing the program that there is no difference between PFOA and Tam at concentration=0. I believe we've found a solution that should work.
We've found that running DESeq(dds) works fine, but the command DESeq(ddsColl) (obviously run after collapsing replicates with no errors) yields the following:
Error in hatmatrix %*% t(y) : non-conformable arguments
The column data looks like this:
lane chemical Conc_uM E2 Technical_rep plate PL1A_B02 4 Tam 10 0 1 PL1_A PL1A_B03 4 Tam 1 0 1 PL1_A PL1A_B04 4 Tam 0.1 0 1 PL1_A PL1A_B05 4 Tam 0.001 0 1 PL1_A PL1A_B06 4 control 0 0 1 PL1_A PL1A_B07 4 PFOA 10 0 1 PL1_A PL1A_B08 4 PFOA 1 0 1 PL1_A PL1A_B09 4 PFOA 0.1 0 1 PL1_A PL1A_B10 4 PFOA 0.001 0 1 PL1_A PL1A_C02 4 Tam 10 0 2 PL1_A PL1A_C03 4 Tam 1 0 2 PL1_A PL1A_C04 4 Tam 0.1 0 2 PL1_A PL1A_C05 4 Tam 0.001 0 2 PL1_A PL1A_C06 4 control 0 0 2 PL1_A PL1A_C07 4 PFOA 10 0 2 PL1_A PL1A_C08 4 PFOA 1 0 2 PL1_A PL1A_C09 4 PFOA 0.1 0 2 PL1_A PL1A_C10 4 PFOA 0.001 0 2 PL1_A
Here is the code:
#Addressing the E2:chemical term, when conc is a factor design_temp <- model.matrix(~ lane + E2 + E2:chemical + E2:chemical:Conc_uM, GroupA_colData) zero.cols <- colnames(design_temp)[colSums(design_temp) == 0] design <- design_temp[, !(colnames(design_temp) %in% zero.cols)] colnames(design) PFOA <- design[,grepl("PFOA", colnames(design)) & grepl("E21", colnames(design))] # Remove the "chemical" term design_temp2 <- model.matrix(~ lane + E2 + E2:chemical:Conc_uM, GroupA_colData) #removed zero columns design2 <- design_temp2[,!(grepl("control|Conc_uM0", colnames(design_temp2)))] dds <- DESeqDataSetFromMatrix(countData = countData_matrix, colData = GroupA_colData, design= ~ 1)
I find that this works fine:
#Differential gene expression dds = DESeq(dds, full=design2, betaPrior=FALSE)
However, this yields an error:
ddsColl = DESeq(ddsColl, full=design2, betaPrior=FALSE)
Previous posts to this forum have shown a similar problem but didn't post a solution (http://seqanswers.com/forums/showthread.php?t=60614). Any recommendations or ways to think about this?
Thanks so much,
Rachel