Question: DESeq2: how to construct the design matrix?
0
4.6 years ago by
taf0
Germany
taf0 wrote:

Hello,

I have some questions concerning the construction of design matrices within the DESeq2 package.\\
I constructed the following design matrix, which contains 3 different treatments (T1,T2,T3) in duplicates and 4 controls (ctrl).

designMat <- rbind( c(1,0,0,0), c(1,0,0,0), c(0,1,0,0), c(0,1,0,0), c(0,0,1,0), c(0,0,1,0), c(0,0,0,1), c(0,0,0,1), c(0,0,0,1), c(0,0,0,1) )
rownames(designMat) <- 1:10
colnames(designMat) <- c("T1","T2","T3","ctrl")
print(designMat)

I know that it is easily possible to convert it into a "colData" data frame, which DESeq2 "understands".:

colData <- mat.or.vec(nrow(designMat), 1)
colData <- as.data.frame(colData)
colnames(colData) <- "condition"
for(i in 1:nrow(designMat))
{
colData[i,"condition"] <- colnames(designMat)[which(designMat[i,]==1)]
}
print(colData)

And, indeed, if I apply the DESeq work flow to some sampled data and I have a look at the model matrix of the DESeq2 object I get "my" design matrix.



lambda0 <- 1000
T1 <- matrix(rpois(2000,lambda=lambda0), ncol=2)
T2 <- matrix(rpois(2000,lambda=2*lambda0), ncol=2)
T3 <-  matrix(rpois(2000,lambda=4*lambda0), ncol=2)
ctrl <-  matrix(rpois(4000,lambda=0.5*lambda0), ncol=4)

data <- cbind(T1,T2,T3,ctrl)
colnames(data) <- c(rep("T1",2),rep("T2",2),rep("T3",2),rep("ctrl",4))

## create DESeq data sets and check the model matrices
dds1 <- DESeqDataSetFromMatrix(countData = data, colData =colData , design =  ~ condition  )
dds1 <- estimateSizeFactors(dds1)
dds1 <- estimateDispersions(dds1)
dds1 <- nbinomWaldTest(dds1)
print(attr(dds1, "modelMatrix"))

With the R package limma I could deal with that, too. And if I would apply model.matrix() function to my scenario I would come up with

print(model.matrix(~T1 +T2 + T3 +ctrl , designMat))

But in DESeq I get an error message, if I try to apply that formula.:

dds2 <- DESeqDataSetFromMatrix(countData = data, colData =as.data.frame(designMat) , design =  ~ T1 +T2 + T3 + ctrl )

My Problem is, that my design is more complex than that.
It also contains combinations of treatments T1, T2 and T3.
And I would also like to "use" these combinations within the linear model to improve my estimates through a better estimation of variance (like limma() does).

Is there any possibility to pass the design matrix directly to a DESeq2 object?
Is it possible in general to deal with designs like, e.g.:

print(rbind(designMat.f,c(1,1,0,0), c(0,1,1,0)))


in DESeq2?

Best regards,

Franziska

sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[4] GenomicRanges_1.18.1    GenomeInfoDb_1.2.2      IRanges_2.0.0
[7] S4Vectors_0.4.0         BiocGenerics_0.12.0

loaded via a namespace (and not attached):
[1] AnnotationDbi_1.28.1 BBmisc_1.8           BatchJobs_1.5
[4] Biobase_2.26.0       BiocParallel_1.0.0   DBI_0.3.1
[7] Formula_1.1-2        Hmisc_3.14-5         MASS_7.3-35
[10] RColorBrewer_1.0-5   RSQLite_1.0.0        XML_3.98-1.1
[13] XVector_0.6.0        acepack_1.3-3.3      annotate_1.44.0
[16] base64enc_0.1-2      brew_1.0-6           checkmate_1.5.0
[19] cluster_1.15.3       codetools_0.2-9      colorspace_1.2-4
[22] digest_0.6.4         fail_1.2             foreach_1.4.2
[25] foreign_0.8-61       genefilter_1.48.1    geneplotter_1.44.0
[28] ggplot2_1.0.0        grid_3.1.2           gtable_0.1.2
[31] iterators_1.0.7      lattice_0.20-29      latticeExtra_0.6-26
[34] locfit_1.5-9.1       munsell_0.4.2        nnet_7.3-8
[37] plyr_1.8.1           proto_0.3-10         reshape2_1.4
[40] rpart_4.1-8          scales_0.2.4         sendmailR_1.2-1
[43] splines_3.1.2        stringr_0.6.2        survival_2.37-7
[46] tools_3.1.2          xtable_1.7-4

modified 4.6 years ago by Michael Love26k • written 4.6 years ago by taf0
Answer: DESeq2: how to construct the design matrix?
0
4.6 years ago by
Michael Love26k
United States
Michael Love26k wrote:

Hi Franziska,

Maybe this solves your problem most easily: in DESeq2 v1.8 which was released this week, we provided support for users to use a custom model matrix / design matrix, which you can provide to the DESeq() function. See the top NEWS item:

http://bioconductor.org/packages/release/bioc/news/DESeq2/NEWS

You would provide the matrix to the 'full' argument of DESeq(). You need to specify betaPrior=FALSE, though because the routines which calculate the prior use the information in the design formula.

You suggested you want to use this design:

"print(model.matrix(~T1 +T2 + T3 +ctrl , designMat))"

If you wrap designMat in data.frame(), then you get a rank-deficient design matrix. The one you likely would supply to limma also has a + 0, to remove the intercept term. You can use this design with DESeq2 (but you need to turn off the betaPrior as well, because it doesn't make sense to shrink the betas zero in this case -- they do not represent group fold changes anymore.)

But maybe the point above will allow you to work with the design matrices you want to use.