DESeq2 model matrix error : Levels or combinations of levels without any samples have resulted in column(s) of zeros in the model matrix.
2
0
Entering edit mode
Raj • 0
@raj-9784
Last seen 3.2 years ago
USA

I am using DESeq2 1.10.1. I am facing model matrix error with "Levels or combinations of levels without any samples have resulted in  column(s) of zeros in the model matrix". From the manual I understood that I have to manually edit a model matrix to remove columns with only zeros. The issue I am facing is creating DESeqDataSetFromHTSeqCount without the design because the design produces a model error. To work around I did use simple design to make a dataset. When I use the dataset in the DESeq with model provided to "full" produces following error.

 

error: inv(): matrix appears to be singular
Error: inv(): matrix appears to be singular

Please advise. 

 

deseq2 • 2.0k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 2 hours ago
United States
You will need to provide more information before I can advise you on what to do: What does the column data look like? What design are you trying to use? Best to provide all of the code you used. The sessionInfo()
ADD COMMENT
0
Entering edit mode
Raj • 0
@raj-9784
Last seen 3.2 years ago
USA

Thank you very much for the response. Please find my codes below,

# Twelve different mouse was used to find DEG between four somite stages

mouse.lr  = c("one","two","three","four","five","six","seven","eight","nine","ten","eleven","twelve")
stage.lr    = c(rep("3L",3), rep("4L",3), rep("5L",3), rep("6L",3))

# HTSeq count files

S3L1.fl = "../HTseq_files/v7_3L1.dat.gene"; S3L2.fl = "../HTseq_files/v7_3L2.dat.gene"; S3L3.fl = "../HTseq_files/v7_3L3.dat.gene"
S4L1.fl = "../HTseq_files/v7_4L1.dat.gene"; S4L2.fl = "../HTseq_files/v7_4L2.dat.gene"; S4L3.fl = "../HTseq_files/v7_4L3.dat.gene"
S5L2.fl = "../HTseq_files/v7_5L2.dat.gene"; S5L3.fl = "../HTseq_files/v7_5L3.dat.gene"; S5L4.fl = "../HTseq_files/v7_5L4.dat.gene"
S6L2.fl = "../HTseq_files/v7_6L2.dat.gene"; S6L3.fl = "../HTseq_files/v7_6L3.dat.gene"; S6L4.fl = "../HTseq_files/v7_6L4.dat.gene"

S3R1.fl = "../HTseq_files/v7_3R1.dat.gene"; S3R2.fl = "../HTseq_files/v7_3R2.dat.gene"; S3R3.fl = "../HTseq_files/v7_3R3.dat.gene"
S4R1.fl = "../HTseq_files/v7_4R1.dat.gene"; S4R2.fl = "../HTseq_files/v7_4R2.dat.gene"; S4R3.fl = "../HTseq_files/v7_4R3.dat.gene"
S5R2.fl = "../HTseq_files/v7_5R2.dat.gene"; S5R3.fl = "../HTseq_files/v7_5R3.dat.gene"; S5R4.fl = "../HTseq_files/v7_5R4.dat.gene"
S6R2.fl = "../HTseq_files/v7_6R2.dat.gene"; S6R3.fl = "../HTseq_files/v7_6R3.dat.gene"; S6R4.fl = "../HTseq_files/v7_6R4.dat.gene"

# Data frame

RNASeqDesign2 = data.frame(
sample.names= c("3L1", "3L2", "3L3", "4L1", "4L2", "4L3", "5L2", "5L3", "5L4", "6L2", "6L3", "6L4",
        "3R1", "3R2", "3R3", "4R1", "4R2", "4R3", "5R2", "5R3", "5R4", "6R2", "6R3", "6R4"),
count.files = c(S3L1.fl, S3L2.fl, S3L3.fl, S4L1.fl, S4L2.fl, S4L3.fl, S5L2.fl, S5L3.fl, S5L4.fl, S6L2.fl, S6L3.fl, S6L4.fl,
        S3R1.fl, S3R2.fl, S3R3.fl, S4R1.fl, S4R2.fl, S4R3.fl, S5R2.fl, S5R3.fl, S5R4.fl, S6R2.fl, S6R3.fl, S6R4.fl),
condition   = c(rep("3left",3),  rep("4left",3),  rep("5left",3),  rep("6left",3),
        rep("3right",3), rep("4right",3), rep("5right",3), rep("6right",3)),
mouse       = c(rep(mouse.lr,2)),
side        = c(rep("left",12),rep("right",12)),
stage       = c(rep(stage.lr,2)))

# our design is complex (pasted below) which gives Error in checkFullRank(modelMatrix) : 
# ddsHTSeq2<-DESeqDataSetFromHTSeqCount(RNASeqDesign2, directory="/data/home/praveen/Pn_data/LR_RNA-seq/DEG_deseq2/",design=~ condition + condition:stage +condition:side+condition:mouse)
#  the model matrix is not full rank, so the model cannot be fit as specified.
#  Levels or combinations of levels without any samples have resulted in
#  column(s) of zeros in the model matrix.
# hence I use simple condition so as to make DESeqDataSet and then manually create and edit model.matrix
ddsHTSeq2<-DESeqDataSetFromHTSeqCount(RNASeqDesign2, directory="/data/home/praveen/Pn_data/LR_RNA-seq/DEG_deseq2/",design=~ condition)
colData(ddsHTSeq2)$condition<-factor(colData(ddsHTSeq2)$condition,levels=c("3left","4left","5left","6left", "3right","4right","5right","6right"))

#creating the model matrix
mm = model.matrix(~ condition + condition:mouse +condition:side,RNASeqDesign2)
mm.2 = mm[,colSums(mm^2)!=0]

# This throws the error

dds.all.2<-DESeq(ddsHTSeq2,full=mm.2,betaPrior=FALSE)

 

error: inv(): matrix appears to be singular
Error: inv(): matrix appears to be singular

#============================SESSION INFO===================================

R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] DESeq2_1.10.1              RcppArmadillo_0.6.500.4.0 
 [3] Rcpp_0.12.3                SummarizedExperiment_1.0.2
 [5] Biobase_2.30.0             GenomicRanges_1.22.4      
 [7] GenomeInfoDb_1.6.3         IRanges_2.4.7             
 [9] S4Vectors_0.8.11           BiocGenerics_0.16.1       

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3          
 [4] XVector_0.10.0       futile.options_1.0.0 zlibbioc_1.16.0     
 [7] rpart_4.1-10         RSQLite_1.0.0        annotate_1.48.0     
[10] gtable_0.1.2         lattice_0.20-33      DBI_0.3.1           
[13] gridExtra_2.0.0      genefilter_1.52.1    cluster_2.0.3       
[16] locfit_1.5-9.1       grid_3.2.0           nnet_7.3-12         
[19] AnnotationDbi_1.32.3 XML_3.98-1.3         survival_2.38-3     
[22] BiocParallel_1.4.3   foreign_0.8-66       latticeExtra_0.6-28 
[25] Formula_1.2-1        geneplotter_1.48.0   ggplot2_2.0.0       
[28] lambda.r_1.1.7       Hmisc_3.17-2         scales_0.3.0        
[31] splines_3.2.0        colorspace_1.2-6     xtable_1.8-2        
[34] acepack_1.3-3.3      munsell_0.4.3       

 

 

ADD COMMENT
1
Entering edit mode
Let's step back a bit, and maybe we can avoid the technical error. What biological question are you trying to answer here and controlling for what? Stage and condition look similar, what's the difference?
ADD REPLY
0
Entering edit mode

Biological question is that we are trying to find DEG between left and right side of embryo in each of the 4 somite stages(3, 4, 5 &6). Yes I understand that stage and condition are similar. I can avoid the column stage. Am I right in using DESeqDataSetFromHTSeqCount with simple design and all?. I can finally do something like this to get DEG for each somite stage.

res.3 <- results(dds.all.2, contrast=c("condition","3left","3right"))
res.4 <- results(dds.all.2, contrast=c("condition","4left","4right"))
res.5 <- results(dds.all.2, contrast=c("condition","5left","5right"))
res.6 <- results(dds.all.2, contrast=c("condition","6left","6right"))

 

Please let me know.

EDIT

confounding factors are different mouse and that there are just two sides among stages

 

 

ADD REPLY
0
Entering edit mode

Yes you can use a design of ~condition and use these comparisons to compare for each somite stage.

It's not so easy to control for the mouse effects using DESeq2 in this particular experimental design, if I've figured it out from your code. An easier approach would be to use the duplicateCorrelation() function in the limma RNA-seq pipeline, to inform which samples are from the same mouse.

ADD REPLY
0
Entering edit mode

Thank you so much. 

ADD REPLY

Login before adding your answer.

Traffic: 939 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6