Use surrogate variables on DESeq2
2
0
Entering edit mode
aristotele_m ▴ 40
@aristotele_m-6821
Last seen 6.9 years ago
Italy

Hi!

I have this problem. I want to model a surrogate variable on my data. I have two  two type of tumors but the PCA can not  show the separation between that.

I try to use sva as describe in vignette  but I found this error:

 

dat <- counts(dds, normalized=TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx,]
mod <- model.matrix(~ condition, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
n.sv = num.sv(dat, mod0,method="leek")
svseq <- svaseq(dat, mod, mod0, n.sv=3)
png("Sva1.png")
plot(svseq$sv[,1], svseq$sv[,2], col=dds$condition, pch=16)
dev.off()
n.sv
svseq <- svaseq(dat, mod, mod0, n.sv=8)

This is the error and this is the previous image:

> dat <- counts(dds, normalized=TRUE)
> idx <- rowMeans(dat) > 1
> dat <- dat[idx,]
> mod <- model.matrix(~ condition, colData(dds))
> mod0 <- model.matrix(~ 1, colData(dds))
> n.sv = num.sv(dat, mod0,method="leek")
> svseq <- svaseq(dat, mod, mod0, n.sv=3)
Number of significant surrogate variables is:  3
Iteration (out of 5 ):1  2  3  4  5  
> png("Sva1.png")
> plot(svseq$sv[,1], svseq$sv[,2], col=dds$condition, pch=16)
> dev.off()
RStudioGD
        2
> n.sv
[1] 8
> svseq <- svaseq(dat, mod, mod0, n.sv=8)
Number of significant surrogate variables is:  8
Iteration (out of 5 ):
Error in solve.default(t(mod) %*% mod) :
  system is computationally singular: reciprocal condition number = 5.89573e-19

 session_info(
+ )
Session info ----------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.2.1 (2015-06-18)
 system   i686, linux-gnu             
 ui       RStudio (0.99.441)          
 language en_US                       
 collate  en_US.UTF-8                 
 tz       <NA>                        

Packages --------------------------------------------------------------------------------------------------------------------------
 package        * version     date       source                           
 acepack          1.3-3.3     2014-11-24 CRAN (R 3.2.0)                   
 annotate       * 1.46.1      2015-08-06 Bioconductor                     
 AnnotationDbi  * 1.30.1      2015-06-05 Bioconductor                     
 Biobase        * 2.28.0      2015-06-05 Bioconductor                     
 BiocGenerics   * 0.14.0      2015-06-05 Bioconductor                     
 BiocParallel     1.2.20      2015-08-17 Bioconductor                     
 BioJob         * 0.1         2015-08-19 local                            
 biomaRt        * 2.24.0      2015-06-05 Bioconductor                     
 bitops           1.0-6       2013-08-17 CRAN (R 3.0.2)                   
 calibrate      * 1.7.2       2013-09-10 CRAN (R 3.2.1)                   
 caTools          1.17.1      2014-09-10 CRAN (R 3.2.0)                   
 cluster          2.0.3       2015-07-21 CRAN (R 3.2.1)                   
 colorspace       1.2-6       2015-03-11 CRAN (R 3.1.3)                   
 curl             0.9.2       2015-08-08 CRAN (R 3.2.1)                   
 DBI            * 0.3.1       2014-09-24 CRAN (R 3.1.1)                   
 DESeq2         * 1.8.1       2015-06-05 Bioconductor                     
 devtools       * 1.8.0       2015-05-09 CRAN (R 3.2.1)                   
 digest           0.6.8       2014-12-31 CRAN (R 3.1.3)                   
 edgeR          * 3.10.2      2015-08-18 Bioconductor                     
 foreign          0.8-66      2015-08-19 CRAN (R 3.2.1)                   
 Formula          1.2-1       2015-04-07 CRAN (R 3.2.0)                   
 futile.logger    1.4.1       2015-04-20 CRAN (R 3.2.0)                   
 futile.options   1.0.0       2010-04-06 CRAN (R 3.2.0)                   
 gdata            2.17.0      2015-07-04 CRAN (R 3.2.1)                   
 genefilter     * 1.50.0      2015-06-05 Bioconductor                     
 geneplotter      1.46.0      2015-06-05 Bioconductor                     
 GenomeInfoDb   * 1.4.2       2015-08-17 Bioconductor                     
 GenomicRanges  * 1.20.5      2015-06-22 Bioconductor                     
 ggplot2        * 1.0.1       2015-03-17 CRAN (R 3.1.3)                   
 git2r            0.11.0      2015-08-12 CRAN (R 3.2.1)                   
 GO.db          * 3.1.2       2015-06-05 Bioconductor                     
 gplots         * 2.17.0      2015-05-02 CRAN (R 3.2.0)                   
 gridExtra        2.0.0       2015-07-14 CRAN (R 3.2.1)                   
 gtable           0.1.2       2012-12-05 CRAN (R 3.1.0)                   
 gtools           3.5.0       2015-05-29 CRAN (R 3.2.0)                   
 Hmisc            3.16-0      2015-04-30 CRAN (R 3.2.0)                   
 IRanges        * 2.2.7       2015-08-17 Bioconductor                     
 KernSmooth       2.23-15     2015-06-29 CRAN (R 3.2.1)                   
 labeling         0.3         2014-08-23 CRAN (R 3.1.1)                   
 lambda.r         1.1.7       2015-03-20 CRAN (R 3.2.0)                   
 lattice          0.20-33     2015-07-14 CRAN (R 3.2.1)                   
 latticeExtra     0.6-26      2013-08-15 CRAN (R 3.2.0)                   
 limma          * 3.24.15     2015-08-17 Bioconductor                     
 locfit           1.5-9.1     2013-04-20 CRAN (R 3.1.0)                   
 magrittr         1.5         2014-11-22 CRAN (R 3.1.3)                   
 MASS           * 7.3-43      2015-07-16 CRAN (R 3.2.1)                   
 Matrix           1.2-2       2015-07-08 CRAN (R 3.2.1)                   
 memoise          0.2.1       2014-04-22 CRAN (R 3.2.1)                   
 mgcv           * 1.8-7       2015-07-23 CRAN (R 3.2.1)                   
 munsell          0.4.2       2013-07-11 CRAN (R 3.1.0)                   
 nlme           * 3.1-122     2015-08-19 CRAN (R 3.2.1)                   
 nnet             7.3-10      2015-06-29 CRAN (R 3.2.1)                   
 Nozzle.R1      * 1.1-1       2015-07-31 local                            
 org.Hs.eg.db   * 3.1.2       2015-06-05 Bioconductor                     
 pheatmap       * 1.0.7       2015-07-02 CRAN (R 3.2.1)                   
 plyr             1.8.3       2015-06-12 CRAN (R 3.2.0)                   
 proto            0.3-10      2012-12-22 CRAN (R 3.1.0)                   
 rafalib        * 1.0.0       2015-08-19 Github (ririzarr/rafalib@99a13e7)
 RColorBrewer   * 1.1-2       2014-12-07 CRAN (R 3.1.2)                   
 Rcpp           * 0.12.0      2015-07-25 CRAN (R 3.2.1)                   
 RcppArmadillo  * 0.5.400.2.0 2015-08-17 CRAN (R 3.2.1)                   
 RCurl            1.95-4.7    2015-06-30 CRAN (R 3.2.1)                   
 reshape2         1.4.1       2014-12-06 CRAN (R 3.1.2)                   
 roxygen2       * 4.1.1       2015-04-15 CRAN (R 3.2.1)                   
 rpart            4.1-10      2015-06-29 CRAN (R 3.2.1)                   
 RSQLite        * 1.0.0       2014-10-25 CRAN (R 3.1.1)                   
 rversions        1.0.2       2015-07-13 CRAN (R 3.2.1)                   
 S4Vectors      * 0.6.3       2015-08-06 Bioconductor                     
 scales           0.2.5       2015-06-12 CRAN (R 3.2.0)                   
 stringi          0.5-5       2015-06-29 CRAN (R 3.2.1)                   
 stringr          1.0.0       2015-04-30 CRAN (R 3.1.3)                   
 survival         2.38-3      2015-07-02 CRAN (R 3.2.1)                   
 sva            * 3.14.0      2015-06-22 Bioconductor                     
 XML            * 3.98-1.3    2015-06-30 CRAN (R 3.2.1)                   
 xml2             0.1.1       2015-06-02 CRAN (R 3.2.1)                   
 xtable           1.7-4       2014-09-12 CRAN (R 3.1.1)                   
 XVector          0.8.0       2015-06-05 Bioconductor 

Any idea what can I do?

deseq2 sva deseq differential gene expression • 3.0k views
ADD COMMENT
0
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.1 years ago
United States

It looks like when you use the estimated number of surrogate variables (8) rather than the number you tried originally (3) the model matrix becomes singular. This often happens for one of two reasons:

(1) The sample size is relatively small so you end up with too many surrogate variables leading to a computationally singular matrix. This appears likely to be the case because based on the PCA plot you have something like a sample size of 9 and 8 surrogate variables are getting estimated. Here you can either just go with a smaller number (like the 3 you chose or some other number based on "elbow" in the scree plot) or you could try method="be" which is often a bit more conservative. 

(2) This probably isn't the reason for your case, but when unmodeled confounders like batch and the tumor groups are highly confounded you can often see that the model matrix is poorly conditioned leading to that error when the surrogate variables are estimated. 

 

Best

Jeff

ADD COMMENT
0
Entering edit mode
szenitha ▴ 20
@szenitha-10863
Last seen 7.4 years ago

When  the model matrix becomes singular how do we know which of the two reasons to consider? Also should we always consider all the estimted surrogates or the number of surrogates suggested when we use method=be?is there a way to know which surrogates explain most of the variation?

ADD COMMENT

Login before adding your answer.

Traffic: 609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6