Hello,

I had a question about computationally singular matrices in DESeq and surrogate variables. After including all the SVs into my formula where I am analyzing paired samples (before/after treatment within a sample), I use the DESeq function and am returned with the error that my matrix is computationally singular. When I reduce the number of surrogate variables to 3, based on the n.sv() function using the "be" method, I no longer receive this error and am able to run the analysis. I want to know why the reduction of the number of SVs included in my design formula allows the dataset to run?

dds <- DESeqDataSetFromMatrix(countData = countdata, colData = phenotype, design = ~ study_id + SV1 + SV2 + SV3 + SV4 + SV5 + SV6 + SV7 + SV8 + SV9 + SV10 + SV11 + treatment)

I'm not getting a single output. it's a matrix of the pairwise study id's and their respective correlations. because i'm looking at before/after treatment within the same sample, these pairwise correlations are 1. Not all of them, just the ones within the same study id

I was expecting you would get a matrix of pairwise correlations between all the covariates in the design. There shouldn't be any covariates with correlation 1. Can you give an example?

(Intercept) 1 NA NA NA NA NA NA NA NA NA study

id121 NA 1.000000000 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.0303030303 studyid123 NA -0.030303030 1.00000000 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.0303030303 studyid124 NA -0.030303030 -0.03030303 1.00000000 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.0303030303 studyid125 NA -0.030303030 -0.03030303 -0.03030303 1.00000000 -0.03030303 -0.03030303 -0.03030303 -0.03030303 -0.0303030303 Warning message: In cor(model.matrix(design(dds), colData(dds))) : the standard deviation is zerothis is a portion of the matrix, and this is after the surrogate variables have been reduced to 3 (after using the n.sv(method = "be") in svaseq). i'm confused as to why this is happening. i suspect i may have set up my design formula incorrectly?

Oh, the diagonal is always 1, that's the correlation of a variable with itself. The concern would be off-diagonal high correlations.

So there is no concern with the output of this correlation? Were the 11 SV's just extremely correlated with the variation across the samples? Was it overcorrecting?

I don't know, but I would guess that some of those were too highly correlated with some other covariates.

thank you - it does seem that the first couple of SVs were highly correlated with cell type proportions.