SVA method for estimating surrogate variables of RNA Seq
Hiya,

I have 105 samples of RNAseq data - I have two expression files, the (i) the gene counts matrix (rows genes, columns samples) (ii) and vst normalised matrix from DESeq2 (rows genes, columns samples)

pheno_LCA$Cat <- factor(pheno_LCA$Cat, levels = c("A","B","C","D"))
full_mod = model.matrix(~as.factor(Cat), data=pheno_LCA)
null_mod = model.matrix(~1, data=pheno_LCA)

I ran the following four options for (i) and for (ii)

svaobj = svaseq(LCA,full_mod,null_mod,n.sv=NULL,numSVmethod="be",B=20)
svaobj = svaseq(LCA,full_mod,null_mod,n.sv=NULL,numSVmethod="leek")
svaobj = sva(LCA,full_mod,null_mod,n.sv=NULL,numSVmethod="be",B=20)
svaobj = sva(LCA,full_mod,null_mod,n.sv=NULL,numSVmethod="leek")

RESULTS:

A/ sva function on counts data: (i) Method= Leek, Number of significant surrogate variables is: 101 Iteration (out of 5 ):Error in density.default(x, adjust = adj) : 'x' contains missing values In addition: Warning message: In pf(fstats, df1 = (df1 - df0), df2 = (n - df1)) : NaNs produced

(ii) Method = Be, Number of significant surrogate variables is: 1

B/ sva function on VST DESeq2 output data: (i) Method= Leek, Number of significant surrogate variables is: 1

(ii) Method = Be, Number of significant surrogate variables is: 15 A/ svaSeq function on counts data: (i) Method= Leek, Number of significant surrogate variables is: 2

(ii) Method = Be, Number of significant surrogate variables is: 8

B/ svaSeq function on VST DESeq2 output data: (i) Method= Leek, No significant surrogate variables

(ii) Method = Be, Number of significant surrogate variables is: 18

My questions are:

1. Am I correct in thinking that normalised counts (i.e. VST from DESeq2) should be used with svaSeq for RNAseq gene expression data - and therefore B/ (ii) is the correct output to take forward

2. What is the error of A/ (i) - I can't seem to find a reason for this, there are no rows with sums of zero for counts?

3. Is there anywhere, where the methods "keep" and "be" are described/contrasted please - I couldn't see this in the manual? How does one choose what "B" should be, I have just used 20 as was used in the example I found.

best wishes, Bex

