Question: surrogate variables from sva independent of model variables?
1
gravatar for pixie
2.4 years ago by
pixie10
pixie10 wrote:

Hi,

I'm trying to get understand the full vs null model specifications in the sva package.
My understanding is the following:

full model = Variable-of-interest + all confounders
null model = all confounders

The resulting SVs should then capture noise that is not captured by my confounders. Is this correct?
What I observe though is that when I run sva, the resulting SVs relate stronger to the confounders, when these were included in the model, vs a scenario, where I didn't include the confounders in the models.

Below a reproducible example using the bladderbatch data. "batch" is in this case a measured confounder (not my variable-of-interest).

library(sva)
library(bladderbatch)
data(bladderdata)
library(pamr)
library(limma)
pheno = pData(bladderEset)
edata = exprs(bladderEset)

mod = model.matrix(~as.factor(cancer), data=pheno)
mod0 = model.matrix(~1,data=pheno)
n.sv = num.sv(edata,mod,method="leek")
n.sv
[1] 2

svobj = sva(edata,mod,mod0,n.sv=n.sv)

Number of significant surrogate variables is:  2 
Iteration (out of 5 ):1  2  3  4  5  

Now, let's see how the derived SVs relate to the batch variable (a covariate not included in the model).

summary(aov(svobj$sv ~ as.factor(pheno$batch)))

 Response 1 :
                       Df  Sum Sq  Mean Sq F value    Pr(>F)    
as.factor(pheno$batch)  4 0.34913 0.087284  6.9734 0.0001426 ***
Residuals              52 0.65087 0.012517                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Response 2 :
                       Df  Sum Sq  Mean Sq F value   Pr(>F)   
as.factor(pheno$batch)  4 0.29016 0.072541  5.3141 0.001153 **
Residuals              52 0.70984 0.013651                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Ok, let's include "batch" as a covariate and see whether the derived SVs are independent of that covariate.

mod = model.matrix(~as.factor(cancer) + as.factor(batch), data=pheno)
mod0 = model.matrix(~as.factor(batch),data=pheno)
svobj = sva(edata,mod,mod0,n.sv=n.sv)

Number of significant surrogate variables is:  2 
Iteration (out of 5 ):1  2  3  4  5  

summary(aov(svobj$sv ~ as.factor(pheno$batch)))

 Response 1 :
                       Df  Sum Sq  Mean Sq F value    Pr(>F)    
as.factor(pheno$batch)  4 0.36353 0.090882   7.425 8.269e-05 ***
Residuals              52 0.63647 0.012240                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Response 2 :
                       Df  Sum Sq  Mean Sq F value    Pr(>F)    
as.factor(pheno$batch)  4 0.42935 0.107339  9.7812 5.631e-06 ***
Residuals              52 0.57065 0.010974                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

I don't get why the association between the SVs and batch is stronger in the second example, in which batch was included in the models. I thought batch should have been "protected". (note: in my own data, the covariate is age; I just used batch here for reproducibility)

 

Thanks, Esther

sva • 440 views
ADD COMMENTlink written 2.4 years ago by pixie10

I have the same problem. Did you manage to find an answer Esther?

ADD REPLYlink written 10 weeks ago by sara.blocquiaux0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 169 users visited in the last hour