I'm trying to understand how sva works and I have two pieces of codes and I want to be sure to understand the differences between them. I added # with my comment, but please if my terminology is wrong or informations are not correct, please correct it so I could understand better :
Code1: library(limma) mod <- model.matrix(~sex) #model could be explained by sex svafit <- sva(geneExpression,mod) # we suspect some batch effect but we don't know from which variable it comes, so we use sva svaX<-model.matrix(~sex+svafit$sv) # we adjust with the sva news covariates lmfit <- lmFit(geneExpression,svaX)# we fit our original data with adjusted model tt<- lmfit$coef[,2]*sqrt(lmfit$df.residual)/(2*lmfit$sigma) # we calculated tt and then conclude with pvalues for sex variable pvals=2*(1-pt(abs(tt),lmfit$df.residual) Code2 library(limma) mod <- model.matrix(~sex+batch) # we explain our model with sex and some batch effect fit <- lmFit(geneExpression,mod) # we fit our model ses <- fit$stdev.unscaled[,2]*fit$sigma ttest <- fit$coef[,2]/ses # and conclude tt and then pvalue for sex variable pvals <- 2*pt(-abs(ttest),fit$df)
So when it comes to the conclusion : what is the difference between these two codes ? (Warning my answers could really be wrong but this is my understanding) : I would say the first code "remove" batch effect from the model then we can estimate more accurately sex effect. While the second to do not remove it but can allow to estimate the correspondant p-value of the batch effect, in the detrimental of sex effect estimation.
Please correct if wrong (surely :( )
Thanks a lot for your precious help