Hi,
I have been trying to find differently express gene in GSE19864 between senescent and growing cells using their GEO raw data.
My first problem is that using reported normalization method (GC-rma) I can't get the same dataset as their normalized GEO dataset. There are 2 experiments in the same dataset wich in a PCA cluster in two groups.So I tried using batch effect / surrogate variables removal from sva R package without much success, my DE gene list is still different from their. Am i missing something ? cf my code is below
gse <- gse_gcrma
expr <- exprs(gse)
pheno <- droplevels(pData(gse))
mod <- model.matrix(~cell_status, data=pheno)
mod0 <- model.matrix(~1, data=pheno)
n.sv <- sva::num.sv(expr,mod,method="leek")
svobj <- sva::sva(as.matrix(expr), mod, mod0, n.sv = n.sv)
modSv = cbind(mod,svobj$sv)
colnames(modSv) <- c(levels(pheno$cell_status), paste("Surrogate",seq_along(1:svobj$n.sv),sep = ""))
contrast_mat <- makeContrasts(Senescent-Growing, levels=modSv)
fit = lmFit(gse,modSv)
fit2 <- contrasts.fit(fit, contrast_mat)
fit2 <- eBayes(fit2)
tT_gcrma_sva <- topTable(fit2, adjust="fdr", sort.by="B", number=50)
PS: which normalization would you recommend for affy array: rma, gc-rma or frma ?

Thank you Bernd for this explanation, you are right I was mistaken for the colnames part, I used ~0+cell_status before when creating the variable mod, but I email jeff asking some question about sva and he recommended that I removed the 0.
I tried your solution and it is working great, now at least some of the DE genes are similar with authors results, but I still don't really understand why it is only working when using an intercept for the SVA part (~cell_status).