Problem using sva with limma for microarray DE analysis
2
0
Entering edit mode
Stane ▴ 40
@stane-10974
Last seen 3.4 years ago

Hi,

I have been trying to find differently express gene in GSE19864 between senescent and growing cells using their GEO raw data.

My first problem is that using reported normalization method (GC-rma) I can't get the same dataset as their normalized GEO dataset. There are 2 experiments in the same dataset wich in a PCA cluster in two groups.So I tried using batch effect / surrogate variables removal from sva R package without much success, my DE gene list is still different from their. Am i missing something ? cf my code is below

gse <- gse_gcrma
expr <- exprs(gse)
pheno <- droplevels(pData(gse))

mod <- model.matrix(~cell_status, data=pheno)
mod0 <- model.matrix(~1, data=pheno)

n.sv <- sva::num.sv(expr,mod,method="leek")
svobj <- sva::sva(as.matrix(expr), mod, mod0, n.sv = n.sv)
modSv = cbind(mod,svobj$sv) colnames(modSv) <- c(levels(pheno$cell_status), paste("Surrogate",seq_along(1:svobj$n.sv),sep = "")) contrast_mat <- makeContrasts(Senescent-Growing, levels=modSv) fit = lmFit(gse,modSv) fit2 <- contrasts.fit(fit, contrast_mat) fit2 <- eBayes(fit2) tT_gcrma_sva <- topTable(fit2, adjust="fdr", sort.by="B", number=50) PS: which normalization would you recommend for affy array: rma, gc-rma or frma ? sva limma • 1.1k views ADD COMMENT 3 Entering edit mode Bernd Klaus ▴ 590 @bernd-klaus-6281 Last seen 2.7 years ago Germany great!, nice to hear it is working well. the reason for Jeff's recommendation is most probably that SVA downweighs genes according to their association with the experimental groups based on an F-test. For an F-test to be valid the two model you compare need to be nested, which is not the case when using the  model.matrix(~ 0 + cell_status, data=pheno) matrix as it does not have an intercept while mod0 has one. However, once you have the SV, you can switch to the no intercept model, which is more convenient for tests using contrasts of coefficients. ADD COMMENT 2 Entering edit mode Bernd Klaus ▴ 590 @bernd-klaus-6281 Last seen 2.7 years ago Germany Hi Stane, In principle, this analysis is correct. However, I think there is one line that will potentially make you compare fold changes against a base line group. You create a model matrix like so: mod <- model.matrix(~cell_status, data=pheno)  This means that the first column represent the intercept, which later on you incorrectly rename to the first level of your cell_status colnames(modSv) <- c(levels(pheno$cell_status), paste("Surrogate",seq_along(1:svobj$n.sv),sep = ""))  Let's assume your first level is "Growing" then, the intercept will be called "Growing", and if "Senescent" is the second entry, the test will compare the fold change of "Senescent"/"Growing" against the expression level of the "Growing"  group. So you need to change the model matrix input for limma like so: modSV <- cbind(model.matrix(~ 0 + cell_status, data=pheno), svobj$sv)

In order for your column renaming scheme making sense.

Regarding the normalization algorithms: try a tool like arrayQualityMetrics

to assess their quality on your data set at hand.

Reproducibility: Unless the authors explicitly specify which software versions they used and which steps they performed, exactly reproducing a DE list can be very difficult. But some overlap should be there of course, especially concerning enriched GO terms and things like that

Bernd

0
Entering edit mode

Thank you Bernd for this explanation, you are right I was mistaken for the colnames part, I used ~0+cell_status before when creating the variable mod, but I email jeff asking some question about sva and he recommended that I removed the 0.

I tried your solution and it is working great, now at least some of the DE genes are similar with authors results, but I still don't really understand why it is only working when using an intercept for the SVA part (~cell_status).