Question

Problem using sva with limma for microarray DE analysis

0

Entering edit mode

Stane ▴ 40

@stane-10974

Last seen 6.3 years ago

Hi,

I have been trying to find differently express gene in GSE19864 between senescent and growing cells using their GEO raw data.

My first problem is that using reported normalization method (GC-rma) I can't get the same dataset as their normalized GEO dataset. There are 2 experiments in the same dataset wich in a PCA cluster in two groups.So I tried using batch effect / surrogate variables removal from sva R package without much success, my DE gene list is still different from their. Am i missing something ? cf my code is below

gse <- gse_gcrma
expr <- exprs(gse)
pheno <- droplevels(pData(gse))

mod <- model.matrix(~cell_status, data=pheno) 
mod0 <- model.matrix(~1, data=pheno)

n.sv <- sva::num.sv(expr,mod,method="leek")
svobj <- sva::sva(as.matrix(expr), mod, mod0, n.sv = n.sv)
modSv = cbind(mod,svobj$sv)

colnames(modSv) <- c(levels(pheno$cell_status), paste("Surrogate",seq_along(1:svobj$n.sv),sep = "")) 
contrast_mat <- makeContrasts(Senescent-Growing, levels=modSv)

fit = lmFit(gse,modSv)
fit2 <- contrasts.fit(fit, contrast_mat)
fit2 <- eBayes(fit2)

tT_gcrma_sva <- topTable(fit2, adjust="fdr", sort.by="B", number=50)

PS: which normalization would you recommend for affy array: rma, gc-rma or frma ?

sva limma • 2.0k views

ADD COMMENT • link updated 7.5 years ago by Bernd Klaus ▴ 610 • written 7.6 years ago by Stane ▴ 40

score 3 · Accepted Answer · 2016-10-12

great!, nice to hear it is working well.

the reason for Jeff's recommendation is most probably that SVA downweighs genes according to their association with the experimental groups based on an F-test.

For an F-test to be valid the two model you compare need to be nested, which is not the case when using the model.matrix(~ 0 + cell_status, data=pheno) matrix as it does not have an intercept while mod0 has one.

However, once you have the SV, you can switch to the no intercept model, which is more convenient for tests using contrasts of coefficients.

score 2 · Accepted Answer · 2016-10-10

Hi Stane,

In principle, this analysis is correct. However, I think there is one line that will potentially make you compare fold changes against a base line group. You create a model matrix like so:

mod <- model.matrix(~cell_status, data=pheno)

This means that the first column represent the intercept, which later on you incorrectly rename to the first level of your cell_status

colnames(modSv) <- c(levels(pheno$cell_status), paste("Surrogate",seq_along(1:svobj$n.sv),sep = ""))

Let's assume your first level is "Growing" then, the intercept will be called "Growing", and if "Senescent" is the second entry, the test will compare the fold change of "Senescent"/"Growing" against the expression level of the "Growing" group. So you need to change the model matrix input for limma like so:

modSV <- cbind(model.matrix(~ 0 + cell_status, data=pheno), svobj$sv)

In order for your column renaming scheme making sense.

Regarding the normalization algorithms: try a tool like arrayQualityMetrics

to assess their quality on your data set at hand.

Reproducibility: Unless the authors explicitly specify which software versions they used and which steps they performed, exactly reproducing a DE list can be very difficult. But some overlap should be there of course, especially concerning enriched GO terms and things like that

Bernd