Question

Limma DE analysis using all microarray or subset of interest microarray

0

Entering edit mode

Stane ▴ 40

@stane-10974

Last seen 7.5 years ago

Hello,

I am using limma to compute the differently express genes between two conditions from microarray.

The dataset, a GSE from GEO contains 38 arrays of senescent / quiescent and growing samples, I am only interested in senescent vs growing DE genes.

So far I have been removing the extra array with the code below, but I also recently came across some post about keeping all arrays in regards to the "level of freedom". While keeping all arrays I am getting completely different results, wondering which is the of the methods is the most appropriate.

idx <- which(cell_status == "Growing" | cell_status == "Senescent")
gse <- gse[,idx]

pheno <- droplevels(pData(gse))

mod <- model.matrix(~cell_status+ 0, data=pheno) 
colnames(mod) <- levels(pheno$cell_status)
contrast_mat <- makeContrasts(Senescent-Growing, levels=mod)

fit = lmFit(gse, mod)
fit2 <- contrasts.fit(fit, contrast_mat)
fit2 <- eBayes(fit2)

topTable(fit2, adjust="fdr", sort.by="B", number=Inf)

limma • 1.6k views

ADD COMMENT • link 8.8 years ago Stane ▴ 40

score 1 · Answer 1 · 2016-10-08

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 15 hours ago

The city by the bay

Generally, if you include more samples, they'll provide more residual degrees of freedom in the model to estimate the variance, even if they're not related to the contrast of interest. This leads to more precise estimates and greater power to detect significant differences. It would help if you were more specific about how your results are "completely different" with and without these extra samples, but I'd naturally expect an increase in the number of putative DE genes when you leave them in.

P.S. You don't mention what the accession number is, but I'm guessing it's GSE19864. It would seem unwise to use only the growing/senescent status of the cells to formulate your design matrix, given that there's a whole bunch of shRNA treatments happening at the same time.

ADD COMMENT • link 8.8 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thank for you help Aaron,

You right I have been working on GSE19864 trying to reproduce their results, I also think that I have a normalization problem, as with the same design, I never get the same DE genes with their normalized data compare to the raw data that I normalized using the method they described and in my case a PCA from my normalization method show a batch effect compare to a PCA of their data which doesn't exhibit this problem.

Concerning the shRNA I know about them so while checking senescent/growing I would usually just keep the control senescent and control growing and remove all other arrays prior to doing my design and limma analysis.

So in my case should I keep other arrays even if their condition/treatment are different than what I am interested in or just remove them.

ADD REPLY • link 8.8 years ago Stane ▴ 40