Limma DE analysis using all microarray or subset of interest microarray
1
0
Entering edit mode
Stane ▴ 40
@stane-10974
Last seen 6.9 years ago

Hello, 

I am using limma to compute the differently express genes between two conditions from microarray.

The dataset, a GSE from GEO contains 38 arrays of senescent / quiescent and growing samples, I am only interested in senescent vs growing DE genes. 

So far I have been removing the extra array with the code below, but I also recently came across some post about keeping all arrays in regards to the "level of freedom". While keeping all arrays I am getting completely different results, wondering which is the of the methods is the most appropriate.

idx <- which(cell_status == "Growing" | cell_status == "Senescent")
gse <- gse[,idx]

pheno <- droplevels(pData(gse))
mod <- model.matrix(~cell_status+ 0, data=pheno) 
colnames(mod) <- levels(pheno$cell_status)
contrast_mat <- makeContrasts(Senescent-Growing, levels=mod)

fit = lmFit(gse, mod)
fit2 <- contrasts.fit(fit, contrast_mat)
fit2 <- eBayes(fit2)

topTable(fit2, adjust="fdr", sort.by="B", number=Inf)
limma • 1.4k views
ADD COMMENT
1
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 7 hours ago
The city by the bay

Generally, if you include more samples, they'll provide more residual degrees of freedom in the model to estimate the variance, even if they're not related to the contrast of interest. This leads to more precise estimates and greater power to detect significant differences. It would help if you were more specific about how your results are "completely different" with and without these extra samples, but I'd naturally expect an increase in the number of putative DE genes when you leave them in.

P.S. You don't mention what the accession number is, but I'm guessing it's GSE19864. It would seem unwise to use only the growing/senescent status of the cells to formulate your design matrix, given that there's a whole bunch of shRNA treatments happening at the same time.

ADD COMMENT
0
Entering edit mode

Thank for you help Aaron, 

You right I have been working on GSE19864 trying to reproduce their results, I also think that I have a normalization problem, as with the same design, I never get the same DE genes with their normalized data compare to the raw data that I normalized using the method they described and in my case a PCA from my normalization method show a batch effect compare to a PCA of their data which doesn't exhibit this problem. 

Concerning the shRNA I know about them so while checking senescent/growing I would usually just keep the control senescent and control growing and remove all other arrays prior to doing my design and limma analysis. 

So in my case should I keep other arrays even if their condition/treatment are different than what I am interested in or just remove them.

ADD REPLY

Login before adding your answer.

Traffic: 962 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6