Hello,
I am using limma to compute the differently express genes between two conditions from microarray.
The dataset, a GSE from GEO contains 38 arrays of senescent / quiescent and growing samples, I am only interested in senescent vs growing DE genes.
So far I have been removing the extra array with the code below, but I also recently came across some post about keeping all arrays in regards to the "level of freedom". While keeping all arrays I am getting completely different results, wondering which is the of the methods is the most appropriate.
idx <- which(cell_status == "Growing" | cell_status == "Senescent") gse <- gse[,idx] pheno <- droplevels(pData(gse))
mod <- model.matrix(~cell_status+ 0, data=pheno) colnames(mod) <- levels(pheno$cell_status) contrast_mat <- makeContrasts(Senescent-Growing, levels=mod) fit = lmFit(gse, mod) fit2 <- contrasts.fit(fit, contrast_mat) fit2 <- eBayes(fit2) topTable(fit2, adjust="fdr", sort.by="B", number=Inf)
Thank for you help Aaron,
You right I have been working on GSE19864 trying to reproduce their results, I also think that I have a normalization problem, as with the same design, I never get the same DE genes with their normalized data compare to the raw data that I normalized using the method they described and in my case a PCA from my normalization method show a batch effect compare to a PCA of their data which doesn't exhibit this problem.
Concerning the shRNA I know about them so while checking senescent/growing I would usually just keep the control senescent and control growing and remove all other arrays prior to doing my design and limma analysis.
So in my case should I keep other arrays even if their condition/treatment are different than what I am interested in or just remove them.