I'm trying to determine genes differentially expressed between a disease group (n=16), and healthy controls (n=10), using the limma package. I'm wanting to generate mainly adjusted p values and logFC for all genes, so that I can then sort them by significance on another platform (excel). My data is from a microarray experiment (Affymetrix Human Gene Array) stored in the GEO repository. The issue I'm having is that results generated differ from previous analysis done on the same samples, so I believe that as a (very) new user I have missed some (or more likely many) steps.
I manually downloaded a GEO supplementary file (GSE...Normalised_data_with_annotation) and then imported the dataset into R. I do not believe this dataset has been log transformed. From this I created a text file with groups of interest ("disease_healthy") and imported it into R: The first 16 columns are disease patients and the remaining 10 are controls. There are 25582 rows of genes. To interrogate the data I used the following steps after loading the limma package:
> disease_healthy <- read.delim("~/disease_healthy.txt", header=FALSE)
> rma.file <- read.table("disease_healthy.txt", header=FALSE)
> eset <- new("ExpressionSet", exprs=as.matrix(rma.file))
> design <-model.matrix(~0+factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0)))
> colnames(design) <- c("disease", "healthy")
> fit <- lmFit(eset, design)
> contrast.matrix<-makeContrasts(d.vs.h=disease-healthy, levels=design)
> fit2<-contrasts.fit(fit, contrast.matrix)
I'm not sure whether inconsistencies with previous analysis are because the data needed further alterations before the steps I took here, or if I'm completely off base with my analysis. I also used the same steps with log2 transformed data but this didn't help.
Any help/direction that could be provided would be much appreciated!