Question

Using LIMMA for differential gene analysis of microarray data

0

Entering edit mode

Francesca McGrath • 0

@francesca-mcgrath-18224

Last seen 5.6 years ago

Australia

Hi Forum,

I'm trying to determine genes differentially expressed between a disease group (n=16), and healthy controls (n=10), using the limma package. I'm wanting to generate mainly adjusted p values and logFC for all genes, so that I can then sort them by significance on another platform (excel). My data is from a microarray experiment (Affymetrix Human Gene Array) stored in the GEO repository. The issue I'm having is that results generated differ from previous analysis done on the same samples, so I believe that as a (very) new user I have missed some (or more likely many) steps.

I manually downloaded a GEO supplementary file (GSE...Normalised_data_with_annotation) and then imported the dataset into R. I do not believe this dataset has been log transformed. From this I created a text file with groups of interest ("disease_healthy") and imported it into R: The first 16 columns are disease patients and the remaining 10 are controls. There are 25582 rows of genes. To interrogate the data I used the following steps after loading the limma package:

> disease_healthy <- read.delim("~/disease_healthy.txt", header=FALSE)
> View(disease_healthy)
> rma.file <- read.table("disease_healthy.txt", header=FALSE)
> eset <- new("ExpressionSet", exprs=as.matrix(rma.file))
> design <-model.matrix(~0+factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0)))
> colnames(design) <- c("disease", "healthy")
> fit <- lmFit(eset, design)
> contrast.matrix<-makeContrasts(d.vs.h=disease-healthy, levels=design)
> fit2<-contrasts.fit(fit, contrast.matrix)
> fit2.eBayes<-eBayes(fit2)
> topTable(fit2.eBayes,coef="d.vs.h",n=Inf,sort.by="none",adjust.method="fdr")

I'm not sure whether inconsistencies with previous analysis are because the data needed further alterations before the steps I took here, or if I'm completely off base with my analysis. I also used the same steps with log2 transformed data but this didn't help.

Any help/direction that could be provided would be much appreciated!

limma differential gene expression ncbi geo affymetrix microarrays • 1.6k views

ADD COMMENT • link updated 5.6 years ago by Aaron Lun ★ 28k • written 5.6 years ago by Francesca McGrath • 0

0

Entering edit mode

I wonder what "previous analyses" you are referring to, since GEO doesn't describe or store analyses?

ADD REPLY • link 5.6 years ago Gordon Smyth 50k

score 2 · Answer 1 · 2018-11-08

Reproducing existing analyses from a brief textual description is notoriously difficult, and part of the reason why journals mandate code availability as part of the submission process. Updates to packages or R itself can also lead to differences. So unless you've got the code that they used for the previous analyses, the best you can do is to guess.

With that in mind, the limma part of your analysis looks mostly fine, assuming that (i) the data are already background-corrected, normalized and log-transformed, and (ii) the design matrix is correct with no additional blocking factors required. You may also need to do some filtering or array weighting, read the limma user's guide for details.