Using LIMMA for differential gene analysis of microarray data
1
0
Entering edit mode
@francesca-mcgrath-18224
Last seen 5.6 years ago
Australia

Hi Forum, 

I'm trying to determine genes differentially expressed between a disease group (n=16), and healthy controls (n=10), using the limma package. I'm wanting to generate mainly adjusted p values and logFC for all genes, so that I can then sort them by significance on another platform (excel). My data is from a microarray experiment (Affymetrix Human Gene Array) stored in the GEO repository. The issue I'm having is that results generated differ from previous analysis done on the same samples, so I believe that as a (very) new user I have missed some (or more likely many) steps. 

I manually downloaded a GEO supplementary file (GSE...Normalised_data_with_annotation) and then imported the dataset into R. I do not believe this dataset has been log transformed. From this I created a text file with groups of interest ("disease_healthy") and imported it into R: The first 16 columns are disease patients and the remaining 10 are controls. There are 25582 rows of genes. To interrogate the data I used the following steps after loading the limma package:

> disease_healthy <- read.delim("~/disease_healthy.txt", header=FALSE)
>   View(disease_healthy)
> rma.file <- read.table("disease_healthy.txt", header=FALSE) 
> eset <- new("ExpressionSet", exprs=as.matrix(rma.file))
> design <-model.matrix(~0+factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0))) 
> colnames(design) <- c("disease", "healthy") 
> fit <- lmFit(eset, design)
> contrast.matrix<-makeContrasts(d.vs.h=disease-healthy, levels=design)
> fit2<-contrasts.fit(fit, contrast.matrix)
> fit2.eBayes<-eBayes(fit2)
> topTable(fit2.eBayes,coef="d.vs.h",n=Inf,sort.by="none",adjust.method="fdr")

I'm not sure whether inconsistencies with previous analysis are because the data needed further alterations before the steps I took here, or if I'm completely off base with my analysis. I also used the same steps with log2 transformed data but this didn't help.

Any help/direction that could be provided would be much appreciated!

limma differential gene expression ncbi geo affymetrix microarrays • 1.6k views
ADD COMMENT
0
Entering edit mode

I wonder what "previous analyses" you are referring to, since GEO doesn't describe or store analyses?

ADD REPLY
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 13 hours ago
The city by the bay

Reproducing existing analyses from a brief textual description is notoriously difficult, and part of the reason why journals mandate code availability as part of the submission process. Updates to packages or R itself can also lead to differences. So unless you've got the code that they used for the previous analyses, the best you can do is to guess.

With that in mind, the limma part of your analysis looks mostly fine, assuming that (i) the data are already background-corrected, normalized and log-transformed, and (ii) the design matrix is correct with no additional blocking factors required. You may also need to do some filtering or array weighting, read the limma user's guide for details.

ADD COMMENT
0
Entering edit mode

Thank you for your answer! 

All the modifications you mentioned have already been done, so I think I'll run with the analysis I have. 

ADD REPLY

Login before adding your answer.

Traffic: 639 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6