Question

Analyzing gene expression data( not celfiles) in limma and annotating results

0

Entering edit mode

kaushal Raj Chaudhary ▴ 10

@kaushal-raj-chaudhary-6165

Last seen 9.0 years ago

United States

I have gene expression data (not image files) obtained from affymetrix chip and I used limma to analyze it. The row names of top table does not match the probe id. I saw this thread for annotating results from celfiles (https://support.bioconductor.org/p/63834/).

I wonder how can I annotate the results of toptable().

Thanks for any insights and help.

Code:

dat<-read.csv("Rat_Gene_Expression_Array.csv") ### importing data
names(dat)
Heart_dat<-dat[c("Probe_ID","CD.ND.H1","CD.ND.H2","CD.ND.H3","CD.ND.H4","CD.ND.H5","CD.ND.H6","CD.ND.H7","CD.ND.H8",
"HF.DM.H1","HF.DM.H2","HF.DM.H3","HF.DM.H4","HF.DM.H5","X.1")] ## subsetting data

library(data.table) ### load the library data.table
setnames(Heart_dat, old=c("Probe_ID","CD.ND.H1","CD.ND.H2","CD.ND.H3","CD.ND.H4","CD.ND.H5","CD.ND.H6","CD.ND.H7","CD.ND.H8",
"HF.DM.H1","HF.DM.H2","HF.DM.H3","HF.DM.H4","HF.DM.H5","X.1"),
new=c("Probe_ID","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND",
"HF.DM","HF.DM","HF.DM","HF.DM","HF.DM","Gene.name")) ## renaming columns
names(Heart_dat)
head(Heart_dat$Gene.name)

test<-Heart_dat[!(Heart_dat$Gene.name=='---'),] ## removing probe ids which does not have gene name
dim(test)
test<-test[,c(-1,-15) ] ### removing first and last column in data
dim(test)

### now fitting linear model

library(limma)

Treat<-c("CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","HF.DM","HF.DM","HF.DM","HF.DM","HF.DM")
f<-factor(Treat, levels=c("CD.ND","HF.DM"))
design<-model.matrix(~f)
fit<-lmFit(test, design)
fit
fit2<-eBayes(fit)
fit2
tt<-topTable(fit2, coef=2, adjust="BH")

> tt
logFC AveExpr t P.Value adj.P.Val B
14352 ...........................................................................
3210 ...........................................................................
10120 ............................................................................

limma annotation • 1.4k views

ADD COMMENT • link updated 9.1 years ago by Steve Lianoglou ★ 13k • written 9.1 years ago by kaushal Raj Chaudhary ▴ 10

score 2 · Accepted Answer · 2015-03-20

2

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

You want test to be a numeric matrix and set its rownames to be the ids/names you want.

You have the following:

test<-test[,c(-1,-15)]

Where it seems the first column and column 15 are the probe ID and genenames, respectively.

Instead of that, what you might want to do is something like:

e <- as.matrix(test[, c(-1,-15)]
rownames(e) <- test[[1]] ## set rownames to probeIDs
stopifnot(is.numeric(e)) ## just making sure we have legit data here
## ... stuff ...
fit <- lmFit(e, design)
fit2 <- eBayes(fit)
tt <- topTable(fit2, coef=2, adjust="BH")

Also, you didn't say what types of numbers are in the (now called) e matrix, but make sure these data have been log2 transformed before you shoot it into lmFit

ADD COMMENT • link 9.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi Steve,

Thank you for help. Yes, expression values are log2 transformed. Now I suppose I can merge annotation table and results from toptable().

ADD REPLY • link 9.1 years ago kaushal Raj Chaudhary ▴ 10

2

Entering edit mode

For me the preferred way to annotate data is to put the annotation information into the 'genes' list item of your MArrayLM object, so topTable() will automagically output the annotation for you.

You say above that this is an Affy rat chip of some sort, so let's say for the sake of argument that it is a Rat Gene 1.0 ST array. In addition, let's assume that whomever summarized these data did so at the transcript level.

library(ragene10sttranscriptcluster.db)

gns <- select(ragene10sttranscriptcluster.db, row.names(fit2$coef), c("ENTREZID","SYMBOL","GENENAME"))

you will get a warning that there are some one-to-many mappings, which you have to deal with. Here I just assume you are a dolt like me, and just want to do the most naive thing possible.

gns <- gns[!duplicated(gns[,1]),]

fit2$genes <- gns

topTable(fit2,2)

ADD REPLY • link 9.1 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks, Jim.

ADD REPLY • link 9.1 years ago kaushal Raj Chaudhary ▴ 10

0

Entering edit mode

Hi Jim,

How can I remove duplicate probe ids before fitting in lmfit ( not in MArrayLM object)? Your code removes the duplicates only after fitting the model.

Thanks !!!

gns <- gns[!duplicated(gns[,1]),]

fit2$genes <- gns

topTable(fit2,2)

ADD REPLY • link 9.1 years ago kaushal Raj Chaudhary ▴ 10