Question: Analyzing gene expression data( not celfiles) in limma and annotating results
0
gravatar for kaushal Raj Chaudhary
4.6 years ago by
United States
kaushal Raj Chaudhary10 wrote:

I have gene expression data (not image files) obtained from affymetrix chip and I used limma to analyze it.  The row names of top table does not match the probe id. I saw this thread for annotating results from celfiles (https://support.bioconductor.org/p/63834/).

 I wonder how can I annotate the results of toptable().  

Thanks for any insights and help.

 

Code:


dat<-read.csv("Rat_Gene_Expression_Array.csv")   ### importing data 
names(dat)
Heart_dat<-dat[c("Probe_ID","CD.ND.H1","CD.ND.H2","CD.ND.H3","CD.ND.H4","CD.ND.H5","CD.ND.H6","CD.ND.H7","CD.ND.H8",
             "HF.DM.H1","HF.DM.H2","HF.DM.H3","HF.DM.H4","HF.DM.H5","X.1")]  ## subsetting data 


library(data.table)        ### load the library data.table 
setnames(Heart_dat, old=c("Probe_ID","CD.ND.H1","CD.ND.H2","CD.ND.H3","CD.ND.H4","CD.ND.H5","CD.ND.H6","CD.ND.H7","CD.ND.H8",
                          "HF.DM.H1","HF.DM.H2","HF.DM.H3","HF.DM.H4","HF.DM.H5","X.1"),
                    new=c("Probe_ID","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND",
                          "HF.DM","HF.DM","HF.DM","HF.DM","HF.DM","Gene.name"))   ## renaming columns
names(Heart_dat)
head(Heart_dat$Gene.name)

test<-Heart_dat[!(Heart_dat$Gene.name=='---'),]   ## removing probe ids which does not have gene name 
dim(test)
test<-test[,c(-1,-15) ]         ### removing first and last column in  data
dim(test)

### now fitting linear model 

library(limma)

Treat<-c("CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","CD.ND","HF.DM","HF.DM","HF.DM","HF.DM","HF.DM")
f<-factor(Treat, levels=c("CD.ND","HF.DM"))
design<-model.matrix(~f)
fit<-lmFit(test, design)
fit
fit2<-eBayes(fit)
fit2
tt<-topTable(fit2, coef=2, adjust="BH")


> tt
           logFC   AveExpr         t      P.Value adj.P.Val         B
14352   ...........................................................................
3210     ...........................................................................
10120  ............................................................................

annotation limma • 634 views
ADD COMMENTlink modified 4.6 years ago by Steve Lianoglou12k • written 4.6 years ago by kaushal Raj Chaudhary10
Answer: Analyzing gene expression data( not celfiles) in limma and annotating results
2
gravatar for Steve Lianoglou
4.6 years ago by
Denali
Steve Lianoglou12k wrote:

You want test to be a numeric matrix and set its rownames to be the ids/names you want.

You have the following:

test<-test[,c(-1,-15)]

Where it seems the first column and column 15 are the probe ID and genenames, respectively.

Instead of that, what you might want to do is something like:

e <- as.matrix(test[, c(-1,-15)]
rownames(e) <- test[[1]] ## set rownames to probeIDs
stopifnot(is.numeric(e)) ## just making sure we have legit data here
## ... stuff ...
fit <- lmFit(e, design)
fit2 <- eBayes(fit)
tt <- topTable(fit2, coef=2, adjust="BH")

Also, you didn't say what types of numbers are in the (now called) e matrix, but make sure these data have been log2 transformed before you shoot it into lmFit

 

 

ADD COMMENTlink written 4.6 years ago by Steve Lianoglou12k

Hi Steve,

Thank you for help. Yes, expression values are log2 transformed.  Now I suppose I can merge annotation table and  results from toptable().

ADD REPLYlink written 4.6 years ago by kaushal Raj Chaudhary10
2

For me the preferred way to annotate data is to put the annotation information into the 'genes' list item of your MArrayLM object, so topTable() will automagically output the annotation for you.

You say above that this is an Affy rat chip of some sort, so let's say for the sake of argument that it is a Rat Gene 1.0 ST array. In addition, let's assume that whomever summarized these data did so at the transcript level.

library(ragene10sttranscriptcluster.db)

gns <- select(ragene10sttranscriptcluster.db, row.names(fit2$coef), c("ENTREZID","SYMBOL","GENENAME"))

you will get a warning that there are some one-to-many mappings, which you have to deal with. Here I just assume you are a dolt like me, and just want to do the most naive thing possible.

gns <- gns[!duplicated(gns[,1]),]

fit2$genes <- gns

topTable(fit2,2)

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by James W. MacDonald51k

Thanks, Jim.

ADD REPLYlink written 4.6 years ago by kaushal Raj Chaudhary10

Hi Jim,

How can I remove duplicate probe ids before fitting in lmfit ( not in MArrayLM object)? Your code removes the duplicates only after fitting the model.   

Thanks !!!

gns <- gns[!duplicated(gns[,1]),]

fit2$genes <- gns

topTable(fit2,2)
ADD REPLYlink written 4.6 years ago by kaushal Raj Chaudhary10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 326 users visited in the last hour