Question

Annotating limma Results with Gene Names for Affy Microarrays

0

Entering edit mode

alakatos ▴ 130

@alakatos-6983

Last seen 4.6 years ago

United States

Hello,

I would like to annotate my "limma result". I tried to implement a script found @ (http://gettinggeneticsdone.blogspot.com/2012/01/annotating-limma-results-with-gene.html ). Unfortunately, it did not work and no error message to understand the issue.

What I noticed are the followings:

1. The author used package "affy" to import his cell files. I had to use "oligo" for my cel files. "affy" produced an error.

Data <- ReadAffy() 
Error: 

The affy package is not designed for this array type.
Please use either the oligo or xps package.

2. In the author's result file before and after annotation, the probeset column has a name "ID". My result file does not have any name in the first column.

Author's file:

ID logFC AveExpr t P.Value adj.P.Val B

7902702 -6.8 7.8 -46 1.0e-09 3.3e-05 11.0

8117594 -4.3 10.0 -33 9.6e-09 1.5e-04 10.0

My file:

           logFC  AveExpr        t      P.Value    adj.P.Val        B
17550599 13.85776 13.84370 250.1408 4.249890e-33 1.492401e-28 53.95407
7550553 13.74935 13.73611 248.1621 4.897051e-33 1.492401e-28 53.92340

My understanding is that the the common identification is "ID" in the script. Could it be the source of the problem?

Would you please advise how to fix it?

Your help much appreciated.

All the best,

Anita

My procedure:

Creating ExpressionSet

pheno<-read.AnnotatedDataFrame("pheno.txt", header=T, row.names="sampleNames",sep="\t")

raw = read.celfiles(list.celfiles())

phenoData(raw) <- pheno

colnames(raw) <- row.names(pheno)

ExpressionSet = rma(raw, target="core")

annotation(ExpressionSet) <- "mogene20sttranscriptcluster.db"

ExpressionSet (storageMode: lockedEnvironment)
assayData: 41345 features, 20 samples 
  element names: exprs 
protocolData
  rowNames: 1112F-1.CEL 1112F-2.CEL ...1112F-3.CEL (20 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: 1112F-1.CEL 1112F-2.CEL ...1112F-3.CEL (20 total)
  varLabels: Mice Geno_Treatm ... Treatment (5 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: mogene20sttranscriptcluster.db

ADDING ANNOTATION to ExpressionSet based on Stephen Turner's script.

ID <- featureNames(ExpressionSet)

Symbol <- getSYMBOL(ID, "mogene20sttranscriptcluster.db")

Name <- as.character(lookUp(ID, "mogene20sttranscriptcluster.db", "GENENAME"))
Ensembl <- as.character(lookUp(ID, "mogene20sttranscriptcluster.db", "ENSEMBL"))
Ensembl <- ifelse(Ensembl=="NA", NA, paste("<a href='http://uswest.ensembl.org/Mus_musculus/Gene/Summary?g=", Ensembl, "'>", Ensembl, "</a>", sep=""))
tmp <- data.frame(ID=ID, Symbol=Symbol, Name=Name, Ensembl=Ensembl, stringsAsFactors=F)
tmp[tmp=="NA"] <- NA
# set the featureData for your expressionset using the data frame you created above.
fData(ExpressionSet) <- tmp
# Clean up
rm(ID, Symbol, Name, Ensembl, tmp)

ExpressionSet after adding annotation fields:

ExpressionSet (storageMode: lockedEnvironment)
assayData: 41345 features, 20 samples 
  element names: exprs 
protocolData
  rowNames: 1112F-1.CEL 1112F-2.CEL ...1112F-3.CEL (20 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: 1112F-1.CEL 1112F-2.CEL ...1112F-3.CEL (20 total)  
  varLabels: Mice Geno_Treatm ... Treatment (5 total)
  varMetadata: labelDescription
featureData
  featureNames: 17200001 17200003 ... 17551169 (41345 total)
  fvarLabels: ID Symbol Name Ensembl
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: mogene20sttranscriptcluster.db

limma annotation pd.mogene.2.0.st mogene20sttranscriptcluster.db affy • 8.4k views

ADD COMMENT • link updated 9.3 years ago by Gordon Smyth 50k • written 9.3 years ago by alakatos ▴ 130

score 1 · Answer 1 · 2014-12-27

You can put the probe-set annotation into the ExpressionSet data object, or into the MArrayLM fit object, or enter its as part of the topTable() function call. They all work. limma will find the annotation anyway that it is entered.

The estrogen case study in Section 17.2 of the limma User's Guide gives a complete worked example with Affymetrix arrays that includes adding gene symbols. Admittedly it uses the oldish function getSymbol() because it was written some years ago, but that still works.

Your top table doesn't include a column called "ID" because limma now sets the probe-set IDs as row names.This should not be a problem.

score 0 · Answer 2 · 2014-12-23

Slight edits.

You can add the data to your ExpressionSet, and according to MIAME principles I suppose you really should. And maybe limma will extract those data out and incorporate into your topTable() output. I never do that, however, so cannot say for sure. Instead, I just annotate my MArrayLM object, and then topTable() will output nice data.frames with all the annotation included.

Please note that things like getSYMBOL() and lookUp() are vestiges of an earlier age (like 5 years ago ;-D), and have been superceded with new methods that are better. And also note that 'ExpressionSet' is not a good object name, as it is also a function name as well as a class description, so you are asking R to distinguish between those three things needlessly.

Anyway, if I were doing something like this, I would proceed like this:

eset <- rma(raw)

design <- model.matrix(<do something here>)

fit <- lmFit(eset, design)

fit <- contrasts.fit(fit, contrast) ## if necessary

fit2 <- eBayes(fit)

library(mogene20sttranscriptcluster.db)

gns <- select(mogene20sttranscriptcluster.db, featureNames(eset), c("ENSEMBL","SYMBOL","GENENAME"))
## there will be duplicates; the most naive way to resolve is to use the first one
gns <- gns[!duplicated(gns[,1]),]

fit2$genes <- gns

topTable(fit2, 1)

And if you want to have HTML output with links to things, I would recommend ReportingTools. First you need a function to create the links you want. You could do affy links:

 affyLinks <- function (df, ...) {
           df$PROBEID <- hwrite(as.character(df$PROBEID), link = paste0("https://www.affymetrix.com/LinkServlet?probeset=",
        as.character(df$PROBEID)), table = FALSE)
    return(df)
}

and ensLinks:

ensLinks <- function (df, ...) {
           df$ENSEMBL <- hwrite(as.character(df$ENSEMBL), link = paste0("http://uswest.ensembl.org/Mus_musculus/Gene/Summary?g=",
        as.character(df$ENSEMBL)), table = FALSE)
    return(df)
}

Then you can do

tab <- HTMLReport("index", "Sweet genes right here!")
publish(topTable(fit2, 1), tab, .modifyDF = list(affyLinks, ensLinks))
finish(tab)
browseURL("index.html")

See the vignettes for ReportingTools if you are interested.

score 0 · Answer 3 · 2014-12-23

0