Question

eset annotation issues, plus generate heatmap with correct gene symbol as row label

0

Entering edit mode

colonppg ▴ 30

@colonppg-7771

Last seen 7.1 years ago

United States

Folks:

need some help here... why it is so error prone and hard to deal with eset?

# read in .cel files, normalize it, then try to attach annotation

mydata<-ReadAffy()

mydata.rma<-rma(mydata)

allprobe<-row.names(exprs(mydata.rma))

all.gs <- select(hgu133plus2.db, allprobe, c("SYMBOL"), keytype="PROBEID")

featureData(mydata.rma) <- new("AnnotatedDataFrame", data=all.gs)

however--------------------

> length(allprobe)
[1] 54675
> dimall.gs)
[1] 58608 2

is there a way to attach gene annotation information?

If I want to do this in a small scale:

mygene.probe <-read.table(file="myprobesets.txt", blabla)

gs<-select(hgu133plus2.db, mygene.probe, c("SYMBOL", "ENTREZID"))

subset.for.heatmap <- mydata.rma[featureNames(mydata.rma) %in% gs$PROBEID,]

featureData(subset.for.heatmap) <-new("AnnotatedDataFrame",data=gs)

heatmap.2(exprs(subset.for.heatmap), scale="row", trace="none", col=colorpanel(100,"green", "white", "red"),labCol=pData(subset.for.heamap)$treatment_time, ColSideColors=pData(subset.for.heatmap)$color, labRow=fData(subset.for.heatmap)[[2]])

then the probeset id and gene symbol are all messed up on the heatmap, they are not matched... I guess this probably like in attached pData, I need to check they match before doing the attachment...

apparent there is no error checking and a lot of mistake could result from this ......

anyone has a way to easily and elegantly solving this? or pointing to a good resource?

thanks

eset annotation • 2.8k views

ADD COMMENT • link updated 8.8 years ago by James W. MacDonald 65k • written 8.8 years ago by colonppg ▴ 30

score 2 · Answer 1 · 2015-07-01

The issue here is that some of the probesets measure multiple things. Since there are one-to-many mappings for some probesets, you get more rows from select().

There are any number of ways to deal with this. One way is to use mapIds() instead of select(), which has methods to deal with multiple mapping probes:

multiVals: What should mapIds do when there are multiple values that
          could be returned?  Options include:

          first: This value means that when there are multiple matches
              only the 1st thing that comes back will be returned. This
              is the default behavior

          list: This will just returns a list object to the end user

          filter: This will remove all elements that contain multiple
              matches and will therefore return a shorter vector than
              what came in whenever some of the keys match more than
              one value

          asNA: This will return an NA value whenever there are
              multiple matches

          CharacterList: This just returns a SimpleCharacterList object

          FUN: You can also supply a function to the multiVals
              argument for custom behaviors.  The function must take a
              single argument and return a single value.  This function
              will be applied to all the elements and will serve a
              'rule' that for which thing to keep when there is more
              than one element.  So for example this example function
              will always grab the last element in each result: 
              last <- function(x){x[[length(x)]]}

How best to deal with multiple mapping probes is an individual decision, and we leave that up to the end user to decide.