eset annotation issues, plus generate heatmap with correct gene symbol as row label
1
0
Entering edit mode
colonppg ▴ 30
@colonppg-7771
Last seen 7.7 years ago
United States

Folks:

need some help here... why it is so error prone and hard to deal with eset?

# read in .cel files, normalize it, then try to attach annotation 

mydata<-ReadAffy()

mydata.rma<-rma(mydata)

allprobe<-row.names(exprs(mydata.rma))

all.gs <- select(hgu133plus2.db, allprobe, c("SYMBOL"), keytype="PROBEID")

featureData(mydata.rma) <- new("AnnotatedDataFrame", data=all.gs)

however--------------------

> length(allprobe)
[1] 54675
> dimall.gs)
[1] 58608     2

 

is there a way to attach gene annotation information? 

If I want to do this in a small scale:

mygene.probe <-read.table(file="myprobesets.txt", blabla)

gs<-select(hgu133plus2.db, mygene.probe, c("SYMBOL", "ENTREZID"))

subset.for.heatmap <- mydata.rma[featureNames(mydata.rma) %in% gs$PROBEID,]

featureData(subset.for.heatmap) <-new("AnnotatedDataFrame",data=gs)

heatmap.2(exprs(subset.for.heatmap), scale="row", trace="none", col=colorpanel(100,"green", "white", "red"),labCol=pData(subset.for.heamap)$treatment_time, ColSideColors=pData(subset.for.heatmap)$color, labRow=fData(subset.for.heatmap)[[2]])

 

then the probeset id and gene symbol are all messed up on the heatmap, they are not matched... I guess this probably like in attached pData, I need to check they match before doing the attachment...

apparent there is no error checking and a lot of mistake could result from this ......

 

anyone has a way to easily and elegantly solving this? or pointing to a good resource?

 

thanks 

 

eset annotation • 3.0k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 7 minutes ago
United States

The issue here is that some of the probesets measure multiple things. Since there are one-to-many mappings for some probesets, you get more rows from select().

There are any number of ways to deal with this. One way is to use mapIds() instead of select(), which has methods to deal with multiple mapping probes:

multiVals: What should mapIds do when there are multiple values that
          could be returned?  Options include:

          first: This value means that when there are multiple matches
              only the 1st thing that comes back will be returned. This
              is the default behavior

          list: This will just returns a list object to the end user

          filter: This will remove all elements that contain multiple
              matches and will therefore return a shorter vector than
              what came in whenever some of the keys match more than
              one value

          asNA: This will return an NA value whenever there are
              multiple matches

          CharacterList: This just returns a SimpleCharacterList object

          FUN: You can also supply a function to the multiVals
              argument for custom behaviors.  The function must take a
              single argument and return a single value.  This function
              will be applied to all the elements and will serve a
              'rule' that for which thing to keep when there is more
              than one element.  So for example this example function
              will always grab the last element in each result: 
              last <- function(x){x[[length(x)]]} 

How best to deal with multiple mapping probes is an individual decision, and we leave that up to the end user to decide.

ADD COMMENT
0
Entering edit mode

Thanks, James... I think I have found the work around:

it seems this will has to be done for individual genes...

mygene.probe <-read.table(file="myprobesets.txt", blabla)

gs<-select(hgu133plus2.db, mygene.probe, c("SYMBOL", "ENTREZID"))

gs<-gs[order(gs$PROBEID), ]  # this will take care of it.. given in the eset the probeid is sorted... 

subset.for.heatmap <- mydata.rma[featureNames(mydata.rma) %in% gs$PROBEID,]

featureData(subset.for.heatmap) <-new("AnnotatedDataFrame",data=gs)

heatmap.2(exprs(subset.for.heatmap), scale="row", trace="none", col=colorpanel(100,"green", "white", "red"),labCol=pData(subset.for.heamap)$treatment_time, ColSideColors=pData(subset.for.heatmap)$color, labRow=fData(subset.for.heatmap)[[2]])

 

An alternative is to read the .csv format of the annotation to an object and attached it to the whole eset as fData...

 

Thanks again

ADD REPLY

Login before adding your answer.

Traffic: 814 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6