Proper use of annotation package

0

Entering edit mode

Kamila Naxerova ▴ 100

@kamila-naxerova-4164

Last seen 9.6 years ago

Hi all, I have a little problem using an annotation package that I know how to work around, but I am wondering how to do it more elegantly and efficiently. I am analyzing a bunch of Mouse Gene 2.0 ST arrays. I built my own annotation package (with much help from all of you!). I am using Limma and want to look up annotation for diff exp genes provided by topTable(). So it's really a standard situation. On the Bioconductor website, this sequence of commands is suggested (http://www.bioconductor.org/help/workflows/annotation-data/) tbl <- topTable(efit, coef=2) ids <- tbl[["ID"]] entrez <- hgu95av2ENTREZID[ids] Looks beautiful! But when I try to do the same thing, I get: tbl<-topTable(fit2all,number=100) ids <- tbl[["ID"]] mogene20sttranscriptclusterACCNUM[ids] Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : value for "17549282" not found This error is evidently generated because some of the ids don't map to any accession numbers. I can work around this by filtering my ids first, but am I doing it wrong? Of course lots of probe ids on the array are not going to map to any accession numbers or symbols or names -- why can't they just come back with NA instead of an error message and abortion of the whole process? Thanks! Kamila

Annotation probe Annotation probe • 1.0k views

ADD COMMENT • link 11.1 years ago Kamila Naxerova ▴ 100

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

Hi Kamila, You could use the select() method instead; select(mogene20sttranscriptcluster.db, ids, "ACCNUM") Which will return NA appropriately. Best, Jim On Mar 18, 2013 2:48 PM, "Naxerova, Kamila" <naxerova@fas.harvard.edu> wrote: > Hi all, > > I have a little problem using an annotation package that I know how to > work around, but I am wondering how to do it more elegantly and efficiently. > > I am analyzing a bunch of Mouse Gene 2.0 ST arrays. I built my own > annotation package (with much help from all of you!). I am using Limma and > want to look up annotation for diff exp genes provided by topTable(). So > it's really a standard situation. On the Bioconductor website, this > sequence of commands is suggested ( > http://www.bioconductor.org/help/workflows/annotation-data/) > > tbl <- topTable(efit, coef=2) > ids <- tbl[["ID"]] > entrez <- hgu95av2ENTREZID[ids] > > Looks beautiful! But when I try to do the same thing, I get: > > tbl<-topTable(fit2all,number=100) > ids <- tbl[["ID"]] > mogene20sttranscriptclusterACCNUM[ids] > > Error in .checkKeys(value, Lkeys(x), x@ifnotfound) : > value for "17549282" not found > > > This error is evidently generated because some of the ids don't map to any > accession numbers. I can work around this by filtering my ids first, but am > I doing it wrong? Of course lots of probe ids on the array are not going to > map to any accession numbers or symbols or names -- why can't they just > come back with NA instead of an error message and abortion of the whole > process? > > Thanks! > Kamila > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 11.1 years ago James W. MacDonald 65k

0

Entering edit mode

Kamila Naxerova ▴ 100

@kamila-naxerova-4164

Last seen 9.6 years ago

Thanks for your prompt reply Jim! Actually I am afraid there is some deeper lack of knowledge on my part here... Some of the ids that topTable() returns simply don't exist in the annotation package, period. They also don't exist when I search for them in the NetAffx Analysis center. What are these? What is, e.g., "17549282"? After reading my cel files and normalizing them, my eset looks like this: > dim(eset) Features Samples 41345 12 The Affy Transcript cluster file only has about 39400 entries. What are these 2000 ids that are not overlapping? And some of them clearly are differentially expressed... Kamila On Mar 18, 2013, at 4:46 PM, "Naxerova, Kamila" <naxerova at="" fas.harvard.edu=""> wrote: > Hi all, > > I have a little problem using an annotation package that I know how to work around, but I am wondering how to do it more elegantly and efficiently. > > I am analyzing a bunch of Mouse Gene 2.0 ST arrays. I built my own annotation package (with much help from all of you!). I am using Limma and want to look up annotation for diff exp genes provided by topTable(). So it's really a standard situation. On the Bioconductor website, this sequence of commands is suggested (http://www.bioconductor.org/help/workflows/annotation-data/) > > tbl <- topTable(efit, coef=2) > ids <- tbl[["ID"]] > entrez <- hgu95av2ENTREZID[ids] > > Looks beautiful! But when I try to do the same thing, I get: > > tbl<-topTable(fit2all,number=100) > ids <- tbl[["ID"]] > mogene20sttranscriptclusterACCNUM[ids] > > Error in .checkKeys(value, Lkeys(x), x at ifnotfound) : > value for "17549282" not found > > > This error is evidently generated because some of the ids don't map to any accession numbers. I can work around this by filtering my ids first, but am I doing it wrong? Of course lots of probe ids on the array are not going to map to any accession numbers or symbols or names -- why can't they just come back with NA instead of an error message and abortion of the whole process? > > Thanks! > Kamila > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 11.1 years ago Kamila Naxerova ▴ 100

0

Entering edit mode

Hi Kamila, The probeset you note is found by netaffx, and it is a 'rescue' probeset. This has a special use, and will not have an annotation associated with it. I generally remove all non-main probe sets after the eBayes() step. You can use the getMainProbes() function in the affycoretools package to accomplish that task. Best, Jim [[alternative HTML version deleted]]

ADD REPLY • link 11.1 years ago James W. MacDonald 65k

Login before adding your answer.